1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566
|
2020/07/07 - HAProxy coding style - Willy Tarreau <w@1wt.eu>
------------------------------------------------------------
A number of contributors are often embarrassed with coding style issues, they
don't always know if they're doing it right, especially since the coding style
has elvoved along the years. What is explained here is not necessarily what is
applied in the code, but new code should as much as possible conform to this
style. Coding style fixes happen when code is replaced. It is useless to send
patches to fix coding style only, they will be rejected, unless they belong to
a patch series which needs these fixes prior to get code changes. Also, please
avoid fixing coding style in the same patches as functional changes, they make
code review harder.
A good way to quickly validate your patch before submitting it is to pass it
through the Linux kernel's checkpatch.pl utility which can be downloaded here :
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/scripts/checkpatch.pl
Running it with the following options relaxes its checks to accommodate to the
extra degree of freedom that is tolerated in HAProxy's coding style compared to
the stricter style used in the kernel :
checkpatch.pl -q --max-line-length=160 --no-tree --no-signoff \
--ignore=LEADING_SPACE,CODE_INDENT,DEEP_INDENTATION \
--ignore=ELSE_AFTER_BRACE < patch
You can take its output as hints instead of strict rules, but in general its
output will be accurate and it may even spot some real bugs.
When modifying a file, you must accept the terms of the license of this file
which is recalled at the top of the file, or is explained in the LICENSE file,
or if not stated, defaults to LGPL version 2.1 or later for files in the
'include' directory, and GPL version 2 or later for all other files.
When adding a new file, you must add a copyright banner at the top of the
file with your real name, e-mail address and a reminder of the license.
Contributions under incompatible licenses or too restrictive licenses might
get rejected. If in doubt, please apply the principle above for existing files.
All code examples below will intentionally be prefixed with " | " to mark
where the code aligns with the first column, and tabs in this document will be
represented as a series of 8 spaces so that it displays the same everywhere.
1) Indentation and alignment
----------------------------
1.1) Indentation
----------------
Indentation and alignment are two completely different things that people often
get wrong. Indentation is used to mark a sub-level in the code. A sub-level
means that a block is executed in the context of another block (eg: a function
or a condition) :
| main(int argc, char **argv)
| {
| int i;
|
| if (argc < 2)
| exit(1);
| }
In the example above, the code belongs to the main() function and the exit()
call belongs to the if statement. Indentation is made with tabs (\t, ASCII 9),
which allows any developer to configure their preferred editor to use their
own tab size and to still get the text properly indented. Exactly one tab is
used per sub-level. Tabs may only appear at the beginning of a line or after
another tab. It is illegal to put a tab after some text, as it mangles displays
in a different manner for different users (particularly when used to align
comments or values after a #define). If you're tempted to put a tab after some
text, then you're doing it wrong and you need alignment instead (see below).
Note that there are places where the code was not properly indented in the
past. In order to view it correctly, you may have to set your tab size to 8
characters.
1.2) Alignment
--------------
Alignment is used to continue a line in a way to makes things easier to group
together. By definition, alignment is character-based, so it uses spaces. Tabs
would not work because for one tab there would not be as many characters on all
displays. For instance, the arguments in a function declaration may be broken
into multiple lines using alignment spaces :
| int http_header_match2(const char *hdr, const char *end,
| const char *name, int len)
| {
| ...
| }
In this example, the "const char *name" part is aligned with the first
character of the group it belongs to (list of function arguments). Placing it
here makes it obvious that it's one of the function's arguments. Multiple lines
are easy to handle this way. This is very common with long conditions too :
| if ((len < eol - sol) &&
| (sol[len] == ':') &&
| (strncasecmp(sol, name, len) == 0)) {
| ctx->del = len;
| }
If we take again the example above marking tabs with "[-Tabs-]" and spaces
with "#", we get this :
| [-Tabs-]if ((len < eol - sol) &&
| [-Tabs-]####(sol[len] == ':') &&
| [-Tabs-]####(strncasecmp(sol, name, len) == 0)) {
| [-Tabs-][-Tabs-]ctx->del = len;
| [-Tabs-]}
It is worth noting that some editors tend to confuse indentations and alignment.
Emacs is notoriously known for this brokenness, and is responsible for almost
all of the alignment mess. The reason is that Emacs only counts spaces, tries
to fill as many as possible with tabs and completes with spaces. Once you know
it, you just have to be careful, as alignment is not used much, so generally it
is just a matter of replacing the last tab with 8 spaces when this happens.
Indentation should be used everywhere there is a block or an opening brace. It
is not possible to have two consecutive closing braces on the same column, it
means that the innermost was not indented.
Right :
| main(int argc, char **argv)
| {
| if (argc > 1) {
| printf("Hello\n");
| }
| exit(0);
| }
Wrong :
| main(int argc, char **argv)
| {
| if (argc > 1) {
| printf("Hello\n");
| }
| exit(0);
| }
A special case applies to switch/case statements. Due to my editor's settings,
I've been used to align "case" with "switch" and to find it somewhat logical
since each of the "case" statements opens a sublevel belonging to the "switch"
statement. But indenting "case" after "switch" is accepted too. However in any
case, whatever follows the "case" statement must be indented, whether or not it
contains braces :
| switch (*arg) {
| case 'A': {
| int i;
| for (i = 0; i < 10; i++)
| printf("Please stop pressing 'A'!\n");
| break;
| }
| case 'B':
| printf("You pressed 'B'\n");
| break;
| case 'C':
| case 'D':
| printf("You pressed 'C' or 'D'\n");
| break;
| default:
| printf("I don't know what you pressed\n");
| }
2) Braces
---------
Braces are used to delimit multiple-instruction blocks. In general it is
preferred to avoid braces around single-instruction blocks as it reduces the
number of lines :
Right :
| if (argc >= 2)
| exit(0);
Wrong :
| if (argc >= 2) {
| exit(0);
| }
But it is not that strict, it really depends on the context. It happens from
time to time that single-instruction blocks are enclosed within braces because
it makes the code more symmetrical, or more readable. Example :
| if (argc < 2) {
| printf("Missing argument\n");
| exit(1);
| } else {
| exit(0);
| }
Braces are always needed to declare a function. A function's opening brace must
be placed at the beginning of the next line :
Right :
| int main(int argc, char **argv)
| {
| exit(0);
| }
Wrong :
| int main(int argc, char **argv) {
| exit(0);
| }
Note that a large portion of the code still does not conforms to this rule, as
it took years to get all authors to adapt to this more common standard which
is now preferred, as it avoids visual confusion when function declarations are
broken on multiple lines :
Right :
| int foo(const char *hdr, const char *end,
| const char *name, const char *err,
| int len)
| {
| int i;
Wrong :
| int foo(const char *hdr, const char *end,
| const char *name, const char *err,
| int len) {
| int i;
Braces should always be used where there might be an ambiguity with the code
later. The most common example is the stacked "if" statement where an "else"
may be added later at the wrong place breaking the code, but it also happens
with comments or long arguments in function calls. In general, if a block is
more than one line long, it should use braces.
Dangerous code waiting of a victim :
| if (argc < 2)
| /* ret must not be negative here */
| if (ret < 0)
| return -1;
Wrong change :
| if (argc < 2)
| /* ret must not be negative here */
| if (ret < 0)
| return -1;
| else
| return 0;
It will do this instead of what your eye seems to tell you :
| if (argc < 2)
| /* ret must not be negative here */
| if (ret < 0)
| return -1;
| else
| return 0;
Right :
| if (argc < 2) {
| /* ret must not be negative here */
| if (ret < 0)
| return -1;
| }
| else
| return 0;
Similarly dangerous example :
| if (ret < 0)
| /* ret must not be negative here */
| complain();
| init();
Wrong change to silent the annoying message :
| if (ret < 0)
| /* ret must not be negative here */
| //complain();
| init();
... which in fact means :
| if (ret < 0)
| init();
3) Breaking lines
-----------------
There is no strict rule for line breaking. Some files try to stick to the 80
column limit, but given that various people use various tab sizes, it does not
make much sense. Also, code is sometimes easier to read with less lines, as it
represents less surface on the screen (since each new line adds its tabs and
spaces). The rule is to stick to the average line length of other lines. If you
are working in a file which fits in 80 columns, try to keep this goal in mind.
If you're in a function with 120-chars lines, there is no reason to add many
short lines, so you can make longer lines.
In general, opening a new block should lead to a new line. Similarly, multiple
instructions should be avoided on the same line. But some constructs make it
more readable when those are perfectly aligned :
A copy-paste bug in the following construct will be easier to spot :
| if (omult % idiv == 0) { omult /= idiv; idiv = 1; }
| if (idiv % omult == 0) { idiv /= omult; omult = 1; }
| if (imult % odiv == 0) { imult /= odiv; odiv = 1; }
| if (odiv % imult == 0) { odiv /= imult; imult = 1; }
than in this one :
| if (omult % idiv == 0) {
| omult /= idiv;
| idiv = 1;
| }
| if (idiv % omult == 0) {
| idiv /= omult;
| omult = 1;
| }
| if (imult % odiv == 0) {
| imult /= odiv;
| odiv = 1;
| }
| if (odiv % imult == 0) {
| odiv /= imult;
| imult = 1;
| }
What is important is not to mix styles. For instance there is nothing wrong
with having many one-line "case" statements as long as most of them are this
short like below :
| switch (*arg) {
| case 'A': ret = 1; break;
| case 'B': ret = 2; break;
| case 'C': ret = 4; break;
| case 'D': ret = 8; break;
| default : ret = 0; break;
| }
Otherwise, prefer to have the "case" statement on its own line as in the
example in section 1.2 about alignment. In any case, avoid to stack multiple
control statements on the same line, so that it will never be the needed to
add two tab levels at once :
Right :
| switch (*arg) {
| case 'A':
| if (ret < 0)
| ret = 1;
| break;
| default : ret = 0; break;
| }
Wrong :
| switch (*arg) {
| case 'A': if (ret < 0)
| ret = 1;
| break;
| default : ret = 0; break;
| }
Right :
| if (argc < 2)
| if (ret < 0)
| return -1;
or Right :
| if (argc < 2)
| if (ret < 0) return -1;
but Wrong :
| if (argc < 2) if (ret < 0) return -1;
When complex conditions or expressions are broken into multiple lines, please
do ensure that alignment is perfectly appropriate, and group all main operators
on the same side (which you're free to choose as long as it does not change for
every block. Putting binary operators on the right side is preferred as it does
not mangle with alignment but various people have their preferences.
Right :
| if ((txn->flags & TX_NOT_FIRST) &&
| ((req->flags & BF_FULL) ||
| req->r < req->lr ||
| req->r > req->data + req->size - global.tune.maxrewrite)) {
| return 0;
| }
Right :
| if ((txn->flags & TX_NOT_FIRST)
| && ((req->flags & BF_FULL)
| || req->r < req->lr
| || req->r > req->data + req->size - global.tune.maxrewrite)) {
| return 0;
| }
Wrong :
| if ((txn->flags & TX_NOT_FIRST) &&
| ((req->flags & BF_FULL) ||
| req->r < req->lr
| || req->r > req->data + req->size - global.tune.maxrewrite)) {
| return 0;
| }
If it makes the result more readable, parenthesis may even be closed on their
own line in order to align with the opening one. Note that should normally not
be needed because such code would be too complex to be digged into.
The "else" statement may either be merged with the closing "if" brace or lie on
its own line. The later is preferred but it adds one extra line to each control
block which is annoying in short ones. However, if the "else" is followed by an
"if", then it should really be on its own line and the rest of the if/else
blocks must follow the same style.
Right :
| if (a < b) {
| return a;
| }
| else {
| return b;
| }
Right :
| if (a < b) {
| return a;
| } else {
| return b;
| }
Right :
| if (a < b) {
| return a;
| }
| else if (a != b) {
| return b;
| }
| else {
| return 0;
| }
Wrong :
| if (a < b) {
| return a;
| } else if (a != b) {
| return b;
| } else {
| return 0;
| }
Wrong :
| if (a < b) {
| return a;
| }
| else if (a != b) {
| return b;
| } else {
| return 0;
| }
4) Spacing
----------
Correctly spacing code is very important. When you have to spot a bug at 3am,
you need it to be clear. When you expect other people to review your code, you
want it to be clear and don't want them to get nervous when trying to find what
you did.
Always place spaces around all binary or ternary operators, commas, as well as
after semi-colons and opening braces if the line continues :
Right :
| int ret = 0;
| /* if (x >> 4) { x >>= 4; ret += 4; } */
| ret += (x >> 4) ? (x >>= 4, 4) : 0;
| val = ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
Wrong :
| int ret=0;
| /* if (x>>4) {x>>=4;ret+=4;} */
| ret+=(x>>4)?(x>>=4,4):0;
| val=ret+((0xFFFFAA50U>>(x<<1))&3)+1;
Never place spaces after unary operators (&, *, -, !, ~, ++, --) nor cast, as
they might be confused with they binary counterpart, nor before commas or
semicolons :
Right :
| bit = !!(~len++ ^ -(unsigned char)*x);
Wrong :
| bit = ! ! (~len++ ^ - (unsigned char) * x) ;
Note that "sizeof" is a unary operator which is sometimes considered as a
language keyword, but in no case it is a function. It does not require
parenthesis so it is sometimes followed by spaces and sometimes not when
there are no parenthesis. Most people do not really care as long as what
is written is unambiguous.
Braces opening a block must be preceded by one space unless the brace is
placed on the first column :
Right :
| if (argc < 2) {
| }
Wrong :
| if (argc < 2){
| }
Do not add unneeded spaces inside parenthesis, they just make the code less
readable.
Right :
| if (x < 4 && (!y || !z))
| break;
Wrong :
| if ( x < 4 && ( !y || !z ) )
| break;
Language keywords must all be followed by a space. This is true for control
statements (do, for, while, if, else, return, switch, case), and for types
(int, char, unsigned). As an exception, the last type in a cast does not take
a space before the closing parenthesis). The "default" statement in a "switch"
construct is generally just followed by the colon. However the colon after a
"case" or "default" statement must be followed by a space.
Right :
| if (nbargs < 2) {
| printf("Missing arg at %c\n", *(char *)ptr);
| for (i = 0; i < 10; i++) beep();
| return 0;
| }
| switch (*arg) {
Wrong :
| if(nbargs < 2){
| printf("Missing arg at %c\n", *(char*)ptr);
| for(i = 0; i < 10; i++)beep();
| return 0;
| }
| switch(*arg) {
Function calls are different, the opening parenthesis is always coupled to the
function name without any space. But spaces are still needed after commas :
Right :
| if (!init(argc, argv))
| exit(1);
Wrong :
| if (!init (argc,argv))
| exit(1);
5) Excess or lack of parenthesis
--------------------------------
Sometimes there are too many parenthesis in some formulas, sometimes there are
too few. There are a few rules of thumb for this. The first one is to respect
the compiler's advice. If it emits a warning and asks for more parenthesis to
avoid confusion, follow the advice at least to shut the warning. For instance,
the code below is quite ambiguous due to its alignment :
| if (var1 < 2 || var2 < 2 &&
| var3 != var4) {
| /* fail */
| return -3;
| }
Note that this code does :
| if (var1 < 2 || (var2 < 2 && var3 != var4)) {
| /* fail */
| return -3;
| }
But maybe the author meant :
| if ((var1 < 2 || var2 < 2) && var3 != var4) {
| /* fail */
| return -3;
| }
A second rule to put parenthesis is that people don't always know operators
precedence too well. Most often they have no issue with operators of the same
category (eg: booleans, integers, bit manipulation, assignment) but once these
operators are mixed, it causes them all sort of issues. In this case, it is
wise to use parenthesis to avoid errors. One common error concerns the bit
shift operators because they're used to replace multiplies and divides but
don't have the same precedence :
The expression :
| x = y * 16 + 5;
becomes :
| x = y << 4 + 5;
which is wrong because it is equivalent to :
| x = y << (4 + 5);
while the following was desired instead :
| x = (y << 4) + 5;
It is generally fine to write boolean expressions based on comparisons without
any parenthesis. But on top of that, integer expressions and assignments should
then be protected. For instance, there is an error in the expression below
which should be safely rewritten :
Wrong :
| if (var1 > 2 && var1 < 10 ||
| var1 > 2 + 256 && var2 < 10 + 256 ||
| var1 > 2 + 1 << 16 && var2 < 10 + 2 << 16)
| return 1;
Right (may remove a few parenthesis depending on taste) :
| if ((var1 > 2 && var1 < 10) ||
| (var1 > (2 + 256) && var2 < (10 + 256)) ||
| (var1 > (2 + (1 << 16)) && var2 < (10 + (1 << 16))))
| return 1;
The "return" statement is not a function, so it takes no argument. It is a
control statement which is followed by the expression to be returned. It does
not need to be followed by parenthesis :
Wrong :
| int ret0()
| {
| return(0);
| }
Right :
| int ret0()
| {
| return 0;
| }
Parenthesisis are also found in type casts. Type casting should be avoided as
much as possible, especially when it concerns pointer types. Casting a pointer
disables the compiler's type checking and is the best way to get caught doing
wrong things with data not the size you expect. If you need to manipulate
multiple data types, you can use a union instead. If the union is really not
convenient and casts are easier, then try to isolate them as much as possible,
for instance when initializing function arguments or in another function. Not
proceeding this way causes huge risks of not using the proper pointer without
any notification, which is especially true during copy-pastes.
Wrong :
| void *check_private_data(void *arg1, void *arg2)
| {
| char *area;
|
| if (*(int *)arg1 > 1000)
| return NULL;
| if (memcmp(*(const char *)arg2, "send(", 5) != 0))
| return NULL;
| area = malloc(*(int *)arg1);
| if (!area)
| return NULL;
| memcpy(area, *(const char *)arg2 + 5, *(int *)arg1);
| return area;
| }
Right :
| void *check_private_data(void *arg1, void *arg2)
| {
| char *area;
| int len = *(int *)arg1;
| const char *msg = arg2;
|
| if (len > 1000)
| return NULL;
| if (memcmp(msg, "send(", 5) != 0)
| return NULL;
| area = malloc(len);
| if (!area)
| return NULL;
| memcpy(area, msg + 5, len);
| return area;
| }
6) Ambiguous comparisons with zero or NULL
------------------------------------------
In C, '0' has no type, or it has the type of the variable it is assigned to.
Comparing a variable or a return value with zero means comparing with the
representation of zero for this variable's type. For a boolean, zero is false.
For a pointer, zero is NULL. Very often, to make things shorter, it is fine to
use the '!' unary operator to compare with zero, as it is shorter and easier to
remind or understand than a plain '0'. Since the '!' operator is read "not", it
helps read code faster when what follows it makes sense as a boolean, and it is
often much more appropriate than a comparison with zero which makes an equal
sign appear at an undesirable place. For instance :
| if (!isdigit(*c) && !isspace(*c))
| break;
is easier to understand than :
| if (isdigit(*c) == 0 && isspace(*c) == 0)
| break;
For a char this "not" operator can be reminded as "no remaining char", and the
absence of comparison to zero implies existence of the tested entity, hence the
simple strcpy() implementation below which automatically stops once the last
zero is copied :
| void my_strcpy(char *d, const char *s)
| {
| while ((*d++ = *s++));
| }
Note the double parenthesis in order to avoid the compiler telling us it looks
like an equality test.
For a string or more generally any pointer, this test may be understood as an
existence test or a validity test, as the only pointer which will fail to
validate equality is the NULL pointer :
| area = malloc(1000);
| if (!area)
| return -1;
However sometimes it can fool the reader. For instance, strcmp() precisely is
one of such functions whose return value can make one think the opposite due to
its name which may be understood as "if strings compare...". Thus it is strongly
recommended to perform an explicit comparison with zero in such a case, and it
makes sense considering that the comparison's operator is the same that is
wanted to compare the strings (note that current config parser lacks a lot in
this regards) :
strcmp(a, b) == 0 <=> a == b
strcmp(a, b) != 0 <=> a != b
strcmp(a, b) < 0 <=> a < b
strcmp(a, b) > 0 <=> a > b
Avoid this :
| if (strcmp(arg, "test"))
| printf("this is not a test\n");
|
| if (!strcmp(arg, "test"))
| printf("this is a test\n");
Prefer this :
| if (strcmp(arg, "test") != 0)
| printf("this is not a test\n");
|
| if (strcmp(arg, "test") == 0)
| printf("this is a test\n");
7) System call returns
----------------------
This is not directly a matter of coding style but more of bad habits. It is
important to check for the correct value upon return of syscalls. The proper
return code indicating an error is described in its man page. There is no
reason to consider wider ranges than what is indicated. For instance, it is
common to see such a thing :
| if ((fd = open(file, O_RDONLY)) < 0)
| return -1;
This is wrong. The man page says that -1 is returned if an error occurred. It
does not suggest that any other negative value will be an error. It is possible
that a few such issues have been left in existing code. They are bugs for which
fixes are accepted, even though they're currently harmless since open() is not
known for returning negative values at the moment.
8) Declaring new types, names and values
----------------------------------------
Please refrain from using "typedef" to declare new types, they only obfuscate
the code. The reader never knows whether he's manipulating a scalar type or a
struct. For instance it is not obvious why the following code fails to build :
| int delay_expired(timer_t exp, timer_us_t now)
| {
| return now >= exp;
| }
With the types declared in another file this way :
| typedef unsigned int timer_t;
| typedef struct timeval timer_us_t;
This cannot work because we're comparing a scalar with a struct, which does
not make sense. Without a typedef, the function would have been written this
way without any ambiguity and would not have failed :
| int delay_expired(unsigned int exp, struct timeval *now)
| {
| return now >= exp->tv_sec;
| }
Declaring special values may be done using enums. Enums are a way to define
structured integer values which are related to each other. They are perfectly
suited for state machines. While the first element is always assigned the zero
value, not everybody knows that, especially people working with multiple
languages all the day. For this reason it is recommended to explicitly force
the first value even if it's zero. The last element should be followed by a
comma if it is planned that new elements might later be added, this will make
later patches shorter. Conversely, if the last element is placed in order to
get the number of possible values, it must not be followed by a comma and must
be preceded by a comment :
| enum {
| first = 0,
| second,
| third,
| fourth,
| };
| enum {
| first = 0,
| second,
| third,
| fourth,
| /* nbvalues must always be placed last */
| nbvalues
| };
Structure names should be short enough not to mangle function declarations,
and explicit enough to avoid confusion (which is the most important thing).
Wrong :
| struct request_args { /* arguments on the query string */
| char *name;
| char *value;
| struct misc_args *next;
| };
Right :
| struct qs_args { /* arguments on the query string */
| char *name;
| char *value;
| struct qs_args *next;
| }
When declaring new functions or structures, please do not use CamelCase, which
is a style where upper and lower case are mixed in a single word. It causes a
lot of confusion when words are composed from acronyms, because it's hard to
stick to a rule. For instance, a function designed to generate an ISN (initial
sequence number) for a TCP/IP connection could be called :
- generateTcpipIsn()
- generateTcpIpIsn()
- generateTcpIpISN()
- generateTCPIPISN()
etc...
None is right, none is wrong, these are just preferences which might change
along the code. Instead, please use an underscore to separate words. Lowercase
is preferred for the words, but if acronyms are upcased it's not dramatic. The
real advantage of this method is that it creates unambiguous levels even for
short names.
Valid examples :
- generate_tcpip_isn()
- generate_tcp_ip_isn()
- generate_TCPIP_ISN()
- generate_TCP_IP_ISN()
Another example is easy to understand when 3 arguments are involved in naming
the function :
Wrong (naming conflict) :
| /* returns A + B * C */
| int mulABC(int a, int b, int c)
| {
| return a + b * c;
| }
|
| /* returns (A + B) * C */
| int mulABC(int a, int b, int c)
| {
| return (a + b) * c;
| }
Right (unambiguous naming) :
| /* returns A + B * C */
| int mul_a_bc(int a, int b, int c)
| {
| return a + b * c;
| }
|
| /* returns (A + B) * C */
| int mul_ab_c(int a, int b, int c)
| {
| return (a + b) * c;
| }
Whenever you manipulate pointers, try to declare them as "const", as it will
save you from many accidental misuses and will only cause warnings to be
emitted when there is a real risk. In the examples below, it is possible to
call my_strcpy() with a const string only in the first declaration. Note that
people who ignore "const" are often the ones who cast a lot and who complain
from segfaults when using strtok() !
Right :
| void my_strcpy(char *d, const char *s)
| {
| while ((*d++ = *s++));
| }
|
| void say_hello(char *dest)
| {
| my_strcpy(dest, "hello\n");
| }
Wrong :
| void my_strcpy(char *d, char *s)
| {
| while ((*d++ = *s++));
| }
|
| void say_hello(char *dest)
| {
| my_strcpy(dest, "hello\n");
| }
9) Getting macros right
-----------------------
It is very common for macros to do the wrong thing when used in a way their
author did not have in mind. For this reason, macros must always be named with
uppercase letters only. This is the only way to catch the developer's eye when
using them, so that they double-check whether they are taking a risk or not. First,
macros must never ever be terminated by a semi-colon, or they will close the
wrong block once in a while. For instance, the following will cause a build
error before the "else" due to the double semi-colon :
Wrong :
| #define WARN printf("warning\n");
| ...
| if (a < 0)
| WARN;
| else
| a--;
Right :
| #define WARN printf("warning\n")
If multiple instructions are needed, then use a do { } while (0) block, which
is the only construct which respects *exactly* the semantics of a single
instruction :
| #define WARN do { printf("warning\n"); log("warning\n"); } while (0)
| ...
|
| if (a < 0)
| WARN;
| else
| a--;
Second, do not put unprotected control statements in macros, they will
definitely cause bugs :
Wrong :
| #define WARN if (verbose) printf("warning\n")
| ...
| if (a < 0)
| WARN;
| else
| a--;
Which is equivalent to the undesired form below :
| if (a < 0)
| if (verbose)
| printf("warning\n");
| else
| a--;
Right way to do it :
| #define WARN do { if (verbose) printf("warning\n"); } while (0)
| ...
| if (a < 0)
| WARN;
| else
| a--;
Which is equivalent to :
| if (a < 0)
| do { if (verbose) printf("warning\n"); } while (0);
| else
| a--;
Macro parameters must always be surrounded by parenthesis, and must never be
duplicated in the same macro unless explicitly stated. Also, macros must not be
defined with operators without surrounding parenthesis. The MIN/MAX macros are
a pretty common example of multiple misuses, but this happens as early as when
using bit masks. Most often, in case of any doubt, try to use inline functions
instead.
Wrong :
| #define MIN(a, b) a < b ? a : b
|
| /* returns 2 * min(a,b) + 1 */
| int double_min_p1(int a, int b)
| {
| return 2 * MIN(a, b) + 1;
| }
What this will do :
| int double_min_p1(int a, int b)
| {
| return 2 * a < b ? a : b + 1;
| }
Which is equivalent to :
| int double_min_p1(int a, int b)
| {
| return (2 * a) < b ? a : (b + 1);
| }
The first thing to fix is to surround the macro definition with parenthesis to
avoid this mistake :
| #define MIN(a, b) (a < b ? a : b)
But this is still not enough, as can be seen in this example :
| /* compares either a or b with c */
| int min_ab_c(int a, int b, int c)
| {
| return MIN(a ? a : b, c);
| }
Which is equivalent to :
| int min_ab_c(int a, int b, int c)
| {
| return (a ? a : b < c ? a ? a : b : c);
| }
Which in turn means a totally different thing due to precedence :
| int min_ab_c(int a, int b, int c)
| {
| return (a ? a : ((b < c) ? (a ? a : b) : c));
| }
This can be fixed by surrounding *each* argument in the macro with parenthesis:
| #define MIN(a, b) ((a) < (b) ? (a) : (b))
But this is still not enough, as can be seen in this example :
| int min_ap1_b(int a, int b)
| {
| return MIN(++a, b);
| }
Which is equivalent to :
| int min_ap1_b(int a, int b)
| {
| return ((++a) < (b) ? (++a) : (b));
| }
Again, this is wrong because "a" is incremented twice if below b. The only way
to fix this is to use a compound statement and to assign each argument exactly
once to a local variable of the same type :
| #define MIN(a, b) ({ typeof(a) __a = (a); typeof(b) __b = (b); \
| ((__a) < (__b) ? (__a) : (__b)); \
| })
At this point, using static inline functions is much cleaner if a single type
is to be used :
| static inline int min(int a, int b)
| {
| return a < b ? a : b;
| }
10) Includes
------------
Includes are as much as possible listed in alphabetically ordered groups :
- the includes more or less system-specific (sys/*, netinet/*, ...)
- the libc-standard includes (those without any path component)
- includes from the local "import" subdirectory
- includes from the local "haproxy" subdirectory
Each section is just visually delimited from the other ones using an empty
line. The two first ones above may be merged into a single section depending on
developer's preference. Please do not copy-paste include statements from other
files. Having too many includes significantly increases build time and makes it
hard to find which ones are needed later. Just include what you need and if
possible in alphabetical order so that when something is missing, it becomes
obvious where to look for it and where to add it.
All files should include <haproxy/api.h> because this is where build options
are prepared.
HAProxy header files are split in two, those exporting the types only (named
with a trailing "-t") and those exporting variables, functions and inline
functions. Types, structures, enums and #defines must go into the types files
which are the only ones that may be included by othertype files. Function
prototypes and inlined functions must go into the main files. This split is
because of inlined functions which cross-reference types from other files,
which cause a chicken-and-egg problem if the functions and types are declared
at the same place.
Include files must be protected against multiple inclusion using the common
#ifndef/#define/#endif trick with a tag derived from the include file and its
location.
11) Comments
------------
Comments are preferably of the standard 'C' form using /* */. The C++ form "//"
are tolerated for very short comments (eg: a word or two) but should be avoided
as much as possible. Multi-line comments are made with each intermediate line
starting with a star aligned with the first one, as in this example :
| /*
| * This is a multi-line
| * comment.
| */
If multiple code lines need a short comment, try to align them so that you can
have multi-line sentences. This is rarely needed, only for really complex
constructs.
Do not tell what you're doing in comments, but explain why you're doing it if
it seems not to be obvious. Also *do* indicate at the top of function what they
accept and what they don't accept. For instance, strcpy() only accepts output
buffers at least as large as the input buffer, and does not support any NULL
pointer. There is nothing wrong with that if the caller knows it.
Wrong use of comments :
| int flsnz8(unsigned int x)
| {
| int ret = 0; /* initialize ret */
| if (x >> 4) { x >>= 4; ret += 4; } /* add 4 to ret if needed */
| return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1; /* add ??? */
| }
| ...
| bit = ~len + (skip << 3) + 9; /* update bit */
Right use of comments :
| /* This function returns the position of the highest bit set in the lowest
| * byte of <x>, between 0 and 7. It only works if <x> is non-null. It uses
| * a 32-bit value as a lookup table to return one of 4 values for the
| * highest 16 possible 4-bit values.
| */
| int flsnz8(unsigned int x)
| {
| int ret = 0;
| if (x >> 4) { x >>= 4; ret += 4; }
| return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
| }
| ...
| bit = ~len + (skip << 3) + 9; /* (skip << 3) + (8 - len), saves 1 cycle */
12) Use of assembly
-------------------
There are many projects where use of assembly code is not welcome. There is no
problem with use of assembly in haproxy, provided that :
a) an alternate C-form is provided for architectures not covered
b) the code is small enough and well commented enough to be maintained
It is important to take care of various incompatibilities between compiler
versions, for instance regarding output and cloberred registers. There are
a number of documentations on the subject on the net. Anyway if you are
fiddling with assembly, you probably know that already.
Example :
| /* gcc does not know when it can safely divide 64 bits by 32 bits. Use this
| * function when you know for sure that the result fits in 32 bits, because
| * it is optimal on x86 and on 64bit processors.
| */
| static inline unsigned int div64_32(unsigned long long o1, unsigned int o2)
| {
| unsigned int result;
| #ifdef __i386__
| asm("divl %2"
| : "=a" (result)
| : "A"(o1), "rm"(o2));
| #else
| result = o1 / o2;
| #endif
| return result;
| }
13) Pointers
------------
A lot could be said about pointers, there's enough to fill entire books. Misuse
of pointers is one of the primary reasons for bugs in haproxy, and this rate
has significantly increased with the use of threads. Moreover, bogus pointers
cause the hardest to analyse bugs, because usually they result in modifications
to reassigned areas or accesses to unmapped areas, and in each case, bugs that
strike very far away from where they were located. Some bugs have already taken
up to 3 weeks of full time analysis, which has a severe impact on the project's
ability to make forward progress on important features. For this reason, code
that doesn't look robust enough or that doesn't follow some of the rules below
will be rejected, and may even be reverted after being merged if the trouble is
detected late!
13.1) No test before freeing
----------------------------
All platforms where haproxy is supported have a well-defined and documented
behavior for free(NULL), which is to do nothing at all. In other words, free()
does test for the pointer's nullity. As such, there is no point in testing
if a pointer is NULL or not before calling free(). And further, you must not
do it, because it adds some confusion to the reader during debugging sessions,
making one think that the code's authors weren't very sure about what they
were doing. This will not cause a bug but will result in your code to get
rejected.
Wrong call to free :
| static inline int blah_free(struct blah *blah)
| {
| if (blah->str1)
| free(blah->str1);
| if (blah->str2)
| free(blah->str2);
| free(blah);
| }
Correct call to free :
| static inline int blah_free(struct blah *blah)
| {
| free(blah->str1);
| free(blah->str2);
| free(blah);
| }
13.2) No dangling pointers
--------------------------
Pointers are very commonly used as booleans: if they're not NULL, then the
area they point to is valid and may be used. This is convenient for many things
and is even emphasized with threads where they can atomically be swapped with
another value (even NULL), and as such provide guaranteed atomic resource
allocation and sharing.
The problem with this is when someone forgets to delete a pointer when an area
is no longer valid, because this may result in the pointer being accessed later
and pointing to a wrong location, one that was reallocated for something else
and causing all sort of nastiness like crashes or memory corruption. Moreover,
thanks to the memory pools, it is extremely likely that a just released pointer
will be reassigned to a similar object with comparable values (flags etc) at
the same positions, making tests apparently succeed for a while. Some such bugs
have gone undetected for several years.
The rule is pretty simple:
+-----------------------------------------------------------------+
| NO REACHABLE POINTER MAY EVER POINT TO AN UNREACHABLE LOCATION. |
+-----------------------------------------------------------------+
By "reachable pointer", here we mean a pointer that is accessible from a
reachable structure or a global variable. This means that any pointer found
anywhere in any structure in the code may always be dereferenced. This can
seem obvious but this is not always enforced.
This means that when freeing an area, the pointer that was used to find that
area must be overwritten with NULL, and all other such pointers must as well
if any. It is one case where one can find more convenient to write the NULL
on the same line as the call to free() to make things easier to check. Be
careful about any potential "if" when doing this.
Wrong use of free :
| static inline int blah_recycle(struct blah *blah)
| {
| free(blah->str1);
| free(blah->str2);
| }
Correct use of free :
| static inline int blah_recycle(struct blah *blah)
| {
| free(blah->str1); blah->str1 = NULL;
| free(blah->str2); blah->str2 = NULL;
| }
Sometimes the code doesn't permit this to be done. It is not a matter of code
but a matter of architecture. Example:
Initialization:
| static struct foo *foo_init()
| {
| struct foo *foo;
| struct bar *bar;
|
| foo = pool_alloc(foo_head);
| bar = pool_alloc(bar_head);
| if (!foo || !bar)
| goto fail;
| foo->bar = bar;
| ...
| }
Scheduled task 1:
| static inline int foo_timeout(struct foo *foo)
| {
| free(foo->bar);
| free(foo);
| }
Scheduled task 2:
| static inline int bar_timeout(struct bar *bar)
| {
| free(bar);
| }
Here it's obvious that if "bar" times out, it will be freed but its pointer in
"foo" will remain here, and if foo times out just after, it will lead to a
double free. Or worse, if another instance allocates a pointer and receives bar
again, when foo times out, it will release the old bar pointer which now points
to a new object, and the code using that new object will crash much later, or
even worse, will share the same area as yet another instance having inherited
that pointer again.
Here this simply means that the data model is wrong. If bar may be freed alone,
it MUST have a pointer to foo so that bar->foo->bar is set to NULL to let foo
finish its life peacefully. This also means that the code dealing with foo must
be written in a way to support bar's leaving.
13.3) Don't abuse pointers as booleans
--------------------------------------
Given the common use of a pointer to know if the area it points to is valid,
there is a big incentive in using such pointers as booleans to describe
something a bit higher level, like "is the user authenticated". This must not
be done. The reason stems from the points above. Initially this perfectly
matches and the code is simple. Then later some extra options need to be added,
and more pointers are needed, all allocated together. At this point they all
start to become their own booleans, supposedly always equivalent, but if that
were true, they would be a single area with a single pointer. And things start
to fall apart with some code areas relying on one pointer for the condition and
other ones relying on other pointers. Pointers may be substituted with "flags"
or "present in list" etc here. And from this point, things quickly degrade with
pointers needing to remain set even if pointing to wrong areas, just for the
sake of not being NULL and not breaking some assumptions. At this point the
bugs are already there and the code is not trustable anymore.
The only way to avoid this is to strictly respect this rule: pointers do not
represent a functionality but a storage area. Of course it is very frequent to
consider that if an optional string is not set, a feature is not enabled. This
can be fine to some extents. But as soon as any slightest condition is added
anywhere into the mux, the code relying on the pointer must be replaced with
something else so that the pointer may live its own life and be released (and
reset) earlier if needed.
13.4) Mixing const and non-const
--------------------------------
Something often encountered, especially when assembling error messages, is
functions that collect strings, assemble them into larger messages and free
everything. The problem here is that if strings are defined as variables, there
will rightfully be build warnings when reporting string constants such as bare
keywords or messages, and if strings are defined as constants, it is not
possible to free them. The temptation is sometimes huge to force some free()
calls on casted strings. Do not do that! It will inevitably lead to someone
getting caught passing a constant string that will make the process crash (if
lucky). Document the expectations, indicate that all arguments must be freeable
and that the caller must be capable of strdup(), and make your function support
NULLs and document it (so that callers can deal with a failing strdup() on
allocation error).
One valid alternative is to use a secondary channel to indicate whether the
message may be freed or not. A flag in a complex structure can be used for this
purpose, for example. If you are certain that your strings are aligned to a
certain number of bytes, it can be possible to instrument the code to use the
lowest bit to indicate the need to free (e.g. by always adding one to every
const string). But such a solution will require good enough instrumentation so
that it doesn't constitute a new set of traps.
13.5) No pointer casts
----------------------
Except in rare occasions caused by legacy APIs (e.g. sockaddr) or special cases
which explicitly require a form of aliasing, there is no valid reason for
casting pointers, and usually this is used to hide other problems that will
strike later. The only suitable type of cast is the cast from the generic void*
used to store a context for example. But in C, there is no need to cast to nor
from void*, so this is not required. However those coming from C++ tend to be
used to this practice, and others argue that it makes the intent more visible.
As a corollary, do not abuse void*. Placing void* everywhere to avoid casting
is a bad practice as well. The use of void* is only for generic functions or
structures which do not have a limited set of types supported. When only a few
types are supported, generally their type can be passed using a side channel,
and the void* can be turned into a union that makes the code more readable and
more verifiable.
An alternative in haproxy is to use a pointer to an obj_type enum. Usually it
is placed at the beginning of a structure. It works like a void* except that
the type is read directly from the object. This is convenient when a small set
of remote objects may be attached to another one because a single of them will
match a non-null pointer (e.g. a connection or an applet).
Example:
| static inline int blah_free(struct blah *blah)
| {
| /* only one of them (at most) will not be null */
| pool_free(pool_head_connection, objt_conn(blah->target));
| pool_free(pool_head_appctx, objt_appctx(blah->target));
| pool_free(pool_head_stream, objt_stream(blah->target));
| blah->target = NULL;
| }
13.6) Extreme caution when using non-canonical pointers
-------------------------------------------------------
It can be particularly convenient to embed some logic in the unused bits or
code points of a pointer. Indeed, when it is known by design that a given
pointer will always follow a certain alignment, a few lower bits will always
remain zero, and as such may be used as optional flags. For example, the ebtree
code uses the lowest bit to differentiate left/right attachments to the parent
and node/leaf in branches. It is also known that values very close to NULL will
never represent a valid pointer, and the thread-safe MT_LIST code uses this to
lock visited pointers.
There are a few rules to respect in order to do this:
- the deviations from the canonical pointers must be exhaustively documented
where the pointer type is defined, and the whole control logic with its
implications and possible and impossible cases must be enumerated as well ;
- make sure that the operations will work on every supported platform, which
includes 32-bit platforms where structures may be aligned on as little as
32-bit. 32-bit alignment leaves only two LSB available. When doing so, make
sure the target structures are not labelled with the "packed" attribute, or
that they're always perfectly aligned. All platforms where haproxy runs
have their NULL pointer mapped at address zero, and use page sizes at least
4096 bytes large, leaving all values form 1 to 4095 unused. Anything
outside of this is unsafe. In particular, never use negative numbers to
represent a supposedly invalid address. On 32-bits platforms it will often
correspond to a system address or a special page. Always try a variety of
platforms when doing such a thing.
- the code must not use such pointers as booleans anymore even if it is known
that "it works" because that keeps a doubt open for the reviewer. Only the
canonical pointer may be tested. There can be a rare exception which is if
this is on a critical path where severe performance degradation may result
from this. In this case, *each* of the checks must be duly documented and
the equivalent BUG_ON() instances must be placed to prove the claim.
- some inline functions (or macros) must be used to turn the pointers to/from
their canonical form so that the regular code doesn't have to see the
operations, and so that the representation may be easily adjusted in the
future. A few comments indicating to a human how to turn a pointer back and
forth from inside a debugger will be appreciated, as macros often end up
not being trivially readable nor directly usable.
- do not use int types to cast the pointers, this will only work on 32-bit
platforms. While "long" is usually fine, it is not recommended anymore due
to the Windows platform being LLP64 and having it set to 32 bits. And
"long long" isn't good either for always being 64 bits. More suitable types
are ptrdiff_t or size_t. Note that while those were not available everywhere
in the early days of hparoxy, size_t is now heavily used and known to work
everywhere. And do not perform the operations on the pointers, only on the
integer types (and cast back again). Some compilers such as gcc are
extremely picky about this and will often emit wrong code when they see
equality conditions they believe is impossible and decide to optimize them
away.
13.7) Pointers in unions
------------------------
Before placing multiple aliasing pointers inside a same union, there MUST be a
SINGLE well-defined way to figure them out from each other. It may be thanks to
a side-channel information (as done in the samples with a defined type), it may
be based on in-area information (as done using obj_types), or any other trusted
solution. In any case, if pointers are mixed with any other type (integer or
float) in a union, there must be a very simple way to distinguish them, and not
a platform-dependent nor compiler-dependent one.
|