1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699
|
<drbdsetup_options>
<drbdsetup_option name="al-extents">
<term xml:id="al-extents"><option>al-extents <replaceable>extents</replaceable></option>
</term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>al-extents</secondary>
</indexterm> DRBD automatically maintains a "hot" or "active" disk area
likely to be written to again soon based on the recent write activity.
The "active" disk area can be written to immediately, while "inactive"
disk areas must be "activated" first, which requires a meta-data write.
We also refer to this active disk area as the "activity log".</para>
<para>The activity log saves meta-data writes, but the whole log must be
resynced upon recovery of a failed node. The size of the activity log is
a major factor of how long a resync will take and how fast a replicated
disk will become consistent after a crash.</para>
<para>The activity log consists of a number of 4-Megabyte segments; the
<replaceable>al-extents</replaceable> parameter determines how many of
those segments can be active at the same time. The default value for
<replaceable>al-extents</replaceable> is 1237, with a minimum of 7 and a
maximum of 65536.</para>
<para>
Note that the effective maximum may be smaller, depending on how
you created the device meta data, see also
<citerefentry><refentrytitle>drbdmeta</refentrytitle><manvolnum>8</manvolnum></citerefentry>
The effective maximum is 919 * (available on-disk activity-log ring-buffer area/4kB -1),
the default 32kB ring-buffer effects a maximum of 6433 (covers more than 25 GiB of data)
We recommend to keep this well within the amount your backend storage
and replication link are able to resync inside of about 5 minutes.
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="al-updates">
<term xml:id="al-updates"><option>al-updates
<group choice="req" rep="norepeat">
<arg choice="plain" rep="norepeat">yes</arg>
<arg choice="plain" rep="norepeat">no</arg>
</group>
</option>
</term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>al-updates</secondary>
</indexterm> With this parameter, the activity log can be turned off
entirely (see the <option>al-extents</option> parameter). This will speed
up writes because fewer meta-data writes will be necessary, but the
entire device needs to be resynchronized opon recovery of a failed
primary node. The default value for <option>al-updates</option> is
<option>yes</option>.
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="c-delay-target">
<term xml:id="c-delay-target"><option>c-delay-target <replaceable>delay_target</replaceable></option></term>
<term xml:id="c-fill-target"><option>c-fill-target <replaceable>fill_target</replaceable></option></term>
<term xml:id="c-max-rate"><option>c-max-rate <replaceable>max_rate</replaceable></option></term>
<term xml:id="c-plan-ahead"><option>c-plan-ahead <replaceable>plan_time</replaceable></option></term>
<definition>
<para>Dynamically control the resync speed. This mechanism is enabled by
setting the <option>c-plan-ahead</option> parameter to a positive value.
The goal is to either fill the buffers along the data path with a defined
amount of data if <option>c-fill-target</option> is defined, or to have a
defined delay along the path if <option>c-delay-target</option> is
defined. The maximum bandwidth is limited by the
<option>c-max-rate</option> parameter.</para>
<para>The <option>c-plan-ahead</option> parameter defines how fast drbd
adapts to changes in the resync speed. It should be set to five times
the network round-trip time or more. Common values for
<option>c-fill-target</option> for "normal" data paths range from 4K to
100K. If drbd-proxy is used, it is advised to use
<option>c-delay-target</option> instead of <option>c-fill-target</option>. The
<option>c-delay-target</option> parameter is used if the
<option>c-fill-target</option> parameter is undefined or set to 0. The
<option>c-delay-target</option> parameter should be set to five times the
network round-trip time or more. The <option>c-max-rate</option> option
should be set to either the bandwidth available between the DRBD-hosts and the
machines hosting DRBD-proxy, or to the available disk bandwidth.</para>
<para>The default values of these parameters are:
<option>c-plan-ahead</option> = 20 (in units of 0.1 seconds),
<option>c-fill-target</option> = 0 (in units of sectors),
<option>c-delay-target</option> = 1 (in units of 0.1 seconds),
and <option>c-max-rate</option> = 102400 (in units of KiB/s).</para>
<para>Dynamic resync speed control is available since DRBD 8.3.9.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="c-min-rate">
<term xml:id="c-min-rate"><option>c-min-rate <replaceable>min_rate</replaceable></option></term>
<definition>
<para>A node which is primary and sync-source has to schedule application
I/O requests and resync I/O requests. The <option>c-min-rate</option>
parameter limits how much bandwidth is available for resync I/O; the
remaining bandwidth is used for application I/O.</para>
<para>A <option>c-min-rate</option> value of 0 means that there is no
limit on the resync I/O bandwidth. This can slow down application I/O
significantly. Use a value of 1 (1 KiB/s) for the lowest possible resync
rate.</para>
<para>The default value of <option>c-min-rate</option> is 4096, in units of
KiB/s.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="disk-barrier">
<term xml:id="disk-barrier"><option>disk-barrier</option></term>
<term xml:id="disk-flushes"><option>disk-flushes</option></term>
<term xml:id="disk-drain"><option>disk-drain</option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>disk-barrier</secondary>
</indexterm>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>disk-flushes</secondary>
</indexterm>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>disk-drain</secondary>
</indexterm>
<para>DRBD has three methods of handling the ordering of dependent write
requests:
<variablelist>
<varlistentry>
<term><option>disk-barrier</option></term>
<listitem>
<para>Use disk barriers to make sure that requests are written to
disk in the right order. Barriers ensure that all requests
submitted before a barrier make it to the disk before any
requests submitted after the barrier. This is implemented using
'tagged command queuing' on SCSI devices and 'native command
queuing' on SATA devices. Only some devices and device stacks
support this method. The device mapper (LVM) only supports
barriers in some configurations.</para>
<para>Note that on systems which do not support
disk barriers, enabling this option can lead to data loss or
corruption. Until DRBD 8.4.1, <option>disk-barrier</option> was
turned on if the I/O stack below DRBD did support barriers.
Kernels since linux-2.6.36 (or 2.6.32 RHEL6) no longer allow to
detect if barriers are supported. Since drbd-8.4.2,
this option is off by default and needs to be enabled explicitly.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>disk-flushes</option></term>
<listitem>
<para>Use disk flushes between dependent write requests, also
referred to as 'force unit access' by drive vendors. This forces
all data to disk. This option is enabled by default.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>disk-drain</option></term>
<listitem>
<para>Wait for the request queue to "drain" (that is, wait for
the requests to finish) before submitting a dependent write
request. This method requires that requests are stable on disk
when they finish. Before DRBD 8.0.9, this was the only method
implemented. This option is enabled by default. Do not disable
in production environments.
</para>
</listitem>
</varlistentry>
</variablelist>
From these three methods, drbd will use the first that is enabled and
supported by the backing storage device. If all three of these options
are turned off, DRBD will submit write requests without bothering about
dependencies. Depending on the I/O stack, write requests can be
reordered, and they can be submitted in a different order on different
cluster nodes. This can result in data loss or corruption. Therefore,
turning off all three methods of controlling write ordering is strongly
discouraged.
</para>
<para>A general guideline for configuring write ordering is to use disk
barriers or disk flushes when using ordinary disks (or an ordinary disk
array) with a volatile write cache. On storage without cache or with a
battery backed write cache, disk draining can be a reasonable
choice.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="disk-timeout">
<term xml:id="disk-timeout"> <option>disk-timeout</option>
</term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>disk-timeout</secondary>
</indexterm>
<para>If the lower-level device on which a DRBD device stores its data does
not finish an I/O request within the defined
<option>disk-timeout</option>, DRBD treats this as a failure. The
lower-level device is detached, and the device's disk state advances to
Diskless. If DRBD is connected to one or more peers, the failed request
is passed on to one of them.</para>
<para>This option is <emphasis>dangerous and may lead to kernel panic!</emphasis></para>
<para>"Aborting" requests, or force-detaching the disk, is intended for
completely blocked/hung local backing devices which do no longer
complete requests at all, not even do error completions. In this
situation, usually a hard-reset and failover is the only way out.</para>
<para>By "aborting", basically faking a local error-completion,
we allow for a more graceful swichover by cleanly migrating services.
Still the affected node has to be rebooted "soon".</para>
<para>By completing these requests, we allow the upper layers to re-use
the associated data pages.</para>
<para>If later the local backing device "recovers", and now DMAs some data
from disk into the original request pages, in the best case it will
just put random data into unused pages; but typically it will corrupt
meanwhile completely unrelated data, causing all sorts of damage.</para>
<para>Which means delayed successful completion,
especially for READ requests, is a reason to panic().
We assume that a delayed *error* completion is OK,
though we still will complain noisily about it.</para>
<para>The default value of
<option>disk-timeout</option> is 0, which stands for an infinite timeout.
Timeouts are specified in units of 0.1 seconds. This option is available
since DRBD 8.3.12.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="fencing">
<term xml:id="fencing"><option>fencing <replaceable>fencing_policy</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>fencing</secondary>
</indexterm> <option>Fencing</option> is a preventive measure to avoid
situations where both nodes are primary and disconnected. This is also
known as a split-brain situation. DRBD supports the following fencing
policies:</para>
<variablelist>
<varlistentry>
<term xml:id="dont-care"><option>dont-care</option></term>
<listitem>
<para>No fencing actions are taken. This is the default policy.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="resource-only"><option>resource-only</option></term>
<listitem>
<para>If a node becomes a disconnected primary, it tries to fence the peer.
This is done by calling the <option>fence-peer</option> handler. The
handler is supposed to reach the peer over an alternative communication path
and call '<option>drbdadm outdate minor</option>' there.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="resource-and-stonith"><option>resource-and-stonith</option></term>
<listitem>
<para>If a node becomes a disconnected primary, it freezes all its IO operations
and calls its fence-peer handler. The fence-peer handler is supposed to reach
the peer over an alternative communication path and call
'<option>drbdadm outdate minor</option>' there. In case it cannot
do that, it should stonith the peer. IO is resumed as soon as
the situation is resolved. In case the fence-peer handler fails,
I/O can be resumed manually with '<option>drbdadm
resume-io</option>'.</para>
</listitem>
</varlistentry>
</variablelist>
</definition>
</drbdsetup_option>
<drbdsetup_option name="md-flushes">
<term xml:id="md-flushes"><option>md-flushes</option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>md-flushes</secondary>
</indexterm>
<para>Enable disk flushes and disk barriers on the meta-data device.
This option is enabled by default. See the <option>disk-flushes</option>
parameter.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="on-io-error">
<term xml:id="on-io-error"><option>on-io-error <replaceable>handler</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>on-io-error</secondary>
</indexterm> Configure how DRBD reacts to I/O errors on a
lower-level device. The following policies are defined:
<variablelist>
<varlistentry>
<term xml:id="pass_on"><option>pass_on</option></term>
<listitem>
<para>Change the disk status to Inconsistent, mark the failed
block as inconsistent in the bitmap, and retry the I/O operation
on a remote cluster node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="call-local-io-error"><option>call-local-io-error</option></term>
<listitem>
<para>Call the <option>local-io-error</option> handler (see the
<option>handlers</option> section).</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="detach"><option>detach</option></term>
<listitem>
<para>Detach the lower-level device and continue in diskless mode.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="read-balancing">
<term xml:id="read-balancing"><option>read-balancing <replaceable>policy</replaceable></option>
</term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>read-balancing</secondary>
</indexterm>
<para>
Distribute read requests among cluster nodes as defined by
<replaceable>policy</replaceable>. The supported policies are
<option xml:id="prefer-local">prefer-local</option> (the default),
<option xml:id="prefer-remote">prefer-remote</option>, <option xml:id="round-robin">round-robin</option>,
<option xml:id="least-pending">least-pending</option>, <option xml:id="when-congested-remote">when-congested-remote</option>,
<option xml:id="_32K-striping">32K-striping</option>, <option xml:id="_64K-striping">64K-striping</option>,
<option xml:id="_128K-striping">128K-striping</option>, <option xml:id="_256K-striping">256K-striping</option>,
<option xml:id="_512K-striping">512K-striping</option> and <option xml:id="_1M-striping">1M-striping</option>.</para>
<para>This option is available since DRBD 8.4.1.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="discard-zeroes-if-aligned">
<term xml:id="discard-zeroes-if-aligned"><option>discard-zeroes-if-aligned <group choice="req" rep="norepeat">
<arg choice="plain" rep="norepeat">yes</arg>
<arg choice="plain" rep="norepeat">no</arg>
</group></option></term>
<definition>
<para>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>discard-zeroes-if-aligned</secondary>
</indexterm>
There are several aspects to discard/trim/unmap support on linux
block devices. Even if discard is supported in general, it may fail
silently, or may partially ignore discard requests. Devices also
announce whether reading from unmapped blocks returns defined data
(usually zeroes), or undefined data (possibly old data, possibly
garbage).
</para><para>
If on different nodes, DRBD is backed by devices with differing discard
characteristics, discards may lead to data divergence (old data or
garbage left over on one backend, zeroes due to unmapped areas on the
other backend). Online verify would now potentially report tons of
spurious differences. While probably harmless for most use cases
(fstrim on a file system), DRBD cannot have that.
</para><para>
To play safe, we have to disable discard support, if our local backend
(on a Primary) does not support "discard_zeroes_data=true". We also have to
translate discards to explicit zero-out on the receiving side, unless
the receiving side (Secondary) supports "discard_zeroes_data=true",
thereby allocating areas what were supposed to be unmapped.
</para><para>
There are some devices (notably the LVM/DM thin provisioning) that are
capable of discard, but announce discard_zeroes_data=false. In the case of
DM-thin, discards aligned to the chunk size will be unmapped, and
reading from unmapped sectors will return zeroes. However, unaligned
partial head or tail areas of discard requests will be silently ignored.
</para><para>
If we now add a helper to explicitly zero-out these unaligned partial
areas, while passing on the discard of the aligned full chunks, we
effectively achieve discard_zeroes_data=true on such devices.
</para><para>
Setting <option>discard-zeroes-if-aligned</option> to <option>yes</option>
will allow DRBD to use discards, and to announce discard_zeroes_data=true,
even on backends that announce discard_zeroes_data=false.
</para><para>
Setting <option>discard-zeroes-if-aligned</option> to <option>no</option>
will cause DRBD to always fall-back to zero-out on the receiving side,
and to not even announce discard capabilities on the Primary,
if the respective backend announces discard_zeroes_data=false.
</para><para>
We used to ignore the discard_zeroes_data setting completely. To not
break established and expected behaviour, and suddenly cause fstrim on
thin-provisioned LVs to run out-of-space instead of freeing up space,
the default value is <option>yes</option>.
</para><para>
This option is available since 8.4.7.
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="rs-discard-granularity">
<term>
<option>rs-discard-granularity <replaceable>byte</replaceable></option>
</term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>rs-discard-granularity</secondary>
</indexterm>
<para>
When <option>rs-discard-granularity</option> is set to a non zero, positive
value then DRBD tries to do a resync operation in requests of this size.
In case such a block contains only zero bytes on the sync source node,
the sync target node will issue a discard/trim/unmap command for
the area.</para>
<para>The value is constrained by the discard granularity of the backing
block device. In case <option>rs-discard-granularity</option> is not a
multiplier of the discard granularity of the backing block device DRBD
rounds it up. The feature only gets active if the backing block device
reads back zeroes after a discard command.</para>
<para> The default value of is 0. This option is available since 8.4.7.
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="resync-after">
<term xml:id="resync-after">
<only-drbdsetup>
<option>resync-after <replaceable>minor</replaceable></option>
</only-drbdsetup>
<only-drbd-conf>
<option>resync-after <replaceable>res-name</replaceable>/<replaceable>volume</replaceable></option>
</only-drbd-conf>
</term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>resync-after</secondary>
</indexterm> Define that a device should only resynchronize after the
specified other device. By default, no order between devices is
defined, and all devices will resynchronize in parallel. Depending on
the configuration of the lower-level devices, and the available
network and disk bandwidth, this can slow down the overall resync
process. This option can be used to form a chain or tree of
dependencies among devices.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="resync-rate">
<term xml:id="resync-rate"><option>resync-rate <replaceable>rate</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>resync-rate</secondary>
</indexterm> Define how much bandwidth DRBD may use for
resynchronizing. DRBD allows "normal" application I/O even during a
resync. If the resync takes up too much bandwidth, application I/O
can become very slow. This parameter allows to avoid that. Please
note this is option only works when the dynamic resync controller is
disabled.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="size">
<!-- NOTE: This description is neither used in drbd.conf.xml.in nor in
drbdsetup.xml.in. -->
<term xml:id="size"><option>size <replaceable>size</replaceable></option></term>
<definition>
<para>Specify the size of the lower-level device explicitly instead of
determining it automatically. The device size must be determined once
and is remembered for the lifetime of the device. In order to
determine it automatically, all the lower-level devices on all nodes
must be attached, and all nodes must be connected. If the size is
specified explicitly, this is not necessary. The <option>size</option>
value is assumed to be in units of sectors (512 bytes) by
default.</para>
<!-- FIXME:
The <option>- - size</option> option should only be used if you wish not
to use as much as possible from the backing block devices. If you do
not use <option>-d</option>, the <replaceable>device</replaceable> is
only ready for use as soon as it was connected to its peer once.
-->
<!--
<para>If you use the <replaceable>size</replaceable> parameter in
drbd.conf, we strongly recommend to add an explicit unit postfix.
drbdadm and drbdsetup used to have mismatching default units.</para>
-->
</definition>
</drbdsetup_option>
<drbdsetup_option name="dialog-refresh">
<term xml:id="dialog-refresh"><option>dialog-refresh <replaceable>time</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>dialog-refresh</secondary>
</indexterm> The DRBD init script can be used to configure and start
DRBD devices, which can involve waiting for other cluster nodes.
While waiting, the init script shows the remaining waiting time. The
<option>dialog-refresh</option> defines the number of seconds between
updates of that countdown. The default value is 1; a value of 0 turns
off the countdown.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="disable-ip-verification">
<term xml:id="disable-ip-verification"><option>disable-ip-verification</option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>disable-ip-verification</secondary>
</indexterm>
<para>
Normally, DRBD verifies that the IP addresses in the configuration
match the host names. Use the <option>disable-ip-verification</option>
parameter to disable these checks.
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="usage-count">
<term xml:id="usage-count"><option>usage-count
<group choice="req" rep="norepeat">
<arg choice="plain" rep="norepeat">yes</arg>
<arg choice="plain" rep="norepeat">no</arg>
<arg choice="plain" rep="norepeat">ask</arg>
</group>
</option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>usage-count</secondary>
</indexterm>
<para>A explained on DRBD's <ulink url="http://usage.drbd.org"><citetitle>
Online Usage Counter</citetitle></ulink> web page, DRBD includes a
mechanism for anonymously counting how many installations are using which
versions of DRBD. The results are available on the web page for anyone to
see.</para>
<para>This parameter defines if a cluster node participates in the usage
counter; the supported values are <option>yes</option>,
<option>no</option>, and <option>ask</option> (ask the user, the
default).</para>
<para>We would like to ask users to participate in the online usage
counter as this provides us valuable feedback for steering the
development of DRBD.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="udev-always-use-vnr">
<term xml:id="udev-always-use-vnr"><option>udev-always-use-vnr</option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>udev-always-use-vnr</secondary>
</indexterm>
<para>When udev asks drbdadm for a list of device related symlinks,
drbdadm would suggest symlinks with differing naming conventions,
depending on whether the resource has explicit
<literal>volume VNR { }</literal> definitions,
or only one single volume with the implicit volume number 0:
<programlisting><![CDATA[
# implicit single volume without "volume 0 {}" block
DEVICE=drbd<minor>
SYMLINK_BY_RES=drbd/by-res/<resource-name>
SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
# explicit volume definition: volume VNR { }
DEVICE=drbd<minor>
SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
]]></programlisting>
</para>
<para>If you define this parameter in the global section,
drbdadm will always add the <literal>.../VNR</literal> part,
and will not care for whether the volume definition was implicit or explicit.
</para>
<para>For legacy backward compatibility, this is off by default,
but we do recommend to enable it.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="after-sb-0pri">
<term xml:id="after-sb-0pri"><option>after-sb-0pri <replaceable>policy</replaceable></option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>after-sb-0pri</secondary>
</indexterm>
<para>Define how to react if a split-brain scenario is detected and none
of the two nodes is in primary role. (We detect split-brain scenarios
when two nodes connect; split-brain decisions are always between two
nodes.) The defined policies are:</para>
<variablelist>
<varlistentry>
<term><option>disconnect</option></term>
<listitem>
<para>No automatic resynchronization; simply disconnect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>discard-younger-primary</option></term>
<term><option>discard-older-primary</option></term>
<listitem>
<para>Resynchronize from the node which became primary first
(<option>discard-younger-primary</option>) or last
(<option>discard-older-primary</option>). If both nodes became
primary independently, the <option>discard-least-changes</option>
policy is used.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>discard-zero-changes</option></term>
<listitem>
<para>If only one of the nodes wrote data since the split brain
situation was detected, resynchronize from this node to the other.
If both nodes wrote data, disconnect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>discard-least-changes</option></term>
<listitem>
<para>Resynchronize from the node with more modified blocks.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>discard-node-<replaceable>nodename</replaceable></option></term>
<listitem>
<para>Always resynchronize to the named node.</para>
</listitem>
</varlistentry>
</variablelist>
<!-- FIXME: Refer to rr-conflict. -->
</definition>
</drbdsetup_option>
<drbdsetup_option name="after-sb-1pri">
<term xml:id="after-sb-1pri"><option>after-sb-1pri <replaceable>policy</replaceable></option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>after-sb-1pri</secondary>
</indexterm>
<para>Define how to react if a split-brain scenario is detected, with one
node in primary role and one node in secondary role. (We detect
split-brain scenarios when two nodes connect, so split-brain decisions
are always among two nodes.) The defined policies are:</para>
<variablelist>
<varlistentry>
<term><option>disconnect</option></term>
<listitem>
<para>No automatic resynchronization, simply disconnect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>consensus</option></term>
<listitem>
<para>Discard the data on the secondary node if the
<option>after-sb-0pri</option> algorithm would also discard the
data on the secondary node. Otherwise, disconnect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>violently-as0p</option></term>
<listitem>
<para>Always take the decision of the <option>after-sb-0pri</option> algorithm,
even if it causes an erratic change of the primary's view of the
data. This is only useful if a single-node file system (i.e., not
OCFS2 or GFS) with the <option>allow-two-primaries</option> flag
is used. This option can cause the primary node to crash, and
should not be used.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="discard-secondary"><option>discard-secondary</option></term>
<listitem>
<para>Discard the data on the secondary node.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="call-pri-lost-after-sb"><option>call-pri-lost-after-sb</option></term>
<listitem>
<para>Always take the decision of the
<option>after-sb-0pri</option> algorithm. If the decision is to
discard the data on the primary node, call the
<option xml:id="pri-lost-after-sb">pri-lost-after-sb</option> handler on the primary
node.</para>
</listitem>
</varlistentry>
</variablelist>
<!-- FIXME: Refer to rr-conflict. -->
</definition>
</drbdsetup_option>
<drbdsetup_option name="after-sb-2pri">
<term xml:id="after-sb-2pri"><option>after-sb-2pri <replaceable>policy</replaceable></option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>after-sb-2pri</secondary>
</indexterm>
<para>Define how to react if a split-brain scenario is detected and both
nodes are in primary role. (We detect split-brain scenarios when two
nodes connect, so split-brain decisions are always among two nodes.) The
defined policies are:</para>
<variablelist>
<varlistentry>
<term><option>disconnect</option></term>
<listitem>
<para>No automatic resynchronization, simply disconnect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="violently-as0p"><option>violently-as0p</option></term>
<listitem>
<para>See the <option>violently-as0p</option> policy for
<option>after-sb-1pri</option>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>call-pri-lost-after-sb</option></term>
<listitem>
<para>Call the <option>pri-lost-after-sb</option> helper program on one
of the machines unless that machine can demote to secondary. The helper
program is expected to reboot the machine, which brings the node into
a secondary role. Which machine runs the helper program is determined
by the <option>after-sb-0pri</option> strategy.</para>
</listitem>
</varlistentry>
</variablelist>
<!-- FIXME: Refer to rr-conflict. -->
</definition>
</drbdsetup_option>
<drbdsetup_option name="allow-two-primaries">
<term xml:id="allow-two-primaries"><option>allow-two-primaries</option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>allow-two-primaries</secondary>
</indexterm> The most common way to configure DRBD devices is to allow
only one node to be primary (and thus writable) at a time.</para>
<para>In some scenarios it is preferable to allow two nodes to be
primary at once; a mechanism outside of DRBD then must make sure that
writes to the shared, replicated device happen in a coordinated way.
This can be done with a shared-storage cluster file system like OCFS2
and GFS, or with virtual machine images and a virtual machine manager
that can migrate virtual machines between physical machines.</para>
<para>The <option>allow-two-primaries</option> parameter tells DRBD to
allow two nodes to be primary at the same time. Never enable this
option when using a non-distributed file system; otherwise, data
corruption and node crashes will result!</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="always-asbp">
<term xml:id="always-asbp"><option>always-asbp</option></term>
<!-- FIXME: this option does not mke any sense anymore. How can we fix this? -->
<definition>
<para>Normally the automatic after-split-brain policies are only used if current
states of the UUIDs do not indicate the presence of a third node.</para>
<para>With this option you request that the automatic after-split-brain policies are
used as long as the data sets of the nodes are somehow related. This might cause a
full sync, if the UUIDs indicate the presence of a third node. (Or double faults led
to strange UUID sets.)</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="connect-int">
<term xml:id="connect-int"><option>connect-int <replaceable>time</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>connect-int</secondary>
</indexterm> As soon as a connection between two nodes is configured
with <command moreinfo="none">drbdsetup connect</command>, DRBD
immediately tries to establish the connection. If this fails, DRBD
waits for <option>connect-int</option> seconds and then repeats. The
default value of <option>connect-int</option> is 10 seconds.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="cram-hmac-alg">
<term xml:id="cram-hmac-alg"><option>cram-hmac-alg <replaceable>hash-algorithm</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>cram-hmac-alg</secondary>
</indexterm> Configure the hash-based message authentication code
(HMAC) or secure hash algorithm to use for peer authentication. The
kernel supports a number of different algorithms, some of which may be
loadable as kernel modules. See the shash algorithms listed in
/proc/crypto. By default, <option>cram-hmac-alg</option> is unset.
Peer authentication also requires a <option>shared-secret</option> to
be configured.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="csums-alg">
<term xml:id="csum-alg"><option>csums-alg <replaceable>hash-algorithm</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>csums-alg</secondary>
</indexterm> Normally, when two nodes resynchronize, the sync target
requests a piece of out-of-sync data from the sync source, and the sync
source sends the data. With many usage patterns, a significant number of those blocks
will actually be identical.</para>
<para>When a <option>csums-alg</option> algorithm is specified, when
requesting a piece of out-of-sync data, the sync target also sends
along a hash of the data it currently has. The sync source compares
this hash with its own version of the data. It sends the sync target
the new data if the hashes differ, and tells it that the data are the
same otherwise. This reduces the network bandwidth required, at the
cost of higher cpu utilization and possibly increased I/O on the sync
target.</para>
<para>The <option>csums-alg</option> can be set to one of the secure
hash algorithms supported by the kernel; see the shash algorithms
listed in /proc/crypto. By default, <option>csums-alg</option> is
unset.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="csums-after-crash-only">
<term xml:id="csums-after-crash-only"><option>csums-after-crash-only</option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>csums-after-crash-only</secondary>
</indexterm> Enabling this option (and csums-alg, above) makes it possible to
use the checksum based resync only for the first resync after primary crash,
but not for later "network hickups".</para>
<para>In most cases, block that are marked as need-to-be-resynced are in fact changed,
so calculating checksums, and both reading and writing the blocks on the resync target
is all effective overhead.</para>
<para>The advantage of checksum based resync is mostly after primary crash recovery,
where the recovery marked larger areas (those covered by the activity log)
as need-to-be-resynced, just in case. Introduced in 8.4.5.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="data-integrity-alg">
<term xml:id="data-integrity-alg"><option>data-integrity-alg </option> <replaceable>alg</replaceable></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>data-integrity-alg</secondary>
</indexterm>
<para>DRBD normally relies on the data integrity checks built into the
TCP/IP protocol, but if a data integrity algorithm is configured, it will
additionally use this algorithm to make sure that the data received over
the network match what the sender has sent. If a data integrity error is
detected, DRBD will close the network connection and reconnect, which
will trigger a resync.</para>
<para>The <option>data-integrity-alg</option> can be set to one of the
secure hash algorithms supported by the kernel; see the shash algorithms
listed in /proc/crypto. By default, this mechanism is turned off.</para>
<para>Because of the CPU overhead involved, we recommend not to use this
option in production environments. Also see the notes on data
integrity below.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="ko-count">
<term xml:id="ko-count"><option>ko-count <replaceable>number</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>ko-count</secondary>
</indexterm> If a secondary node fails to complete a write request in
<option>ko-count</option> times the <option>timeout</option> parameter,
it is excluded from the cluster. The primary node then sets the
connection to this secondary node to Standalone.
To disable this feature, you should explicitly set it to 0; defaults may change between versions.
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="max-buffers">
<term xml:id="max-buffers"><option>max-buffers <replaceable>number</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>max-buffers</secondary>
</indexterm> Limits the memory usage per DRBD minor device on the receiving side,
or for internal buffers during resync or online-verify.
Unit is PAGE_SIZE, which is 4 KiB on most systems.
The minimum possible setting is hard coded to 32 (=128 KiB).
These buffers are used to hold data blocks while they are written to/read from disk.
To avoid possible distributed deadlocks on congestion, this setting is used
as a throttle threshold rather than a hard limit. Once more than max-buffers
pages are in use, further allocation from this pool is throttled.
You want to increase max-buffers if you cannot saturate the IO backend on the
receiving side.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="max-epoch-size">
<term xml:id="max-epoch-size"><option>max-epoch-size <replaceable>number</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>max-epoch-size</secondary>
</indexterm> Define the maximum number of write requests DRBD may issue
before issuing a write barrier. The default value is 2048, with a
minimum of 1 and a maximum of 20000. Setting this parameter to a value
below 10 is likely to decrease performance.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="on-congestion">
<term xml:id="on-congestion"><option>on-congestion <replaceable>policy</replaceable></option></term>
<term xml:id="congestion-fill"><option>congestion-fill <replaceable>threshold</replaceable></option></term>
<term xml:id="congestion-extents"><option>congestion-extents
<replaceable>threshold</replaceable></option></term>
<definition>
<para>By default, DRBD blocks when the TCP send queue is full. This prevents
applications from generating further write requests until more buffer
space becomes available again.</para>
<para>When DRBD is used together with DRBD-proxy, it can be better to use
the <option>pull-ahead</option> <option>on-congestion</option> policy,
which can switch DRBD into ahead/behind mode before the send queue is full.
DRBD then records the differences between itself and the peer in its
bitmap, but it no longer replicates them to the peer. When enough buffer
space becomes available again, the node resynchronizes with the peer and
switches back to normal replication.</para>
<para>This has the advantage of not blocking application I/O even when the
queues fill up, and the disadvantage that peer nodes can fall behind much
further. Also, while resynchronizing, peer nodes will become
inconsistent.</para>
<para>The available congestion policies are <option>block</option> (the
default) and <option>pull-ahead</option>. The
<option>congestion-fill</option> parameter defines how much data is
allowed to be "in flight" in this connection. The default value is 0,
which disables this mechanism of congestion control, with a maximum of
10 GiBytes. The <option>congestion-extents</option> parameter defines
how many bitmap extents may be active before switching into ahead/behind
mode, with the same default and limits as the <option>al-extents</option>
parameter. The <option>congestion-extents</option> parameter is
effective only when set to a value smaller than
<option>al-extents</option>.</para>
<para>Ahead/behind mode is available since DRBD 8.3.10.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="ping-int">
<term xml:id="ping-int"><option>ping-int <replaceable>interval</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>ping-int</secondary>
</indexterm> When the TCP/IP connection to a peer is idle for more than
<option>ping-int</option> seconds, DRBD will send a keep-alive packet
to make sure that a failed peer or network connection is detected
reasonably soon. The default value is 10 seconds, with a minimum of 1
and a maximum of 120 seconds. The unit is seconds.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="ping-timeout">
<term xml:id="ping-timeout"><option>ping-timeout <replaceable>timeout</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>ping-timeout</secondary>
</indexterm> Define the timeout for replies to keep-alive packets. If
the peer does not reply within <option>ping-timeout</option>, DRBD will
close and try to reestablish the connection. The default value is 0.5
seconds, with a minimum of 0.1 seconds and a maximum of 3 seconds. The
unit is tenths of a second.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="socket-check-timeout">
<term xml:id="socket-check-timeout"><option>socket-check-timeout <replaceable>timeout</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>socket-check-timeout</secondary>
</indexterm>In setups involving a DRBD-proxy and connections that experience a lot of
buffer-bloat it might be necessary to set <option>ping-timeout</option> to an
unusual high value. By default DRBD uses the same value to wait if a newly
established TCP-connection is stable. Since the DRBD-proxy is usually located
in the same data center such a long wait time may hinder DRBD's connect process.</para>
<para>In such setups <option>socket-check-timeout</option> should be set to
at least to the round trip time between DRBD and DRBD-proxy. I.e. in most
cases to 1.</para>
<para>The default unit is tenths of a second, the default value is 0 (which causes
DRBD to use the value of <option>ping-timeout</option> instead).
Introduced in 8.4.5.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="protocol">
<term xml:id="protocol"><option>protocol <replaceable>name</replaceable></option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>protocol</secondary>
</indexterm>
<para>Use the specified protocol on this connection. The supported
protocols are:
<variablelist>
<varlistentry>
<term xml:id="A"><option>A</option></term>
<listitem>
<para>Writes to the DRBD device complete as soon as they have
reached the local disk and the TCP/IP send buffer.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="B"><option>B</option></term>
<listitem>
<para>Writes to the DRBD device complete as soon as they have
reached the local disk, and all peers have acknowledged the
receipt of the write requests.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="C"><option>C</option></term>
<listitem>
<para>Writes to the DRBD device complete as soon as they have
reached the local and all remote disks.</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="rcvbuf-size">
<term xml:id="rcvbuf-size"><option>rcvbuf-size <replaceable>size</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>rcvbuf-size</secondary>
</indexterm> Configure the size of the TCP/IP receive buffer. A value
of 0 (the default) causes the buffer size to adjust dynamically.
This parameter usually does not need to be set, but it can be set
to a value up to 10 MiB. The default unit is bytes.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="rr-conflict">
<term xml:id="rr-conflict"><option>rr-conflict</option> <replaceable>policy</replaceable></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>rr-conflict</secondary>
</indexterm>
<para>This option helps to solve the cases when the outcome of the resync decision is
incompatible with the current role assignment in the cluster. The
defined policies are:</para>
<variablelist>
<varlistentry>
<term xml:id="disconnect"><option>disconnect</option></term>
<listitem>
<para>No automatic resynchronization, simply disconnect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="violently"><option>violently</option></term>
<listitem>
<para>Resync to the primary node is allowed, violating the assumption that data on
a block device are stable for one of the nodes. <emphasis>Do not
use this option, it is dangerous.</emphasis></para> <!-- What would happen? -->
</listitem>
</varlistentry>
<varlistentry>
<term xml:id="call-pri-lost"><option>call-pri-lost</option></term>
<listitem>
<para>Call the <option>pri-lost</option> handler on one of the machines. The handler is
expected to reboot the machine, which puts it into secondary role.</para>
</listitem>
</varlistentry>
</variablelist>
<!-- FIXME: It is completely unclear how this option interacts with
after-sb-0pri, after-sb-1pri, and after-sb-2pri. -->
<!-- FIXME: Refer to after-sb-0pri, after-sb-1pri, and after-sb-2pri. -->
</definition>
</drbdsetup_option>
<drbdsetup_option name="shared-secret">
<term xml:id="shared-secret"><option>shared-secret <replaceable>secret</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>shared-secret</secondary>
</indexterm> Configure the shared secret used for peer authentication.
The secret is a string of up to 64 characters. Peer authentication also
requires the <option>cram-hmac-alg</option> parameter to be set.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="sndbuf-size">
<term xml:id="sndbuf-size"><option>sndbuf-size <replaceable>size</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>sndbuf-size</secondary>
</indexterm> Configure the size of the TCP/IP send buffer. Since DRBD
8.0.13 / 8.2.7, a value of 0 (the default) causes the buffer size to
adjust dynamically. Values below 32 KiB are harmful to the throughput
on this connection. Large buffer sizes can be useful especially when
protocol A is used over high-latency networks; the maximum value
supported is 10 MiB.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="tcp-cork">
<term xml:id="tcp-cork"><option>tcp-cork</option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>tcp-cork</secondary>
</indexterm>
<para>By default, DRBD uses the TCP_CORK socket option to prevent the
kernel from sending partial messages; this results in fewer and bigger
packets on the network. Some network stacks can perform worse with this
optimization. On these, the <option>tcp-cork</option> parameter can be
used to turn this optimization off.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="timeout">
<term xml:id="timeout"><option>timeout <replaceable>time</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>timeout</secondary>
</indexterm> Define the timeout for replies over the network: if a peer
node does not send an expected reply within the specified <option>timeout</option>,
it is considered dead and the TCP/IP connection is closed. The timeout
value must be lower than <option>connect-int</option> and lower than
<option>ping-int</option>. The default is 6 seconds; the value is
specified in tenths of a second.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="use-rle">
<term xml:id="use-rle"><option>use-rle</option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>use-rle</secondary>
</indexterm> Each replicated device on a cluster node has a separate
bitmap for each of its peer devices. The bitmaps are used for tracking
the differences between the local and peer device: depending on the
cluster state, a disk range can be marked as different from the peer in
the device's bitmap, in the peer device's bitmap, or in both bitmaps.
When two cluster nodes connect, they exchange each other's bitmaps, and
they each compute the union of the local and peer bitmap to determine
the overall differences.</para>
<para>Bitmaps of very large devices are also relatively large, but they
usually compress very well using run-length encoding. This can save
time and bandwidth for the bitmap transfers.</para>
<para>The <option>use-rle</option> parameter determines if run-length
encoding should be used. It is on by default since DRBD 8.4.0.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="verify-alg">
<term xml:id="verify-alg"><option>verify-alg <replaceable>hash-algorithm</replaceable></option></term>
<definition>
<para>Online verification (<command moreinfo="none">drbdadm
verify</command>) computes and compares checksums of disk blocks
(i.e., hash values) in order to detect if they differ. The
<option>verify-alg</option> parameter determines which algorithm to use
for these checksums. It must be set to one of the secure hash algorithms
supported by the kernel before online verify can be used; see the shash
algorithms listed in /proc/crypto.</para>
<para>We recommend to schedule online verifications regularly during
low-load periods, for example once a month. Also see the notes on data
integrity below.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="discard-my-data">
<term xml:id="discard-my-data"><option>discard-my-data</option></term>
<definition>
<para>Discard the local data and resynchronize with the peer that has the
most up-to-data data. Use this option to manually recover from a
split-brain situation.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="tentative">
<term xml:id="tentative"><option>tentative</option></term>
<definition>
<para>Only determine if a connection to the peer can be established and
if a resync is necessary (and in which direction) without actually
establishing the connection or starting the resync. Check the system
log to see what DRBD would do without the <option>--tentative</option>
option.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="auto-promote">
<term xml:id="auto-promote"><option>auto-promote <replaceable>bool-value</replaceable></option></term>
<definition>
<indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>auto-promote</secondary>
</indexterm>
<para>A resource must be promoted to primary role before any of its devices
can be mounted or opened for writing.</para>
<para>Before DRBD 9, this could only be done explicitly ("drbdadm
primary"). Since DRBD 9, the <option>auto-promote</option> parameter
allows to automatically promote a resource to primary role when one of
its devices is mounted or opened for writing. As soon as all devices are
unmounted or closed with no more remaining users, the role of the
resource changes back to secondary.</para>
<para>Automatic promotion only succeeds if the cluster state allows it
(that is, if an explicit <command moreinfo="none">drbdadm
primary</command> command would succeed). Otherwise, mounting or
opening the device fails as it already did before DRBD 9: the
<citerefentry><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>
system call fails with errno set to EROFS (Read-only file system); the
<citerefentry><refentrytitle>open</refentrytitle><manvolnum>2</manvolnum></citerefentry>
system call fails with errno set to EMEDIUMTYPE (wrong medium
type).</para>
<para>Irrespective of the <option>auto-promote</option> parameter, if a
device is promoted explicitly (<command moreinfo="none">drbdadm
primary</command>), it also needs to be demoted explicitly (<command
moreinfo="none">drbdadm secondary</command>).</para>
<para>The <option>auto-promote</option> parameter is available since DRBD
9.0.0, and defaults to <constant>yes</constant>.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="cpu-mask">
<term xml:id="cpu-mask"><option>cpu-mask <replaceable>cpu-mask</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>cpu-mask</secondary>
</indexterm> Set the cpu affinity mask for DRBD kernel threads. The
cpu mask is specified as a hexadecimal number. The default value is 0,
which lets the scheduler decide which kernel threads run on which CPUs.
CPU numbers in <option>cpu-mask</option> which do not exist in the
system are ignored.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="on-no-data-accessible">
<term xml:id="on-no-data-accessible"><option>on-no-data-accessible
<replaceable>policy</replaceable></option></term>
<definition>
<para>Determine how to deal with I/O requests when the requested data is
not available locally or remotely (for example, when all disks have
failed). The defined policies are:
<variablelist>
<varlistentry>
<term xml:id="io-error"><option>io-error</option></term>
<listitem><para>
System calls fail with errno set to EIO.
</para></listitem>
</varlistentry>
<varlistentry>
<term xml:id="suspend-io"><option>suspend-io</option></term>
<listitem><para>
The resource suspends I/O. I/O can be resumed by (re)attaching
the lower-level device, by connecting to a peer which has
access to the data, or by forcing DRBD to resume I/O with
<command moreinfo="none">drbdadm resume-io
<replaceable>res</replaceable></command>. When no data is
available, forcing I/O to resume will result in the same
behavior as the <option>io-error</option> policy.
</para></listitem>
</varlistentry>
</variablelist>
This setting is available since DRBD 8.3.9; the default policy is
<option>io-error</option>. </para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="peer-ack-window">
<term xml:id="peer-ack-window"><option>peer-ack-window <replaceable>value</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>peer-ack-window</secondary>
</indexterm>
On each node and for each device, DRBD maintains a bitmap of the
differences between the local and remote data for each peer device.
For example, in a three-node setup (nodes A, B, C) each with a single
device, every node maintains one bitmap for each of its peers.</para>
<para>When nodes receive write requests, they know how to update the
bitmaps for the writing node, but not how to update the bitmaps between
themselves. In this example, when a write request propagates from node
A to B and C, nodes B and C know that they have the same data as node
A, but not whether or not they both have the same data.</para>
<para>As a remedy, the writing node occasionally sends peer-ack packets
to its peers which tell them which state they are in relative to each
other.</para>
<para>The <option>peer-ack-window</option> parameter specifies how much
data a primary node may send before sending a peer-ack packet. A low
value causes increased network traffic; a high value causes less
network traffic but higher memory consumption on secondary nodes and
higher resync times between the secondary nodes after primary node
failures. (Note: peer-ack packets may be sent due to other reasons as
well, e.g. membership changes or expiry of the
<option>peer-ack-delay</option> timer.)</para>
<para>The default value for <option>peer-ack-window</option> is 2 MiB,
the default unit is sectors. This option is available since
9.0.0.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="peer-ack-delay">
<term xml:id="peer-ack-delay"><option>peer-ack-delay <replaceable>expiry-time</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>peer-ack-delay</secondary>
</indexterm>
If after the last finished write request no new write request gets issued for
<replaceable>expiry-time</replaceable>, then a peer-ack packet is sent.
If a new write request is issued before the timer expires, the timer gets reset
to <replaceable>expiry-time</replaceable>. (Note: peer-ack packets may be sent
due to other reasons as well, e.g. membership changes or the
<option>peer-ack-window</option> option.)</para>
<para>This parameter may influence resync behavior on remote nodes. Peer nodes
need to wait until they receive an peer-ack for releasing a lock on an AL-extent.
Resync operations between peers may need to wait for for these locks.
</para>
<para>The default value for <option>peer-ack-delay</option> is 100 milliseconds,
the default unit is milliseconds. This option is available since
9.0.0.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="degr-wfc-timeout">
<term xml:id="degr-wfc-timeout"><option>degr-wfc-timeout <replaceable>timeout</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>degr-wfc-timeout</secondary>
</indexterm> Define how long to wait until all peers are
connected in case the cluster consisted of a single node only
when the system went down. This parameter is usually set to a
value smaller than <option>wfc-timeout</option>. The
assumption here is that peers which were unreachable before a
reboot are less likely to be reachable after the reboot, so
waiting is less likely to help.</para>
<para>The timeout is specified in seconds. The default value is 0,
which stands for an infinite timeout. Also see the
<option>wfc-timeout</option> parameter.</para>
<!-- FIXME: How does wfc-timeout vs. degr-wfc-timeout work with
more than two nodes in the cluster? If a cluster is only
"degraded" when only one node remains and only one out of
three nodes fails, we will still wait for that one node for
wfc-timeout, which might be forever. -->
</definition>
</drbdsetup_option>
<drbdsetup_option name="outdated-wfc-timeout">
<term xml:id="outdated-wfc-timeout"><option>outdated-wfc-timeout <replaceable>timeout</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>outdated-wfc-timeout</secondary>
</indexterm> Define how long to wait until all peers are
connected if all peers were outdated when the system went down.
This parameter is usually set to a value smaller than
<option>wfc-timeout</option>. The assumption here is that an
outdated peer cannot have become primary in the meantime, so we
don't need to wait for it as long as for a node which was alive
before.</para>
<para>The timeout is specified in seconds. The default value is 0,
which stands for an infinite timeout. Also see the
<option>wfc-timeout</option> parameter.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="wait-after-sb">
<term xml:id="wait-after-sb"><option>wait-after-sb</option></term>
<definition>
<para>This parameter causes DRBD to continue waiting in the init
script even when a split-brain situation has been detected, and
the nodes therefore refuse to connect to each other.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="wfc-timeout">
<term xml:id="wfc-timeout"><option>wfc-timeout <replaceable>timeout</replaceable></option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>wfc-timeout</secondary>
</indexterm> Define how long the init script waits until all peers are
connected. This can be useful in combination with a cluster manager
which cannot manage DRBD resources: when the cluster manager starts,
the DRBD resources will already be up and running. With a more capable
cluster manager such as Pacemaker, it makes more sense to let the
cluster manager control DRBD resources. The timeout is specified in
seconds. The default value is 0, which stands for an infinite timeout.
Also see the <option>degr-wfc-timeout</option> parameter.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="quorum">
<term xml:id="quorum"><option>quorum <replaceable>value</replaceable></option>
</term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>quorum</secondary>
</indexterm> When activated, a cluster partition requires quorum
in order to modify the replicated data set. That means a node in
the cluster partition can only be promoted to primary if the
cluster partition has quorum. Every node with a disk directly
connected to the node that should be promoted counts.
If a primary node should execute a write request, but the
cluster partition has lost quorum, it will freeze IO or reject
the write request with an error (depending on the
<option>on-no-quorum</option> setting). Upon loosing quorum a primary
always invokes the <option>quorum-lost</option> handler. The handler is
intended for notification purposes, its return code is ignored.</para>
<para>The option's value might be set to <option>off</option>,
<option>majority</option>, <option>all</option> or a numeric value. If you
set it to a numeric value, make sure that the value is greater than half
of your number of nodes.
Quorum is a mechanism to avoid data divergence, it might be used instead
of fencing when there are more than two repicas. It defaults to
<option>off</option></para>
<para>If all missing nodes are marked as outdated, a partition always has
quorum, no matter how small it is. I.e. If you disconnect all secondary
nodes gracefully a single primary continues to operate. In the moment a
single secondary is lost, it has to be assumed that it forms a partition
with all the missing outdated nodes. In case my partition might
be smaller than the other, quorum is lost in this moment.</para>
<para>In case you want to allow permanently diskless nodes to
gain quorum it is recommendet to not use <option>majority</option> or
<option>all</option>. It is recommended to specify an absolute number,
since DBRD's heuristic to determine the complete number of diskfull
nodes in the cluster is unreliable.</para>
<para>The quorum implementation is available starting with the DRBD kernel
driver version 9.0.7.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="quorum-minimum-redundancy">
<term xml:id="quorum-minimum-redundancy"><option>quorum-minimum-redundancy <replaceable>value</replaceable></option>
</term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>quorum-minimum-redundancy</secondary>
</indexterm> This option sets the minimal required number of nodes with
an UpToDate disk to allow the partition to gain quorum. This is a different
requirement than the plain <option>quorum</option> option expresses.</para>
<para>The option's value might be set to <option>off</option>,
<option>majority</option>, <option>all</option> or a numeric value. If you
set it to a numeric value, make sure that the value is greater than half
of your number of nodes.</para>
<para>In case you want to allow permanently diskless nodes to
gain quorum it is recommendet to not use <option>majority</option> or
<option>all</option>. It is recommended to specify an absolute number,
since DBRD's heuristic to determine the complete number of diskfull
nodes in the cluster is unreliable.</para>
<para>This option is available starting with the DRBD kernel
driver version 9.0.10.</para>
</definition>
</drbdsetup_option>
<drbdsetup_option name="on-no-quorum">
<term xml:id="on-no-quorum"><option>on-no-quorum <group choice="req" rep="norepeat">
<arg choice="plain" rep="norepeat">io-error</arg>
<arg choice="plain" rep="norepeat">suspend-io</arg>
</group>
</option></term>
<definition>
<para><indexterm significance="normal">
<primary>drbd.conf</primary>
<secondary>quorum</secondary>
</indexterm> By default DRBD freezes IO on a device, that lost quorum.
By setting the <option>on-no-quorum</option> to <option>io-error</option> it
completes all IO operations with an error if quorum ist lost.</para>
<para>The <option>on-no-quorum</option> options is available starting with the DRBD kernel
driver version 9.0.8.</para>
</definition>
</drbdsetup_option>
</drbdsetup_options>
|