1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072
|
= The OCF Resource Agent Developer's Guide
== Introduction
This document is to serve as a guide and reference for all developers,
maintainers, and contributors working on OCF (Open Cluster Framework)
compliant cluster resource agents. It explains the anatomy and general
functionality of a resource agent, illustrates the resource agent API,
and provides valuable hints and tips to resource agent authors.
=== What is a resource agent?
A resource agent is an executable that manages a cluster resource. No
formal definition of a cluster resource exists, other than "anything a
cluster manages is a resource." Cluster resources can be as diverse as
IP addresses, file systems, database services, and entire virtual
machines -- to name just a few examples.
=== Who or what uses a resource agent?
Any Open Cluster Framework (OCF) compliant cluster management
application is capable of managing resources using the resource agents
described in this document. At the time of writing, two OCF compliant
cluster management applications exist for the Linux platform:
* _Pacemaker_, a cluster manager supporting both the Corosync and
Heartbeat cluster messaging frameworks. Pacemaker evolved out of the
Linux-HA project.
* _RGmanager_, the cluster manager bundled in Red Hat Cluster
Suite. It supports the Corosync cluster messaging framework
exclusively.
=== Which language is a resource agent written in?
An OCF compliant resource agent can be implemented in _any_
programming language. The API is not language specific. However, most
resource agents are implemented as shell scripts, which is why this
guide primarily uses example code written in shell language.
=== Is there a naming convention?
Yes! We have agreed to the following convention for resource agent
names: Please name resource agents using lower case letters, with
words separated by dashes (+example-agent-name+).
Existing agents may or may not follow this convention, but it is the
intention to make sure future agents follow this rule.
== API definitions
=== Environment variables
A resource agent receives all configuration information about the
resource it manages via environment variables. The names of these
environment variables are always the name of the resource parameter,
prefixed with +OCF_RESKEY_+. For example, if the resource has an +ip+
parameter set to +192.168.1.1+, then the resource agent will have
access to an environment variable +OCF_RESKEY_ip+ holding that value.
For any resource parameter that is not required to be set by the user
-- that is, its parameter definition in the resource agent metadata
does not specify +required="true"+ -- then the resource agent must
* Provide a reasonable default. This should be advertised in the
metadata. By convention, the resource agent uses a variable named
+OCF_RESKEY_<parametername>_default+ that holds this default.
* Alternatively, cater correctly for the value being empty.
In addition, the cluster manager may also support _meta_ resource
parameters. These do not apply directly to the resource configuration,
but rather specify _how_ the cluster resource manager is expected to manage
the resource. For example, the Pacemaker cluster manager uses the
+target-role+ meta parameter to specify whether the resource should be
started or stopped.
Meta parameters are passed into the resource agent in the
+OCF_RESKEY_CRM_meta_+ namespace, with any hypens converted to
underscores. Thus, the +target-role+ attribute maps to an environment
variable named +OCF_RESKEY_CRM_meta_target_role+.
The <<_script_variables>> section contains other system environment
variables.
=== Actions
Any resource agent must support one command-line argument which
specifies the action the resource agent is about to execute. The
following actions must be supported by any resource agent:
* +start+ -- starts the resource.
* +stop+ -- shuts down the resource.
* +monitor+ -- queries the resource for its state.
* +meta-data+ -- dumps the resource agent metadata.
In addition, resource agents may optionally support the following
actions:
* +promote+ -- turns a resource into the +Master+ role (Master/Slave
resources only).
* +demote+ -- turns a resource into the +Slave+ role (Master/Slave
resources only).
* +migrate_to+ and +migrate_from+ -- implement live migration of
resources.
* +validate-all+ -- validates a resource's configuration.
* +usage+ or +help+ -- displays a usage message when the resource
agent is invoked from the command line, rather than by the cluster
manager.
* +notify+ -- inform resource about changes in state of other clones.
* +status+ -- historical (deprecated) synonym for +monitor+.
=== Timeouts
Action timeouts are enforced outside the resource agent proper. It is
the cluster manager's responsibility to monitor how long a resource
agent action has been running, and terminate it if it does not meet
its completion deadline. Thus, resource agents need not themselves
check for any timeout expiry.
Resource agents can, however, _advise_ the user of sensible timeout
values (which, when correctly set, will be duly enforced by the
cluster manager). See <<_metadata,the following section>> for details
on how a resource agent advertises its suggested timeouts.
=== Metadata
Every resource agent must describe its own purpose and supported
parameters in a set of XML metadata. This metadata is used by cluster
management applications for on-line help, and resource agent man pages
are generated from it as well. The following is a fictitious set of
metadata from an imaginary resource agent:
[source,xml]
--------------------------------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="foobar" version="0.1">
<version>1.0</version>
<longdesc lang="en">
This is a fictitious example resource agent written for the
OCF Resource Agent Developers Guide.
</longdesc>
<shortdesc lang="en">Example resource agent
for budding OCF RA developers</shortdesc>
<parameters>
<parameter name="eggs" unique="0" required="1">
<longdesc lang="en">
Number of eggs, an example numeric parameter
</longdesc>
<shortdesc lang="en">Number of eggs</shortdesc>
<content type="integer"/>
</parameter>
<parameter name="superfrobnicate" unique="0" required="0">
<longdesc lang="en">
Enable superfrobnication, an example boolean parameter
</longdesc>
<shortdesc lang="en">Enable superfrobnication</shortdesc>
<content type="boolean" default="false"/>
</parameter>
<parameter name="datadir" unique="0" required="1">
<longdesc lang="en">
Data directory, an example string parameter
</longdesc>
<shortdesc lang="en">Data directory</shortdesc>
<content type="string"/>
</parameter>
</parameters>
<actions>
<action name="start" timeout="20" />
<action name="stop" timeout="20" />
<action name="monitor" timeout="20"
interval="10" depth="0" />
<action name="notify" timeout="20" />
<action name="reload" timeout="20" />
<action name="migrate_to" timeout="20" />
<action name="migrate_from" timeout="20" />
<action name="meta-data" timeout="5" />
<action name="validate-all" timeout="20" />
</actions>
</resource-agent>
--------------------------------------------------------------------------
The +resource-agent+ element, of which there must only be one per
resource agent, defines the resource agent +name+ and +version+. The
+version+ element specifies the OCF version standard the metadata complies
with.
The +longdesc+ and +shortdesc+ elements in +resource-agent+ provide a
long and short description of the resource agent's
functionality. While +shortdesc+ is a one-line description of what
the resource agent does and is usually used in terse listings,
+longdesc+ should give a full-blown description of the resource agent
in as much detail as possible.
The +parameters+ element describes the resource agent parameters, and
should hold any number of +parameter+ children -- one for each
parameter that the resource agent supports.
Every +parameter+ should, like the +resource-agent+ as a whole, come
with a +shortdesc+ and a +longdesc+, and also a +content+ child that
describes the parameter's expected content.
On the +content+ element, there may be four different attributes:
* +type+ describes the parameter type (+string+, +integer+, or
+boolean+). If unset, +type+ defaults to +string+.
* +required+ indicates whether setting the parameter is mandatory
(+required="true"+) or optional (+required="false"+).
* For optional parameters, it is customary to provide a sensible
default via the +default+ attribute.
* Finally, the +unique+ attribute (allowed values: +true+ or +false+)
indicates that a specific value must be unique across the cluster,
for this parameter of this particular resource type. For example, a
highly available floating IP address is declared +unique+ -- as that
one IP address should run only once throughout the cluster, avoiding
duplicates.
The +actions+ list defines the actions that the resource agent
advertises as supported.
Every +action+ should list its own +timeout+ value. This is a
hint to the user what _minimal_ timeout should be configured for the
action. This is meant to cater for the fact that some resources are
quick to start and stop (IP addresses or filesystems, for example),
some may take several minutes to do so (such as databases).
In addition, recurring actions (such as +monitor+) should also specify
a recommended minimum +interval+, which is the time between two
consecutive invocations of the same action. Like +timeout+, this value
does not constitute a default -- it is merely a hint for the user
which action interval to configure, at minimum.
== Return codes
For any invocation, resource agents must exit with a defined return
code that informs the caller of the outcome of the invoked
action. The return codes are explained in detail in the following
subsections.
=== +OCF_SUCCESS+ (0)
The action completed successfully. This is the expected return code
for any successful +start+, +stop+, +promote+, +demote+,
+migrate_from+, +migrate_to+, +meta_data+, +help+, and +usage+ action.
For +monitor+ (and its deprecated alias, +status+), however, a
modified convention applies:
* For primitive (stateless) resources, +OCF_SUCCESS+ from +monitor+
means that the resource is running. Non-running and gracefully
shut-down resources must instead return +OCF_NOT_RUNNING+.
* For master/slave (stateful) resources, +OCF_SUCCESS+ from +monitor+
means that the resource is running _in Slave mode_. Resources
running in Master mode must instead return +OCF_RUNNING_MASTER+, and
gracefully shut-down resources must instead return
+OCF_NOT_RUNNING+.
=== +OCF_ERR_GENERIC+ (1)
The action returned a generic error. A resource agent should use this
exit code only when none of the more specific error codes, defined
below, accurately describes the problem.
The cluster resource manager interprets this exit code as a _soft_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
+OCF_ERR_GENERIC+ in-place -- usually by restarting the resource on
the same node.
=== +OCF_ERR_ARGS+ (2)
The resource’s configuration is not valid on this machine. E.g. it
refers to a location not found on the node.
NOTE: The resource agent should not return this error when instructed
to perform an action that it does not support. Instead, under those
circumstances, it should return +OCF_ERR_UNIMPLEMENTED+.
=== +OCF_ERR_UNIMPLEMENTED+ (3)
The resource agent was instructed to execute an action that the agent
does not implement.
Not all resource agent actions are mandatory. +promote+, +demote+,
+migrate_to+, +migrate_from+, and +notify+, are all optional actions
which the resource agent may or may not implement. When a non-stateful
resource agent is misconfigured as a master/slave resource, for
example, then the resource agent should alert the user about this
misconfiguration by returning +OCF_ERR_UNIMPLEMENTED+ on the +promote+
and +demote+ actions.
=== +OCF_ERR_PERM+ (4)
The action failed due to insufficient permissions. This may be due to
the agent not being able to open a certain file, to listen on a
specific socket, to write to a directory, or similar.
The cluster resource manager interprets this exit code as a _hard_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
this error by restarting the resource on a different node (where the
permission problem may not exist).
=== +OCF_ERR_INSTALLED+ (5)
The action failed because a required component is missing on the node
where the action was executed. This may be due to a required binary
not being executable, or a vital configuration file being unreadable.
The cluster resource manager interprets this exit code as a _hard_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
this error by restarting the resource on a different node (where the
required files or binaries may be present).
=== +OCF_ERR_CONFIGURED+ (6)
The action failed because the user misconfigured the resource. For
example, the user may have configured an alphanumeric string for a
parameter that really should be an integer.
The cluster resource manager interprets this exit code as a _fatal_
error. Since this is a configuration error that is present
cluster-wide, it would make no sense to recover such a resource on a
different node, let alone in-place. When a resource fails with this
error, the cluster manager will attempt to shut down the resource, and
wait for administrator intervention.
=== +OCF_NOT_RUNNING+ (7)
The resource was found not to be running. This is an exit code that
may be returned by the +monitor+ action exclusively. Note that this
implies that the resource has either _gracefully_ shut down, or has
never been started.
If the resource is not running due to an error condition, the
+monitor+ action should instead return one of the +OCF_ERR_+ exit
codes or +OCF_FAILED_MASTER+.
=== +OCF_RUNNING_MASTER+ (8)
The resource was found to be running in the +Master+ role. This
applies only to stateful (Master/Slave) resources, and only to
their +monitor+ action.
Note that there is no specific exit code for "running in slave
mode". This is because their is no functional distinction between a
primitive resource running normally, and a stateful resource running
as a slave. The +monitor+ action of a stateful resource running
normally in the +Slave+ role should simply return +OCF_SUCCESS+.
=== +OCF_FAILED_MASTER+ (9)
The resource was found to have failed in the +Master+ role. This
applies only to stateful (Master/Slave) resources, and only to their
+monitor+ action.
The cluster resource manager interprets this exit code as a _soft_
error. This means that unless specifically configured otherwise, the
resource manager will attempt to recover a resource which failed with
+$OCF_FAILED_MASTER+ in-place -- usually by demoting, stopping,
starting and then promoting the resource on the same node.
== Resource agent structure
A typical (shell-based) resource agent contains standard structural
items, in the order as listed in this section. It describes the
expected behavior of a resource agent with respect to the various
actions it supports, using a fictitous resource agent named +foobar+
as an example.
=== Resource agent interpreter
Any resource agent implemented as a script must specify its
interpreter using standard "shebang" (+#!+) header syntax.
[source,bash]
--------------------------------------------------------------------------
#!/bin/sh
--------------------------------------------------------------------------
If a resource agent is written in shell, specifying the generic shell
interpreter (+#!/bin/sh+) is generally preferred, though not
required. Resource agents declared as +/bin/sh+ compatible must not
use constructs native to a specific shell (such as, for example,
+${!variable}+ syntax native to +bash+). It is advisable to
occasionally run such resource agents through a sanitization utility
such as +checkbashisms+.
It is considered a regression to introduce a patch that will make a
previously +sh+ compatible resource agent suitable only for +bash+,
+ksh+, or any other non-generic shell. It is, however, perfectly
acceptable for a new resource agent to explicitly define a specific
shell, such as +/bin/bash+, as its interpreter.
=== Author and license information
The resource agent should contain a comment listing the resource agent
author(s) and/or copyright holder(s), and stating the license that
applies to the resource agent:
[source,bash]
--------------------------------------------------------------------------
#
# Resource Agent for managing foobar resources.
#
# License: GNU General Public License (GPL)
# (c) 2008-2010 John Doe, Jane Roe,
# and Linux-HA contributors
--------------------------------------------------------------------------
When a resource agent refers to a license for which multiple versions
exist, it is assumed that the current version applies.
=== Initialization
Any shell resource agent should source the +ocf-shellfuncs+ function
library. With the syntax below, this is done in terms of
+$OCF_FUNCTIONS_DIR+, which -- for testing purposes, and also for
generating documentation -- may be overridden from the command line.
[source,bash]
--------------------------------------------------------------------------
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
--------------------------------------------------------------------------
=== Functions implementing resource agent actions
What follows next are the functions implementing the resource agent's
advertised actions. The individual actions are described in detail in
<<_resource_agent_actions>>.
=== Execution block
This is the part of the resource agent that actually executes when the
resource agent is invoked. It typically follows a fairly standard
structure:
[source,bash]
--------------------------------------------------------------------------
# Make sure meta-data and usage always succeed
case $__OCF_ACTION in
meta-data) foobar_meta_data
exit $OCF_SUCCESS
;;
usage|help) foobar_usage
exit $OCF_SUCCESS
;;
esac
# Anything other than meta-data and usage must pass validation
foobar_validate_all || exit $?
# Translate each action into the appropriate function call
case $__OCF_ACTION in
start) foobar_start;;
stop) foobar_stop;;
status|monitor) foobar_monitor;;
promote) foobar_promote;;
demote) foobar_demote;;
notify) foobar_notify;;
reload) ocf_log info "Reloading..."
foobar_start
;;
validate-all) ;;
*) foobar_usage
exit $OCF_ERR_UNIMPLEMENTED
;;
esac
rc=$?
# The resource agent may optionally log a debug message
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"
exit $rc
--------------------------------------------------------------------------
== Resource agent actions
Each action is typically implemented in a separate function or method
in the resource agent. By convention, these are usually named
+<agent>_<action>+, so the function implementing the +start+ action in
+foobar+ would be named +foobar_start()+.
As a general rule, whenever the resource agent encounters an error
that it is not able to recover, it is permitted to immediately exit,
throw an exception, or otherwise cease execution. Examples for this
include configuration issues, missing binaries, permission problems,
etc. It is not necessary to pass these errors up the call stack.
It is the cluster manager's responsibility to initiate the appropriate
recovery action based on the user's configuration. The resource agent
should not guess at said configuration.
=== +start+ action
When invoked with the +start+ action, the resource agent must start
the resource if it is not yet running. This means that the agent must
verify the resource's configuration, query its state, and then start
it only if it is not running. A common way of doing this would be to
invoke the +validate_all+ and +monitor+ function first, as in the
following example:
[source,bash]
--------------------------------------------------------------------------
foobar_start() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is already running, bail out early
if foobar_monitor; then
ocf_log info "Resource is already running"
return $OCF_SUCCESS
fi
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
...
# After the resource has been started, check whether it started up
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# start up within the defined timeout, the cluster manager will
# consider the start action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not started yet, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
=== +stop+ action
When invoked with the +stop+ action, the resource agent must stop the
resource, if it is running. This means that the agent must verify the
resource configuration, query its state, and then stop it only if it
is currently running. A common way of doing this would be to invoke
the +validate_all+ and +monitor+ function first. It is important to
understand that +stop+ is a force operation -- the resource agent must
do everything in its power to shut down, the resource, short of
rebooting the node or shutting it off. Consider the following example:
[source,bash]
--------------------------------------------------------------------------
foobar_stop() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
foobar_monitor
rc=$?
case "$rc" in
"$OCF_SUCCESS")
# Currently running. Normal, expected behavior.
ocf_log debug "Resource is currently running"
;;
"$OCF_RUNNING_MASTER")
# Running as a Master. Need to demote before stopping.
ocf_log info "Resource is currently running as Master"
foobar_demote || \
ocf_log warn "Demote failed, trying to stop anyway"
;;
"$OCF_NOT_RUNNING")
# Currently not running. Nothing to do.
ocf_log info "Resource is already stopped"
return $OCF_SUCCESS
;;
esac
# actually shut down the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
...
# After the resource has been stopped, check whether it shut down
# correctly. If the resource stops asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# shut down within the defined timeout, the cluster manager will
# consider the stop action failed
while foobar_monitor; do
ocf_log debug "Resource has not stopped yet, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
NOTE: The expected exit code for a successful stop operation is
+$OCF_SUCCESS+, _not_ +$OCF_NOT_RUNNING+.
IMPORTANT: A failed stop operation is a potentially dangerous
situation which the cluster manager will almost invariably try to
resolve by means of node fencing. In other words, the cluster manager
will forcibly evict from the cluster a node on which a stop operation
has failed. While this measure serves ultimately to protect data, it
does cause disruption to applications and their users. Thus, a
resource agent should make sure that it exits with an error only if
all avenues for proper resource shutdown have been exhausted.
=== +monitor+ action
The +monitor+ action queries the current status of a resource. It must
discern between three different states:
* resource is currently running (return +$OCF_SUCCESS+);
* resource has stopped gracefully (return +$OCF_NOT_RUNNING+);
* resource has run into a problem and must be considered failed
(return the appropriate +$OCF_ERR_+ code to indicate the nature of the
problem).
[source,bash]
--------------------------------------------------------------------------
foobar_monitor() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
ocf_run frobnicate --test
# This example assumes the following exit code convention
# for frobnicate:
# 0: running, and fully caught up with master
# 1: gracefully stopped
# any other: error
case "$?" in
0)
rc=$OCF_SUCCESS
ocf_log debug "Resource is running"
;;
1)
rc=$OCF_NOT_RUNNING
ocf_log debug "Resource is not running"
;;
*)
ocf_log err "Resource has failed"
exit $OCF_ERR_GENERIC
esac
return $rc
}
--------------------------------------------------------------------------
Stateful (master/slave) resource agents may use a more elaborate
monitoring scheme where they can provide "hints" to the cluster
manager identifying which instance is best suited to assume the
+Master+ role. <<_specifying_a_master_preference>> explains the
details.
NOTE: The cluster manager may invoke the +monitor+ action for a
_probe_, which is a test whether the resource is currently
running. Normally, the monitor operation would behave exactly the same
during a probe and a "real" monitor action. If a specific resource
does require special treatment for probes, however, the +ocf_is_probe+
convenience function is available in the OCF shell functions library
for that purpose.
=== +validate-all+ action
The +validate-all+ action tests for correct resource agent
configuration and a working environment. +validate-all+ should exit
with one of the following return codes:
* +$OCF_SUCCESS+ -- all is well, the configuration is valid and
usable.
* +$OCF_ERR_CONFIGURED+ -- the user has misconfigured the resource.
* +$OCF_ERR_INSTALLED+ -- the resource has possibly been configured
correctly, but a vital component is missing on the node where
+validate-all+ is being executed.
* +$OCF_ERR_PERM+ -- the resource is configured correctly and is not
missing any required components, but is suffering from a permission
issue (such as not being able to create a necessary file).
+validate-all+ is usually wrapped in a function that is not only
called when explicitly invoking the corresponding action, but also --
as a sanity check -- from just about any other function. Therefore,
the resource agent author must keep in mind that the function may be
invoked during the +start+, +stop+, and +monitor+ operations, and also
during probes.
Probes pose a separate challenge for validation. During a probe (when
the cluster manager may expect the resource _not_ to be running on the
node where the probe is executed), some required components may be
_expected_ to not be available on the affected node. For example, this
includes any shared data on storage devices not available for reading
during the probe. The +validate-all+ function may thus need to treat
probes specially, using the +ocf_is_probe+ convenience function:
[source,bash]
--------------------------------------------------------------------------
foobar_validate_all() {
# Test for configuration errors first
if ! ocf_is_decimal $OCF_RESKEY_eggs; then
ocf_log err "eggs is not numeric!"
exit $OCF_ERR_CONFIGURED
fi
# Test for required binaries
check_binary frobnicate
# Check for data directory (this may be on shared storage, so
# disable this test during probes)
if ! ocf_is_probe; then
if ! [ -d $OCF_RESKEY_datadir ]; then
ocf_log err "$OCF_RESKEY_datadir does not exist or is not a directory!"
exit $OCF_ERR_INSTALLED
fi
fi
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
=== +meta-data+ action
The +meta-data+ action dumps the resource agent metadata to standard
output. The output must follow the metadata format as specified in
<<_metadata>>.
[source,bash]
--------------------------------------------------------------------------
foobar_meta_data {
cat <<EOF
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="foobar" version="0.1">
<version>1.0</version>
<longdesc lang="en">
...
EOF
}
--------------------------------------------------------------------------
=== +promote+ action
The +promote+ action is optional. It must only be supported by
_stateful_ resource agents, which means agents that discern between
two distinct _roles_: +Master+ and +Slave+. +Slave+ is functionally
identical to the +Started+ state in a stateless resource agent. Thus,
while a regular (stateless) resource agent only needs to implement
+start+ and +stop+, a stateful resource agent must also support the
+promote+ action to be able to make a transition between the +Started+
(+Slave+) and +Master+ roles.
[source,bash]
--------------------------------------------------------------------------
foobar_promote() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# test the resource's current state
foobar_monitor
rc=$?
case "$rc" in
"$OCF_SUCCESS")
# Running as slave. Normal, expected behavior.
ocf_log debug "Resource is currently running as Slave"
;;
"$OCF_RUNNING_MASTER")
# Already a master. Unexpected, but not a problem.
ocf_log info "Resource is already running as Master"
return $OCF_SUCCESS
;;
"$OCF_NOT_RUNNING")
# Currently not running. Need to start before promoting.
ocf_log info "Resource is currently not running"
foobar_start
;;
*)
# Failed resource. Let the cluster manager recover.
ocf_log err "Unexpected error, cannot promote"
exit $rc
;;
esac
# actually promote the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --master-mode || exit $OCF_ERR_GENERIC
# After the resource has been promoted, check whether the
# promotion worked. If the resource promotion is asynchronous, the
# agent may spin on the monitor function here -- if the resource
# does not assume the Master role within the defined timeout, the
# cluster manager will consider the promote action failed.
while true; do
foobar_monitor
if [ $? -eq $OCF_RUNNING_MASTER ]; then
ocf_log debug "Resource promoted"
break
else
ocf_log debug "Resource still awaiting promotion"
sleep 1
fi
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
=== +demote+ action
The +demote+ action is optional. It must only be supported by
_stateful_ resource agents, which means agents that discern between
two distict _roles_: +Master+ and +Slave+. +Slave+ is functionally
identical to the +Started+ state in a stateless resource agent. Thus,
while a regular (stateless) resource agent only needs to implement
+start+ and +stop+, a stateful resource agent must also support the
+demote+ action to be able to make a transition between the +Master+
and +Started+ (+Slave+) roles.
[source,bash]
--------------------------------------------------------------------------
foobar_demote() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# test the resource's current state
foobar_monitor
rc=$?
case "$rc" in
"$OCF_RUNNING_MASTER")
# Running as master. Normal, expected behavior.
ocf_log debug "Resource is currently running as Master"
;;
"$OCF_SUCCESS")
# Alread running as slave. Nothing to do.
ocf_log debug "Resource is currently running as Slave"
return $OCF_SUCCESS
;;
"$OCF_NOT_RUNNING")
# Currently not running. Getting a demote action
# in this state is unexpected. Exit with an error
# and let the cluster manager recover.
ocf_log err "Resource is currently not running"
exit $OCF_ERR_GENERIC
;;
*)
# Failed resource. Let the cluster manager recover.
ocf_log err "Unexpected error, cannot demote"
exit $rc
;;
esac
# actually demote the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --unset-master-mode || exit $OCF_ERR_GENERIC
# After the resource has been demoted, check whether the
# demotion worked. If the resource demotion is asynchronous, the
# agent may spin on the monitor function here -- if the resource
# does not assume the Slave role within the defined timeout, the
# cluster manager will consider the demote action failed.
while true; do
foobar_monitor
if [ $? -eq $OCF_RUNNING_MASTER ]; then
ocf_log debug "Resource still demoting"
sleep 1
else
ocf_log debug "Resource demoted"
break
fi
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
=== +migrate_to+ action
The +migrate_to+ action can serve one of two purposes:
* Initiate a native _push_ type migration for the resource. In other
words, instruct the resource to move _to_ a specific node from the
node it is currently running on. The resource agent knows about its
destination node via the +$OCF_RESKEY_CRM_meta_migrate_target+ environment
variable.
* Freeze the resource in a _freeze/thaw_ (also known as
_suspend/resume_) type migration. In this mode, the resource does
not need any information about its destination node at this point.
The example below illustrates a push type migration:
[source,bash]
--------------------------------------------------------------------------
foobar_migrate_to() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is not running, bail out early
if ! foobar_monitor; then
ocf_log err "Resource is not running"
exit $OCF_ERR_GENERIC
fi
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --migrate \
--dest=$OCF_RESKEY_CRM_meta_migrate_target \
|| exit OCF_ERR_GENERIC
...
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
In contrast, a freeze/thaw type migration may implement its freeze
operation like this:
[source,bash]
--------------------------------------------------------------------------
foobar_migrate_to() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is not running, bail out early
if ! foobar_monitor; then
ocf_log err "Resource is not running"
exit $OCF_ERR_GENERIC
fi
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --freeze || exit OCF_ERR_GENERIC
...
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
=== +migrate_from+ action
The +migrate_from+ action can serve one of two purposes:
* Complete a native _push_ type migration for the resource. In other
words, check whether the migration has succeeded properly, and the
resource is running on the local node. The resource agent knows
about its the migration source via the
+$OCF_RESKEY_CRM_meta_migrate_source+ environment variable.
* Thaw the resource in a _freeze/thaw_ (also known as
_suspend/resume_) type migration. In this mode, the resource usually
not need any information about its source node at this point.
The example below illustrates a push type migration:
[source,bash]
--------------------------------------------------------------------------
foobar_migrate_from() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# After the resource has been migrated, check whether it resumed
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# run within the defined timeout, the cluster manager will
# consider the migrate_from action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not yet migrated, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
In contrast, a freeze/thaw type migration may implement its thaw
operation like this:
[source,bash]
--------------------------------------------------------------------------
foobar_migrate_from() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --thaw || exit OCF_ERR_GENERIC
# After the resource has been migrated, check whether it resumed
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# run within the defined timeout, the cluster manager will
# consider the migrate_from action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not yet migrated, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
=== +notify+ action
With notifications, instances of clones (and of master/slave
resources, which are an extended kind of clones) can inform each other
about their state. When notifications are enabled, certain actions on
any instance of a clone carries a +pre+ and +post+ notification.
List of actions that trigger notifications:
* start
* stop
* promote
* demote
The cluster manager invokes the +notify+ operation on _all_ clone
instances. For +notify+ operations, additional environment variables
are passed into the resource agent during execution:
* +$OCF_RESKEY_CRM_meta_notify_type+ -- the notification type (+pre+
or +post+)
* +$OCF_RESKEY_CRM_meta_notify_operation+ -- the operation (action)
that the notification is about (+start+, +stop+, +promote+, +demote+
etc.)
* +$OCF_RESKEY_CRM_meta_notify_start_uname+ -- node name of the node
where the resource is being started (+start+ notifications only)
* +$OCF_RESKEY_CRM_meta_notify_stop_uname+ -- node name of the node
where the resource is being stopped (+stop+ notifications only)
* +$OCF_RESKEY_CRM_meta_notify_master_uname+ -- node name of the node
where the resource currently _is in_ the Master role
* +$OCF_RESKEY_CRM_meta_notify_promote_uname+ -- node name of the node
where the resource currently _is being promoted to_ the Master role
(+promote+ notifications only)
* +$OCF_RESKEY_CRM_meta_notify_demote_uname+ -- node name of the node
where the resource currently _is being demoted to_ the Slave role
(+demote+ notifications only)
Notifications come in particularly handy for master/slave resources
using a "pull" scheme, where the master is a publisher and the slave a
subscriber. Since the master is obviously only available as such when
a promotion has occurred, the slaves can use a "pre-promote"
notification to configure themselves to subscribe to the right
publisher.
Likewise, the subscribers may want to unsubscribe from the publisher
after it has relinquished its master status, and a "post-demote"
notification can be used for that purpose.
Consider the example below to illustrate the concept.
[source,bash]
--------------------------------------------------------------------------
foobar_notify() {
local type_op
type_op="${OCF_RESKEY_CRM_meta_notify_type}-${OCF_RESKEY_CRM_meta_notify_operation}"
ocf_log debug "Received $type_op notification."
case "$type_op" in
'pre-promote')
ocf_run frobnicate --slave-mode \
--master=$OCF_RESKEY_CRM_meta_notify_promote_uname \
|| exit $OCF_ERR_GENERIC
;;
'post-demote')
ocf_run frobnicate --unset-slave-mode || exit $OCF_ERR_GENERIC
;;
esac
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
NOTE: A master/slave resource agent may support a _multi-master_
configuration, where there is possibly more than one master at any
given time. If that is the case, then the
+$OCF_RESKEY_CRM_meta_notify_*_uname+ variables may each contain a
space-separated lists of hostnames, rather than a single host name as
shown in the example. Under those circumstances the resource agent
would have to properly iterate over this list.
== Script variables
This section outlines variables typically available to resource agents,
primarily for convenience purposes. For additional variables
available while the agent is being executed, refer to
<<_environment_variables>> and <<_return_codes>>.
=== +$OCF_RA_VERSION_MAJOR+
The major version number of the resource agent API that the cluster
manager is currently using.
=== +$OCF_RA_VERSION_MINOR+
The minor version number of the resource agent API that the cluster
manager is currently using.
=== +$OCF_ROOT+
The root of the OCF resource agent hierarchy. This should never be
changed by a resource agent. This is usually +/usr/lib/ocf+.
=== +$OCF_FUNCTIONS_DIR+
The directory where the resource agents shell function library,
+ocf-shellfuncs+, resides. This is usually defined in terms of
+$OCF_ROOT+ and should never be changed by a resource agent. This
variable may, however, be overridden from the command line while
testing a new or modified resource agent.
=== +$OCF_EXIT_REASON_PREFIX+
Used as a prefix when printing error messages from the resource agent.
Script functions use this automaticly so no explicit use is required
for shell based scripts.
=== +$OCF_RESOURCE_INSTANCE+
The resource instance name. For primitive (non-clone, non-stateful)
resources, this is simply the resource name. For clones and stateful
resources, this is the primitive name, followed by a colon an the
clone instance number (such as +p_foobar:0+).
=== +$OCF_RESOURCE_TYPE+
The resource type of the current resource, e.g. IPaddr2.
=== +$OCF_RESOURCE_PROVIDER+
The resource provider, e.g. heartbeat. This may not be in all cluster
managers of Resource Agent API version 1.0.
=== +$__OCF_ACTION+
The currently invoked action. This is exactly the first command-line
argument that the cluster manager specifies when it invokes the
resource agent.
=== +$__SCRIPT_NAME+
The name of the resource agent. This is exactly the base name of the
resource agent script, with leading directory names removed.
=== +$HA_RSCTMP+
A temporary directory for use by resource agents. The system startup
sequence (on any LSB compliant Linux distribution) guarantees that
this directory is emptied on system startup, so this directory will
not contain any stale data after a node reboot.
== Convenience functions
=== Logging: +ocf_log+
Resource agents should use the +ocf_log+ function for logging
purposes. This convenient logging wrapper is invoked as follows:
[source,bash]
--------------------------------------------------------------------------
ocf_log <severity> "Log message"
--------------------------------------------------------------------------
It supports following the following severity levels:
* +debug+ -- for debugging messages. Most logging configurations
suppress this level by default.
* +info+ -- for informational messages about the agent's behavior or
status.
* +warn+ -- for warnings. This is for any messages which reflect
unexpected behavior that does _not_ constitute an unrecoverable
error.
* +err+ -- for errors. As a general rule, this logging level should
only be used immediately prior to an +exit+ with the appropriate
error code.
* +crit+ -- for critical errors. As with +err+, this logging level
should not be used unless the resource agent also exits with an
error code. Very rarely used.
=== Testing for binaries: +have_binary+ and +check_binary+
A resource agent may need to test for the availability of a specific
executable. The +have_binary+ convenience function comes in handy
here:
[source,bash]
--------------------------------------------------------------------------
if ! have_binary frobnicate; then
ocf_log warn "Missing frobnicate binary, frobnication disabled!"
fi
--------------------------------------------------------------------------
If a missing binary is a fatal problem for the resource, then the
+check_binary+ function should be used:
[source,bash]
--------------------------------------------------------------------------
check_binary frobnicate
--------------------------------------------------------------------------
Using +check_binary+ is a shorthand method for testing for the
existence (and executability) of the specified binary, and exiting
with +$OCF_ERR_INSTALLED+ if it cannot be found or executed.
NOTE: Both +have_binary+ and +check_binary+ honor +$PATH+ when the
binary to test for is not specified as a full path. It is usually wise
to _not_ test for a full path, as binary installations path may vary
by distribution or user policy.
=== Executing commands and capturing their output: +ocf_run+
Whenever a resource agent needs to execute a command and capture its
output, it should use the +ocf_run+ convenience function, invoked as
in this example:
[source,bash]
--------------------------------------------------------------------------
ocf_run frobnicate --spam=eggs || exit $OCF_ERR_GENERIC
--------------------------------------------------------------------------
With the command specified above, the resource agent will invoke
+frobnicate --spam=eggs+ and capture its output and
exit code. If the exit code is nonzero (indicating an error),
+ocf_run+ logs the command output with the +err+ logging severity, and
the resource agent subsequently exits. If the exit code is zero
(indicating success), any command output will be logged with the +info+
logging severity.
If the resource agent wishes to ignore the output of a successful
command execution, it can use the +-q+ flag with +ocf_run+. In the
example below, +ocf_run+ will only log output if the command exit code
is nonzero.
[source,bash]
--------------------------------------------------------------------------
ocf_run -q frobnicate --spam=eggs || exit $OCF_ERR_GENERIC
--------------------------------------------------------------------------
Finally, if the resource agent wants to log the output of a command
with a nonzero exit code with a severity _other_ than error, it may do
so by adding the +-info+ or +-warn+ option to +ocf_run+:
[source,bash]
--------------------------------------------------------------------------
ocf_run -warn frobnicate --spam=eggs
--------------------------------------------------------------------------
=== Locks: +ocf_take_lock+ and +ocf_release_lock_on_exit+
Occasionally, there may be different resources of the same type in a
cluster configuration that should not execute actions in
parallel. When a resource agent needs to guard against parallel
execution on the same machine, it can use the +ocf_take_lock+ and
+ocf_release_lock_on_exit+ convenience functions:
[source,bash]
--------------------------------------------------------------------------
LOCKFILE=${HA_RSCTMP}/foobar
ocf_release_lock_on_exit $LOCKFILE
foobar_start() {
...
ocf_take_lock $LOCKFILE
...
}
--------------------------------------------------------------------------
+ocf_take_lock+ attempts to acquire the designated +$LOCKFILE+. When
it is unavailable, it sleeps a random amount of time between 0 and 1
seconds, and retries. +ocf_release_lock_on_exit+ releases the lock
file when the agent exits (for any reason).
=== Testing for numerical values: +ocf_is_decimal+
Specifically for parameter validation, it can be helpful to test
whether a given value is numeric. The +ocf_is_decimal+ function exists
for that purpose:
--------------------------------------------------------------------------
foobar_validate_all() {
if ! ocf_is_decimal $OCF_RESKEY_eggs; then
ocf_log err "eggs is not numeric!"
exit $OCF_ERR_CONFIGURED
fi
...
}
--------------------------------------------------------------------------
=== Testing for boolean values: +ocf_is_true+
When a resource agent defines a boolean parameter, the value
for this parameter may be specified by the user as +0+/+1+,
+true+/+false+, or +on+/+off+. Since it is tedious to test for all
these values from within the resource agent, the agent should instead
use the +ocf_is_true+ convenience function:
[source,bash]
--------------------------------------------------------------------------
if ocf_is_true $OCF_RESKEY_superfrobnicate; then
ocf_run frobnicate --super
fi
--------------------------------------------------------------------------
NOTE: If +ocf_is_true+ is used against an empty or non-existant
variable, it always returns an exit code of +1+, which is equivalent
to +false+.
=== Version comparison: +ocf_version_cmp+
A resource agent may want to check the version of software
installed. +ocf_version_cmp+ takes care of all the necessary
details.
The return codes are
* +0+ -- the first version is smaller (earlier) than the second
* +1+ -- the two versions are equal
* +2+ -- the first version is greater (later) than the second
* +3+ -- one of arguments is not recognized as a version string
The versions are allowed to contain digits, dots, and dashes.
[source,bash]
--------------------------------------------------------------------------
local v=`gooey --version`
ocf_version_cmp "$v" 12.0.8-1
case $? in
0) ocf_log err "we do not support version $v, it is too old"
exit $OCF_ERR_INSTALLED
;;
[12]) ;; # we can work with versions >= 12.0.8-1
3) ocf_log err "gooey produced version <$v>, too funky for me"
exit $OCF_ERR_INSTALLED
;;
esac
--------------------------------------------------------------------------
=== Pseudo resources: +ha_pseudo_resource+
"Pseudo resources" are those where the resource agent in fact does not
actually start or stop something akin to a runnable process, but
merely executes a single action and then needs some form of tracing
whether that action has been executed or not. The +portblock+ resource
agent is an example of this.
Resource agents for pseudo resources can use a convenience function,
+ha_pseudo_resource+, which makes use of _tracking files_ to keep tabs
on the status of a resource. If +foobar+ was designed to manage a
pseudo resource, then its +start+ action could look like this:
[source,bash]
--------------------------------------------------------------------------
foobar_start() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is already running, bail out early
if foobar_monitor; then
ocf_log info "Resource is already running"
return $OCF_SUCCESS
fi
# start the pseudo resource
ha_pseudo_resource ${OCF_RESOURCE_INSTANCE} start
# After the resource has been started, check whether it started up
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# start up within the defined timeout, the cluster manager will
# consider the start action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not started yet, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
--------------------------------------------------------------------------
== Conventions
This section contains a collection of conventions that have emerged in
the resource agent repositories over the years. Following these
conventions is by no means mandatory for resource agent authors, but
it is a good idea based on the
http://en.wikipedia.org/wiki/Principle_of_least_surprise[Principle of
Least Surprise] -- resource agents following these conventions will be
easier to understand, review, and use than those that do not.
=== Well-known parameter names
Several parameter names are supported by a number of resource
agents. For new resource agents, following these examples is generally
a good idea:
* +binary+ -- the name of a binary that principally manages the
resource, such as a server daemon
* +config+ -- the full path to a configuration file
* +pid+ -- the full path to a file holding a process ID (PID)
* +log+ -- the full path to a log file
* +socket+ -- the full path to a UNIX socket that the resource manages
* +ip+ -- an IP address that a daemon binds to
* +port+ -- a TCP or UDP port that a daemon binds to
Needless to say, resource agents should only implement any of these
parameters if they are sensible to use in the agent's context.
=== Parameter defaults
Defaults for resource agent parameters should be set by initializing
variables with the suffix +_default+:
[source,bash]
--------------------------------------------------------------------------
# Defaults
OCF_RESKEY_superfrobnicate_default=0
: ${OCF_RESKEY_superfrobnicate=${OCF_RESKEY_superfrobnicate_default}}
--------------------------------------------------------------------------
NOTE: The resource agent should make sure that it sets a default for
any parameter not marked as +required+ in the metadata.
=== Honoring +PATH+ for binaries
When a resource agent supports a parameter designed to hold the name
of a binary (such as a daemon, or a client utility for querying
status), then that parameter should honor the +PATH+ environment
variable. Do not supply full paths. Thus, the following approach:
[source,bash]
--------------------------------------------------------------------------
# Good example -- do it this way
OCF_RESKEY_frobnicate_default="frobnicate"
: ${OCF_RESKEY_frobnicate="${OCF_RESKEY_frobnicate_default}"}
--------------------------------------------------------------------------
is much preferred over specifying a full path, as shown here:
[source,bash]
--------------------------------------------------------------------------
# Bad example -- avoid if you can
OCF_RESKEY_frobnicate_default="/usr/local/sbin/frobnicate"
: ${OCF_RESKEY_frobnicate="${OCF_RESKEY_frobnicate_default}"}
--------------------------------------------------------------------------
This rule holds for defaults, as well.
== Special considerations
=== Licensing
Whenever possible, resource agent contributors are _encouraged_ to use
the GNU General Public License (GPL), version 2 and later, for any new
resource agents. The shell functions library does not strictly mandate
this, however, as it is licensed under the GNU Lesser General Public
License (LGPL), version 2.1 and later (so it can be used by non-GPL
agents).
The resource agent _must_ explicitly state its own license in the
agent source code.
=== Locale settings
When sourcing +ocf-shellfuncs+ as explained in <<_initialization>>,
any resource agent automatically sets +LANG+ and +LC_ALL+ to the +C+
locale. Resource agents can thus expect to always operate in the +C+
locale, and need not reset +LANG+ or any of the +LC_+ environment
variables themselves.
=== Testing for running processes
For testing whether a particular process (with a known process ID) is
currently running, a frequently found method is to send it a +0+
signal and catch errors, similar to this example:
[source,bash]
--------------------------------------------------------------------------
if kill -s 0 `cat $daemon_pid_file`; then
ocf_log debug "Process is currently running"
else
ocf_log warn "Process is dead, removing pid file"
rm -f $daemon_pid_file
if
--------------------------------------------------------------------------
IMPORTANT: An approach far superior to this example is to instead test
the _functionality_ of the daemon by connecting to it with a client
process, as shown in the example in
<<_literal_monitor_literal_action>>.
=== Specifying a master preference
Stateful (master/slave) resources must set their own _master
preference_ -- they can thus provide hints to the cluster manager
which is the the best instance to promote to the +Master+ role.
IMPORTANT: It is acceptable for multiple instances to have identical
positive master preferences. In that case, the cluster resource
manager will automatically select a resource agent to
promote. However, if _all_ instances have the (default) master score
of zero, the cluster manager will not promote any instance at
all. Thus, it is crucial that at least one instance has a positive
master score.
For this purpose, +crm_master+ comes in handy. This convenience
wrapper around the +crm_attribute+ sets a node attribute named
+master-<<_literal_ocf_resource_instance_literal,$OCF_RESOURCE_INSTANCE>>+
for the node it is being executed on, and fills this attribute with
the specified value. The cluster manager is then expected to translate
this into a promotion score for the corresponding instance, and base
its promotion preference on that score.
Stateful resource agents typically execute +crm_master+ during the
<<_literal_monitor_literal_action,+monitor+>> and/or
<<_literal_notify_literal_action,+notify+>> action.
The following example assumes that the +foobar+ resource agent can
test the application's status by executing a binary that returns
certain exit codes based on whether
* the resource is either in the master role, or is a slave that is
fully caught up with the master (at any rate, it has current data),
or
* the resource is in the slave role, but through some form of
asynchronous replication has "fallen behind" the master, or
* the resource has gracefully stopped, or
* the resource has unexpectedly failed.
[source,bash]
--------------------------------------------------------------------------
foobar_monitor() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
ocf_run frobnicate --test
# This example assumes the following exit code convention
# for frobnicate:
# 0: running, and fully caught up with master
# 1: gracefully stopped
# 2: running, but lagging behind master
# any other: error
case "$?" in
0)
rc=$OCF_SUCCESS
ocf_log debug "Resource is running"
# Set a high master preference. The current master
# will always get this, plus 1. Any current slaves
# will get a high preference so that if the master
# fails, they are next in line to take over.
crm_master -l reboot -v 100
;;
1)
rc=$OCF_NOT_RUNNING
ocf_log debug "Resource is not running"
# Remove the master preference for this node
crm_master -l reboot -D
;;
2)
rc=$OCF_SUCCESS
ocf_log debug "Resource is lagging behind master"
# Set a low master preference: if the master fails
# right now, and there is another slave that does
# not lag behind the master, its higher master
# preference will win and that slave will become
# the new master
crm_master -l reboot -v 5
;;
*)
ocf_log err "Resource has failed"
exit $OCF_ERR_GENERIC
esac
return $rc
}
--------------------------------------------------------------------------
== Testing resource agents
This section discusses automated testing for resource agents. Testing
is a vital aspect of development; it is crucial both for creating new
resource agents, and for modifying existing ones.
=== Testing with +ocf-tester+
The resource agents repository (and hence, any installed resource
agents package) contains a utility named +ocf-tester+. This shell
script allows you to conveniently and easily test the functionality of
your resource agent.
+ocf-tester+ is commonly invoked, as +root+, like this:
--------------------------------------------------------------------------
ocf-tester -n <name> [-o <param>=<value> ... ] <resource agent>
--------------------------------------------------------------------------
* +<name>+ is an arbitrary resource name.
* You may set any number of +<param>=<value>+ with the +-o+ option,
corresponding to any resource parameters you wish to set for
testing.
* +<resource agent>+ is the full path to your resource agent.
When invoked, +ocf-tester+ executes all mandatory actions and enforces
action behavior as explained in <<_resource_agent_actions>>.
It also tests for optional actions. Optional actions must behave as
expected when advertised, but do not cause +ocf-tester+ to flag an
error if not implemented.
IMPORTANT: +ocf-tester+ does not initiate "dry runs" of actions, nor
does it create resource dummies of any kind. Instead, it exercises the
actual resource agent as-is, whether that may include opening and
closing databases, mounting file systems, starting or stopping virtual
machines, etc. Use with care.
For example, you could run +ocf-tester+ on the +foobar+ resource agent
as follows:
--------------------------------------------------------------------------
# ocf-tester -n foobartest \
-o superfrobnicate=true \
-o datadir=/tmp \
/home/johndoe/ra-dev/foobar
Beginning tests for /home/johndoe/ra-dev/foobar...
* Your agent does not support the notify action (optional)
* Your agent does not support the reload action (optional)
/home/johndoe/ra-dev/foobar passed all tests
--------------------------------------------------------------------------
If the resource agent exhibits some difficult to grasp behaviour,
which is typically the case with just developed software, there
are +-v+ and +-d+ options to dump more output. If that does not
help, instruct +ocf-tester+ to trace the resource agent with
+-X+ (make sure to redirect output to a file, unless you are a
really fast reader).
=== Testing with +ocft+
+ocft+ is a testing tool for resource agents. The main difference
to +ocf-tester+ is that +ocft+ can automate creating complex
testing environments. That includes package installation and
arbitrary shell scripting.
==== +ocft+ components
+ocft+ consists of the following components:
* A test case generator (+/usr/sbin/ocft+) -- generates shell
scripts from test case configuration files
* Configuration files (+/usr/share/resource-agents/ocft/configs/+) --
a configuration file contains environment setup and test cases
for one resource agent
* The testing scripts are stored in +/var/lib/resource-agents/ocft/cases/+,
but normally there is no need to inspect them
==== Customizing the testing environment
+ocft+ modifies the runtime environment of the resource agent
either by changing environment variables (through the interface
defined by OCF) or by running ad-hoc shell scripts which can for
instance change permissions of a file or unmount a file system.
==== How to test
You need to know the software (resource) you want to test. Draw a
sketch of all interesting scenarios, with all expected and
unexpected conditions and how the resource agent should react to
them. Then you need to encode these conditions and the expected
outcomes as +ocft+ test cases. Running ocft is then simple:
---------------------------------------
# ocft make <RA>
# ocft test <RA>
---------------------------------------
The first subcommand generates the scripts for your test cases
whereas the second runs them and checks the outcome.
==== +ocft+ configuration file syntax
There are four top level options each of which can contain
one or more sub-options.
===== +CONFIG+ (top level option)
This option is global and influences every test case.
** +AgentRoot+ (sub-option)
---------------------------------------
AgentRoot /usr/lib/ocf/resource.d/xxx
---------------------------------------
Normally, we assume that the resource agent lives under the
+heartbeat+ provider. Use `AgentRoot` to test agent which is
distributed by another vendor.
** +InstallPackage+ (sub-option)
---------------------------------------
InstallPackage package [package2 [...]]
---------------------------------------
Install packages necessary for testing. The installation is
skipped if the packages have already been installed.
** 'HangTimeout' (sub-option)
---------------------------------------
HangTimeout secs
---------------------------------------
The maximum time allowed for a single RA action. If this timer
expires, the action is considered as failed.
===== +SETUP-AGENT+ (top level option)
---------------------------------------
SETUP-AGENT
bash commands
---------------------------------------
If the RA needs to be initialized before testing, you can put
bash code here for that purpose. The initialization is done only
once. If you need to reinitialize then delete the
+/tmp/.[AGENT_NAME]_set+ stamp file.
===== +CASE+ (top level option)
---------------------------------------
CASE "description"
---------------------------------------
This is the main building block of the test suite. Each test
case is to be described in one +CASE+ top level option.
One case consists of several suboptions typically followed by the
+RunAgent+ suboption.
** +Var+ (sub-option)
---------------------------------------
Var VARIABLE=value
---------------------------------------
It is to set up an environment variable of the resource agent. They
usually appear to be OCF_RESKEY_xxx. One point is to be noted is there
is no blank by both sides of "=".
** +Unvar+ (sub-option)
---------------------------------------
Unvar VARIABLE [VARIABLE2 [...]]
---------------------------------------
Remove the environment variable.
** +Include+ (sub-option)
---------------------------------------
Include macro_name
---------------------------------------
Include statements in 'macro_name'. See below for description of
+CASE-BLOCK+.
** +Bash+ (sub-option)
---------------------------------------
Bash bash_codes
---------------------------------------
This option is to set up the environment of OS, where you can insert
BASH code to customize the system randomly. Note, do not cause
unrecoverable consequences to the system.
** +BashAtExit+ (sub-option)
---------------------------------------
BashAtExit bash_codes
---------------------------------------
This option is to recover the OS environment in order to run another
test case correctly. Of cause you can use 'Bash' option to recover
it. However, if mistakes occur in the process, the script will quit
directly instead of running your recovery codes. If it happens, you
ought to use BashAtExit which can restore the system environment
before you quit.
** +RunAgent+ (sub-option)
---------------------------------------
RunAgent cmd [ret_value]
---------------------------------------
This option is to run resource agent. "cmd" is the parameter of the
resource agent, such as "start, status, stop ...". The second
parameter is optional. It will compare the actual returned value with
the expected value when the script has run recourse agent. If
differs, bugs will be found.
It is also possible to execute a suboption on a remote host
instead of locally. The protocol used is ssh and the command is
run in the background. Just add the +@<ipaddr>+ suffix to the
suboption name. For instance:
---------------------------------------
Bash@192.168.1.100 date
---------------------------------------
would run the date program. Remote commands are run in
background.
NB: Not clear how can ssh be automated as we don't know in
advance the environment. Perhaps use "well-known" host names such
as "node2"? Also, if the command runs in the background, it's not
clear how is the exit code checked. Finally, does Var@node make
sense? Or is the current environment somehow copied over? We
probably need an example here.
Need examples in general.
===== +CASE-BLOCK+ (top level option)
---------------------------------------
CASE-BLOCK macro_name
---------------------------------------
The +CASE-BLOCK+ option defines a macro which can be +Include+d
in any +CASE+. All +CASE+ suboptions are valid in +CASE-BLOCK+.
== Installing and packaging resource agents
This section discusses what to do with your resource agent once it is
done and tested -- where to install it, and how to include it in either
your own application package or in the Linux-HA resource agents
repository.
=== Installing resource agents
If you choose to include your resource agent in your own project, make
sure it installs into the correct location. Resource agents should
install into the +/usr/lib/ocf/resource.d/<provider>+ directory, where
+<provider>+ is the name of your project or any other name you wish to
identify the resource agent with.
For example, if your +foobar+ resource agent is being packaged as part
of a project named +fortytwo+, then the correct full path to your
resource agent would be
+/usr/lib/ocf/resource.d/fortytwo/foobar+. Make sure your resource
agent installs with +0755+ (+-rwxr-xr-x+) permission bits.
When installed this way, OCF-compliant cluster resource managers will
be able to properly identify, parse, and execute your resource
agent. The Pacemaker cluster manager, for example, would map the
above-mentioned installation path to the +ocf:fortytwo:foobar+
resource type identifier.
=== Packaging resource agents
When you package resource agents as part of your own project, you
should apply the considerations outlined in this section.
NOTE: If you instead prefer to submit your resource agent to the
Linux-HA resource agents repository, see
<<_submitting_resource_agents>> for information on doing so.
==== RPM packaging
It is recommended to put your OCF resource agent(s) in an RPM
sub-package, with the name +<toppackage>-resource-agents+. Ensure that
the package owns its provider directory, and depends on the upstream
+resource-agents+ package which lays out the directory hierarchy and
provides convenience shell functions. An example RPM spec snippet is
given below:
--------------------------------------------------------------------------
%package resource-agents
Summary: OCF resource agent for Foobar
Group: System Environment/Base
Requires: %{name} = %{version}-%{release}, resource-agents
%description resource-agents
This package contains the OCF-compliant resource agents for Foobar.
%files resource-agents
%defattr(755,root,root,-)
%dir %{_prefix}/lib/ocf/resource.d/fortytwo
%{_prefix}/lib/ocf/resource.d/fortytwo/foobar
--------------------------------------------------------------------------
NOTE: If an RPM spec file contains a +%package+ declaration, then RPM
considers this a sub-package which inherits top-level fields such as
+Name+, +Version+, +License+, etc. Sub-packages have the top-level
package name automatically prepended to their own name. Thus the snippet
above would create a sub-package named +foobar-resource-agents+
(presuming the package +Name+ is +foobar+).
==== Debian packaging
For Debian packages, like for <<_rpm_packaging,RPMs>>, it is
recommended to create a separate package holding your resource agents,
which then should depend on the +cluster-agents+ package.
NOTE: This section assumes that you are packaging with +debhelper+.
An example +debian/control+ snippet is given below:
--------------------------------------------------------------------------
Package: foobar-cluster-agents
Priority: extra
Architecture: all
Depends: cluster-agents
Description: OCF-compliant resource agents for Foobar
--------------------------------------------------------------------------
You will also create a separate +.install+ file. Sticking with the
example of installing the +foobar+ resource agent as a sub-package of
+fortytwo+, the +debian/fortytwo-cluster-agents.install+ file could
consist of the following content:
--------------------------------------------------------------------------
usr/lib/ocf/resource.d/fortytwo/foobar
--------------------------------------------------------------------------
=== Submitting resource agents
If you choose not to bundle your resource agent with your own package,
but instead wish to submit it to the upstream resource agent
repository hosted on
https://github.com/ClusterLabs/resource-agents[the ClusterLabs
repository on GitHub], please follow the steps outlined in this section.
Create a fork of the
https://github.com/ClusterLabs/resource-agents[upstream repository] and
clone it with the following commands:
--------------------------------------------------------------------------
git clone git://github.com/<your-username>/resource-agents
git remote add upstream git@github.com:ClusterLabs/resource-agents.git
git checkout -b <new-branch>
--------------------------------------------------------------------------
Then, copy your resource agent into the +heartbeat+ subdirectory:
--------------------------------------------------------------------------
cd resource-agents/heartbeat
cp /path/to/your/local/copy/of/foobar .
chmod 0755 foobar
cd ..
--------------------------------------------------------------------------
Next, modify the +Makefile.am+ file in +resource-agents/heartbeat+ and
add your new resource agent to the +ocf_SCRIPTS+ list. This will make
sure the agent is properly installed.
Lastly, open Makefile.am in +resource-agents/doc/man+ and add
+ocf_heartbeat_<name>.7+ to the +man_MANS+ variable. This will
automatically generate a resource agent manual page from its metadata,
and then install that man page into the correct location.
Now, add your new resource agents, and the two modifications to the
Makefiles, to your changeset:
--------------------------------------------------------------------------
git add heartbeat/foobar
git add heartbeat/Makefile.am
git add doc/man/Makefile.am
git commit
--------------------------------------------------------------------------
In your commit message, be sure to include a meaningful description,
for example:
--------------------------------------------------------------------------
High: foobar: new resource agent
This new resource agent adds functionality to manage a foobar service.
It supports being configured as a primitive or as a master/slave set,
and also optionally supports superfrobnication.
--------------------------------------------------------------------------
Now push the patch set to GitHub:
--------------------------------------------------------------------------
git push
--------------------------------------------------------------------------
Create a Pull Request (PR) on Github that will be reviewed by the
upstream developers.
Once your new resource agent has been accepted for merging, one of the
upstream developers will Merge the Pull Request into the upstream
repository. At that point, you can update your main branch from
upstream, and remove your own branch.
--------------------------------------------------------------------------
git checkout main
git fetch upstream
git merge upstream/main
git branch -D <branch>
--------------------------------------------------------------------------
=== Maintaining resource agents
If you maintain a specific resource agent, or you are making repeated
contributions to the codebase, it's usually a good idea to maintain
your own _fork_ of the +ClusterLabs/resource-agents+ repository on
GitHub.
To do so,
* https://github.com/signup[Create a GitHub account] if you do not
have one already.
* http://help.github.com/fork-a-repo/[Fork] the
https://github.com/ClusterLabs/resource-agents[+resource-agents+
repository].
* Clone your personal fork into a local working copy.
As you work on resource agents, *please* commit early, and commit
often. You can always fold commits later with +git rebase -i+.
Once you have made a number of changes that you would like others to
review, push them to your GitHub fork and send a post to the
+linux-ha-dev+ mailing list pointing people to it.
After the review is done, fix up your tree with any requested changes,
and then issue a pull request. There are two ways of doing so:
* You can use the +git request-pull+ utility to get a pre-populated
email skeleton summarizing your changesets. Add any information you
see fit, and send it to the list. It is a good idea to prefix your
email subject with +[GIT PULL]+ so upstream maintainers can pick the
message out easily.
* You can also issue a pull request directly on GitHub. GitHub
automatically notifies upstream maintainers about new pull requests
by email. Please refer to
http://help.github.com/send-pull-requests/[github:help] for details
on initiating pull requests.
|