1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167
|
.. index::
single: Glossary
.. _glossary:
Glossary
********
.. glossary::
:sorted:
aborted
Is a :term:`node` :term:`status`.
When the :term:`ECF_JOB_CMD` fails or the :term:`job file` sends a :term:`ecflow_client` --abort :term:`child command`, then
the task is placed into a aborted state.
active
Is a :term:`node` :term:`status`.
If :term:`job creation` was successful, and :term:`job file` has started, then the :term:`ecflow_client` --init
:term:`child command` is received by the :term:`ecflow_server` and the :term:`task` is placed into a active state
autocancel
autocancel is a way to automatically delete a :term:`node` which has completed.
The delete may be delayed by an amount of time in hours and minutes or
expressed in days. Any node may have a single autocancel attribute.
If the auto cancelled node is referenced in the :term:`trigger` expression of other nodes
it may leave the node waiting. This can be solved by making sure the :term:`trigger`
expression also checks for the :term:`unknown` state. i.e.:
.. code-block:: shell
trigger node_to_cancel == complete or node_to_cancel == unknown
This guards against the 'node_to_cancel' being undefined or deleted
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Autocancel`, :py:class:`ecflow.Node.add_autocancel`
* - :ref:`grammar`
- :token:`autocancel`
aviso
An aviso is an attribute of a :term:`Node<node>` (typically a :term:`Task<task>`),
and creates a dependency on an external Aviso server.
A :term:`Node<node>` with an aviso attribute is held from executing until a
notification matching the configured listener is received. When a
matching notification is received, the node then allowed to execute
following a behaviour similar to :term:`trigger` or time dependency e.g.
:term:`cron`).
`Only one aviso attribute is allowed per node`, and each attribute is
defined by the following properties:
- :code:`name`, an identifier
- :code:`listener`, the configuration for the Aviso listener
- :code:`url`, the base location of the Aviso server
- :code:`schema`, the location of the Aviso schema used to evaluate the notifications
- :code:`polling`, the value (in seconds) used to periodically contact the Aviso server
- :code:`auth`, the location to the Aviso authentication credentials file
.. note::
The `listener` parameter is expected to be a valid single line JSON string, enclosed in single quotes.
The value of the properties :code:`url`, :code:`schema`, :code:`polling`,
and :code:`auth` can be composed of :term:`Variables<variable>`. When
these properties are not provided, the following default values are used:
- :code:`%ECF_AVISO_URL%`, for :code:`url`
- :code:`%ECF_AVISO_SCHEMA%`, for :code:`schema`
- :code:`%ECF_AVISO_POLLING%`, for :code:`polling`
- :code:`%ECF_AVISO_AUTH%`, for :code:`auth`
.. important::
The variables :code:`ECF_AVISO_*` are not automatically provided at
server level, and must be defined at :term:`Suite<suite>` level by
the user.
Each aviso attribute implies that a background thread is spawned whenever
the associated :term:`node` is (re)queued. This independent background thread,
responsible for polling the Aviso server and periodically processing the latest notifications,
uses the configuriguration available when the associated task is queued.
.. note::
If any variables provinding the configuration are updated, the Aviso configuration
can be reloaded (without unqueuing the Task) by issuing an Alter change command with
the value :code:`reload` to the relevant Aviso attribute.
The authentication credentials file is expected to be in JSON format, following
the `ECMWF Web API <https://www.ecmwf.int/en/computing/software/ecmwf-web-api>`_
(this is conventionally stored in a file located at `$HOME/.ecmwfapirc`):
.. code-block:: json
{
"url" : "https://api.ecmwf.int/v1",
"key" : "<your-api-key>",
"email" : "<your-email>"
}
Only the fields :code:`url`, :code:`key`, and :code:`email` are required; any additional fields are ignored.
.. important::
If :code:`%ECF_AVISO_AUTH%` provides a path to a nonexistent file, or if the provided file is
not a valid JSON, the credentials will be ignored and the Aviso notification retrieval will eventually fail
due to "UNAUTHORIZED" access.
The Aviso schema file is a JSON file that defines the event listener schema. This is used by both Aviso server
and client (thus, by ecFlow) to define the valid event types and request parameters used when polling for
notifications. The schema file path must be provided to the `schema` option (or via the `ECF_AVISO_SCHEMA` variable).
check point
The check point file is like the :term:`suite definition`, but includes all the state information.
It is periodically saved by the :term:`ecflow_server`.
It can be used to recover the state of the node tree should server die, or machine crash.
By default when a :term:`ecflow_server` is started it will look to load the check point file.
The default check point file name is <host>.<port>.ecf.check. This can be overridden by the ECF_CHECK environment variable
The check point file format is the same as the defs file format (from release 4.7.0 onwards). However, the indentation has been removed to preserve space. To view with indentation use:
.. code-block:: shell
ecflow_client --load=<check_point_file> print check_only
child command
Child (or Task) commands are called from within the :term:`ecf script` files. The table also includes the default action (from version 4.0.4) if the child command is part of a zombie. 'block' means the job will be held by the :term:`ecflow_client` command. Until time out, or manual/automatic intervention.
.. list-table::
:header-rows: 1
* - Child (or Task) Command
- Description
- Zombie (default action)
* - :ref:`ecflow_client --init <init_cli>`
- Sets the :term:`task` to the :term:`active` :term:`status`
- block
* - :ref:`ecflow_client --wait <wait_cli>`
- Wait for a expression to evaluate
- block
* - :ref:`ecflow_client --queue <queue_cli>`
- Update :term:`queue` step in server
- block
* - :ref:`ecflow_client --abort <abort_cli>`
- Sets the :term:`task` to the :term:`aborted` :term:`status`
- block
* - :ref:`ecflow_client --complete <complete_cli>`
- Sets the :term:`task` to the :term:`complete` :term:`status`
- block
* - :ref:`ecflow_client --event <event_cli>`
- Set an :term:`event`
- fob
* - :ref:`ecflow_client --meter <meter_cli>`
- Change a :term:`meter`
- fob
* - :ref:`ecflow_client --label <label_cli>`
- Change a :term:`label`
- fob
The following environment variables must be set for the child commands. ECF_HOST, :term:`ECF_NAME` , :term:`ECF_PASS` and ECF_RID. See :term:`ecflow_client`.
clock
A clock is an attribute of a :term:`suite`.
A gain can be specified to offset from the given date.
The hybrid and real clocks always runs in phase with the system clock (UTC in UNIX)
but can have any offset from the system clock.
The clock can be :
* :term:`hybrid clock`
* :term:`real clock`
* :term:`virtual clock`
:term:`time`, :term:`day` and :term:`date` and :term:`cron` :term:`dependencies`
work a little differently under the clocks.
The default clock type is hybrid.
If the :term:`ecflow_server` is :term:`shutdown` or :term:`halted` the job :term:`scheduling` is suspended.
If this suspension is left for period of time, then it can affect task submission under **hybrid** and **real** clocks.
In particular it will affect :term:`task`\ s with :term:`time`, :term:`today` or :term:`cron` :term:`dependencies`.
- :term:`dependencies` with time series, can result in missed time slots:
.. code-block:: shell
time 10:00 20:00 00:15 # If server is suspended > 15 minutes, time slots can be missed
time +00:05 20:00 00:15 # start 5 minutes after the start of the suite, then every 15m until 20:00
- When the server is placed back into :term:`running` state any time :term:`dependencies`
with an expired time slot are submitted straight away. i.e if :term:`ecflow_server` is
:term:`halted` at 10:59 and then placed back into :term:`running` state at 11:20:
.. code-block:: shell
time 11:00
Then any :term:`task` with a expired single time slot dependency will be submitted straight away.
See also:
.. list-table::
:widths: 40 60
* - :ref:`python_api`
- :py:class:`ecflow.Clock`, :py:class:`ecflow.Suite.add_clock`
* - :ref:`grammar`
- :token:`clock`
complete
Is a :term:`node` :term:`status`.
The node can be set to complete:
- By the :term:`complete expression`
- At job end when the :term:`task` receives the :ref:`ecflow_client –complete <complete_cli>` :term:`child command`
- Manually via the command line or GUI. When this happens any time attributes are expired in order.
complete expression
Force a node to be complete **if** the expression evaluates, without running any of the nodes.
This allows the user to have tasks in the suite which run only in case others fail.
In practice the node would need to have a :term:`trigger` also.
.. list-table::
:widths: 40 60
* - :ref:`ecflow_cli`
- :ref:`--complete <complete_cli>`
* - :ref:`python_api`
- :py:class:`ecflow.Expression`, :py:class:`ecflow.Node.add_complete`
* - :ref:`grammar`
- :token:`complete`
cron
A :term:`cron` defines a time dependency for a :term:`node`, similar to :term:`time`,
but one that will be repeated indefinitely.
See also:
.. list-table::
:widths: 40 60
* - Text Definition
- :ref:`cron<text_based_def_cron>`
* - :ref:`python_api`
- :py:class:`ecflow.Cron`, :py:class:`ecflow.Node.add_cron`
* - :ref:`grammar`
- :token:`cron`
date
This defines a date dependency for a node.
There can be multiple date dependencies.
The European format is used for dates, which is: dd.mm.yy as in 31.12.2007.
Any of the three number fields can be expressed with a wildcard `*` to mean any valid value.
Thus, 01.*.* means the first day of every month of every year.
If a :term:`hybrid clock` is defined, any node held by a date dependency will be set to :term:`complete` at the beginning
of the :term:`suite`, without running the corresponding job. Otherwise under a hybrid clock the :term:`suite` would
never :term:`complete`.
.. list-table::
:widths: 40 60
* - :ref:`python_api`
- :py:class:`ecflow.Date`, :py:class:`ecflow.Node.add_date`
* - :ref:`grammar`
- :token:`date`
day
This defines a day dependency for a node.
There can be multiple day dependencies.
If a :term:`hybrid clock` is defined, any node held by a day dependency will be set to :term:`complete` at the beginning
of the :term:`suite`, without running the corresponding job. Otherwise under a hybrid clock the :term:`suite` would
never :term:`complete`.
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Day`, :py:class:`ecflow.Node.add_day`
* - :ref:`grammar`
- :token:`day`
defstatus
Defines the default :term:`status` for a task/family to be assigned to the :term:`node` when the begin command is issued.
By default :term:`node` gets queued when you use begin on a :term:`suite`.
defstatus is useful in preventing suites from running automatically once begun or in setting
tasks complete so they can be run selectively.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.DState`, :py:class:`ecflow.Node.add_defstatus`
* - :ref:`grammar`
- :token:`defstatus`
dependencies
Dependencies are attributes of node, that can suppress/hold a :term:`task` from taking part in :term:`job creation`.
They include :term:`trigger`, :term:`date`, :term:`day`, :term:`time`, :term:`today`, :term:`cron`, :term:`complete expression`, :term:`inlimit` and :term:`limit`.
A :term:`task` that is dependent cannot be started as long as some dependency is holding it or any of its **parent** :term:`node` s.
The :term:`ecflow_server` will check the dependencies every minute, during normal :term:`scheduling` **and** when any
:term:`child command` causes a state change in the :term:`suite definition`.
directives
Directives appear in a ecf script. (i.e. typically .ecf file, but could be .py file).Directives start with a % character. This is referred to as :term:`ECF_MICRO` character.
The directives are used in two main context.
- Preprocessing directives. In this case the directive starts as the **first** character on a line in a :term:`ecf script` file.
See the table below which shows the allowable values. Only one directive is allowed on the line.
- Variable directives. We use two :term:`ECF_MICRO` characters ie %VAR%, in this case they can occur **anywhere** on
the line and in any number.
.. code-block:: shell
%CAR% %TYPE% %WISHLIST%
These directives take part in :term:`variable substitution`.
If the micro characters are not paired (i.e uneven) then :term:`variable substitution` cannot take place
hence an error message is issued.
.. code-block:: shell
port=%ECF_PORT # error issued since '%' micro character are not paired.
However an uneven number of micro character are allowed, **If** the line begins with '#' comment character.
.. code-block:: shell
# This is a comment line with a single micro character % no error issued
# port=%ECF_PORT again no error issued
Directives are expanded during :term:`pre-processing`. Examples include:
====================== ============================================================================
Symbol Meaning
====================== ============================================================================
%include <filename> %ECF_INCLUDE% directory is searched for the :file:`filename` and the contents
included into the job file. If that variable is not defined :term:`ECF_HOME` is used.
If the :term:`ECF_INCLUDE` is defined but the file does not exist, then we look in
:term:`ECF_HOME`. This allows specific files to be placed in ECF_INCLUDE and the
more general/common include files to be placed in ECF_HOME.
This is the recommended format
%include "filename" Include the contents of the file:
%ECF_HOME%/%SUITE%/%FAMILY%/filename into the job.
%include filename Include the contents of the file :file:`filename` into the output. The only form
that can be used safely must start with a slash '/'
%includenopp filename Same as %include, but the file is not interpreted at all.
%comment Starts a comment, which is ended by %end directive.
The section enclosed by %comment - %end is removed during :term:`pre-processing`
%manual Starts a manual, which is ended by %end directive.
The section enclosed by %manual - %end is removed during :term:`pre-processing`.
The manual directive is used to create the :term:`manual page`
show in :term:`ecflow_ui`.
%nopp Stop pre-processing until a line starting with %end is found.
No interpretation of the text will be done (i.e. no variable substitutions)
%end End processing of %comment or %manual or %nopp
%ecfmicro CHAR Change the directive character, to the character given. If set in an
include file the effect is retained for the rest of the job (or until
set again). It should be noted that the ecfmicro directive specified in
the :term:`ecf script` file, does **not** effect the variable substitution
for :term:`ECF_JOB_CMD`, ECF_KILL_CMD or :term:`ECF_STATUS_CMD` variables. They still use
:term:`ECF_MICRO`. If no ecfmicro directive exists, we default to using
:term:`ECF_MICRO` from the :term:`suite definition`
====================== ============================================================================
From ecFlow release 4.4.0, use of %VAR% (variable substitution) can be a part of the include filename. i.e.:
.. code-block:: shell
# %file% must be defined, on the task, or on the parent hierarchy
%include <%file%.h>
# use %INCLUDEFILE% if defined (on the task, or on the parent hierarchy,
# and MUST follow one of formats above: ".filename", "../filename", "filename",
# filename>) otherwise use <file>
%include %INCLUDEFILE:<file>%
Care should be taken to avoid spaces in the variable values.
ecf file location algorithm
:term:`ecflow_server` and job creation checking uses the following algorithm to locate the '.ecf' file corresponding to a :term:`task`.
.. note::
To search for files with a different extension, i.e. to look for python file '.py'. Override the :term:`ECF_EXTN` variable. Its default value is '.ecf'
* :term:`ECF_SCRIPT`: First it uses the generated variable ECF_SCRIPT to locate the script.
This variable is generated from: ECF_HOME/<path to task>.ecf
Hence if the task path is /suite/f1/f2/t1, then ECF_SCRIPT=ECF_HOME/suite/f1/f2/t1.ecf
* :term:`ECF_FETCH` (user variable): File is obtained from running the command after some postfix arguments are added. (Output of popen)
* :term:`ECF_SCRIPT_CMD` (user variable): File is obtained from running the command. (Output of popen)
* **ECF_FILES**: Second it checks for the user defined ECF_FILES variable. If defined the value of this variable must correspond to a directory. This directory is searched in reverse order.
I.e. lets assume we have a :term:`task` /o/12/fc/model and ECF_FILES is defined as /home/ecmwf/emos/def/o/ECFfiles
The ecFlow will use the following search pattern.
#. /home/ecmwf/emos/def/o/ECFfiles/o/12/fc/model.ecf
#. /home/ecmwf/emos/def/o/ECFfiles/12/fc/model.ecf
#. /home/ecmwf/emos/def/o/ECFfiles/fc/model.ecf
#. /home/ecmwf/emos/def/o/ECFfiles/model.ecf
If the directory does not exist, the server will try variable substitution. This allows additional configuration:
.. code-block:: shell
edit ECF_FILES /home/ecmwf/emos/def/o/%FILE_DIR:ECFfiles%
The search can be reversed, by adding a variable **ECF_FILES_LOOKUP**, with a value of "prune_leaf" (from ecFlow 4.12.0). Then ecFlow will use the following search pattern.
#. /home/ecmwf/emos/def/o/ECFfiles/o/12/fc/model.ecf
#. /home/ecmwf/emos/def/o/ECFfiles/o/12/model.ecf
#. /home/ecmwf/emos/def/o/ECFfiles/o/model.ecf
#. /home/ecmwf/emos/def/o/ECFfiles/model.ecf
However please be aware this will also affect the search in :term:`ECF_HOME`
* :term:`ECF_HOME`: Thirdly it searches for the script in reverse order using :term:`ECF_HOME` (i.e like ECF_FILES). If this fails, than the :term:`task` is placed into the :term:`aborted` state. We can check that file can be located before loading the suites into the server.
Note: The addition of variable with a name **ECF_FILES_LOOKUP** and value 'prune_leaf', affects the search in BOTH **ECF_FILES** and :term:`ECF_HOME`
See also:
* :ref:`tutorial-checking-job-creation`
* :py:class:`ecflow.Defs.check_job_creation`
ecf script
The ecFlow script refers to an ‘.ecf’ file.
The script file is transformed into the :term:`job file` by the :term:`job creation` process.
The base name of the script file **must** match its corresponding :term:`task`. i.e t1.ecf , corresponds to the task of name 't1'.
The script if placed in the ECF_FILES directory, may be re-used by multiple tasks belonging to different families,
providing the :term:`task` name matches.
The ecFlow script is similar to a UNIX shell script.
The differences, however, includes the addition of “C” like pre-processing :term:`directives` and ecFlow :term:`variable`\ s.
Also the script *must* include calls to the **init** and **complete** :term:`child command`\ s so that
the :term:`ecflow_server` is aware when the job starts (i.e changes state to :term:`active`) and finishes (i.e changes state to :term:`complete`)
ECF_DUMMY_TASK
This is a user variable that can be added to :term:`task` to indicate that there is no
associated :term:`ecf script` file.
If this variable is added to :term:`suite` or :term:`family` then all child tasks are treated as dummy.
This stops the server from reporting an error during :term:`job creation`.
ECF_EXTN
Defines the extension for the script that will be turned into a job file. This has a default value of '.ecf'. But could be any extension.This is used by the server as part of 'ecf file location algorithm'
ECF_FETCH
*Experimental*
This is used to specify a command, whose output can be used as a job script. The ecFlow server will run the command with popen. Hence great care needs to be taken not to doom the server, with command that can hang. As this could severely affect servers ability to schedule jobs.
.. code-block:: shell
edit ECF_FETCH my_custom_cmd.sh
After variable substitution, the server will add the following.
.. code-block:: shell
my_custom_cmd.sh -s <task_name>.<ECF_EXTN> # to extract the script and create the job
my_custom_cmd.sh -i # to extract the includes
my_custom_cmd.sh -m <task_name>.<ECF_EXTN> # to extract the manual, i.e. for display in the info tab
my_custom_cmd.sh -c <task_name>.<ECF_EXTN> # to extract the comments
The output of running these commands (-s) is used to create the job.
ECF_HOME
This is user defined :term:`variable`; it has four functions:
- it is used as a prefix portion of the path of the job files created by ecFlow server; see the description of the :term:`ECF_JOB` generated variable.
- it is a default directory where ecFlow server looks for scripts (with file extension defined by :term:`ECF_EXTN`,default is .ecf); overridden by ECF_FILES user defined variable. See the "ecf file location algorithm" entry for more detail.
- it is a default directory where ecFlow server looks for include files; overridden by :term:`ECF_INCLUDE` user defined variable. See the "directives" entry for more detail.
- it is used as a default prefix portion of the job output path (the :term:`ECF_JOBOUT` generated variable); overridden by **ECF_OUT** user defined variable. See descriptions of :term:`ECF_JOBOUT` and :term:`ECF_OUT` variables for more detail.
ECF_INCLUDE
This is a user defined variable. It is used to specify directory locations, that are used to search for include files.
.. code-block:: shell
edit ECF_INCLUDE /home/fred/course/include # a single directory
edit ECF_INCLUDE /home/fred/course/include:/home/fred/course/include2:/home/fred/course/include_me # set of directories to search
ECF_JOB
This is a generated :term:`variable`. If defines the path name location of the job file.
The variable is composed as::
ECF_HOME/ECF_NAME.job<ECF_TRYNO>
ECF_JOB_CMD
This :term:`variable` should point to a script that can submit the job. (i.e. to the queuing system, via, SLURM,PBS).
The ecFlow server will detect abnormal termination of this command. Hence for errors in the job file, should call 'ecflow_client --abort", then exits cleanly.
Otherwise server detects abnormal job termination, and abort flag is set. Which will prevent job re-queue(due to ECF_TRIES).
If the job also sends an abort, zombies can be created. If ECF_JOB_CMD command fails, and the task is in a submitted state, then the task is set to the aborted state. However if the task was active or complete, then we do NOT abort the task. Instead the zombie flag is set. (since ecFlow 4.17.1)
ECF_JOBOUT
This is a generated :term:`variable`. This variable defines the path name for the job output file. The variable is composed as following.
If :term:`ECF_OUT` is specified::
ECF_OUT/ECF_NAME.ECF_TRYNO
otherwise::
ECF_HOME/ECF_NAME.ECF_TRYNO
ECF_LISTS
This is the server variable. The variable specifies the path to the White list file. This file controls who has read/write access to the server via the :term:`user command`\ s.
The user name can be found using linux, id command and is typically the login name. The file has a very simple format.
The file path specified by ECF_LISTS environment, is read by the server on start up. The contents of the white list can be modified, and reloaded by the server. (However the path to the white-list file can NOT be modified after the server has started).
If ECF_LISTS is not set, the server will look for a file named <host>.<port>.ecf.lists (i.e.my_host.3141.ecf.lists) in same directory where the server was started.
If the file specified by ECF_LISTS or <host>.<port>.ecf.lists, does not exist or exists but is empty, then all users will have read/write access to suites on the server. Special care must be taken, so that user reloading the white list file does not remove write access for the administrator.
.. code-block:: shell
:caption: Re-load white list file
ecflow_client --help=reloadwsfile
ecflow_client --reloadwsfile
.. code-block:: shell
:caption: Read write access for specific users
4.4.14 # this is a comment, the first non-comment line must include a version.
# These users have read and write access to the server
uid1 # user uid1,uid2,cog have read and write access to the server
uid2
cog
# Read only users
-fred # users fred,bill and jake have read only access
-bill
-jake
.. code-block:: shell
:caption: Example where all users have read access
4.4.14 # this is a comment, the first non-comment line must include a version.
# These users have read and write access to the server
uid1 # user uid1,uid2,cog have read and write access to the server
uid2
cog
# User with read access
-* # all users have read access
.. code-block:: shell
:caption: From ecFlow release 4.1.0, users can be restricted via node paths
4.4.5
fred # has read /write access to all suites
-joe # has read access to all suites
* /x /y # all users have read/write access to suites /x /y
-* /w /z # all users have read access to suites /w /z
user1 /a,/b,/c # user1 has read/write access to suite /a /b /c
user2 /a
user2 /b
user2 /c # user2 has read write access to suite /a /b /c
user3 /a /b /c # user3 has read write access to suite /a /b /c
-user4 /a,/b,/c # user4 has read access to suite /a /b /c
-user5 /a
-user5 /b
-user5 /c # user5 has read access to suite /a /b /c
-user6 /a /b /c # user6 has read access to suite /a /b /c
ECF_MICRO
This is a generated :term:`variable`. The default value is %.
This variable is used in :term:`variable substitution` during command invocation and
default directive character during :term:`pre-processing`.
It can be overriden, but must be replaced by a single character.
ECF_NAME
This is a generated :term:`variable`. It defines the path name of the :term:`task`. It will typically be used inside script file, referring to the corresponding task.
.. code-block:: shell
:caption: t1.ecf
%include <head.h>
....
ecflow_client --alter change variable "fred" "bill" %ECF_NAME% # change variable on corresponding task
...
%include <tail.h>
ECF_NO_SCRIPT
This is a user :term:`variable`, that can be added to a :term:`node` (introduced with ecFlow release 4.3.0). It is used to inform the ecflow_server that there is **no SCRIPT** associated with a task. However unlike ECF_DUMMY_TASK, the task can still be submitted provided the :term:`ECF_JOB_CMD` is set up.
This is suitable for very **lightweight** tasks that want to minimize latency. The output can still be seen, if it is redirected to :term:`ECF_JOBOUT`. Care must be taken to ensure the path to ecflow_client is accessible.
.. code-block:: shell
:caption: ECF_NO_SCRIPT examples
family no_script
edit ECF_NO_SCRIPT "1" # the server will not look for .ecf files
edit ECFLOW_CLIENT ecflow_client
edit DIROUT %VERBOSE%
edit SILENT ""
edit VERBOSE " > %ECF_JOBOUT 2>&1"
task non_script_task
edit ECF_JOB_CMD "export ECF_PASS=%ECF_PASS%;export ECF_PORT=%ECF_PORT%;export ECF_HOST=%ECF_HOST%;export ECF_NAME=%ECF_NAME%;export ECF_TRYNO=%ECF_TRYNO%; %ECF_CLIENT% --init=$$; echo 'test test_ecf_no_script' %DIROUT% && %ECF_CLIENT% --complete"
# this command is not expected to fail. hence no error handling.(i.e.. will stay active)
task ecf_no_script
edit ECF_JOB_CMD "ecf_no_script --pass %ECF_PASS% --host %ECF_HOST% --port %ECF_PORT% " # %DIROUT%
# ecf_no_script contains init, complete, call to ecflow_client and trapping to raise abort
# use this approach for robust error handling
task ymd2jul
edit ECF_JOB_CMD "ECF_PASS=%ECF_PASS% ECF_NAME=%ECF_NAME% /usr/local/bin/ymd2jul.sh -p %ECF_PORT% -n %ECF_HOST% -r /%SUITE%/%FAMILY% -y %YMD% > %ECF_JOBOUT% 2>&1 &"
# /usr/local/bin/ymd2jul.sh can be called on command line or as ecflow_client
endfamily
ECF_OUT
This is user/suite variable that specifies a directory PATH. It controls the location of job output (stdout and stderr of the process) on a remote file system. It provides an alternate location for the job and cmd output files. If it exists, it is used as a base for :term:`ECF_JOBOUT`, but it is also used to search for the output by ecFlow, when asked by :term:`ecflow_ui`/:term:`ecflow_client`. If the output is in ECF_OUT/ECF_NAME.ECF_TRYNO it is returned, otherwise ECF_HOME/ECF_NAME.ECF_TRYNO is used.
The user must ensure that all the directories exists, including :term:`suite`/:term:`family`. If this is not done, you may well find task remains stuck in a submitted state. At ECMWF our submission scripts will ensure that directories exists.
ECF_PASS
This is a generated :term:`variable`. During job generation process in the server, a unique password is generated and stored in the task. It then replaces %ECF_PASS% in the scripts(.ecf), with the actual value. When the job runs, ecflow_client reads this, as an environment variable, and passes it to the server. The server then compares this password with the one held on the task. This is used as a part of the authentication for child commands, and is used to detect zombies.
The authentication process can be bypassed, and allow the job to proceed (i.e.. when the user is sure that there is only a single process, trying to communicate with the server), by adding it as a user variable. i.e.:
.. code-block:: shell
ecflow_client --alter add variable ECF_PASS FREE <path to task>
This functionality is also available in the GUI. Select a task, RMB > Special >Free password. However it is important not leave this in place, as it will always bypass the authentication. Just delete the variable.
ECF_PASSWD
This is an environment variable, which points to a password file for both client and server. This enables password based authentication for ecFlow :term:`user command`\ s. The password file is required for the client and server.
.. code-block:: shell
:caption: Example client password file. The same file can be used for multiple servers
4.5.0
# <user> <host> <port> <passwd>
user1 machine1 3141 xxxty
user1 machine2 3142 shhert
.. code-block:: shell
:caption: Example server password file for machine1 and port 3141
4.5.0
user1 machine1 3141 xxxty
user2 machine1 3141 bbsdd7
The server administrator needs to set Unix file permissions, so that this file is only readable by ecFlow server and the administrator.
ECF_SCRIPT
This is a generated :term:`variable`. If defines the path name for the :term:`ecf script`
ECF_SCRIPT_CMD
*Experimental*
This allows the output of running a command to be treated as a script. The command is run after variable substitution. The output is obtained from running the system function popen in the server. Great care should be taken when running this command, to ensure errors in the command do not crash the server. This approach could be used for short lived tasks, where extremely low latency is required. Commands that take more than 20s can interfere with job scheduling and should be avoided. Could possibly be used to checkout a script from a version control system.
If the output contains %include,%manual,%noop they are treated in the same manner as a normal '.ecf' script.
.. code-block:: shell
:caption: Here the output of the 'cat' command is treated as a script
suite test
family family
task check
edit ECF_SCRIPT_CMD "cat /tmp/ECF_SCRIPT_CMD/family/check.ecf"
task t1
trigger check == complete
edit ECF_SCRIPT_CMD "cat /tmp/ECF_SCRIPT_CMD/family/t1.ecf"
endfamily
endsuite
ECF_STATUS_CMD
User defined :term:`variable` defining the :ref:`ecflow_client --status <status_cli>` command. It invokes a user-supplied (shell) command that queries the status of the job.
The command should be written in such a way that the output is written to %ECF_JOB%.stat, and if the script determines that the job is not active, it should abort the task in ecflow. This command can be particularly useful when nodes on the supercomputer go down, and we don't know the true state of the jobs.
The status command can be invoked from the :ref:`ecflow_cli` and :ref:`ecflow_ui`. If applied to a :term:`family` or :term:`suite`, the command will be run hierarchically. In :ref:`ecflow_ui` use the Status tab in the Info panel or use Special > Status from the node context menu to run it and see the output.
The code below allows the output of the status command to be shown by the ``--file`` command on the command line, and automatically via the Status tab in :ref:`ecflow_ui`:
.. code-block:: shell
suite s1
edit ECF_STATUS_CMD /home/ma/emos/bin/ecfstatus %USER% %HOST% %ECF_RID% %ECF_JOB% > %ECF_JOB%.stat 2>&1
....
endsuite
.. code-block:: shell
:caption: Invoking status cmd, from the command line
ecflow_client --status=/s1/f1/t1 # ECF_STATUS_CMD should output to %ECF_JOB%.stat
ecflow_client --file=/s1/f1/t1 stat # Return contents of %ECF_JOB%.stat file"
ECF_TRIES
This is generated variable added at the server level with a default value of 2. It can be overridden by the user and controls the number of times job should re-run should it abort. Provided:
- the task/job has NOT been killed(user action)
- the job process (created from .ecf or .py) exited cleanly and not with exit 1 || sys.exit(1) as process death is captured by the server. Always ensure your script exits cleanly. i.e. exit(0)
- the task has NOT been set to abort by the user(user action)
- job creation has not failed . i.e. task pre-processing(include file expansion,variable - substitution, change of file permission for job file)
- the value of the variable ECF_TRIES must be convertible to an integer.
Please note this allows your scripts to be self-aware of the number times it is being run. i.e.:
.. code-block:: shell
:caption: task.ecf
%include <head.h>
"echo do some work\n";
if [ %ECF_TRYNO% -eq 1 ] ; then
echo "first attempt"
.....
fi
if [ %ECF_TRYNO% -eq 2 ] ; then
echo "first attempt failed, trying a different approach, clean data, etc"
.....
fi
%include <tail.h>
ECF_TRYNO
This is a generated :term:`variable` that is used in file name generation.
It represents the current try number for the :term:`task`.
After **begin** it is set to 1. The number is advanced if the job is re-run.
It is re-set back to 1 after a re-queue.
It is used in output and :term:`job file` numbering.
(i.e It avoids overwriting the :term:`job file` output during multiple re-runs)
ecFlow
Is the ECMWF work flow manager.
A general purpose application designed to schedule a large number of
computer process in a heterogeneous environment.
Helps computer jobs design, submission and monitoring both in
the research and operation departments.
ecflow_client
This executable provides the ecFlow :ref:`ecflow_cli`; it is used for all communication with the :term:`ecflow_server`.
To see the full range of commands that can be sent to the :term:`ecflow_server` type the following in a UNIX shell:
.. code-block:: shell
ecflow_client --help
This functionality is also provided by the :ref:`python_api`.
The following variables affect the execution of ecflow_client.
Since the :term:`ecf script` can call ecflow_client(i.e :term:`child command`) then typically
some are set in an include header.
.. list-table:: Environment variables common for User and Task commands
:header-rows: 1
:widths: 10 50 10 30
* - Variable Name
- Explanation
- Compulsory
- Example
* - ECF_PORT
- Port number of the :term:`ecflow_server`. Must match :term:`ecflow_server`
- Yes/No
- We can use:
.. code-block:: shell
ecflow_client --port 3141
as an alternative to specifying the ECF_PORT.
* - ECF_HOST
- Name of the host running the :term:`ecflow_server`
- Yes/No
- We can use:
.. code-block:: shell
ecflow --host machine1
as an alternative to specifying ECF_HOST
* - NO_ECF
- If set exits ecflow_client immediately with success. This allows the scripts to be tested independent of the server
- No
- .. code-block:: shell
export NO_ECF=1
* - ECF_DENIED
- If server denies client communication and this flag is set, exit with an error. Avoids 24hr hour connection attempt to :term:`ecflow_server`.
- No
- .. code-block:: shell
export ECF_DENIED=1
* - ECF_SSL
- For secure communication between server and client -- requires build with SSL enabled.
- No
- .. code-block:: shell
# To share a certificate amongst multiple servers
export ECF_SSL=1 # or empty value
# To use specific server certificate
export ECF_SSL=<any non-empty value, except '1'>
When `ECF_SSL=1`, ecflow will search for a shared certificate at `$HOME/.ecflowrc/ssl/server.crt`,
and then fallback to the server specific certificate at `$HOME/.ecflowrc/ssl/<host>.<port>.crt`.
Secure communication can also be activated using the :code:`ecflow_client --ssl ...` option.
When using the `--ssl` option, if `ECF_SSL` is not explicitly specified, it is assumed `ECF_SSL=1`.
.. list-table:: Environment variables for Task commands
:header-rows: 1
:widths: 10 50 10 30
* - Variable Name
- Explanation
- Compulsory
- Example
* - :term:`ECF_NAME`
- Path to the task
- Yes
- /suite/family/task
* - :term:`ECF_PASS`
- Jobs password. Generated by the server, will replace %ECF_PASS% in the scripts,during job generation.Used for authenticating child commands.
- Yes
- (generated)
* - ECF_RID
- Remote id. Allow easier job kill, and disambiguate a zombie
- Yes
- (generated)
* - :term:`ECF_TRYNO`
- The number of times the job has run. This is allocated by the server and used in job/output file name generation.
- No
- (generated)
* - ECF_HOSTFILE
- File that lists alternate hosts to try, if connection to main host fails
- No
- $HOME/.echostfile
* - ECF_HOSTFILE_POLICY
- The policy, one of "task" or "all" indicates when to perform retry based on the ECF_HOSTFILE.
The default policy is "task", meaning that the retry will only be performed for task (i.e. commands) commands.
If the policy is "all", the retry will be performed for both task and user commands (including :code:`ping`).
- No
- .. code-block:: shell
export ECF_HOSTFILE_POLICY=all
* - ECF_TIMEOUT
- Maximum time (in seconds) for the client to deliver message
- No
- default value: 24 * 60 * 60 # i.e. 24 hours
.. code-block:: shell
export ECF_TIMEOUT=36024*3600
* - ECF_CONNECT_TIMEOUT
- Maximum time (in seconds) for the client to establish connection
- No
- default value: 0
* - ECF_ZOMBIE_TIMEOUT
- Maximum time (in seconds) for the zombie Task client (performing :code:`init`, :code:`abort`, :code:`complete`, etc) to get a reply from the server.
- No
- 12*3600 (default value):
.. code-block:: shell
export ECF_ZOMBIE_TIMEOUT=36024*3600
.. list-table:: Variables specific to user commands
:header-rows: 1
:widths: 10 50 10 30
* - Variable Name
- Explanation
- Compulsory
- Example
* - :term:`ECF_PASSWD`
- path to the client password file, used for password based authentication
- No
- .. code-block:: shell
export ECF_PASSWD=mymachine.3141.ecf.passwd
* - ECF_USER
- When user need to pose as another user, i.e. when users id on the client machine, doesn't match his id on the remote server. Requires password file.
- No
- .. code-block:: shell
export ECF_USER=my_user_name
To avoid setting environment variable we can use:
.. code-block:: shell
ecflow_client --user my_user_name ......
ecflow_server
This executable is the server.
It is responsible for :term:`scheduling` the jobs and responding to :term:`ecflow_client` requests
Multiple servers can be run on the same machine/host providing they are assigned a unique port number.
The server records all requests in the log file.
The server will periodically (see ECF_CHECKINTERVAL) write out a :term:`check point` file.
The following environment variables control the execution of the server and may be set before the start of the server.
ecflow_server will start happily with out any of these variables being set, since all of them have default values.
.. list-table::
:header-rows: 1
* - Variable Name
- Explanation
- Default value
* - :term:`ECF_HOME`
- Home for all the :term:`ecFlow` files
- Current working directory
* - ECF_PORT
- Server port number. Must be unique
- 3141
* - ECF_LOG
- History or log file
- <host>.<port>.ecf.log
* - ECF_CHECK
- Name of the checkpoint file
- <host>.<port>.ecf.check
* - ECF_CHECKOLD
- Name of the backup checkpoint file
- <host>.<port>.ecf.check.b
* - ECF_CHECKINTERVAL
- Interval in second to save :term:`check point` file
- 120
* - ECF_LISTS
- White list file. Controls read/write access to the server for each user
- <host>.<port>.ecf.lists
* - ECF_TASK_THRESHOLD
- Report in log file all task/job that take longer than given threshold. Used to debug/instrument, those scripts that are very large.
- 4000 (milliseconds). Before release 4.0.6 default was 2000 ms.
* - :term:`ECF_PASSWD`
- path to server password file, used to authenticate :term:`user command`\ s. Use when ALL should be password authenticated
- <host>.<port>.ecf.passwd
* - ECF_CUSTOM_PASSWD
- path to server password file, used to authenticate :term:`user command`\ s. Use when a small number of users need to be password authenticated. Typically client would use:ecflow_client --user=fred ....export ECF_USER=fred; ecflow_client ...
- <host>.<port>.ecf.custom_passwd
* - ECF_PRUNE_NODE_LOG
- When the checkpoint point file is loaded, node log history older than 30 days is automatically pruned. The variable allows this value to be changed.Setting the variable to zero, means there will be no pruning. All history is preserved at the cost increasing server memory, and time taken to write checkpoint file.
- .. code-block:: shell
export ECF_PRUNE_NODE_LOG=40
Prune node log history older than 40 days, upon reload of :term:`check point` file.
* - ECF_SSL
- For secure communication between server and client -- requires build with SSL enabled.
- .. code-block:: shell
# To share a certificate amongst multiple servers
export ECF_SSL=1 # or empty value
# To use specific server certificate
export ECF_SSL=<any non-empty value, except '1'>
When `ECF_SSL=1`, ecflow will search for a shared certificate at `$HOME/.ecflowrc/ssl/server.crt`,
and then fallback to the server specific certificate at `$HOME/.ecflowrc/ssl/<host>.<port>.crt`.
Secure communication can also be activated using the :code:`ecflow_server --ssl ...` option.
When using the `--ssl` option, if `ECF_SSL` is not explicitly specified, it is assumed `ECF_SSL=1`.
Consider using `ecflow_start.sh -s` to start the server with SSL support.
The server can be in several states. The default when first started is :term:`halted`, See :term:`server states`
ecflow_ui
ecflow_ui executable in the new GUI based client. It is used to visualise and monitor the hierarchical structure of the :term:`suite definition`.
event
The purpose of an event is to signal partial completion of a :term:`task` and to be able to
trigger another job which is waiting for this partial completion.
Only tasks can have events and they can be considered as an attribute of a :term:`task`.
There can be many events and they are displayed as nodes.
The event is updated by placing the ``--event`` :term:`child command` in a :term:`ecf script`.
An event has a number and possibly a name. If it is only defined as a number,
its name is the text representation of the number without leading zeroes.
See also:
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`event_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Event`, :py:class:`ecflow.Node.add_event`
* - :ref:`grammar`
- :token:`event`
Events can be referenced in :term:`trigger` and :term:`complete expression` s.
extern
This allows an external :term:`node` to be used in a :term:`trigger` expression.
All :term:`node`\ s in :term:`trigger`\ s must be known to :term:`ecflow_server` by the end of the load command.
No cross-suite :term:`dependencies` are allowed unless the names of tasks outside the suite are declared as external.
An external :term:`trigger` reference is considered unknown if it is not defined when the :term:`trigger` is evaluated.
You are strongly advised to avoid cross-suite :term:`dependencies`.
Families and suites that depend on one another should be placed in a single :term:`suite`.
If you think you need cross-suite dependencies, you should consider merging the suites
together and have each as a top-level family in the merged suite.
For grammar see :token:`extern`.
family
A family is an organisational entity that is used to provide hierarchy and grouping.
It consists of a collection of :term:`task`\ s and families.
Typically you place tasks that are related to each other inside the same family, analogous to the way you
create directories to contain related files.
For python see :py:class:`ecflow.Family`. For BNF see :token:`family`
It serves as an intermediate :term:`node` in a :term:`suite definition`.
generic
A generic attribute associates a name to a set of generic string values, and is used to gracefully indicate
the presence of unknown attributes in the suite definition.
This kind of attribute is used to allow the introduction of future attributes without requiring an API change.
When an older version of ecflow encounters a new/unknown attribute, the attribute is automatically converted
into a generic attribute.
.. warning::
The user is strongly advised not to include generic attributes in suite definitions.
halted
Is a :term:`ecflow_server` state. See :term:`server states`.
hybrid clock
A hybrid :term:`clock` is a complex notion: the date and time are not connected.
The date has a fixed value during the complete execution of the :term:`suite`.
This will be mainly used in cases where the suite does not :term:`complete` in less than 24 hours.
This guarantees that all tasks of this suite are using the same :term:`date`.
On the other hand, the time follows the time of the machine.
Hence the :term:`date` never changes unless specifically altered or unless the suite restarts,
either automatically or from a begin command.
Under a hybrid :term:`clock` any :term:`node` held by a :term:`date` or :term:`day` dependency
will be set to complete at the beginning of the suite. (i.e without its job ever running).
Otherwise the :term:`suite` would never :term:`complete`.
inlimit
The inlimit works in conjunction with :term:`limit`/:py:class:`ecflow.Limit` for providing simple load management. inlimit is added to the :term:`node` that needs to be limited.
.. code-block:: shell
:caption: Limiting tasks, only allow 5 tasks to run in parallel
suite suite
limit disk 100
family anon
inlimit /suite:disk 5
task t1
...
task t100
endfamily
endsuite
.. code-block:: shell
:caption: Limiting Families, only two families can run in parallel. The tasks are unconstrained
suite test
limit fam 2
family f1
inlimit -n fam
task t1
....
endfamily
family f2
inlimit -n fam
task t1
....
endfamily
family f3
inlimit -n fam
task t1
....
endfamily
endsuite
.. code-block:: shell
:caption: Limit submission
# Hence we could have more than 2 active jobs, since we are only control the number in the submitted state.
# If we removed the -s then we can only have two active jobs running at one time
suite test_limit_on_submission
limit disk 2
family anon
inlimit -s disk # Inlimit submission
task t1
task t2
....
endfamily
endsuite
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.InLimit`, :py:class:`ecflow.Node.add_inlimit`
* - :ref:`grammar`
- :token:`inlimit`
job creation
Job creation or task invocation can be initiated manually via :term:`ecflow_ui` but also by
the :term:`ecflow_server` during :term:`scheduling` when a :term:`task` (and *all* of its parent :term:`node` s) is free of its :term:`dependencies`.
The process of job creation includes:
- Generating a unique password :term:`ECF_PASS`, which is placed in :term:`ecf script` during :term:`pre-processing`.
- Locating :term:`ecf script` files , corresponding to the :term:`task` in the :term:`suite definition`, See :term:`ecf file location algorithm`
- :term:`pre-processing` the contents of the :term:`ecf script` file
The steps above transforms an :term:`ecf script` to a :term:`job file` that can be submitted by
performing :term:`variable substitution` on the :term:`ECF_JOB_CMD` :term:`variable` and invoking the command.
The running jobs will communicate back to the :term:`ecflow_server` by calling :term:`child command`\ s.
This causes :term:`status` changes on the :term:`node`\ s in the :term:`ecflow_server` and flags can be set
to indicate various events.
If a :term:`task` is to be treated as a dummy task (i.e. is used as a scheduling task) and is not meant to
to be run, then a variable of name :term:`ECF_DUMMY_TASK` can be added:
.. code-block:: shell
task.add_variable("ECF_DUMMY_TASK","")
job file
The job file is created by the :term:`ecflow_server` during :term:`job creation` using the :term:`ECF_TRYNO` :term:`variable`
It is derived from the :term:`ecf script` after expanding the pre-processing :term:`directives`.
It has the form <task name>.job<:term:`ECF_TRYNO`>", i.e. t1.job1.
Note job creation checking will create a job file with an extension with zero. i.e '.job0'. See :py:class:`ecflow.Defs.check_job_creation`
When the job is run the output file has the :term:`ECF_TRYNO` as the extension.
i.e t1.1 where 't1' represents the task name and '1' the :term:`ECF_TRYNO`
label
A label has a name and a value and is a way of **displaying** information in :term:`ecflow_ui`
By placing a label :term:`child command`\ s in the :term:`ecf script` the user can be informed about progress
in :term:`ecflow_ui`.
Labels can be added to family nodes. To change the labels, scripts should use:
.. code-block:: shell
ecflow_client --alter change label <label_name> <new_value> /path/to/family_node/with/label
If the label :term:`child command`\ s results in a :term:`zombie` then the default action if for the server to **fob**, this allows the ecflow_client command to exit normally. (i.e. without any errors). This default can be overridden by using a :term:`zombie` attribute.
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`label_cli`, :ref:`add_cli`, :ref:`alter_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Label`, :py:class:`ecflow.Node.add_label`
* - :ref:`grammar`
- :token:`label`
late
Define a tag for a node to be late. A node can only have **one** late attribute. The late attribute only applies to a task. You can define it on a Suite/Family in which case it will be inherited. Any late defined lower down the hierarchy will override the aspect(submitted,active, complete) defined higher up.
Command options:
* -s submitted: The time node can stay submitted (format ``[+]hh:mm``). submitted is always relative, so + is simple ignored, if present. If the node stays submitted longer than the time specified, the late flag is set
* -a active: The time of day the node must have become active (format ``hh:mm``). If the node is still queued or submitted, the late flag is set
* -c complete: The time node must become complete (format ``{+}hh:mm``). If relative, time is taken from the time the node became active, otherwise node must be complete by the time given.
.. code-block:: shell
suite late
family familyName
task t1
late -s +00:15 -a 20:00 -c +02:00
task t2
late -a 20:00 -c +02:00 -s +00:15
task t3
late -c +02:00 -a 20:00 -s +00:15
task t4
late -s 00:02 -c +00:05
task t5
late -s 00:01 -a 14:30 -c +00:01
endfamily
endsuite
Suites cannot be late, but you can define a late tag
for submitted in a suite, to be inherited by the families and tasks.
When a node is classified as being late, the only action :term:`ecflow_server` takes is to set a flag.
:term:`ecflow_ui` will display these alongside the :term:`node` name as an icon (and optionally pop up a window).
.. code-block:: shell
suite late
late -s +00:15 # report late for all task taking longer than 15 minutes in submitted state
family familyName
late -c +02:00 # all child task that take longer than 2 hours to complete should raise a late flag
task t1
# effective late -s +00:05 -c +02:00
late -s +00:05
task t2
# effective late -s +00:15 -c +02:00
task t5
# effective late -c +03:00 -a 18:00 -s +00:15
late -c +03:00 -a 18:00
endfamily
endsuite
The late attribute can be added/deleted to any suite/family/task.
.. code-block:: shell
ecflow_client --alter add late "-s 00:15" <path-to-node>
ecflow_client --alter change late "-s 00:01 -a 14:30 -c +00:01" <path-to-node>
ecflow_client --alter delete late
See also:
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`add_cli`, :ref:`alter_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Late`, :py:class:`ecflow.Node.add_late`
* - :ref:`grammar`
- :token:`late`
limit
Limits provide simple load management by limiting the number of tasks
submitted by a specific :term:`ecflow_server`.
Typically you either define limits on :term:`suite` level or define a separate suite to
hold limits so that they can be used by multiple suites.
Setting limits on a separate suite, has the benefit that by setting the limit value to zero, you can control task submission over a number of suites.
.. code-block:: shell
:caption: Limits
suite suiteName
limit sg1 10
limit mars 10
endsuite
The limits are used in conjunction with :term:`inlimit`.
The limit max value can be changed on the command line:
.. code-block:: shell
ecflow_client --alter change limit_max <limit-name> <new-limit-value> <path-to-limit>
ecflow_client --alter change limit_max limit 2 /suite
It can also be changed in python:
.. code-block:: python
import ecflow
try:
ci = ecflow.Client()
ci.alter("/suite","change","limit_max","limit", "2")
except RuntimeError, e:
print("Failed: " + str(e))
See also:
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`add_cli`, :ref:`alter_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Limit`, :py:class:`ecflow.Node.add_limit`
* - :ref:`grammar`
- :token:`limit`
manual page
Manual pages are part of the :term:`ecf script`.
This is to ensure that the manual page is updated when the :term:`ecf script` is updated.
The manual page is a very important operational tool allowing you to view a description of a task,
and possibly describing solutions to common problems.
The :term:`pre-processing` can be used to extract the manual page from the script file and is visible in :term:`ecflow_ui`.
The manual page is the text contained within the %manual and %end :term:`directives`.
They can be seen using the Manual tab in the Info panel in :term:`ecflow_ui`.
The text in the manual page in **not** included in the :term:`job file`.
There can be multiple manual sections in the same :term:`ecf script` file. When viewed they are
simply concatenated. It is good practice to modify the manual pages when the script changes.
The manual page may have the %include :term:`directives`.
Suite and families may also have a manual page. These will also be available in the GUI. Ecflow will look for a file ``<node_name>.man`` (where node_name is the name of suite or family) using a backwards search algorithm first in ECF_FILES directory, then ECF_HOME directory. Note that errors in variable pre-processing are ignored inside of a manual section. It should also be noted that for family and suite manuals, the %manual and %end directives are not strictly necessary, as the whole file is treated as a manual.
If we have family ``/suite/big/f1``, ecFlow will search for "f1.man" in:
.. code-block:: shell
<ECF_FILES>/suite/big/f1.man
<ECF_FILES>/suite/f1.man
<ECF_FILES>/f1.man
<ECF_HOME>/suite/big/f1.man
<ECF_HOME>/suite/f1.man
<ECF_HOME>/f1.man
meter
The purpose of a meter is to signal proportional completion of a task and to
be able to trigger another job which is waiting on this proportional completion.
The meter is updated by placing the --meter :term:`child command` in a :term:`ecf script`. Meters can be added to family nodes. To change the meters, in the scripts should use:
.. code-block:: shell
ecflow_client --alter change meter <meter_name> <new_value> /path/to/family_node/with/meter
If the meter :term:`child command` results in a zombie, then the default action if for the server to **fob**, this allows the ecflow_client command to exit normally (i.e. without any errors). This default can be overridden by using a zombie attribute.
See also:
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`meter_cli`, :ref:`add_cli`, :ref:`alter_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Meter`, :py:class:`ecflow.Node.add_meter`
* - :ref:`grammar`
- :token:`meter`
Meters can be referenced in :term:`trigger` and :term:`complete expression` expressions.
mirror
A mirror is an attribute of a local :term:`Node<node>` (typically a :term:`Task<task>`),
and allows to synchronise with a node on a remote ecFlow server. The node synchronisation includes:
- Node status
- Variables (User, Generated and Inherited)
- Meters
- Lavels
- Events
Notice that all synchronized Variables, including generated and inherited, become user variables on the local
ecFlow server.
A :term:`Node<node>` with a mirror attribute will have its status periodically
synchronized with the :term:`status` of a node on a remote ecFlow server.
The synchronised status and attributes can be used to trigger the execution of local nodes.
.. note::
Synchronised tasks don't need to be provided with :code:`.ecf` files
on the local ecFlow server, as the execution of a :term:`Task<task>`
with a mirror attribute does not happen under the responsibility of the
local ecFlow server.
Operations to execute synchronised Tasks have been disabled from the :term:`ecflow_ui`.
`Only one mirror attribute is allowed per node`, and each attribute is
defined by the following properties:
- :code:`name`, an identifier
- :code:`remote_path`, the path of the node on the remote ecFlow server
- :code:`remote_host`, the remote ecFlow server host
- :code:`remote_port`, the remote ecFlow server port
- :code:`ssl`, to connect to the ecFlow server using SSL
- :code:`polling`, the value (in seconds) used to periodically contact the remote ecFlow server
- :code:`auth`, the location to the Mirror authentication credentials file
The value of the properties :code:`remote_host`, :code:`remote_port`, :code:`polling`,
and :code:`auth` can be composed of :term:`Variables<variable>`. When
these properties are not provided, the following default values are used:
- :code:`%ECF_MIRROR_REMOTE_HOST%`, for :code:`remote_host`
- :code:`%ECF_MIRROR_REMOTE_PORT%`, for :code:`remote_port`
- :code:`%ECF_MIRROR_REMOTE_POLLING%`, for :code:`polling`
- :code:`%ECF_MIRROR_REMOTE_AUTH%`, for :code:`auth`
The following fallback values are considered when the default value is
used but the variable is not actually defined:
- in case :code:`%ECF_MIRROR_REMOTE_PORT%` is not defined, the fallback value is :code:`3141`
- in case :code:`%ECF_MIRROR_REMOTE_POLLING%` is not defined, the fallback value is :code:`120` (seconds)
- in case :code:`%ECF_MIRROR_REMOTE_AUTH%` is not defined, the fallback value is :code:`""`
(empty string), which effectively disables Authentication
Each mirror attribute implies that a background thread is spawned whenever
the ecFlow server is :term:`running<server states>` (i.e. when the server is shutdown or halted the
thread is terminated and the mirroring process is completely stopped).
This independent background thread, responsible for polling the remote ecFlow server and periodically
synchronise node status, uses the configuration available when the server is restarted.
.. note::
If any variables provinding the configuration are updated, the Mirror configuration can be
reloaded (without restarting the Server) by issuing an Alter change command with the value
:code:`reload` to the relevant attributes.
The authentication credentials file is expected to be in JSON, according to the following format:
.. code-block:: json
{
"username" : "<your-username>",
"password" : "<your-password>",
}
Only the fields :code:`username`, and :code:`password` are required; any additional fields are ignored.
node
:term:`suite`, :term:`family` and :term:`task` form a hierarchy.
Where a :term:`suite` serves as the root of the hierarchy.
The :term:`family` provides the intermediate nodes, and the :term:`task`
provide the leafs.
Collectively :term:`suite`, :term:`family` and :term:`task` can be referred
to as nodes.
For python see :py:class:`ecflow.Node`.
pre-processing
Pre-processing takes place during :term:`job creation` and acts on :term:`directives` specified in :term:`ecf script` file.
This involves:
- expanding any include file :term:`directives`. i.e similar to 'c' language pre-processing
- removing comments and manual :term:`directives`
- performing :term:`variable substitution`
queue
Queues allows efficiently running jobs that are identical but vary only in the step.
This attribute makes it possible to follow a producer(server)/consumer(tasks) pattern. Note additional task consumers can be added for load balancing.
.. code-block:: shell
suite test_queue
family f1
queue q1 001 002 003 004 005 006 007
task t
endfamily
family f2
queue q2 1 2 3 4 5 6 8 9 10
task a
task b
# notice that queue name is accessible to the trigger
trigger /test_queue/f1:q1 > 5
task c
trigger ../f2/a:q2 > 9
endfamily
endsuite
The :ref:`queue_cli` :term:`child command` will signal when a step is active, complete, or has aborted:
.. code-block:: shell
# Note: because --queue is treated like a child command(init,complete,event,label,meter,abort,wait), the task path ECF_NAME is read from the environment
# The --queue command will search up the node hierarchy for the queue name. If not found it fails.
step=$(ecflow_client --queue queue_name active) # returns first queued/aborted step from the server and makes it active, Return "NULL" for the last step.
ecflow_client --queue queue_name complete $step # Tell the server that step has completed for the given queue
ecflow_client --queue queue_name aborted $step # Tell the server that step has aborted for the given queue
no_of_aborted=$(ecflow_client --queue queue_name no_of_aborted) # returns as a string the number of aborted steps
ecflow_client --queue queue_name reset
The queue values can be strings, however, if they are to be used in :term:`trigger` expressions, they must be convertible to integers:
.. code-block:: shell
suite test_queue
family f1
queue q1 red orange yellow green blue indigo violet
task t
endfamily
endsuite
See also:
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`queue_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Queue`, :py:class:`ecflow.Node.add_queue`
* - :ref:`grammar`
- :token:`queue`
queued
Is a :term:`node` :term:`status`.
After the begin command, the task **without** a :term:`defstatus` are placed into the queued state
real clock
A :term:`suite` using a real :term:`clock` will have its :term:`clock` matching the clock of the machine.
Hence the :term:`date` advances by one day at midnight.
repeat
Repeats provide looping functionality. There can only be a single repeat on a :term:`node`.
.. code-block:: shell
repeat day step [ENDDATE] # only for suites
repeat integer VARIABLE start end [step]
repeat enumerated VARIABLE first [second [third ...]]
repeat string VARIABLE str1 [str2 ...]
repeat file VARIABLE filename
repeat date VARIABLE yyyymmdd yyyymmdd [delta]
repeat datetime VARIABLE yyyymmddTHHMMSS yyyymmddTHHMMSS [delta]
repeat datelist VARIABLE yyyymmdd(1) yyyymmdd(2) ...
The repeat variable name is available as a generated variable.
The **repeat date** and **repeat datetime** define several generated variables, prefixed by variable name:
.. code-block:: shell
# Provided for `repeat date` and `repeat datetime`
<variable> # the default, the value is the current date
<variable>_YYYY # the year
<variable>_MM # the month
<variable>_DD # the day of the month
<variable>_DOW # day of the week
<variable>_JULIAN # the julian value for the date
# Provided for `repeat datetime`
<variable>_DATE # the date formatted as yyyymmdd
<variable>_TIME # the time formatted as HHMMSS
<variable>_HOURS # the hours
<variable>_MINUTES # the minutes
<variable>_SECONDS # the seconds
For example:
.. code-block:: shell
:caption: Repeat generated variables, accessible for trigger expressions
repeat date YMD 20090101 20220101
# The following generated variables, are accessible for trigger expressions
# YMD
# YMD_YYYY, YMD_MM, YMD_DD, YMD_DOW, YMD_JULIAN
repeat datetime DT 20090101T000000 20090102T000000 06:00:00
# The following generated variables, are accessible for trigger expressions
# DT
# DT_DATE, DT_YYYY, DT_MM, DT_DD, DT_DOW, DT_JULIAN
# DT_TIME, DT_HOURS, DT_MINUTES, DT_SECONDS
The repeat VARIABLE can be used in :term:`trigger` and :term:`complete expression` expressions.
As the repeat variable changes so do the generated variables. (See the tutorial for an example. Repeat)
.. warning::
If a repeat is added to a family/suite, then the repeat will ONLY loop(and automatically re-queue its children) if all the children are complete. Hence additional care needs to be taken. i.e. if the parent node has a repeat and the child has a cron attribute then the cron will always force a re-queue on the node once it has run, and hence will stop the parent from looping.
If we use relative time attribute. i.e. time +02:00, under a repeat, then the time is relative to the repeat re-queue.
The repeat VARIABLE can be used in :term:`trigger` and :term:`complete expression` expressions. Depending on the kind of repeat the value can vary:
.. code-block:: shell
RepeatDate -> value
RepeatDateList -> value
RepeatString -> index (will always return a index)
RepeatInteger -> value
RepeatEnumerated -> value | index ( return value at index if cast-able to integer, otherwise return index )
RepeatDay -> value
If a "repeat date" VARIABLE is used in a trigger expression then date arithmetic is used,
when the expression uses addition and subtraction. i.e.:
.. code-block:: python
defs = ecflow.Defs()
s1 = defs.add_suite("s1");
t1 = s1.add_task("t1").add_repeat( ecflow.RepeatDate("YMD",20090101,20091231,1) );
t2 = s1.add_task("t2").add_trigger("t1:YMD - 1 eq 20081231");
assert t2.evaluate_trigger(), "Expected trigger to evaluate. 20090101 - 1 == 20081231"
When we use relative time attributes under a Repeat. They are automatically reset when the repeat loops. Take for example:
.. code-block:: shell
suite s1
family hc00
repeat integer HYEAR 1993 2017
time +00:01 # when the repeat loops delay starting task a, for 1 minute
task a
task b
trigger a == complete
endfamily
endsuite
Now when task 'a' and Task 'b' complete, the repeat is incremented, and any relative time attributes are reset. In this case effectively delaying the starting of task 'a' for 1 minute.
See also:
.. list-table::
* - :ref:`ecflow_cli`
- :ref:`add_cli`, :ref:`alter_cli`
* - :ref:`python_api`
- :py:class:`ecflow.Node.add_repeat`, :py:class:`ecflow.Repeat`, :py:class:`ecflow.RepeatDate`, :py:class:`ecflow.RepeatEnumerated`, :py:class:`ecflow.RepeatInteger`, :py:class:`ecflow.RepeatDay`
* - :ref:`grammar`
- :token:`repeat`
running
Is a :term:`ecflow_server` state. See :term:`server states`
scheduling
The :term:`ecflow_server` is responsible for :term:`task` scheduling.
It will check :term:`dependencies` in the :term:`suite definition` every minute.
If these :term:`dependencies` are free, the :term:`ecflow_server` will submit the task.
See :term:`job creation`.
server states
The following tables reflects the :term:`ecflow_server` capabilities in the different states
================ ============ ============ ============== ===================
State User Request Task Request Job Scheduling Auto-Check-pointing
================ ============ ============ ============== ===================
:term:`running` yes yes yes yes
:term:`shutdown` yes yes no yes
:term:`halted` yes no no no
================ ============ ============ ============== ===================
shutdown
Is a :term:`ecflow_server` state. See :term:`server states`
status
Each :term:`node` in :term:`suite definition` has a status.
Status reflects the state of the :term:`node`.
In :term:`ecflow_ui` the background colour of the text reflects the status.
:term:`task` status are: :term:`unknown`, :term:`queued`, :term:`submitted`, :term:`active`, :term:`complete`, :term:`aborted` and :term:`suspended`
:term:`ecflow_server` status are: :term:`shutdown`, :term:`halted`, :term:`running`
this is shown on the root node in :term:`ecflow_ui`
submitted
Is a :term:`node` :term:`status`.
When the :term:`task` :term:`dependencies` are resolved/free the :term:`ecflow_server` places the task into a submitted state.
However if the :term:`ECF_JOB_CMD` fails, the task is placed into the :term:`aborted` state
suite
A suite is an organisational entity. It is serves as the root :term:`node` in a :term:`suite definition`.
It should be used to hold a set of jobs that achieve a common function. It can be used to hold
user :term:`variable`\ s that are common to all of its children.
Only a suite node can have a :term:`clock`.
Suite generated variables:
.. list-table::
:widths: 20 80
* - SUITE
- The name of the suite
* - ECF_TIME
- 23:30 the current suite time
* - TIME
- 2330 time as integer, Can be used in a trigger expression, ideally using <=, <, >=, >
* - YYYY
- The year as an integer
* - DOW
- Day of the week, as an integer. Sunday=0,Monday=1,etc
* - DOY
- Day of the year, as an integer
* - DAY
- The days as a string, i.e. monday
* - DD
- Day of the month as an integer
* - MM
- The month as an integer
* - MONTH
- as a string
* - ECF_DATE
- YYYMMDD year,month,day of the month as 8 digit integer
* - ECF_JULIAN
- The julian value of the current date (added in ecFlow 4.7.0)
* - ECF_CLOCK
- <day>:<month>:<day of week>:<day of year>. i.e. Tuesday:December:2:348
It is a collection of :term:`family`\ s, :term:`variable`\ s, :term:`repeat` and a single
:term:`clock` definition.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Suite`
* - :ref:`grammar`
- :token:`suite`
suite definition
The suite definition is the hierarchical :term:`node` tree. It describes how your :term:`task`\ s run and interact. It can be built up using:
* Ascii text file by following the rules defined in the ecFlow :ref:`grammar`. Hence any language can be used, to generate this format.
* :ref:`python_based_suite_definition`
Once the definition is built, it can be loaded into the :term:`ecflow_server`, and started.
It can be monitored by :term:`ecflow_ui`
suspended
Is a :term:`node` state. A :term:`node` can be placed into the suspended state via a :term:`defstatus` or via :term:`ecflow_ui`
A suspended :term:`node` including any of its children cannot take part in :term:`scheduling` until
the node is resumed.
task
A task represents a job that needs to be carried out.
It serves as a leaf :term:`node` in a :term:`suite definition`
Only tasks can be submitted.
A job inside a task :term:`ecf script` should generally be re-entrant
so that no harm is done by rerunning it, since a task may be automatically
submitted more than once if it aborts.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Task`
* - :ref:`grammar`
- :token:`task`
time
This defines a time dependency for a node.
Time is expressed in the format ``[h]h:mm``. Only numeric values are allowed.
There can be multiple time dependencies for a node,
but overlapping times may cause unexpected results.
.. code-block:: shell
:caption: The task is free to run when the time is 10:00 or 11:00
task t
time 10:00
time 11:00
To define a series of times, specify the start time, end time and a time increment.
If the start time begins with '+', times are relative to the beginning of the suite or,
in repeated families, relative to the beginning of the repeated family.
If the time the job takes to complete is longer than the interval a “slot” is missed,
e.g.:
.. code-block:: shell
time 10:00 20:00 01:00
If the 10:00 run takes more than an hour, the 11:00 run will never occur.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Time`, :py:class:`ecflow.Node.add_time`
* - :ref:`grammar`
- :token:`time`
time dependencies
This includes :term:`time`, :term:`today`, :term:`day`, :term:`date` and :term:`cron`.
When we have multiple time dependencies on the same task, then time dependency of the same type are **or'ed** together, and **and'ed** with the different types.
.. code-block:: shell
:caption: This task will run on the 17th of February 2017 at 10am
task xx
time 10:00
date 17.2.2017
.. code-block:: shell
:caption: Run task xx. at 10am and 8pm, on the 17th and 19th of February 2017, that is four times in all. Notice the task is queued in between and completes only after the last run
task xx
time 10:00
time 20:00
date 17.2.2017
date 19.2.2017
today
Like :term:`time`, but If the suites begin time is **past** the time given for the "today" command the :term:`node` is free
to run (as far as the time dependency is concerned).
For example:
.. code-block:: shell
task x
today 10:00
If we begin or re-queue the :term:`suite` at 9.00 am, then the :term:`task` in held until 10.00 am.
However if we begin or re-queue the suite at 11.00am, the :term:`task` is run immediately.
No lets look at time:
.. code-block:: shell
task x
time 10:00
If we begin or re-queue the :term:`suite` at 9.00am, then the :term:`task` in held until 10.00 am.
If we begin or re-queue the :term:`suite` at 11.00am, the :term:`task` is still held.
If the time the job takes to complete is longer than the interval a “slot” is missed,
e.g.:
.. code-block:: shell
today 10:00 20:00 01:00
If the 10:00 run takes more than an hour, the 11:00 run will never occur.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Today`
* - :ref:`grammar`
- :token:`today`
trigger
Triggers defines a dependency for a :term:`task` or :term:`family`.
There can be only one trigger dependency per :term:`node`,
but that can be a complex boolean expression of the :term:`status` of several nodes.
Triggers should be avoided on suites.
A node with a trigger can only be activated when its trigger has expired.
A trigger holds the node as long as the trigger expression evaluation returns false.
Trigger evaluation occurs when ever the :term:`child command` communicates with the server. i.e whenever
there is a state change in the suite definition.
The keywords in trigger expressions are: :term:`unknown`, :term:`suspended`, :term:`complete`, :term:`queued`, :term:`submitted`, :term:`active`, :term:`aborted`
and **clear** and **set** for :term:`event` status.
Triggers can also reference Node attributes like :term:`event`, :term:`meter`, :term:`variable`, :term:`repeat` and generated variables.
Trigger evaluation for node attributes uses integer arithmetic:
- :term:`event`: has the integer value of 0(clear) and set(1)
- :term:`meter`: values are integers hence they are used as is
- :term:`variable`: value is converted to an integer, otherwise 0 is used. See example below
- :term:`repeat` *string*: use the index values as integers. See example below
- :term:`repeat` *enumerated*: use the index values as integers. See example below
- :term:`repeat` *integer*: use the implicit integer values
- :term:`repeat` *date*: use the date values as integers. Use of plus/minus on repeat date variable uses date arithmetic
- :term:`repeat` *datetime*: use the date+time instant values as integers. Use of plus/minus on repeat datetime variable uses second arithmetic
- :term:`limit`: the limit value is used as an integer. This allows a degree of prioritisation amongst tasks under a limit
- :term:`late`: the value is stored in a flag, and is a simple boolean. Used to signify when a task is late.
Here are some examples:
.. code-block:: shell
:caption: Trigger examples
suite trigger_suite
task a
event EVENT
meter METER 1 100 50
edit VAR_INT 12
edit VAR_STRING "captain scarlett" # This is not convertible to an integer, if referenced will use '0'
family f1
edit SLEEP 2
repeat string NAME a b c d e f # This has values: a(0), b(1), c(3), d(4), e(5), f(6) i.e index
family f2
repeat integer VALUE 5 10 # This has values: 5,6,7,8,9,10
family f3
repeat enumerated red green blue # red(0), green(1), blue(2)
task t1
repeat date DATE 19991230 20000102 # This has values: 19991230,19991231,20000101,20000102
endfamily
endfamily
endfamily
family f2
task event_meter
trigger /suite/a:EVENT == set and /suite/a:METER >= 30
task variable
trigger /suite/a:VAR_INT >= 12 and /suite/a:VAR_STRING == 0
task repeat_string
trigger /suite/f1:NAME >= 4
task repeat_integer
trigger /suite/f1/f2:VALUE >= 7
task repeat_date
trigger /suite/f1/f2/f3/t1:DATE >= 19991231
task repeat_date2
# Using plus/minus on a repeat DATE will use date arithmetic
# Since the starting value of DATE is 19991230, this task will run straight away
trigger /suite/f1/f2/f3/t1:DATE - 1 == 19991229
endfamily
endsuite
What happens when we have multiple node attributes of the same name, referenced in trigger expressions?
.. code-block:: shell
:caption: Trigger priority when name clashes
task foo
event blah
meter blah 0 200 50
edit blah 10
task bar
trigger foo:blah >= 0
In this case ecFlow will use the following precedence:
- :term:`event`
- :term:`meter`
- :term:`variable`
- :term:`repeat`
- generated variables
- :term:`limit`
Hence in the example above expression ``foo:blah >= 0`` will reference the event.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Expression`, :py:class:`ecflow.Node.add_trigger`
* - :ref:`grammar`
- :token:`trigger`
unknown
Is a :term:`node` :term:`status`.
This is the default :term:`node` :term:`status` when a :term:`suite definition` is loaded into the :term:`ecflow_server`
user command
User commands are any client to server requests that are **not** :term:`child command`\ s.
variable
ecFlow makes heavy use of different kinds of variables.There are several kinds of variables:
- Environment variables: which are set in the UNIX shell before the :term:`ecFlow` starts. These control :term:`ecflow_server`, and :term:`ecflow_client` .
- suite definition variables: Also referred to as user variables. These control :term:`ecflow_server`, and :term:`ecflow_client` and are available for use in :term:`job file`.
- Generated variables: These are generated within the :term:`suite definition` node tree during :term:`job creation` and are available for use in the :term:`job file`.
Variables can be referenced in :term:`trigger` and :term:`complete expression`\ s . When used as part of a :term:`trigger` or :term:`complete expression`, the value of the
variable should be convertible to an integer (n.b. floating point values are allowed, but will be truncated).
It is important to notice that, when evaluating a :term:`trigger` or :term:`complete expression`, the default value 0 will be used for any variable with a non-numerical value.
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.Node.add_variable`
* - :ref:`grammar`
- :token:`variable`
variable inheritance
When a :term:`variable` is needed at :term:`job creation` time, it is first sought in the :term:`task` itself.
If it is not found in the :term:`task`, it is sought from the task's parent and so on, up through
the :term:`node` levels until found.
For any :term:`node`, there are two places to look for variables.
Suite definition variables are looked for first, and then any generated variables.
variable substitution
Takes place during :term:`pre-processing` *or* command invocation.(i.e :term:`ECF_JOB_CMD`,ECF_KILL_CMD,etc)
It involves searching each line of :term:`ecf script` file or command, for :term:`ECF_MICRO` character. typically '%'
The text between two % character, defines a variable. i.e %VAR%
This variable is searched for in the :term:`suite definition`.
First the suite definition variables (sometimes referred to as user variables) are searched
and then Repeat variable name, and finally the generated variables. If no variable
is found then the same search pattern is repeated up the node tree.
The value of the :term:`variable` is replaced between the % characters.
If the micro character are not paired and an error message is written to the log file,
and the task is placed into the :term:`aborted` state.
If the variable is not found in the :term:`suite definition` during pre-processing then :term:`job creation` fails,
and an error message is written to the log file, and the task is placed into the :term:`aborted` state.
To avoid this, variables in the :term:`ecf script` can be defined as:
.. code-block:: shell
%VAR:replacement%
This is similar to %VAR% but if VAR is not found in the :term:`suite definition` then 'replacement' is used.
virtual clock
Like :term:`real clock` until the :term:`ecflow_server` is suspended (i.e :term:`shutdown` or :term:`halted`),
the suites :term:`clock` is also suspended.
Hence will honour relative times in :term:`cron`, :term:`today` and :term:`time` dependencies.
It is possible to have a combination of hybrid/real and virtual.
More useful when we want complete adherence to time related dependencies at the expense
being out of sync with system time.
zombie
Zombies are running jobs that fail authentication when communicating with the :term:`ecflow_server`
:term:`child command`\ s like (init, event,meter, label, abort,complete) are placed in the :term:`ecf script`
file and are used to communicate with the :term:`ecflow_server`.
The :term:`ecflow_server` authenticates each connection attempt made by the :term:`child command`.
Authentication can fail for a number of reasons:
- password(ECF_PASS) supplied with the :term:`child command`, does not match the one in the :term:`ecflow_server`
- path name(ECF_NAME) supplied with the :term:`child command`, does not locate a :term:`task` in the :term:`ecflow_server`
- process id(ECF_RID) supplied with :term:`child command`, does not correspond with the one stored in the :term:`ecflow_server`
- :term:`task` is already :term:`active`, but receives another init :term:`child command`
- :term:`task` is already :term:`complete`, but receives another :term:`child command`
- :term:`task` is already :term:`aborted`, but receives another :term:`child command`
When authentication fails the job is considered to be a zombie.
The :term:`ecflow_server` will keep a note of the zombie for a period of time, before it is automatically removed.
However the removed zombie, may well re-appear. (this is because each :term:`child command` will continue
attempting to contact the :term:`ecflow_server` for 24 hours. This is configurable
see ECF_TIMEOUT on :term:`ecflow_client`)
See also:
.. list-table::
* - :ref:`python_api`
- :py:class:`ecflow.ZombieAttr`, :py:class:`ecflow.ZombieUserActionType`
* - :ref:`grammar`
- :token:`zombie`
There are several types of zombies see :term:`zombie type` and :py:class:`ecflow.ZombieType`
zombie attribute
The zombie attribute defines how a :term:`zombie` should be handled in an automated fashion.
Very careful consideration should be taken before this attribute is added as it may hide a genuine problem.
It can be added to any :term:`node`. But is best defined at the :term:`suite` or :term:`family` level.
If there is no zombie attribute the default behaviour is to block the :term:`child command`.
To add a zombie attribute in python, please see: :py:class:`ecflow.ZombieAttr`
zombie type
See :term:`zombie` and class :py:class:`ecflow.ZombieAttr` for further information.
How do zombies arise.
- Server crashed (or terminated and restarted) and the recovered check point file is out of date.
- A task is repeatedly re-run, earlier copies will not be remembered.
- Job sent by another ecflow_server , but which cannot talk to the original ecflow_server
- Network glitches/network down
- errors in script, i.e. multiple calls to init, complete
- errors in job submission i.e. job submitted twice.
There are several types of zombies:
* **path**:
- The task path cannot be found in the server, because node tree was deleted, replaced,reload, server crashed or backup server does not have node tree.
- Jobs could have been created, via server :term:`scheduling` or by :term:`user command`\ s
* **user**: Job is created by :term:`user commands<user command>`\ s like, rerun, re-queue. User zombies are differentiated from server(scheduled) since they are automatically created when the force option is used and we have tasks in an :term:`active` or :term:`submitted` states.
* **ecf**: Jobs are created as part of the normal :term:`scheduling`
- Two init commands or task complete or aborted but receives another :term:`child command`
- Server crashed (or terminated and restarted) and the recovered :term:`check point` file is out of date.
- A :term:`task` is repeatedly re-run, earlier copies will not be remembered.
- Job sent by another :term:`ecflow_server`, but which cannot talk to the original :term:`ecflow_server`
- Network glitches/network down
* **ecf_pid**: pid mismatched, Job scheduled twice. Check submitter.
* **ecf_passwd**: Password mismatch, PID matches, system has re-cycled PID or hacked job file?
* **ecf_pid_passwd**: Both PID and password mismatch. Re-queue & submit of active job?
The type of the zombie is not fixed and may change.
|