1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411
|
Bugfixes:
* Many manager configuration settings that are only applicable to user
manager or system manager can be always set. It would be better to reject
them when parsing config.
* Jun 01 09:43:02 krowka systemd[1]: Unit user@1000.service has alias user@.service.
Jun 01 09:43:02 krowka systemd[1]: Unit user@6.service has alias user@.service.
Jun 01 09:43:02 krowka systemd[1]: Unit user-runtime-dir@6.service has alias user-runtime-dir@.service.
External:
* Fedora: add an rpmlint check that verifies that all unit files in the RPM are listed in %elogind_post macros.
* dbus:
- natively watch for dbus-*.service symlinks (PENDING)
- teach dbus to activate all services it finds in /etc/elogind/services/org-*.service
* fedora: suggest auto-restart on failure, but not on success and not on coredump. also, ask people to think about changing the start limit logic. Also point people to RestartPreventExitStatus=, SuccessExitStatus=
* neither pkexec nor sudo initialize environ[] from the PAM environment?
* fedora: update policy to declare access mode and ownership of unit files to root:root 0644, and add an rpmlint check for it
* register catalog database signature as file magic
* zsh shell completion:
- <command> <verb> -<TAB> should complete options, but currently does not
- systemctl add-wants,add-requires
- systemctl reboot --boot-loader-entry=
after being started.
* write blog stories about:
- hwdb: what belongs into it, lsusb
- enabling dbus services
- how to make changes to sysctl and sysfs attributes
- remote access
- how to pass throw-away units to elogind, or dynamically change properties of existing units
- testing with Harald's awesome test kit
- auto-restart
- how to develop against journal browsing APIs
- the journal HTTP iface
- non-cgroup resource management
- dynamic resource management with cgroups
- refreshed, longer missions statement
- calendar time events
- init=/bin/sh vs. "emergency" mode, vs. "rescue" mode, vs. "multi-user" mode, vs. "graphical" mode, and the debug shell
- how to create your own target
- instantiated apache, dovecot and so on
- hooking a script into various stages of shutdown/early boot
Regularly:
* look for close() vs. close_nointr() vs. close_nointr_nofail()
* check for strerror(r) instead of strerror(-r)
* pahole
* set_put(), hashmap_put() return values check. i.e. == 0 does not free()!
* use secure_getenv() instead of getenv() where appropriate
* link up selected blog stories from man pages and unit files Documentation= fields
Janitorial Clean-ups:
* rework mount.c and swap.c to follow proper state enumeration/deserialization
semantics, like we do for device.c now
* get rid of prefix_roota() and similar, only use chase() and related
calls instead.
* get rid of basename() and replace by path_extract_filename()
* Replace our fstype_is_network() with a call to libmount's mnt_fstype_is_netfs()?
Having two lists is not nice, but maybe it's now worth making a dependency on
libmount for something so trivial.
* drop set_free_free() and switch things over from string_hash_ops to
string_hash_ops_free everywhere, so that destruction is implicit rather than
explicit. Similar, for other special hashmap/set/ordered_hashmap destructors.
* generators sometimes apply C escaping and sometimes specifier escaping to
paths and similar strings they write out. Sometimes both. We should clean
this up, and should probably always apply both, i.e. introduce
unit_file_escape() or so, which applies both.
* xopenat() should pin the parent dir of the inode it creates before doing its
thing, so that it can create, open, label somewhat atomically.
Deprecations and removals:
* Remove any support for booting without /usr pre-mounted in the initrd entirely.
Update INITRD_INTERFACE.md accordingly.
* remove cgroups v1 support EOY 2023. As per
unit around, and always operate on that, instead of cgroup fs paths.
* drop support for kernels that lack ambient capabilities support (i.e. make
4.3 new baseline). Then drop support for "!!" modifier for ExecStart= which
is only supported for such old kernels.
* drop support for kernels lacking memfd_create() (i.e. make 3.17 new
baseline), then drop all pipe() based fallbacks.
* drop support for getrandom()-less kernels. (GRND_INSECURE means once kernel
5.6 becomes our baseline). See
https://github.com/systemd/systemd/pull/24101#issuecomment-1193966468 for
details. Maybe before that: at taint-flags/warn about kernels that lack
getrandom()/environments where it is blocked.
* drop support for LOOP_CONFIGURE-less loopback block devices, once kernel
baseline is 5.8.
* drop fd_is_mount_point() fallback mess once we can rely on
STATX_ATTR_MOUNT_ROOT to exist i.e. kernel baseline 5.8
* Remove /dev/mem ACPI FPDT parsing when /sys/firmware/acpi/fpdt is ubiquitous.
That requires distros to enable CONFIG_ACPI_FPDT, and have kernels v5.12 for
x86 and v6.2 for arm.
* Once baseline is 4.13, remove support for INTERFACE_OLD= checks in "udevadm
trigger"'s waiting logic, since we can then rely on uuid-tagged uevents
* remove remaining tpm1.2 support from sd-stub
Features:
* add a kernel cmdline switch (and cred?) for marking a system to be
"headless", in which case we never open /dev/console for reading, only for
* extend mime database with mime types for:
- journal files
- credential files
- hwdb files
- catalog files
* cryptsetup: new crypttab option to auto-grow a luks device to its backing
partition size. new crypttab option to reencrypt a luks device with a new
volume key.
* we probably should have some infrastructure to acquire sysexts with
drivers/firmware for local hardware automatically. Idea: reuse the modalias
logic of the kernel for this: make the main OS image install a hwdb file
that matches against local modalias strings, and adds properties to relevant
devices listing names of sysexts needed to support the hw. Then provide some
tool that goes through all devices and tries to acquire/download the
specified images.
* repart + cryptsetup: support file systems that are encrypted and use verity
on top. Usecase: confexts that shall be signed by the admin but also be
confidential. Then, add a new --make-ddi=confext-encrypted for this.
* tmpfiles: add new line type for moving files from some source dir to some
target dir. then use that to move sysexts/confexts and stuff from initrd
tmpfs to /run/, so that host can pick things up.
* tiny varlink service that takes a fd passed in and serves it via http. Then
make use of that in networkd, and expose some EFI binary of choice for
DHCP/HTTP base EFI boot.
* bootctl: add reboot-to-disk which takes a block device name, and
automatically sets things up so that system reboots into that device next.
* maybe: in PID1, when we detect we run in an initrd, make superblock read-only
early on, but provide opt-out via kernel cmdline.
* elogind-pcrextend:
- support measuring to nvindex with PCR update semantics ("fake PCRs")
- add api for "allocating" such an nvindex
- once we have that start measuring every sysext we apply, every confext,
every RootImage= we apply, every nspawn and so on. All in separate fake
PCRs.
* vmspawn:
- enable hyperv extension by default (https://www.qemu.org/docs/master/system/i386/hyperv.html)
- register with machined
- run in scope unit when invoked from command line, and machined registration is off
- support --directory= via virtiofs
- sd_notify support
- --ephemeral support
- --read-only support
- automatically suspend/resume the VM if the host suspends. Use logind
suspend inhibitor to implement this. request clean suspend by generating
suspend key presses.
- support for "real" networking via "-n" and --network-bridge=
- automatically run service "at the side" for swtpm
- translate SIGTERM to clean ACPI shutdown event
* elogind-pcrmachine should probably also measure the SMBIOS system UUID.
* sd-boot: allow synthesizing additional type1 entries via SMBIOS vendor strings
* storagetm:
- add USB mass storage device logic, so that all local disks are also exposed
as mass storage devices on systems that have a USB controller that can
operate in device mode
- add NVMe authentication
* add support for activating nvme-oF devices at boot automatically via kernel
cmdline, and maybe even support a syntax such as
root=nvme:<trtype>:<traddr>:<trsvcid>:<nqn>:<partition> to boot directly from
nvme-oF
* pcrlock:
- make signed PCR work together with pcrlock
- add kernel-install plugin that automatically creates UKI .pcrlock file when
UKI is installed, and removes it when it is removed again
- automatically install PE measurement of sd-boot on "bootctl install"
- write generated pcrlock signature files to the ESP as credential, one for
each installed OS & pick up generated pcrlock signature file in sd-stub,
pass it via initrd to OS
- pre-calc sysext + kernel cmdline measurements
- pre-calc cryptsetup root key measurement
- Add support for more than 8 branches per PCR OR
- add "elogind-pcrlock lock-kernel-current" or so which synthesizes .pcrlock
policy from currently booted kernel/event log, to close gap for first boot
for pre-built images
* add a new elogind-project@.service that is very similar to user@.service but
uses DynamicUser=1 and no PAMName= to invoke an unprivileged somewhat
light-weight service manager. Use HOME=/var/lib/elogind/projects/%i as home
dir. Similar for $XDG_RUNTIME_DIR. Start project@%i.target. Use LogField= to
add a field identifying the project.
* logind: add a new dbus call Sleep() which automatically redirects to one of
Suspend(), Hibernate(), SuspendThenHibernate() depending on what is
available, and also subject to some local configuration in
logind.conf. Should default to SuspendThenHibernate() if available, and then
fallback to Suspend() and finally Hibernate() if not. Then expose this as
"systemctl sleep", and tell DEs to default to this.
* in sd-boot and sd-stub measure the SMBIOS vendor strings to some PCR (at
least some subset of them that look like elogind stuff), because apparently
some firmware does not, but elogind honours it. avoid duplicate measurement
by sd-boot and sd-stub by adding LoaderFeatures/StubFeatures flag for this,
so that sd-stub can avoid it if sd-boot already did it.
* cryptsetup: a mechanism that allows signing a volume key with some key that
has to be present in the kernel keyring, or similar, to ensure that confext
DDIs can be encrypted against the local SRK but signed with the admin's key
and thus can authenticated locally before they are decrypted.
* image policy should be extended to allow dictating *how* a disk is unlocked,
i.e. root=encrypted-tpm2+encrypted-fido2 would mean "root fs must be
encrypted and unlocked via fido2 or tpm2, but not otherwise"
* homed: use elogind-storagetm to expose home dirs via nvme-tcp. Then,
same home dir. Similar maybe for nbd, iscsi? this should then first ask for
the local root pw, to authenticate that logging in like this is ok, and would
then be followed by another password prompt asking for the user's own
password. Also, do something similar for CIFS: if you log in via
lennart%cifs-someserver_someshare, then set up the homed dir for it
automatically. The PAM module should update the user name used for login to
the short version once it set up the user. Some care should be taken, so that
the long version can be still be resolved via NSS afterwards, to deal with
PAM clients that do not support PAM sessions where PAM_USER changes half-way.
* redefine /var/lib/extensions/ as the dir one can place all three of sysext,
confext as well is multi-modal DDIs that qualify as both. Then introduce
/var/lib/sysexts/ which can be used to place only DDIs that shall be used as
sysext
* in pid1: move out all cgroup state settings from Unit into a new object
CGroupState or so which is allocated when we realize the unit into a cgroup,
and then remains referenced by it. The new object should also carry an fd to
the realized cgroup, to pin it (and later execute all cgroup operations over,
once we drop cgroupv1 compat).
* add new "elogind-ssh-generator", which allows basic ssh config via
credentials (host key). It generates sshd.socket for IP, but also
sshd-vsock.socket for listening on AF_VSOCK when running in a VM, and
sshd-unix.socket on AF_UNIX when running in a container. It also generates a
matching sshd.service file with a host key passed in on the cmdline via
credentials. Then, add a ssh_config drop-in that matches some suitable
hostname pattern and has a ProxyCommand set that allows connecting to any
local VM/container that way without any networking configured.
* Varlinkification of the following command line tools, to open them up to
other programs via IPC:
- bootctl
- journalctl (allowing journal read access via IPC)
- coredumpcl
- elogind-bless-boot
- elogind-measure
- elogind-dissect
- elogind-sysupdate
- kernel-install
* Varlink: add glue code to allow varlink clients to be authenticated via
Polkit by passing client pidfd over.
* in the service manager, pick up ERRNO= + BUSERROR= + VARLINKERROR= error
identifiers, and store them along with the exit status of a server and report
via "systemctl status".
* enumerate virtiofs devices during boot-up in a generator, and synthesize
mounts for rootfs, /usr/, /home/, /srv/ and some others from it, depending on
the "tag". (waits for: https://gitlab.com/virtio-fs/virtiofsd/-/issues/128)
* automatically mount one virtiofs during early boot phase to /run/host/,
similar to how we do that for nspawn, based on some clear tag.
* add some service that makes an atomic snapshot of PCR state and event log up
to that point available, possibly even with quote by the TPM.
* encode type1 entries in some UKI section to add additional entries to the
menu.
* Add ACL-based access management to .socket units. i.e. add AllowPeerUser= +
AllowPeerGroup= that installs additional user/group ACL entries on AF_UNIX
sockets.
* elogind-tpm2-setup should probably have a factory reset logic, i.e. when some
kernel command line option is set we reset the TPM (equivalent of tpm2_clear
-c owner?).
* elogind-tpm2-setup should support a mode where we refuse booting if the SRK
changed. (Must be opt-in, to not break systems which are supposed to be
migratable between PCs)
* when elogind-sysext learns mutable /usr/ (and elogind-confext mutable /etc/)
then allow them to store the result in a .v/ versioned subdir, for some basic
snapshot logic
* add a new PE binary section ".mokkeys" or so which sd-stub will insert into
Mok keyring, by overriding/extending whatever shim sets in the EFI
var. Benefit: we can extend the kernel module keyring at ukify time,
i.e. without recompiling the kernel, taking an upstream OS' kernel and adding
a local key to it.
* PidRef conversion work:
- cg_pid_get_xyz()
- pid_from_same_root_fs()
- get_ctty_devnr()
- pid1: sd_notify() receiver should use SCM_PIDFD to authenticate client
- actually wait for POLLIN on pidref's pidfd in service logic
- exec_spawn() + safe_fork()
- openpt_allocate_in_namespace()
- sd_bus_creds
- unit_attach_pid_to_cgroup_via_bus()
- cg_attach() – requires new kernel feature
- varlink_get_peer_pid()
* ddi must be listed as block device fstype
* measure some string via pcrphase whenever we end up booting into emergency
mode.
* homed: add a basic form of secrets management to homed, that stores
secrets in $HOME somewhere, is protected by the accounts own authentication
mechanisms. Should implement something PKCS#11-like that can be used to
implement emulated FIDO2 in unpriv userspace on top (which should happen
outside of homed), emulated PKCS11, and libsecrets support. Operate with a
2nd key derived from volume key of the user, with which to wrap all
keys. maintain keys in kernel keyring if possible.
* use sd-event ratelimit feature optionally for journal stream clients that log
too much
* elogind-mount should only consider modern file systems when mounting, similar
to elogind-dissect
* add another PE section ".fname" or so that encodes the intended filename for
PE file, and validate that when loading add-ons and similar before using
it. This is particularly relevant when we load multiple add-ons and want to
sort them to apply them in a define order. The order should not be under
control of the attacker.
* also include packaging metadata (á la
https://systemd.io/ELF_PACKAGE_METADATA/) in our UEFI PE binaries, using the
same JSON format.
* make "bootctl install" + "bootctl update" useful for installing shim too. For
that introduce new dir /usr/lib/elogind/efi/extra/ which we copy mostly 1:1
into the ESP at install time. Then make the logic smart enough so that we
don't overwrite bootx64.efi with our own if the extra tree already contains
one. Also, follow symlinks when copying, so that shim rpm can symlink their
stuff into our dir (which is safe since the target ESP is generally VFAT and
thus does not have symlinks anyway). Later, teach the update logic to look at
the ELF package metadata (which we also should include in all PE files, see
above) for version info in all *.EFI files, and use it to only update if
newer.
* in sd-stub: optionally add support for a new PE section .keyring or so that
contains additional certificates to include in the Mok keyring, extending
what shim might have placed there. why? let's say I use "ukify" to build +
sign my own fedora-based UKIs, and only enroll my personal lennart key via
shim. Then, I want to include the fedora keyring in it, so that kmods work.
But I might not want to enroll the fedora key in shim, because this would
also mean that the key would be in effect whenever I boot an archlinux UKI
built the same way, signed with the same lennart key.
* resolved: take possession of some IPv6 ULA address (let's say
fd00:5353:5353:5353:5353:5353:5353:5353), and listen on port 53 on it for the
local stubs, so that we can make the stub available via ipv6 too.
* introduce a .microcode PE section for sd-stub which we'll pass as first initrd
to the kernel which will then upload it to the CPU. This should be distinct
from .initrd to guarantee right ordering. also, and maybe more importantly
support .microcode in PE add-ons, so that a microcode update can be shipped
independently of any kernel.
* Maybe add SwitchRootEx() as new bus call that takes env vars to set for new
PID 1 as argument. When adding SwitchRootEx() we should maybe also add a
flags param that allows disabling and enabling whether serialization is
requested during switch root.
* introduce a .acpitable section for early ACPI table override
* add proper .osrel matching for PE addons. i.e. refuse applying an addon
intended for a different OS. Take inspiration from how confext/sysext are
matched against OS.
* figure out what to do about credentials sealed to PCRs in kexec + soft-reboot
scenarios. Maybe insist sealing is done additionally against some keypair in
the TPM to which access is updated on each boot, for the next, or so?
* logind: when logging in, always take an fd to the home dir, to keep the dir
busy, so that autofs release can never happen. (this is generally a good
idea, and specifically works around the fact the autofs ignores busy by mount
namespaces)
* mount most file systems with a restrictive uidmap. e.g. mount /usr/ with a
uidmap that blocks out anything outside 0…1000 (i.e. system users) and similar.
* mount the root fs with MS_NOSUID by default, and then mount /usr/ without
both so that suid executables can only be placed there. Do this already in
the initrd. If /usr/ is not split out create a bind mount automatically.
* fix our various hwdb lookup keys to end with ":" again. The original idea was
that hwdb patterns can match arbitrary fields with expressions like
"*:foobar:*", to wildcard match both the start and the end of the string.
This only works safely for later extensions of the string if the strings
always end in a colon. This requires updating our udev rules, as well as
checking if the various hwdb files are fine with that.
* mount /tmp/ and /var/tmp with a uidmap applied that blocks out "nobody" user
among other things such as dynamic uid ranges for containers and so on. That
way no one can create files there with these uids and we enforce they are only
used transiently, never persistently.
* rework loopback support in fstab: when "loop" option is used, then
instantiate a new elogind-loop@.service for the source path, set the
lo_file_name field for it to something recognizable derived from the fstab
line, and then generate a mount unit for it using a udev generated symlink
based on lo_file_name.
* remove tomoyo support, it's obsolete and unmaintained apparently
* In .socket units, add ConnectStream=, ConnectDatagram=,
ConnectSequentialPacket= that create a socket, and then *connect to* rather than
listen on some socket. Then, add a new setting WriteData= that takes some
base64 data that elogind will write into the socket early on. This can then
be used to create connections to arbitrary services and issue requests into
them, as long as the data is static. This can then be combined with the
aforementioned journald subscription varlink service, to enable
activation-by-message id and similar.
* .service with invalid Sockets= starts successfully.
* landlock: lock down RuntimeDirectory= via landlock, so that services lose
ability to write anywhere else below /run/. Similar for
StateDirectory=. Benefit would be clear delegation via unit files: services
get the directories they get, and nothing else even if they wanted to.
* landlock: for unprivileged elogind (i.e. elogind --user), use landlock to
implement ProtectSystem=, ProtectHome= and so on. Landlock does not require
privs, and we can implement pretty similar behaviour. Also, maybe add a mode
where ProtectSystem= combined with an explicit PrivateMounts=no could request
similar behaviour for system services, too.
* Add elogind-mount@.service which is instantiated for a block device and
invokes elogind-mount and exits. This is then useful to use in
ENV{SYSTEMD_WANTS} in udev rules, and a bit prettier than using RUN+=
* udevd: extend memory pressure logic: also kill any idle worker processes
* udevadm: to make symlink querying with udevadm nicer:
- do not enable the pager for queries like 'udevadm info -q -r symlink'
- add mode with newlines instead of spaces (for grep)?
* SIGRTMIN+18 and memory pressure handling should still be added to: hostnamed,
localed, oomd, timedated.
* repart/gpt-auto/DDIs: maybe introduce a concept of "extension" partitions,
that have a new type uuid and can "extend" earlier partitions, to work around
partition that is to be extended would just set a bit in the partition flags
field to indicate that there's another extension partition to look for. The
identifying UUID of the extension partition would be hashed in counter mode
from the uuid of the original partition it extends. Inspiration for this is
the "dynamic partitions" concept of new Android. This would be a minimalistic
concept of a volume manager, with the extents it manages being exposes as GPT
partitions. I a partition is extended multiple times they should probably
grow exponentially in size to ensure O(log(n)) time for finding them on
access.
* Use CLONE_INTO_CGROUP to spawn elogind-executor, once glibc supports it in
posix_spawn().
* Make nspawn to a frontend for elogind-executor, so that we have to ways into
the executor: via unit files/dbus/varlink through PID1 and via cmdline/OCI
through nspawn.
* sd-stub: detect if we are running with uefi console output on serial, and if so
automatically add console= to kernel cmdline matching the same port.
* add a utility that can be used with the kernel's
CONFIG_STATIC_USERMODEHELPER_PATH and then handles them within pid1 so that
security, resource management and cgroup settings can be enforced properly
for all umh processes.
* elogind-shutdown: keep sending sd_notify() status updates immediately before
going down, in particular include the "reboot param" string.
* homed: when resizing an fs don't sync identity beforehand there might simply
not be enough disk space for that. try to be defensive and sync only after
resize.
* homed: if for some reason the partition ended up being much smaller than
whole disk, recover from that, and grow it again.
* timesyncd: when saving/restoring clock try to take boot time into account.
Specifically, along with the saved clock, store the current boot ID. When
starting, check if the boot id matches. If so, don't do anything (we are on
the same boot and clock just kept running anyway). If not, then read
CLOCK_BOOTTIME (which started at boot), and add it to the saved clock
timestamp, to compensate for the time we spent booting. If EFI timestamps are
available, also include that in the calculation. With this we'll then only
miss the time spent during shutdown after timesync stopped and before the
system actually reset.
* elogind-stub: maybe store a "boot counter" in the ESP, and pass it down to
userspace to allow ordering boots (for example in journalctl). The counter
would be monotonically increased on every boot.
into two of the three PAM stacks gdm provides.
See discussion at https://github.com/authselect/authselect/pull/311
* sd-boot: make boot loader spec type #1 accept http urls in "linux"
lines. Then, do the uefi http dance to download kernels and boot them. This
is then useful for network boot, by embedding a cpio with type #1 snippets
in sd-boot, which reference remote kernels.
* maybe prohibit setuid() to the nobody user, to lock things down, via seccomp.
the nobody is not a user any code should run under, ever, as that user would
possibly get a lot of access to resources it really shouldn't be getting
access to due to the userns + nfs semantics of the user. Alternatively: use
the seccomp log action, and allow it.
* sd-boot: add a new PE section .bls or so that carries a cpio with additional
boot loader entries (both type1 and type2). Then when initializing, find this
section, iterate through it and populate menu with it. cpio is simple enough
to make a parser for this reasonably robust. use same path structures as in
the ESP. Similar add one for signature key drop-ins.
* sd-boot: also allow passing in the cpio as in the previous item via SMBIOS
* add a new EFI tool "sd-fetch" or so. It looks in a PE section ".url" for an
URL, then downloads the file from it using UEFI HTTP APIs, and executes it.
Use case: provide a minimal ESP with sd-boot and a couple of these sd-fetch
binaries in place of UKIs, and download them on-the-fly.
* maybe: elogind-loop-generator that sets up loopback devices if requested via kernel
cmdline. use case: include encrypted/verity root fs in UKI.
* elogind-gpt-auto-generator: add kernel cmdline option to override block
device to dissect. also support dissecting a regular file. useccase: include
encrypted/verity root fs in UKI.
* sd-stub: add ".bootcfg" section for kernel bootconfig data (as per
https://docs.kernel.org/admin-guide/bootconfig.html)
* tpm2: add (optional) support for generating a local signing key from PCR 15
state. use private key part to sign PCR 7+14 policies. stash signatures for
expected PCR7+14 policies in EFI var. use public key part in disk encryption.
generate new sigs whenever db/dbx/mok/mokx gets updated. that way we can
securely bind against SecureBoot/shim state, without having to renroll
everything on each update (but we still have to generate one sig on each
update, but that should be robust/idempotent). needs rollback protection, as
usual.
* Lennart: big blog story about DDIs
* Lennart: big blog story about building initrds
* Lennart: big blog story about "why elogind-boot"
* bpf: see if we can use BPF to solve the syslog message cgroup source problem:
one idea would be to patch source sockaddr of all AF_UNIX/SOCK_DGRAM to
implicitly contain the source cgroup id. Another idea would be to patch
sendto()/connect()/sendmsg() sockaddr on-the-fly to use a different target
sockaddr.
* bpf: see if we can address opportunistic inode sharing of immutable fs images
with BPF. i.e. if bpf gives us power to hook into openat() and return a
different inode than is requested for which we however it has same contents
then we can use that to implement opportunistic inode sharing among DDIs:
make all DDIs ship xattr on all reg files with a SHA256 hash. Then, also
dictate that DDIs should come with a top-level subdir where all reg files are
linked into by their SHA256 sum. Then, whenever an inode is opened with the
xattr set, check bpf table to find dirs with hashes for other prior DDIs and
try to use inode from there.
* extend the verity signature partition to permit multiple signatures for the
same root hash, so that people can sign a single image with multiple keys.
* consider adding a new partition type, just for /opt/ for usage in system
extensions
* gpt-auto-discovery: also use the pkcs7 signature stuff, and pass signature to
kernel. So far we only did this for the various --image= switches, but not
for the root fs or /usr/.
* dissection policy should enforce that unlocking can only take place by
certain means, i.e. only via pw, only via tpm2, or only via fido, or a
combination thereof.
enforce the uuids for partitions created, so that they can calculate PCR 15
ahead of time.
* in the initrd: derive the default machine ID to pass to the host PID 1 via
$machine_id from the same seed credential.
* Add elogind-sysupdate-initrd.service or so that runs elogind-sysupdate in the
initrd to bootstrap the initrd to populate the initial partitions. Some things
to figure out:
- Should it run on firstboot or on every boot?
- If run on every boot, should it use the sysupdate config from the host on
subsequent boots?
* provide an API (probably IPC) to apps to encrypt/decrypt
credentials. use case: allow bluez bluetooth daemon to pass pairings to initrd
that way, without shelling out to our tools.
safe bet, given that it should change only on policy changes, and not
software updates. But that's wrong. Recent fwupd (rightfully) contains code
for updating the dbx denylist. This means even without any active policy
and in cryptsetup simply the empty list? Also, PCR 14 almost certainly should
be included as much as PCR 7 (as it contains shim's policy, which is
certainly as relevant as PCR 7 on many systems)
* To mimic the new tpm2-measure-pcr= crypttab option add the same to veritytab
(measuring the root hash) and integritytab (measuring the HMAC key if one is
used)
* We should start measuring all services, containers, and system extensions we
activate. probably into PCR 13. i.e. add --tpm2-measure-pcr= or so to
verity is used, hash of the root hash).
* bootspec: permit graceful "update" from type #2 to type #1. If both a type #1
and a type #2 entry exist under otherwise the exact same name, then use the
type #1 entry, and ignore the type #2 entry. This way, people can "upgrade"
from the UKI with all parameters baked in to a Type #1 .conf file with manual
parametrization, if needed. This matches our usual rule that admin config
should win over vendor defaults.
* write a "search path" spec, that documents the prefixes to search in
(i.e. the usual /etc/, /run/, /usr/lib/ dance, potentially /usr/etc/), how to
sort found entries, how masking works and overriding.
* automatic boot assessment: add one more default success check that just waits
for a bit after boot, and blesses the boot if the system stayed up that long.
* implement concept of "versioned" resources inside a dir, and write a spec for
it. Make all tools in elogind, in particular
RootImage=/RootDirectory=/--image=/--directory= implement this. Idea:
directories ending in ".v/" indicate a directory with versioned resources in
them. Versioned resources inside a .v dir are always named in the pattern
<prefix>_<version>[+<tries-left>[-<tries-done>]].<suffix>
* add support for using this .v/ logic on the root fs itself: in the initrd,
after mounting the rootfs, look for root-<arch>.v/ in the root fs, and then
apply the logic, moving the switch root logic there.
partitions marked for it are entirely removed. Use case: remove secondary OS
copy, and redundant partitions entirely, and recreate them anew.
* elogind-boot: maybe add support for collapsing menu entries of the same OS
into one item that can be opened (like in a "tree view" UI element) or
collapsed. If only a single OS is installed, disable this mode, but if
multiple OSes are installed might make sense to default to it, so that user
is not immediately bombarded with a multitude of Linux kernel versions but
only one for each OS.
addition to the existing mechanisms via EFI variables and kernel command
line. Benefit: works also on non-EFI systems, and can be requested on one
boot, for the next.
* elogind-sysupdate: make transport pluggable, so people can plug casync or
similar behind it, instead of http.
* in UKIs: add way to define allowlist of additional words that can be added to
the kernel cmdline even in SecureBoot mode
* we probably needs .pcrpkeyrd or so as additional PE section in UKIs,
which contains a separate public key for PCR values that only apply in the
initrd, i.e. in the boot phase "enter-initrd". Then, consumers in userspace
can easily bind resources to just the initrd. Similar, maybe one more for
"enter-initrd:leave-initrd" for resources that shall be accessible only
before unprivileged user code is allowed. (we only need this for .pcrpkey,
not for .pcrsig, since the latter is a list of signatures anyway). With that,
when you enroll a LUKS volume or similar, pick either the .pcrkey (for
coverage through all phases of the boot, but excluding shutdown), the
.pcrpkeyrd (for coverage in the initrd only) and .pcrpkeybt (for coverage
until users are allowed to log in).
* Once the root fs LUKS volume key is measured into PCR 15, default to binding
* add support for asymmetric LUKS2 TPM based encryption. i.e. allow preparing
an encrypted image on some host given a public key belonging to a specific
other host, so that only hosts possessing the private key in the TPM2 chip
can decrypt the volume key and activate the volume. Use case: elogind-confext
for a central orchestrator to generate confext images securely that can only
be activated on one specific host (which can be used for installing a bunch
of creds in /etc/credstore/ for example). Extending on this: allow binding
LUKS2 TPM based encryption also to the TPM2 internal clock. Net result:
prepare a confext image that can only be activated on a specific host that
runs a specific software in a specific time window. confext would be
automatically invalidated outside of it.
* maybe add a "elogind-report" tool, that generates a TPM2-backed "report" of
current system state, i.e. a combination of PCR information, local system
time and TPM clock, running services, recent high-priority log
messages/coredumps, system load/PSI, signed by the local TPM chip, to form an
enhanced remote attestation quote. Use case: a simple orchestrator could use
this: have the report tool upload these reports every 3min somewhere. Then
have the orchestrator collect these reports centrally over a 3min time
window, and use them to determine what which node should now start/stop what,
and generate a small confext for each node, that uses Uphold= to pin services
on each node. The confext would be encrypted using the asymmetric encryption
proposed above, so that it can only be activated on the specific host, if the
software is in a good state, and within a specific time frame. Then run a
loop on each node that sends report to orchestrator and then sysupdate to
update confext. Orchestrator would be stateless, i.e. operate on desired
config and collected reports in the last 3min time window only, and thus can
be trivially scaled up since all instances of the orchestrator should come to
the same conclusions given the same inputs of reports/desired workload info.
Could also be used to deliver Wireguard secrets and thus to clients, thus
permitting zero-trust networking: secrets are rolled over via confext updates,
and via the time window TPM logic invalidated if node doesn't keep itself
updated, or becomes corrupted in some way.
* in the initrd, once the rootfs encryption key has been measured to PCR 15,
derive default machine ID to use from it, and pass it to host PID 1.
* tree-wide: convert as much as possible over to use sd_event_set_signal_exit(), instead
of manually hooking into SIGINT/SIGTERM
* tree-wide: convert as much as possible over to SD_EVENT_SIGNAL_PROCMASK
instead of manual blocking.
* sd-boot: for each installed OS, grey out older entries (i.e. all but the
newest), to indicate they are obsolete
* automatically propagate LUKS password credential into cryptsetup from host
(i.e. SMBIOS type #11, …), so that one can unlock LUKS via VM hypervisor
supplied password.
* add ability to path_is_valid() to classify paths that refer to a dir from
those which may refer to anything, and use that in various places to filter
early. i.e. stuff ending in "/", "/." and "/.." definitely refers to a
directory, and paths ending that way can be refused early in many contexts.
* elogind-measure: allow operating with PEM certificates in addition to PEM
public keys when signing PCR values. SecureBoot and our Verity signatures
operate with certificates already, hence I guess we should also just deal for
convenience with certificates for the PCR stuff too.
* elogind-measure: add --pcrpkey-auto as an alternative to --pcrpkey=, where it
would just use the same public key specified with --public-key= (or the one
automatically derived from --private-key=).
* push people to use ".sysext.raw" as suffix for sysext DDIs (DDI =
discoverable disk images, i.e. the new name for gpt disk images following the
discoverable disk spec). [Also: just ".sysext/" for directory-based sysext]
* Add "purpose" flag to partition flags in discoverable partition spec that
indicate if partition is intended for sysext, for portable service, for
booting and so on. Then, when dissecting DDI allow specifying a purpose to
use as additional search condition. Use case: images that combined a sysext
partition with a portable service partition in one.
* On boot, auto-generate an asymmetric key pair from the TPM,
and use it for validating DDIs and credentials. Maybe upload it to the kernel
keyring, so that the kernel does this validation for us for verity and kernel
modules
* for elogind-confext: add a tool that can generate suitable DDIs with verity +
* lock down acceptable encrypted credentials at boot, via simple allowlist,
maybe on kernel command line:
elogind.import_encrypted_creds=foobar.waldo,tmpfiles.extra to protect locked
down kernels from credentials generated on the host with a weak kernel
* chase(): take inspiration from path_extract_filename() and return
O_DIRECTORY if input path contains trailing slash.
* chase(): refuse resolution if trailing slash is specified on input,
but final node is not a directory
* document in boot loader spec that symlinks in XBOOTLDR/ESP are not OK even if
non-VFAT fs is used.
* measure credentials picked up from SMBIOS to some suitable PCR
* measure GPT and LUKS headers somewhere when we use them (i.e. in
* pick up creds from EFI vars
* Add and pickup tpm2 metadata for creds structure.
* sd-boot: we probably should include all BootXY EFI variable defined boot
entries in our menu, and then suppress ourselves. Benefit: instant
compatibility with all other OSes which register things there, in particular
on other disks. Always boot into them via NextBoot EFI variable, to not
affect PCR values.
* elogind-measure tool:
- pre-calculate PCR 12 (command line) + PCR 13 (sysext) the same way we can precalculate PCR 11
* in sd-boot: load EFI drivers from a new PE section. That way, one can have a
"supercharged" sd-boot binary, that could carry ext4 drivers built-in.
* sd-bus: document that sd_bus_process() only returns messages that non of the
filters/handlers installed on the connection took possession of.
* sd-device: add an API for acquiring list of child devices, given a device
objects (i.e. all child dirents that dirs or symlinks to dirs)
* sd-device: maybe pin the sysfs dir with an fd, during the entire runtime of
an sd_device, then always work based on that.
* maybe add new flags to gpt partition tables for rootfs and usrfs indicating
purpose, i.e. whether something is supposed to be bootable in a VM, on
baremetal, on an nspawn-style container, if it is a portable service image,
or a sysext for initrd, for host os, or for portable container. Then hook
portabled/… up to udev to watch block devices coming up with the flags set, and
use it.
* sd-boot should look for information what to boot in SMBIOS, too, so that VM
managers can tell sd-boot what to boot into and suchlike
* add "elogind-sysext identify" verb, that you can point on any file in /usr/
and that determines from which overlayfs layer it originates, which image, and with
what it was signed.
locally in /var. It then outputs a certificate for the pub part to stdout.
This can then be copied/taken elsewhere, and can be used for encrypting creds
that only the host on its specific hw can decrypt. Then, support a drop-in
dir with certificates that can be used to authenticate credentials. Flow of
operations is then this: build image with owner certificate, then after
the dropped in certs and encrypted with machine pubkey, and pass to machine.
Machine is then able to authenticate you, and confidentiality is guaranteed.
* building on top of the above, the pub/priv key pair generated on the TPM2
should probably also one you can use to get a remote attestation quote.
* Process credentials in:
• networkd/udevd: add a way to define additional .link, .network, .netdev files
via the credentials logic.
• crypttab-generator: allow defining additional crypttab-like volumes via
credentials (similar: verity-generator, integrity-generator). Use
fstab-generator logic as inspiration.
• run-generator: allow defining additional commands to run via a credential
• resolved: allow defining additional /etc/hosts entries via a credential (it
might make sense to then synthesize a new combined /etc/hosts file in /run
and bind mount it on /etc/hosts for other clients that want to read it.
• repart: allow defining additional partitions via credential
• timesyncd: pick NTP server info from credential
• portabled: read a credential "portable.extra" or so, that takes a list of
file system paths to enable on start.
• make elogind-fstab-generator look for a system credential encoding root= or
usr=
register if not registered yet. Use case: deploy a system, and add an
account one can directly log into.
• in gpt-auto-generator: check partition uuids against such uuids supplied via
sd-stub credentials. That way, we can support parallel OS installations with
pre-built kernels.
* define a JSON format for units, separating out unit definitions from unit
runtime state. Then, expose it:
1. Add Describe() method to Unit D-Bus object that returns a JSON object
about the unit.
2. Expose this natively via Varlink, in similar style
3. Use it when invoking binaries (i.e. make PID 1 fork off elogind-executor
binary which reads the JSON definition and runs it), to address the cow
trap issue and the fact that NSS is actually forbidden in
forked-but-not-exec'ed children
4. Add varlink API to run transient units based on provided JSON definitions
* Add SUPPORT_END_URL= field to os-release with more *actionable* information
what to do if support ended
* pam_elogind: on interactive logins, maybe show SUPPORT_END information at
login time, à la motd
* sd-boot: instead of unconditionally deriving the ESP to search boot loader
spec entries in from the paths of sd-boot binary, let's optionally allow it
to be configured on sd-boot cmdline + efi var. Use case: embed sd-boot in the
UEFI firmware (for example, ovmf supports that via qemu cmdline option), and
use it to load stuff from the ESP.
* mount /var/ from initrd, so that we can apply sysext and stuff before the
initrd transition. Specifically:
1. There should be a var= kernel cmdline option, matching root= and usr=
2. elogind-gpt-auto-generator should auto-mount /var if it finds it on disk
3. mount.x-initrd mount option in fstab should be implied for /var
* make persistent restarts easier by adding a new setting OpenPersistentFile=
or so, which allows opening one or more files that is "persistent" across
service restarts, hot reboot, cold reboots (depending on configuration): the
files are created empty on first invocation, and on subsequent invocations
the files are reboot. The files would be backed by tmpfs, pmem or /var
depending on desired level of persistency.
* sd-event: add ability to "chain" event sources. Specifically, add a call
sd_event_source_chain(x, y), which will automatically enable event source y
in oneshot mode once x is triggered. Use case: in src/core/mount.c implement
the /proc/self/mountinfo rescan on SIGCHLD with this: whenever a SIGCHLD is
seen, trigger the rescan defer event source automatically, and allow it to be
dispatched *before* the SIGCHLD is handled (based on priorities). Benefit:
dispatch order is strictly controlled by priorities again. (next step: chain
event sources to the ratelimit being over)
* if we fork of a service with StandardOutput=journal, and it forks off a
subprocess that quickly dies, we might not be able to identify the cgroup it
comes from, but we can still derive that from the stdin socket its output
came from. We apparently don't do that right now.
* add ability to set hostname with suffix derived from machine id at boot
* add PR_SET_DUMPABLE service setting
* homed/userdb: maybe define a "companion" dir for home directories where apps
can safely put privileged stuff in. Would not be writable by the user, but
still conceptually belong to the user. Would be included in user's quota if
possible, even if files are not owned by UID of user. Use case: container
images that owned by arbitrary UIDs, and are owned/managed by the users, but
are not directly belonging to the user's UID. Goal: we shouldn't place more
privileged dirs inside of unprivileged dirs, and thus containers really
should not be placed inside of traditional UNIX home dirs (which are owned by
users themselves) but somewhere else, that is separate, but still close
by. Inform user code about path to this companion dir via env var, so that
container managers find it. the ~/.identity file is also a candidate for a
file to move there, since it is managed by privileged code (i.e. homed) and
not unprivileged code.
* given that /etc/ssh/ssh_config.d/ is a thing now, ship a drop-in for that
that hooks up userdbctl ssh-key stuff.
* maybe add support for binding and connecting AF_UNIX sockets in the file
system outside of the 108ch limit. When connecting, open O_PATH fd to socket
inode first, then connect to /proc/self/fd/XYZ. When binding, create symlink
to target dir in /tmp, and bind through it.
* add a proper concept of a "developer" mode, i.e. where cryptographic
protections of the root OS are weakened after interactive confirmation, to
allow hackers to allow their own stuff. idea: allow entering developer mode
only via explicit choice in boot menu: i.e. add explicit boot menu item for
it. When developer mode is entered, generate a key pair in the TPM2, and add
the public part of it automatically to keychain of valid code signature keys
on subsequent boots. Then provide a tool to sign code with the key in the
TPM2. Ensure that boot menu item is the only way to enter developer mode, by
binding it to locality/PCRs so that keys cannot be generated otherwise.
* services: add support for cryptographically unlocking per-service directories
via TPM2. Specifically, for StateDirectory= (and related dirs) use fscrypt to
set up the directory so that it can only be accessed if host and app are in
order.
* update HACKING.md to suggest developing elogind with the ideas from:
https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html
https://0pointer.net/blog/running-an-container-off-the-host-usr.html
* sd-event: compat wd reuse in inotify code: keep a set of removed watch
descriptors, and clear this set piecemeal when we see the IN_IGNORED event
for it, or when read() returns EAGAIN or on IN_Q_OVERFLOW. Then, whenever we
see an inotify wd event check against this set, and if it is contained ignore
the event. (to be fully correct this would have to count the occurrences, in
case the same wd is reused multiple times before we start processing
IN_IGNORED again)
* for vendor-built signed initrds:
- kernel-install should be able to install encrypted creds automatically for
machine id, root pw, rootfs uuid, resume partition uuid, and place next to
EFI kernel, for sd-stub to pick them up. These creds should be locked to
the TPM, and bind to the right PCR the kernel is measured to.
- kernel-install should be able to pick up initrd sysexts automatically and
place them next to EFI kernel, for sd-stub to pick them up.
- elogind-fstab-generator should look for rootfs device to mount in creds
- elogind-resume-generator should look for resume partition uuid in creds
- sd-stub: automatically pick up microcode from ESP (/loader/microcode/*)
and synthesize initrd from it, and measure it. Signing is not necessary, as
microcode does that on its own. Pass as first initrd to kernel.
* Maybe extend the service protocol to support handling of some specific SIGRT
signal for setting service log level, that carries the level via the
sigqueue() data parameter. Enable this via unit file setting.
* sd_notify/vsock: maybe support binding to AF_VSOCK in Type=notify services,
then passing $NOTIFY_SOCKET and $NOTIFY_GUESTCID with PID1's cid (typically
fixed to "2", i.e. the official host cid) and the expected guest cid, for the
two sides of the channel. The latter env var could then be used in an
appropriate qemu cmdline. That way qemu payloads could talk sd_notify()
directly to host service manager.
* sd-device has an API to create an sd_device object from a device id, but has
no api to query the device id
* sd-device should return the devnum type (i.e. 'b' or 'c') via some API for an
sd_device object, so that data passed into sd_device_new_from_devnum() can
also be queried.
* sd-event: optionally, if per-event source rate limit is hit, downgrade
priority, but leave enabled, and once ratelimit window is over, upgrade
priority again. That way we can combat event source starvation without
stopping processing events from one source entirely.
* sd-event: similar to existing inotify support add fanotify support (given
that apparently new features in this area are only going to be added to the
latter).
* sd-event: add 1st class event source for clock changes
* sd-event: add 1st class event source for timezone changes
* support uefi/http boots with sd-boot: instead of looking for dropin files in
/loader/entries/ dir, look for a file /loader/entries/SHA256SUMS and use that
as directory manifest. The file would be a standard directory listing as
generated by GNU sha256sums.
* sd-boot: maybe add support for embedding the various auxiliary resources we
look for right in the sd-boot binary. i.e. take inspiration from sd-stub
logic: allow combining sd-boot via ukify with kernels to enumerate, .conf
files, drivers, keys to enroll and so on. Then, add whatever we find that way
to the menu. Use case: allow building a single PE image you can boot into via
UEFI HTTP boot.
* maybe add a new UEFI stub binary "sd-http". It works similar to sd-stub, but
all it does is download a file from a http server, and execute it, after
optionally checking its hash sum. idea would be: combine this "sd-http" stub
binary with some minimal info about a URL + hash sum, plus .osrel data, and
drop it into the unified kernel dir in the ESP. And bam you have something
that is tiny, feels a lot like a unified kernel, but all it does is chainload
the real kernel. benefit: downloading these stubs would be tiny and quick,
hence cheap for enumeration.
* sysext: measure all activated sysext into a TPM PCR
* elogind-dissect: show available versions inside of a disk image, i.e. if
multiple versions are around of the same resource, show which ones. (in other
words: show partition labels).
* maybe add a generator that reads /proc/cmdline, looks for
elogind.pull-raw-portable=, elogind-pull-raw-sysext= and similar switches
that take a URL as parameter. It then generates service units for
elogind-pull calls that download these URLs if not installed yet. Use case:
invoke a VM or nspawn container in a way it automatically deploys/runs these
images as OS payloads. i.e. have a generic OS image you can point to any
payload you like, which is then downloaded, securely verified and run.
* deprecate cgroupsv1 further (print log message at boot)
* elogind-dissect: add --cat switch for dumping files such as /etc/os-release
* per-service sandboxing option: ProtectIds=. If used, will overmount
/etc/machine-id and /proc/sys/kernel/random/boot_id with synthetic files, to
make it harder for the service to identify the host. Depending on the user
setting it should be fully randomized at invocation time, or a hash of the
real thing, keyed by the unit name or so. Of course, there are other ways to
get these IDs (e.g. journal) or similar ids (e.g. MAC addresses, DMI ids, CPU
ids), so this knob would only be useful in combination with other lockdown
options. Particularly useful for portable services, and anything else that
uses RootDirectory= or RootImage=. (Might also over-mount
/sys/class/dmi/id/*{uuid,serial} with /dev/null).
* doc: prep a document explaining resolved's internal objects, i.e. Query
vs. Question vs. Transaction vs. Stream and so on.
* doc: prep a document explaining PID 1's internal logic, i.e. transactions,
jobs, units
* bootspec: bring UEFI and userspace enumeration of bootspec entries back into
sync, i.e. parse out architecture field in sd-boot (currently only done in
userspace)
* automatically ignore threaded cgroups in cg_xyz().
* add linker script that implicitly adds symbol for build ID and new coredump
json package metadata, and use that when logging
* Enable RestrictFileSystems= for all our long-running services (similar:
RestrictNetworkInterfaces=)
* cryptsetup/homed: implement TOTP authentication backed by TPM2 and its
internal clock.
* man: rework os-release(5), and clearly separate our extension-release.d/ and
initrd-release parts, i.e. list explicitly which fields are about what.
* sysext: before applying a sysext, do a superficial validation run so that
things are not rearranged to wildy. I.e. protect against accidental fuckups,
such as masking out /usr/lib/ or so. We should probably refuse if existing
inodes are replaced by other types of inodes or so.
* userdb: when synthesizing NSS records, pick "best" password from defined
passwords, not just the first. i.e. if there are multiple defined, prefer
unlocked over locked and prefer non-empty over empty.
* maybe add a tool inspired by the GPT auto discovery spec that runs in the
initrd and rearranges the rootfs hierarchy via bind mounts, if
enabled. Specifically in some top-level dir /@auto/ it will look for
dirs/symlinks/subvolumes that are named after their purpose, and optionally
encode a version as well as assessment counters, and then mount them into the
file system tree to boot into, similar to how we do that for the gpt auto
logic. Maybe then bind mount the original root into /.superior or something
like that (so that update tools can look there). Further discussion in this
thread:
detect a specially marked root fs (i.e introduce a new generic root gpt type
for this, that is arch independent). The also implement this in the image
dissection logic, so that nspawn/RootImage= and so on grok it. Maybe make
generic enough so that it can also work for ostrees arrangements.
* if a path ending in ".auto.d/" is set for RootDirectory=/RootImage= then do a
strverscmp() of everything inside that dir and use that. i.e. implement very
* homed: while a home dir is not activated generate slightly different NSS
records for it, that reports the home dir as "/" and the shell as some binary
provided by us. Then, when an SSH login happens and SSH permits it our binary
is invoked. This binary can then talk to homed and activate the homedir if
it's not around yet, prompting the user for a password. Once that succeeded
we'll switch to the real user record, i.e. home dir and shell, and our tool
exec()s the latter. Net effect: ssh'ing into a homed account will just work:
we'll neatly prompt for the homedir's password if its needed. –– Building on
this we could take this even further: since this tool will potentially have
access to the client's ssh-agent (if ssh-agent forwarding is enabled) we
could implement SSH unlocking of a homedir with that: when enrolling a new
ssh pubkey in a user record we'd ask the ssh-agent to sign some random value
with the privkey, then use that as luks key to unlock the home dir. Will not
work for ECDSA keys since their signatures contain a random component, but
will work for RSA and Ed25519 keys.
* add tiny service that decrypts encrypted user records passed via initrd
credential logic and drops them into /run where nss-elogind can pick them up,
similar to /run/host/userdb/. Use case: drop a root user JSON record there,
and use it in the initrd to log in as root with locally selected password,
for debugging purposes. Other use case: boot into qemu with regular user
mounted from host. maybe put this in elogind-user-sessions.service?
* drop dependency on libcap, replace by direct syscalls based on
CapabilityQuintet we already have. (This likely allows us to drop libcap
dep in the base OS image)
* userdbd: implement an additional varlink service socket that provides the
host user db in restricted form, then allow this to be bind mounted into
sandboxed environments that want the host database in minimal form. All
records would be stripped of all meta info, except the basic UID/name
info. Then use this in portabled environments that do not use PrivateUsers=1.
* portabled: when extracting unit files and copying to system.attached, if a
.p7s is available in the image, use it to protect the system.attached copy
with fs-verity, so that it cannot be tampered with
* logind introduce two types of sessions: "heavy" and "light". The former would
be our current sessions. But the latter would be a new type of session that
is mostly the same but does not pull in user@.service or wait for it. Then,
allow configuration which type of session is desired via pam_elogind
parameters, and then make user@.service's session one of these "light" ones.
People could then choose to make FTP sessions and suchlike "light" if they
don't want the service manager to be started for that.
* /etc/veritytab: allow that the roothash column can be specified as fs path
including a path to an AF_UNIX path, similar to how we do things with the
keys of /etc/crypttab. That way people can store/provide the roothash
externally and provide to us on demand only.
* we probably should extend the root verity hash of the root fs into some PCR
on boot. (i.e. maybe add a veritytab option tpm2-measure=12 or so to measure
it into PCR 12); Similar: we probably should extend the LUKS volume key of
the root fs into some PCR on boot. (i.e. maybe add a crypttab option
tpm2-measure=15 or so to measure it into PCR 15); once both are in place
update gpt-auto-discovery to generate these by default for the partitions it
discovers. Static vendor stuff should probably end up in PCR 12 (i.e. the
verity hash), with local keys in PCR 15 (i.e. the encryption volume
key). That way, we nicely distinguish resources supplied by the OS vendor
(i.e. sysext, root verity) from those inherently local (i.e. encryption key),
which is useful if they shall be signed separately.
* in uefi stub: query firmware regarding which PCR banks are being used, store
that in EFI var. then use this when enrolling TPM2 in cryptsetup to verify
that the selected PCRs actually are used by firmware.
* rework recursive read-only remount to use new mount API
* PAM: pick up authentication token from credentials
* when mounting disk images: if IMAGE_ID/IMAGE_VERSION is set in os-release
data in the image, make sure the image filename actually matches this, so
that images cannot be misused.
* New udev block device symlink names:
/dev/disk/by-parttypelabel/<pttype>-<ptlabel>. Use case: if pt label is used
as partition image version string, this is a safe way to reference a specific
version of a specific partition type, in particular where related partitions
are processed (e.g. verity + rootfs both named "LennartOS_0.7").
* sysupdate:
- add fuzzing to the pattern parser
- support casync as download mechanism
- "elogind-sysupdate update --all" support, that iterates through all components
defined on the host, plus all images installed into /var/lib/machines/,
/var/lib/portable/ and so on.
- figure out what to do about system extensions (i.e. they need to imply an
update component, since otherwise sysupdate.d/ files would override the
host's update files.)
- Allow invocation with a single transfer definition, i.e. with
--definitions= pointing to a file rather than a dir.
- add ability to disable implicit decompression of downloaded artifacts,
i.e. a Compress=no option in the transfer definitions
* in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix)
* DynamicUser= + StateDirectory= → use uid mapping mounts, too, in order to
make dirs appear under right UID.
* elogind-sysext: optionally, run it in initrd already, before transitioning
into host, to open up possibility for services shipped like that.
* introduce /dev/disk/root/* symlinks that allow referencing partitions on the
disk the rootfs is on in a reasonably secure way. (or maybe: add
/dev/gpt-auto-{home,srv,boot,…} similar in style to /dev/gpt-auto-root as we
already have it.
* whenever we receive fds via SCM_RIGHTS make sure none got dropped due to the
reception limit the kernel silently enforces.
* Add service unit setting ConnectStream= which takes IP addresses and connects to them.
* Similar, Load= which takes literal data in text or base64 format, and puts it
into a memfd, and passes that. This enables some fun stuff, such as embedding
bash scripts in unit files, by combining Load= with ExecStart=/bin/bash
/proc/self/fd/3
* add a ConnectSocket= setting to service unit files, that may reference a
socket unit, and which will connect to the socket defined therein, and pass
the resulting fd to the service program via socket activation proto.
* Add a concept of ListenStream=anonymous to socket units: listen on a socket
that is deleted in the fs. Use case would be with ConnectSocket= above.
* importd: support image signature verification with PKCS#7 + OpenBSD signify
logic, as alternative to crummy gpg
invoked on processes forked off PID 1.
* expose MS_NOSYMFOLLOW in various places
* credentials system:
- acquire from EFI variable?
- acquire via ask-password?
- acquire creds via keyring?
- pass creds via keyring?
- pass creds via memfd?
- acquire + decrypt creds from pkcs11?
- make macsec/wireguard code in networkd read key via creds logic
- make gatwayd/remote read key via creds logic
- add sd_notify() command for flushing out creds not needed anymore
- make user manager instances create and use a user-specific key (the one in
* add tpm.target or so which is delayed until TPM2 device showed up in case
firmware indicates there is one.
* TPM2: auto-reenroll in cryptsetup, as fallback for hosed firmware upgrades
and such
* introduce a new group to own TPM devices
* cryptsetup: add option for automatically removing empty password slot on boot
* cryptsetup: optionally, when run during boot-up and password is never
entered, and we are on battery power (or so), power off machine again
* cryptsetup: when waiting for FIDO2/PKCS#11 token, tell plymouth that, and
allow plymouth to abort the waiting and enter pw instead
* make cryptsetup lower --iter-time
* cryptsetup: allow encoding key directly in /etc/crypttab, maybe with a
"base64:" prefix. Useful in particular for pkcs11 mode.
* cryptsetup: reimplement the mkswap/mke2fs in cryptsetup-generator to use
elogind-makefs.service instead.
* cryptsetup:
- cryptsetup-generator: allow specification of passwords in crypttab itself
- support rd.luks.allow-discards= kernel cmdline params in cryptsetup generator
* Add service setting to run a service within the specified VRF. i.e. do the
equivalent of "ip vrf exec".
* special case some calls of chase() to use openat2() internally, so
that the kernel does what we otherwise do.
* add a new flag to chase() that stops chasing once the first missing
component is found and then allows the caller to create the rest.
* make use of new glibc 2.32 APIs sigabbrev_np() and strerrorname_np().
* if /usr/bin/swapoff fails due to OOM, log a friendly explanatory message about it
* pid1: support new clone3() fork-into-cgroup feature
* pid1: also remove PID files of a service when the service starts, not just
when it exits
* make us use dynamically fewer deps for containers in general purpose distros:
o turn into dlopen() deps:
- kmod-libs (only when called from PID 1)
- libblkid (only in RootImage= handling in PID 1, but not elsewhere)
- libpam (only when called from PID 1)
- bzip2, xz, lz4 (always — gzip and zstd should probably stay static deps the way they are,
since they are so basic and our defaults)
* seccomp: maybe use seccomp_merge() to merge our filters per-arch if we can.
Apparently kernel performance is much better with fewer larger seccomp
filters than with more smaller seccomp filters.
* elogind-path: add ESP and XBOOTLDR path. Add "private" runtime/state/cache dir enum,
mapping to $RUNTIME_DIRECTORY, $STATE_DIRECTORY and such
* seccomp: by default mask x32 ABI system wide on x86-64. it's on its way out
* seccomp: don't install filters for ABIs that are masked anyway for the
specific service
* busctl: maybe expose a verb "ping" for pinging a dbus service to see if it
exists and responds.
* socket units: allow creating a udev monitor socket with ListenDevices= or so,
with matches, then activate app through that passing socket over
* unify on openssl:
- kill gnutls support in resolved
- figure out what to do about libmicrohttpd, which has a hard dependency on
gnutls
- port fsprg over to a dlopen lib, then switch it to openssl
* add growvol and makevol options for /etc/crypttab, similar to
x-elogind.growfs and x-elogind-makefs.
* userdb: allow username prefix searches in varlink API, allow realname and
realname substr searches in varlink API
* userdb: allow uid/gid range checks
* userdb: allow existence checks
* pid1: activation by journal search expression
* when switching root from initrd to host, set the machine_id env var so that
if the host has no machine ID set yet we continue to use the random one the
initrd had set.
* sd-event: add native support for P_ALL waitid() watching, then move PID 1 to
it for reaping assigned but unknown children. This needs to some special care
to operate somewhat sensibly in light of priorities: P_ALL will return
arbitrary processes, regardless of the priority we want to watch them with,
hence on each event loop iteration check all processes which we shall watch
with higher prio explicitly, and then watch the entire rest with P_ALL.
* tweak sd-event's child watching: keep a prioq of children to watch and use
waitid() only on the children with the highest priority until one is waitable
and ignore all lower-prio ones from that point on
* maybe introduce xattrs that can be set on the root dir of the root fs
partition that declare the volatility mode to use the image in. Previously I
thought marking this via GPT partition flags but that's not ideal since
that's outside of the LUKS encryption/verity verification, and we probably
shouldn't operate in a volatile mode unless we got told so from a trusted
source.
* coredump: maybe when coredumping read a new xattr from /proc/$PID/exe that
may be used to mark a whole binary as non-coredumpable. Would fix:
https://bugs.freedesktop.org/show_bug.cgi?id=69447
* teach parse_timestamp() timezones like the calendar spec already knows it
* We should probably replace /etc/rc.d/README with a symlink to doc
content. After all it is constant vendor data.
* maybe add kernel cmdline params: to force random seed crediting
* introduce a new per-process uuid, similar to the boot id, the machine id, the
invocation id, that is derived from process creds, specifically a hashed
combination of AT_RANDOM + getpid() + the starttime from
/proc/self/status. Then add these ids implicitly when logging. Deriving this
uuid from these three things has the benefit that it can be derived easily
from /proc/$PID/ in a stable, and unique way that changes on both fork() and
exec().
* let's not GC a unit while its ratelimits are still pending
* when killing due to service watchdog timeout maybe detect whether target
process is under ptracing and then log loudly and continue instead.
* make rfkill uaccess controllable by default, i.e. steal rule from
gnome-bluetooth and friends
* make MAINPID= message reception checks even stricter: if service uses User=,
then check sending UID and ignore message if it doesn't match the user or
root.
* maybe trigger a uevent "change" on a device if "systemctl reload xyz.device"
is issued.
* when importing an fs tree with machined, optionally apply userns-rec-chown
* when importing an fs tree with machined, complain if image is not an OS
* Maybe introduce a helper safe_exec() or so, which is to execve() which
safe_fork() is to fork(). And then make revert the RLIMIT_NOFILE soft limit
to 1K implicitly, unless explicitly opted-out.
* rework seccomp/nnp logic that even if User= is used in combination with
a seccomp option we don't have to set NNP. For that, change uid first whil
keeping CAP_SYS_ADMIN, then apply seccomp, the drop cap.
* when no locale is configured, default to UEFI's PlatformLang variable
* add a new syscall group "@esoteric" for more esoteric stuff such as bpf() and
* paranoia: whenever we process passwords, call mlock() on the memory
first. i.e. look for all places we use free_and_erasep() and
augment them with mlock(). Also use MADV_DONTDUMP.
Alternatively (preferably?) use memfd_secret().
* Move RestrictAddressFamily= to the new cgroup create socket
* optionally: turn on cgroup delegation for per-session scope units
* sd-boot: optionally, show boot menu when previous default boot item has
non-zero "tries done" count
* augment CODE_FILE=, CODE_LINE= with something like CODE_BASE= or so which
contains some identifier for the project, which allows us to include
clickable links to source files generating these log messages. The identifier
could be some abberviated URL prefix or so (taking inspiration from Go
imports). For example, for elogind we could use
CODE_BASE=github.com/systemd/systemd/blob/98b0b1123cc or so which is
sufficient to build a link by prefixing "http://" and suffixing the
CODE_FILE.
* Augment MESSAGE_ID with MESSAGE_BASE, in a similar fashion so that we can
make clickable links from log messages carrying a MESSAGE_ID, that lead to
some explanatory text online.
* maybe extend .path units to expose fanotify() per-mount change events
* hibernate/s2h: if swap is on weird storage and refuse if so
* cgroups: use inotify to get notified when somebody else modifies cgroups
owned by us, then log a friendly warning.
* beef up log.c with support for stripping ANSI sequences from strings, so that
it is OK to include them in log strings. This would be particularly useful so
that our log messages could contain clickable links for example for unit
files and suchlike we operate on.
* importd: add ability download images for portabled + sysext
* add support for "portablectl attach http://foobar.com/waaa.raw (i.e. importd integration)
* sync dynamic uids/gids between host+portable srvice (i.e. if DynamicUser=1 is set for a service, make sure that the
selected user is resolvable in the service even if it ships its own /etc/passwd)
* Fix DECIMAL_STR_MAX or DECIMAL_STR_WIDTH. One includes a trailing NUL, the
other doesn't. What a disaster. Probably to exclude it.
* Check that users of inotify's IN_DELETE_SELF flag are using it properly, as
usually IN_ATTRIB is the right way to watch deleted files, as the former only
fires when a file is actually removed from disk, i.e. the link count drops to
zero and is not open anymore, while the latter happens when a file is
unlinked from any dir.
* port systemctl, busctl, … over to format-table.[ch]'s table formatters
* pid1: lock image configured with RootDirectory=/RootImage= using the usual nspawn semantics while the unit is up
* add --vacuum-xyz options to coredumpctl, matching those journalctl already has.
* add CopyFile= or so as unit file setting that may be used to copy files or
directory trees from the host to the services RootImage= and RootDirectory=
environment. Which we can use for /etc/machine-id and in particular
/etc/resolv.conf. Should be smart and do something useful on read-only
images, for example fall back to read-only bind mounting the file instead.
* bypass SIGTERM state in unit files if KillSignal is SIGKILL
* add proper dbus APIs for the various sd_notify() commands, such as MAINPID=1
and so on, which would mean we could report errors and such.
* introduce DefaultSlice= or so in system.conf that allows changing where we
place our units by default, i.e. change system.slice to something
else. Similar, ManagerSlice= should exist so that PID1's own scope unit could
be moved somewhere else too. Finally machined and logind should get similar
options so that it is possible to move user session scopes and machines to a
different slice too by default. Use case: people who want to put resources on
the entire system, with the exception of one specific service. See:
* maybe rework get_user_creds() to query the user database if $SHELL is used
for root, but only then.
* calenderspec: add support for week numbers and day numbers within a
year. This would allow us to define "bi-weekly" triggers safely.
* sd-bus: add vtable flag, that may be used to request client creds implicitly
and asynchronously before dispatching the operation
* sd-bus: parse addresses given in sd_bus_set_addresses immediately and not
only when used. Add unit tests.
* make use of ethtool veth peer info in machined, for automatically finding out
host-side interface pointing to the container.
* add some special mode to LogsDirectory=/StateDirectory=… that allows
declaring these directories without necessarily pulling in deps for them, or
creating them when starting up. That way, we could declare that
* deprecate RootDirectoryStartOnly= in favour of a new ExecStart= prefix char
* support projid-based quota in machinectl for containers
* add a way to lock down cgroup migration: a boolean, which when set for a unit
makes sure the processes in it can never migrate out of it
* blog about fd store and restartable services
* document Environment=SYSTEMD_LOG_LEVEL=debug drop-in in debugging document
* rework ExecOutput and ExecInput enums so that EXEC_OUTPUT_NULL loses its
magic meaning and is no longer upgraded to something else if set explicitly.
* in the long run: permit a system with /etc/machine-id linked to /dev/null, to
make it lose its identity, i.e. be anonymous. For this we'd have to patch
through the whole tree to make all code deal with the case where no machine
ID is available.
* optionally, collect cgroup resource data, and store it in per-unit RRD files,
suitable for processing with rrdtool. Add bus API to access this data, and
possibly implement a CPULoad property based on it.
* beef up pam_elogind to take unit file settings such as cgroups properties as
parameters
* maybe hook up xfs/ext4 quotactl() with services? i.e. automatically manage
the quota of the user indicated in User= via unit file settings, like the
other resource management concepts. Would mix nicely with DynamicUser=1. Or
alternatively, do this with projids, so that we can also cover services
running as root. Quota should probably cover all the special dirs such as
StateDirectory=, LogsDirectory=, CacheDirectory=, as well as RootDirectory= if it
is set, plus the whole disk space any image configured with RootImage=.
* In DynamicUser= mode: before selecting a UID, use disk quota APIs on relevant
disks to see if the UID is already in use.
* Add AddUser= setting to unit files, similar to DynamicUser=1 which however
creates a static, persistent user rather than a dynamic, transient user. We
can leverage code from sysusers.d for this.
* add some optional flag to ReadWritePaths= and friends, that has the effect
that we create the dir in question when the service is started. Example:
ReadWritePaths=:/var/lib/foobar
* Add ExecMonitor= setting. May be used multiple times. Forks off a process in
the service cgroup, which is supposed to monitor the service, and when it
exits the service is considered failed by its monitor.
* track the per-service PAM process properly (i.e. as an additional control
process), so that it may be queried on the bus and everything.
* add a new "debug" job mode, that is propagated to unit_start() and for
services results in two things: we raise SIGSTOP right before invoking
execve() and turn off watchdog support. Then, use that to implement
"elogind-gdb" for attaching to the start-up of any system service in its
natural habitat.
* gpt-auto logic: support encrypted swap, add kernel cmdline option to force
it, and honour a gpt bit about it, plus maybe a configuration file
* add a percentage syntax for TimeoutStopSec=, e.g. TimeoutStopSec=150%, and
then use that for the setting used in user@.service. It should be understood
relative to the configured default value.
* enable LockMLOCK to take a percentage value relative to physical memory
* Permit masking specific netlink APIs with RestrictAddressFamily=
* define gpt header bits to select volatility mode
* ProtectClock= (drops CAP_SYS_TIMES, adds seecomp filters for settimeofday, adjtimex), sets DeviceAllow o /dev/rtc
* ProtectTracing= (drops CAP_SYS_PTRACE, blocks ptrace syscall, makes /sys/kernel/tracing go away)
* ProtectMount= (drop mount/umount/pivot_root from seccomp, disallow fuse via DeviceAllow, imply Mountflags=slave)
* ProtectKeyRing= to take keyring calls away
* RemoveKeyRing= to remove all keyring entries of the specified user
* ProtectReboot= that masks reboot() and kexec_load() syscalls, prohibits kill
on PID 1 with the relevant signals, and makes relevant files in /sys and
/proc (such as the sysrq stuff) unavailable
* Support ReadWritePaths/ReadOnlyPaths/InaccessiblePaths in elogind --user instances
via the new unprivileged Landlock LSM (https://landlock.io)
* make sure the ratelimit object can deal with USEC_INFINITY as way to turn off things
* in nss-elogind, if we run inside of RootDirectory= with PrivateUsers= set,
find a way to map the User=/Group= of the service to the right name. This way
a user/group for a service only has to exist on the host for the right
mapping to work.
* add bus API for creating unit files in /etc, reusing the code for transient units
* add bus API to remove unit files from /etc
* add bus API to retrieve current unit file contents (i.e. implement "systemctl cat" on the bus only)
* rework fopen_temporary() to make use of open_tmpfile_linkable() (problem: the
kernel doesn't support linkat() that replaces existing files, currently)
* transient units: don't bother with actually setting unit properties, we
reload the unit file anyway
* optionally, also require WATCHDOG=1 notifications during service start-up and shutdown
* cache sd_event_now() result from before the first iteration...
* PID1: find a way how we can reload unit file configuration for
specific units only, without reloading the whole of elogind
* add an explicit parser for LimitRTPRIO= that verifies
the specified range and generates sane error messages for incorrect
specifications.
* when we detect that there are waiting jobs but no running jobs, do something
* PID 1 should send out sd_notify("WATCHDOG=1") messages (for usage in the --user mode, and when run via nspawn)
* there's probably something wrong with having user mounts below /sys,
as we have for debugfs. for example, src/core/mount.c handles mounts
prefixed with /sys generally special.
* fstab-generator: default to tmpfs-as-root if only usr= is specified on the kernel cmdline
* docs: bring https://www.freedesktop.org/wiki/Software/elogind/MyServiceCantGetRealtime up to date
* add a job mode that will fail if a transaction would mean stopping
running units. Use this in timedated to manage the NTP service
state.
* The udev blkid built-in should expose a property that reflects
whether media was sensed in USB CF/SD card readers. This should then
be used to control SYSTEMD_READY=1/0 so that USB card readers aren't
picked up by elogind unless they contain a medium. This would mirror
the behaviour we already have for CD drives.
* hostnamectl: show root image uuid
* Find a solution for SMACK capabilities stuff:
* synchronize console access with BSD locks:
* as soon as we have sender timestamps, revisit coalescing multiple parallel daemon reloads:
* figure out when we can use the coarse timers
* maybe allow timer units with an empty Units= setting, so that they
can be used for resuming the system but nothing else.
* what to do about udev db binary stability for apps? (raw access is not an option)
* exponential backoff in timesyncd when we cannot reach a server
* timesyncd: add ugly bus calls to set NTP servers per-interface, for usage by NM
* add elogind.abort_on_kill or some other such flag to send SIGABRT instead of SIGKILL
(throughout the codebase, not only PID1)
* drop nss-myhostname in favour of nss-resolve?
* resolved:
- mDNS/DNS-SD
- service registration
- service/domain/types browsing
- avahi compat
- DNS-SD service registration from socket units
- resolved should optionally register additional per-interface LLMNR
names, so that for the container case we can establish the same name
(maybe "host") for referencing the server, everywhere.
- allow clients to request DNSSEC for a single lookup even if DNSSEC is off (?)
- hook up resolved with machined-based address resolution
* refcounting in sd-resolve is borked
* add new gpt type for btrfs volumes
* generator that automatically discovers btrfs subvolumes, identifies their purpose based on some xattr on them.
* a way for container managers to turn off getty starting via $container_headless= or so...
* figure out a nice way how we can let the admin know what child/sibling unit causes cgroup membership for a specific unit
* For timer units: add some mechanisms so that timer units that trigger immediately on boot do not have the services
they run added to the initial transaction and thus confuse Type=idle.
* add bus api to query unit file's X fields.
* gpt-auto-generator:
- Define new partition type for encrypted swap? Support probed LUKS for encrypted swap?
- Make /home automount rather than mount?
* add generator that pulls in elogind-network from containers when
CAP_NET_ADMIN is set, more than the loopback device is defined, even
when it is otherwise off
* MessageQueueMessageSize= (and suchlike) should use parse_iec_size().
* implement Distribute= in socket units to allow running multiple
service instances processing the listening socket, and open this up
for ReusePort=
* cgroups:
- implement per-slice CPUFairScheduling=1 switch
- introduce high-level settings for RT budget, swappiness
- how to reset dynamically changed unit cgroup attributes sanely?
- when reloading configuration, apply new cgroup configuration
- when recursively showing the cgroup hierarchy, optionally also show
the hierarchies of child processes
- add settings for cgroup.max.descendants and cgroup.max.depth,
maybe use them for user@.service
* transient units:
- add field to transient units that indicate whether elogind or somebody else saves/restores its settings, for integration with libvirt
* when we detect low battery and no AC on boot, show pretty splash and refuse boot
* libelogind-journal, libelogind-login, libudev: add calls to easily attach these objects to sd-event event loops
* be more careful what we export on the bus as (usec_t) 0 and (usec_t) -1
* rfkill,backlight: we probably should run the load tools inside of the udev rules so that the state is properly initialized by the time other software sees it
* If we try to find a unit via a dangling symlink, generate a clean
error. Currently, we just ignore it and read the unit from the search
path anyway.
* refuse boot if /usr/lib/os-release is missing or /etc/machine-id cannot be set up
* man: the documentation of Restart= currently is very misleading and suggests the tools from ExecStartPre= might get restarted.
* load .d/*.conf dropins for device units
* There's currently no way to cancel fsck (used to be possible via C-c or c on the console)
* add option to sockets to avoid activation. Instead just drop packets/connections, see http://cyberelk.net/tim/2012/02/15/portreserve-elogind-solution/
* make sure elogind-ask-password-wall does not shutdown elogind-ask-password-console too early
* verify that the AF_UNIX sockets of a service in the fs still exist
when we start a service in order to avoid confusion when a user
assumes starting a service is enough to make it accessible
* Make it possible to set the keymap independently from the font on
the kernel cmdline. Right now setting one resets also the other.
* and a dbus call to generate target from current state
* investigate whether the gnome pty helper should be moved into elogind, to provide cgroup support.
* dot output for --test showing the 'initial transaction'
* be able to specify a forced restart of service A where service B depends on, in case B
needs to be auto-respawned?
* pid1:
- When logging about multiple units (stopping BoundTo units, conflicts, etc.),
log both units as UNIT=, so that journalctl -u triggers on both.
- generate better errors when people try to set transient properties
that are not supported...
- move PAM code into its own binary
- when we automatically restart a service, ensure we restart its rdeps, too.
- hide PAM options in fragment parser when compile time disabled
- Support --test based on current system state
- If we show an error about a unit (such as not showing up) and it has no Description string, then show a description string generated form the reverse of unit_name_mangle().
- after deserializing sockets in socket.c we should reapply sockopts and things
- drop PID 1 reloading, only do reexecing (difficult: Reload()
currently is properly synchronous, Reexec() is weird, because we
cannot delay the response properly until we are back, so instead of
being properly synchronous we just keep open the fd and close it
when done. That means clients do not get a successful method reply,
but much rather a disconnect on success.
- when breaking cycles drop sysv services first, then services from /run, then from /etc, then from /usr
- when a bus name of a service disappears from the bus make sure to queue further activation requests
- maybe introduce CoreScheduling=yes/no to optionally set a PR_SCHED_CORE cookie, so that all
processes in a service's cgroup share the same cookie and are guaranteed not to share SMT cores
with other units https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/hw-vuln/core-scheduling.rst
* unit files:
- allow port=0 in .socket units
- maybe introduce ExecRestartPre=
- implement Register= switch in .socket units to enable registration
in Avahi, RPC and other socket registration services.
- allow Type=simple with PIDFile=
https://bugzilla.redhat.com/show_bug.cgi?id=723942
- allow writing multiple conditions in unit files on one line
- introduce Type=pid-file
- add a concept of RemainAfterExit= to scope units
- Allow multiple ExecStart= for all Type= settings, so that we can cover rescue.service nicely
* timer units:
- timer units should get the ability to trigger when DST changes
- Modulate timer frequency based on battery state
* clean up date formatting and parsing so that all absolute/relative timestamps we format can also be parsed
* on shutdown: move utmp, wall, audit logic all into PID 1 (or logind?), get rid of elogind-update-utmp-runlevel
* make repeated alt-ctrl-del presses printing a dump
* currently x-elogind.timeout is lost in the initrd, since crypttab is copied into dracut, but fstab is not
* add a pam module that passes the hdd passphrase into the PAM stack and then expires it, for usage by gdm auto-login.
* add a pam module that on password changes updates any LUKS slot where the password matches
* test/:
- add unit tests for config_parse_device_allow()
* seems that when we follow symlinks to units we prefer the symlink
destination path over /etc and /usr. We should not do that. Instead
/etc should always override /run+/usr and also any symlink
destination.
* when isolating, try to figure out a way how we implicitly can order
all units we stop before the isolating unit...
* teach ConditionKernelCommandLine= globs or regexes (in order to match foobar={no,0,off})
* Add ConditionDirectoryNotEmpty= handle non-absoute paths as a search path or add
ConditionConfigSearchPathNotEmpty= or different syntax? See the discussion starting at
https://github.com/systemd/systemd/pull/15109#issuecomment-607740136.
* BootLoaderSpec: Define a way how an installer can figure out whether a BLS
compliant boot loader is installed.
* think about requeuing jobs when daemon-reload is issued? use case:
the initrd issues a reload after fstab from the host is accessible
and we might want to requeue the mounts local-fs acquired through
that automatically.
* elogind-inhibit: make taking delay locks useful: support sending SIGINT or SIGTERM on PrepareForSleep()
* remove any syslog support from log.c — we probably cannot do this before split-off udev is gone for good
* shutdown logging: store to EFI var, and store to USB stick?
* merge unit_kill_common() and unit_kill_context()
* add a dependency on standard-conf.xml and other included files to man pages
* MountFlags=shared acts as MountFlags=slave right now.
* properly handle loop back mounts via fstab, especially regards to fsck/passno
* initialize the hostname from the fs label of /, if /etc/hostname does not exist?
* sd-bus:
- EBADSLT handling
- GetAllProperties() on a non-existing object does not result in a failure currently
- port to sd-resolve for connecting to TCP dbus servers
- see if we can introduce a new sd_bus_get_owner_machine_id() call to retrieve the machine ID of the machine of the bus itself
- see if we can drop more message validation on the sending side
- add API to clone sd_bus_message objects
- longer term: priority inheritance
- dbus spec updates:
- NameLost/NameAcquired obsolete
- path escaping
- update elogind.special(7) to mention that dbus.socket is only about the compatibility socket now
* sd-event
- allow multiple signal handlers per signal?
- document chaining of signal handler for SIGCHLD and child handlers
- define more intervals where we will shift wakeup intervals around in, 1h, 6h, 24h, ...
- maybe support iouring as backend, so that we allow hooking read and write
operations instead of IO ready events into event loops. See considerations
here:
http://blog.vmsplice.net/2020/07/rethinking-event-loop-integration-for.html
* dbus: when a unit failed to load (i.e. is in UNIT_ERROR state), we
should be able to safely try another attempt when the bus call LoadUnit() is invoked.
* document org.freedesktop.MemoryAllocation1
* maybe do not install getty@tty1.service symlink in /etc but in /usr?
* print a nicer explanation if people use variable/specifier expansion in ExecStart= for the first word
* mount: turn dependency information from /proc/self/mountinfo into dependency information between elogind units.
* EFI:
- honor language efi variables for default language selection (if there are any?)
- honor timezone efi variables for default timezone selection (if there are any?)
- change bootctl to be backed by elogind-bootd to control temporary and persistent default boot goal plus efi variables
* bootctl
- recognize the case when not booted on EFI
* bootctl,sd-boot: actually honour the "architecture" key
* bootctl:
- show whether UEFI audit mode is available
- teach it to prepare an ESP wholesale, i.e. with mkfs.vfat invocation
- teach it to copy in unified kernel images and maybe type #1 boot loader spec entries from host
* logind:
- logind: optionally, ignore idle-hint logic for autosuspend, block suspend as long as a session is around
- logind: wakelock/opportunistic suspend support
- Add pretty name for seats in logind
- logind: allow showing logout dialog from system?
- add Suspend() bus calls which take timestamps to fix double suspend issues when somebody hits suspend and closes laptop quickly.
- if pam_elogind is invoked by su from a process that is outside of a
any session we should probably just become a NOP, since that's
usually not a real user session but just some system code that just
needs setuid().
- logind: make the Suspend()/Hibernate() bus calls wait for the for
the job to be completed. before returning, so that clients can wait
for "systemctl suspend" to finish to know when the suspending is
complete.
- logind: when the power button is pressed short, just popup a
logout dialog. If it is pressed for 1s, do the usual
shutdown. Inspiration are Macs here.
- expose "Locked" property on logind session objects
- maybe allow configuration of the StopTimeout for session scopes
- rename session scope so that it includes the UID. THat way
the session scope can be arranged freely in slices and we don't have
make assumptions about their slice anymore.
- follow PropertiesChanged state more closely, to deal with quick logouts and
relogins
- (optionally?) spawn seat-manager@$SEAT.service whenever a seat shows up that as CanGraphical set
- expose details of boot entries on the bus. In particular, it should be possible
to query the list of boot entry titles that bootctl / sd-boot would show.
Currently we only expose their identifiers.
* move multiseat vid/pid matches from logind udev rule to hwdb
* logind: rework pam_logind to also do a bus call in case of invocation from
user@.service, which returns the XDG_RUNTIME_DIR value, and make this
behaviour selectable via pam module option.
* delay activation of logind until somebody logs in, or when /dev/tty0 pulls it
in or lingering is on (so that containers don't bother with it until PAM is used). also exit-on-idle
* journal:
- consider introducing implicit _TTY= + _PPID= + _EUID= + _EGID= + _FSUID= + _FSGID= fields
- journald: also get thread ID from client, plus thread name
- journal: when waiting for journal additions in the client always sleep at least 1s or so, in order to minimize wakeups
- add API to close/reopen/get fd for journal client fd in libelogind-journal.
- fall back to /dev/log based logging in libelogind-journal, if we cannot log natively?
- declare the local journal protocol stable in the wiki interface chart
- sd-journal: speed up sd_journal_get_data() with transparent hash table in bg
- journald: when dropping msgs due to ratelimit make sure to write
"dropped %u messages" not only when we are about to print the next
message that works, but already after a short timeout
- check if we can make journalctl by default use --follow mode inside of less if called without args?
- maybe add API to send pairs of iovecs via sd_journal_send
- journal: add a setgid "elogind-journal" utility to invoke from libelogind-journal, which passes fds via STDOUT and does PK access
- journalctl: support negative filtering, i.e. FOOBAR!="waldo",
and !FOOBAR for events without FOOBAR.
- journal: store timestamp of journal_file_set_offline() in the header,
so it is possible to display when the file was last synced.
- journal-send.c, log.c: when the log socket is clogged, and we drop, count this and write a message about this when it gets unclogged again.
- journal: find a way to allow dropping history early, based on priority, other rules
- journal: When used on NFS, check payload hashes
- journald: add kernel cmdline option to disable ratelimiting for debug purposes
- refuse taking lower-case variable names in sd_journal_send() and friends.
- journald: we currently rotate only after MaxUse+MaxFilesize has been reached.
- journal: deal nicely with byte-by-byte copied files, especially regards header
- journal: sanely deal with entries which are larger than the individual file size, but where the components would fit
- Replace utmp, wtmp, btmp, and lastlog completely with journal
- journalctl: instead --after-cursor= maybe have a --cursor=XYZ+1 syntax?
- when a kernel driver logs in a tight loop, we should ratelimit that too.
- journald: optionally, log debug messages to /run but everything else to /var
- journald: when we drop syslog messages because the syslog socket is
full, make sure to write how many messages are lost as first thing
to syslog when it works again.
- journald: allow per-priority and per-service retention times when rotating/vacuuming
- journald: make use of uid-range.h to managed uid ranges to split
journals in.
- journalctl: add the ability to look for the most recent process of a binary. journalctl /usr/bin/X11 --pid=-1 or so...
- improve journalctl performance by loading journal files
lazily. Encode just enough information in the file name, so that we
do not have to open it to know that it is not interesting for us, for
the most common operations.
- man: document that corrupted journal files is nothing to act on
- rework journald sigbus stuff to use mutex
- Set RLIMIT_NPROC for elogind-journal-xyz, and all other of our
services that run under their own user ids, and use User= (but only
in a world where userns is ubiquitous since otherwise we cannot
invoke those daemons on the host AND in a container anymore). Also,
if LimitNPROC= is used without User= we should warn and refuse
operation.
- journalctl --verify: don't show files that are currently being
written to as FAIL, but instead show that they are being written to.
- add journalctl -H that talks via ssh to a remote peer and passes through
binary logs data
- add a version of --merge which also merges /var/log/journal/remote
- journalctl: -m should access container journals directly by enumerating
them via machined, and also watch containers coming and going.
Benefit: nspawn --ephemeral would start working nicely with the journal.
- assign MESSAGE_ID to log messages about failed services
- check if loop in decompress_blob_xz() is necessary
* journald: support RFC3164 fully for the incoming syslog transport, see
https://github.com/elogind/elogind/issues/19251#issuecomment-816601955
* Hook up journald's FSS logic with TPM2: seal the verification disk by
time-based policy, so that the verification key can remain on host and ve
validated via TPM.
* rework journalctl -M to be based on a machined method that generates a mount
fd of the relevant journal dirs in the container with uidmapping applied to
allow the host to read it, while making everything read-only.
* journald: add varlink service that allows subscribing to certain log events,
for example matching by message ID, or log level returns a list of journal
cursors as they happen.
* journald: also collect CLOCK_BOOTTIME timestamps per log entry. Then, derive
"corrected" CLOCK_REALTIME information on display from that and the timestamp
info of the newest entry of the specific boot (as identified by the boot
ID). This way, if a system comes up without a valid clock but acquires a
better clock later, we can "fix" older entry timestamps on display, by
calculating backwards. We cannot use CLOCK_MONOTONIC for this, since it does
not account for suspend phases. This would then also enable us to correct the
kmsg timestamping we consume (where we erroneously assume the clock was in
CLOCK_MONOTONIC, but it actually is CLOCK_BOOTTIME as per kernel).
* in journald, write out a recognizable log record whenever the system clock is
changed ("stepped"), and in timesyncd whenever we acquire an NTP fix
("slewing"). Then, in journalctl for each boot time we come across, find
these records, and use the structured info they include to display
"corrected" wallclock time, as calculated from the monotonic timestamp in the
log record, adjusted by the delta declared in the structured log record.
* in journald: whenever we start a new journal file because the boot ID
changed, let's generate a recognizable log record containing info about old
and new ID. Then, when displaying log stream in journalctl look for these
records, to be able to order them.
* journald: generate recognizable log events whenever we shutdown journald
cleanly, and when we migrate run → var. This way tools can verify that a
previous boot terminated cleanly, because either of these two messages must
be safely written to disk, then.
* hook up journald with TPMs? measure new journal records to the TPM in regular
intervals, validate the journal against current TPM state with that. (taking
inspiration from IMA log)
* sd-journal puts a limit on parallel journal files to view at once. journald
should probably honour that same limit (JOURNAL_FILES_MAX) when vacuuming to
ensure we never generate more files than we can actually view.
* maybe add a tool that displays most recent journal logs as QR code to scan
off screen and run it automatically on boot failures, emergency logs and
such. Use DRM APIs directly, see
https://github.com/dvdhrm/docs/blob/master/drm-howto/modeset.c for an example
for doing that.
* maybe implicitly attach monotonic+realtime timestamps to outgoing messages in
log.c and sd-journal-send
* journalctl/timesyncd: whenever timesyncd acquires a synchronization from NTP,
create a structured log entry that contains boot ID, monotonic clock and
realtime clock (I mean, this requires no special work, as these three fields
are implicit). Then in journalctl when attempting to display the realtime
timestamp of a log entry, first search for the closest later log entry
of this kinda that has a matching boot id, and convert the monotonic clock
timestamp of the entry to the realtime clock using this info. This way we can
retroactively correct the wallclock timestamps, in particular for systems
without RTC, i.e. where initially wallclock timestamps carry rubbish, until
an NTP sync is acquired.
* introduce per-unit (i.e. per-slice, per-service) journal log size limits.
* journald: do journal file writing out-of-process, with one writer process per
client UID, so that synthetic hash table collisions can slow down a specific
user's journal stream down but not the others.
* tweak journald context caching. In addition to caching per-process attributes
keyed by PID, cache per-cgroup attributes (i.e. the various xattrs we read)
keyed by cgroup path, and guarded by ctime changes. This should provide us
with a nice speed-up on services that have many processes running in the same
cgroup.
* maybe add call sd_journal_set_block_timeout() or so to set SO_SNDTIMEO for
the sd-journal logging socket, and, if the timeout is set to 0, sets
O_NONBLOCK on it. That way people can control if and when to block for
logging.
* journalctl: make sure -f ends when the container indicated by -M terminates
* journald: sigbus API via a signal-handler safe function that people may call
from the SIGBUS handler
* add a test if all entries in the catalog are properly formatted.
(Adding dashes in a catalog entry currently results in the catalog entry
being silently skipped. journalctl --update-catalog must warn about this,
and we should also have a unit test to check that all our message are OK.)
* build short web pages out of each catalog entry, build them along with man
pages, and include hyperlinks to them in the journal output
* homed:
- when user tries to log into record signed by unrecognized key, automatically add key to our chain after polkit auth
- rollback when resize fails mid-operation
- GNOME's side for forget key on suspend (requires rework so that lock screen runs outside of uid)
- update LUKS password on login if we find there's a password that unlocks the JSON record but not the LUKS device.
- create on activate?
- properties: icon url?, preferred session type?, administrator bool (which translates to 'wheel' membership)?, address?, telephone?, vcard?, samba stuff?, parental controls?
- communicate clearly when usb stick is safe to remove. probably involves
beefing up logind to make pam session close hook synchronous and wait until
elogind --user is shut down.
- logind: maybe keep a "busy fd" as long as there's a non-released session around or the user@.service
- maybe make automatic, read-only, time-based reflink-copies of LUKS disk
images (and btrfs snapshots of subvolumes) (think: time machine)
- distinguish destroy / remove (i.e. currently we can unregister a user, unregister+remove their home directory, but not just remove their home directory)
- in elogind's PAMName= logic: query passwords with ssh-askpassword, so that we can make "loginctl set-linger" mode work
- fingerprint authentication, pattern authentication, …
- make sure "classic" user records can also be managed by homed
- make size of $XDG_RUNTIME_DIR configurable in user record
- query password from kernel keyring first
- update even if record is "absent"
- make slice for users configurable (requires logind rework)
- logind: populate auto-login list bus property from PKCS#11 token
- when determining state of a LUKS home directory, check DM suspended sysfs file
- when homed is in use, maybe start the user session manager in a mount namespace with MS_SLAVE,
so that mounts propagate down but not up - eg, user A setting up a backup volume
doesn't mean user B sees it
- use credentials logic/TPM2 logic to store homed signing key
- permit multiple user record signing keys to be used locally, and pick
the right one for signing records automatically depending on a pre-existing
signature
- add a way to "adopt" a home directory, i.e. strip foreign signatures
and insert a local signature instead.
- as an extension to the directory+subvolume backend: if located on
especially marked fs, then sync down password into LUKS header of that fs,
and always verify passwords against it too. Bootstrapping is a problem
though: if no one is logged in (or no other user even exists yet), how do you
unlock the volume in order to create the first user and add the first pw.
- support new FS_IOC_ADD_ENCRYPTION_KEY ioctl for setting up fscrypt
- maybe pre-create ~/.cache as subvol so that it can have separate quota
easily?
- add a switch to homectl (maybe called --first-boot) where it will check if
any non-system users exist, and if not prompts interactively for basic user
- store PKCS#11 + FIDO2 token info in LUKS2 header, compatible with
- on login, if we can't fallocate initially, but rebalance is on, then allow
login in discard mode, then immediately rebalance, then turn off discard
- extend user records with optional "bulk" data. Specifically, a user
avatar/photo or so. This data should be stored along with the user record,
but probably shouldn't be part of the record itself, since it might be
large.
- add "homectl unbind" command to remove local user record of an inactive
home dir
partition on disk, but only if it is marked for growing and not read-only.
or so. (this is useful to factory reset an image, then putting it into
another machine, ensuring that luks key is generated on new machine, not old)
something goes wrong on the way.
end), in order to maximize dd'ability. Requires libfdisk work, see
https://github.com/karelzak/util-linux/issues/907
MBR case. Idea: accept syntax "Type=gpt:home mbr:0x83" for setting the types
for the two partition types explicitly. And provide an internal mapping so
that "Type=linux-generic" maps to the right types for both partition tables
automatically.
is useful to implement ESP vs. XBOOTLDR schemes in installers: have one set
of repart files for the case where ESP is large enough and one where it isn't
and XBOOTLDR is added in instead. Then apply the former first, and if it
fails to apply use the latter.
Also add option to disable operation via kernel command line.
during boot.
* document:
- document that deps in [Unit] sections ignore Alias= fields in
[Install] units of other units, unless those units are disabled
- man: clarify that time-sync.target is not only sysv compat but also useful otherwise. Same for similar targets
- document that service reload may be implemented as service reexec
- add a man page containing packaging guidelines and recommending usage of things like Documentation=, PrivateTmp=, PrivateNetwork= and ReadOnlyDirectories=/etc /usr.
- document elogind-journal-flush.service properly
- documentation: recommend to connect the timer units of a service to the service via Also= in [Install]
- man: document the very specific env the shutdown drop-in tools live in
- man: add more examples to man pages,
- in particular an example how to do the equivalent of switching runlevels
- man: maybe sort directives in man pages, and take sections from --help and apply them to man too
- document root=gpt-auto properly
* systemctl:
- add systemctl switch to dump transaction without executing it
- Add a verbose mode to "systemctl start" and friends that explains what is being done or not done
- "systemctl disable" on a static unit prints no message and does
nothing. "systemctl enable" does nothing, and gives a bad message
about it. Should fix both to print nice actionable messages.
- print nice message from systemctl --failed if there are no entries shown, and hook that into ExecStartPre of rescue.service/emergency.service
- add new command to systemctl: "systemctl system-reexec" which reexecs as many daemons as virtually possible
- systemctl enable: fail if target to alias into does not exist? maybe show how many units are enabled afterwards?
- systemctl: "Journal has been rotated since unit was started." message is misleading
- systemctl status output should include list of triggering units and their status
* introduce an option (or replacement) for "systemctl show" that outputs all
properties as JSON, similar to busctl's new JSON output. In contrast to that
it should skip the variant type string though.
* Add a "systemctl list-units --by-slice" mode or so, which rearranges the
output of "systemctl list-units" slightly by showing the tree structure of
the slices, and the units attached to them.
* add "systemctl wait" or so, which does what "elogind-run --wait" does, but
for all units. It should be both a way to pin units into memory as well as a
wait to retrieve their exit data.
* show whether a service has out-of-date configuration in "systemctl status" by
using mtime data of ConfigurationDirectory=.
* "systemctl preset-all" should probably order the unit files it
operates on lexicographically before starting to work, in order to
ensure deterministic behaviour if two unit files conflict (like DMs
do, for example)
* add "systemctl start -v foobar.service" that shows logs of a service
while the start command runs. This is non-trivial to do without
races though, since we should flush out all journal messages before
returning from the "systemctl stop".
* systemctl: if some operation fails, show log output?
* Add a new verb "systemctl top"
* unit install:
- "systemctl mask" should find all names by which a unit is accessible
(i.e. by scanning for symlinks to it) and link them all to /dev/null
* nspawn:
- emulate /dev/kmsg using CUSE and turn off the syslog syscall
with seccomp. That should provide us with a useful log buffer that
elogind can log to during early boot, and disconnect container logs
from the kernel's logs.
- as soon as networkd has a bus interface, hook up --network-interface=,
--network-bridge= with networkd, to trigger netdev creation should an
interface be missing
- a nice way to boot up without machine id set, so that it is set at boot
automatically for supporting --ephemeral. Maybe hash the host machine id
together with the machine name to generate the machine id for the container
- fix logic always print a final newline on output.
https://github.com/systemd/systemd/pull/272#issuecomment-113153176
- should optionally support receiving WATCHDOG=1 messages from its payload
PID 1...
- optionally automatically add FORWARD rules to iptables whenever nspawn is
running, remove them when shut down.
- add support for sysext extensions, too. i.e. a new --extension= switch that
takes one or more arguments, and applies the extensions already during
startup.
- when main nspawn supervisor process gets suspended due to SIGSTOP/SIGTTOU
or so, freeze the payload too.
- support time namespaces
- on cgroupsv1 issue cgroup empty handler process based on host events, so
that we make cgroup agent logic safe
- add API to invoke binary in container, then use that as fallback in
"machinectl shell"
- make nspawn suitable for shell pipelines: instead of triggering a hangup
when input is finished, send ^D, which synthesizes an EOF. Then wait for
hangup or ^D before passing on the EOF.
- greater control over selinux label?
- support that /proc, /sys/, /dev are pre-mounted
- maybe allow TPM passthrough, backed by swtpm, and measure --image= hash
into its PCR 11, so that nspawn instances can be TPM enabled, and partake
in measurements/remote attestation and such. swtpm would run outside of
control of container, and ideally would itself bind its encryption keys to
host TPM.
- make boot assessment do something sensible in a container. i.e send an
sd_notify() from payload to container manager once boot-up is completed
successfully, and use that in nspawn for dealing with boot counting,
implemented in the partition table labels and directory names.
- optionally set up nftables/iptables routes that forward UDP/TCP traffic on
port 53 to resolved stub 127.0.0.54
- maybe optionally insert .nspawn file as GPT partition into images, so that
such container images are entirely stand-alone and can be updated as one.
- The subreaper logic we currently have seems overly complex. We should
investigate whether creating the inner child with CLONE_PARENT isn't better.
- Reduce the number of sockets that are currently in use and just rely on one
or two sockets.
- Support running nspawn as an unprivileged user.
* machined: add API to acquire UID range. add API to mount/dissect loopback
file. Both protected by PK. Then make nspawn use these APIs to run
unprivileged containers. i.e. push the truly privileged bits into machined,
so that the client side can remain entirely unprivileged, with SUID or
anything like that.
* machined:
- add an API so that libvirt-lxc can inform us about network interfaces being
removed or added to an existing machine
- "machinectl migrate" or similar to copy a container from or to a
difference host, via ssh
- "machinectl status" should also show internal logs of the container in
question
- "machinectl history"
- "machinectl diff"
- "machinectl commit" that takes a writable snapshot of a tree, invokes a
shell in it, and marks it read-only after use
* udev:
- move to LGPL
- kill scsi_id
- add trigger --subsystem-match=usb/usb_device device
- reimport udev db after MOVE events for devices without dev_t
- re-enable ProtectClock= once only cgroupsv2 is supported.
See f562abe2963bad241d34e0b308e48cf114672c84.
* coredump:
- save coredump in Windows/Mozilla minidump format
- when truncating coredumps, also log the full size that the process had, and make a metadata field so we can report truncated coredumps
- add examples for other distros in ELF_PACKAGE_METADATA
* support crash reporting operation modes (https://live.gnome.org/GnomeOS/Design/Whiteboards/ProblemReporting)
* tmpfiles:
- apply "x" on "D" too (see patch from William Douglas)
- allow time-based cleanup in r and R too
- instead of ignoring unknown fields, reject them.
- creating new directories/subvolumes/fifos/device nodes
should not follow symlinks. None of the other adjustment or creation
calls follow symlinks.
- add --test mode
- teach tmpfiles.d q/Q logic something sensible in the context of XFS/ext4
project quota
- teach tmpfiles.d m/M to move / atomic move + symlink old -> new
- add new line type for setting btrfs subvolume attributes (i.e. rw/ro)
- tmpfiles: add new line type for setting fcaps
* udev-link-config:
- Make sure ID_PATH is always exported and complete for
network devices where possible, so we can safely rely
on Path= matching
* sd-rtnl:
- add support for more attribute types
- inbuilt piping support (essentially degenerate async)? see loopback-setup.c and other places
* networkd:
- add more keys to [Route] and [Address] sections
- add support for more DHCPv4 options (and, longer term, other kinds of dynamic config)
- add reduced [Link] support to .network files
- properly handle routerless dhcp leases
- work with non-Ethernet devices
- dhcp: do we allow configuring dhcp routes on interfaces that are not the one we got the dhcp info from?
- the DHCP lease data (such as NTP/DNS) is still made available when
a carrier is lost on a link. It should be removed instantly.
- expose in the API the following bits:
- option 15, domain name
- option 12, hostname and/or option 81, fqdn
- option 123, 144, geolocation
- option 252, configure http proxy (PAC/wpad)
- provide a way to define a per-network interface default metric value
for all routes to it. possibly a second default for DHCP routes.
- allow Name= to be specified repeatedly in the [Match] section. Maybe also
support Name=foo*|bar*|baz ?
- whenever uplink info changes, make DHCP server send out FORCERENEW
* in networkd, when matching device types, fix up DEVTYPE rubbish the kernel passes to us
* Figure out how to do unittests of networkd's state serialization
* dhcp:
- figure out how much we can increase Maximum Message Size
* dhcp6:
- add functions to set previously stored IPv6 addresses on startup and get
them at shutdown; store them in client->ia_na
- write more test cases
- implement reconfigure support, see 5.3., 15.11. and 22.20.
- implement support for temporary addresses (IA_TA)
- implement dhcpv6 authentication
- investigate the usefulness of Confirm messages; i.e. are there any
situations where the link changes without any loss in carrier detection
or interface down
- some servers don't do rapid commit without a filled in IA_NA, verify
this behavior
- RouteTable= ?
* shared/wall: Once more programs are taught to prefer sd-login over utmp,
switch the default wall implementation to wall_logind
(https://github.com/systemd/systemd/pull/29051#issuecomment-1704917074)
|