File: libcpuset.leo

package info (click to toggle)
libcpuset 1.0-1
  • links: PTS
  • area: main
  • in suites: squeeze
  • size: 1,108 kB
  • ctags: 424
  • sloc: ansic: 2,553; sh: 786; makefile: 58
file content (3065 lines) | stat: -rw-r--r-- 150,965 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
<?xml version="1.0" encoding="UTF-8"?>
<leo_file>
<leo_header file_format="2" tnodes="0" max_tnode_index="34" clone_windows="0"/>
<globals body_outline_ratio="0.393236714976">
	<global_window_position top="10" left="467" height="1035" width="999"/>
	<global_log_window_position top="0" left="0" height="0" width="0"/>
</globals>
<preferences/>
<find_panel_settings/>
<vnodes>
<v t="PaulJackson.20040305164814" a="E"><vh>libcpuset document</vh>
<v t="PaulJackson.20040312144603" a="ETV"><vh>@rst libcpuset.tex</vh>
<v t="PaulJackson.20040305191413"><vh>Why Cpusets?</vh></v>
<v t="PaulJackson.20040822234350" a="E"><vh>Linux Cpuset Kernel Support</vh>
<v t="PaulJackson.20060430183511"><vh>The Cpuset File System</vh></v>
<v t="PaulJackson.20060430164839"><vh>Exclusive Cpusets</vh></v>
<v t="PaulJackson.20060430161249"><vh>Notify On Release</vh></v>
<v t="PaulJackson.20060430161323"><vh>Memory Pressure</vh></v>
<v t="PaulJackson.20060430161438.1"><vh>Memory Spread</vh></v>
<v t="PaulJackson.20060430161438"><vh>Memory Migration</vh></v>
<v t="PaulJackson.20060430171303.1"><vh>Mask Format</vh></v>
<v t="PaulJackson.20060430161249.1"><vh>List Format</vh></v>
</v>
<v t="PaulJackson.20040824000303"><vh>Using Cpusets at the Shell Prompt</vh></v>
<v t="PaulJackson.20040305164814.4" a="E"><vh>Cpuset Programming Model</vh>
<v t="PaulJackson.20050114221828"><vh>Permission Model</vh></v>
<v t="PaulJackson.20060516001414"><vh>Other Placement Mechanisms</vh></v>
<v t="PaulJackson.20060516113818" a="E"><vh>Cpuset Aware Thread Pinning</vh></v>
<v t="PaulJackson.20060516121002"><vh>Safe Job Migration</vh></v>
</v>
<v t="PaulJackson.20040315131148"><vh>CPUs and Memory Nodes</vh></v>
<v t="PaulJackson.20040315151432" a="E"><vh>Extensible API</vh></v>
<v t="PaulJackson.20041015222637"><vh>Cpuset Text Format</vh></v>
<v t="PaulJackson.20041104233632" a="E"><vh>Basic Cpuset Library Functions</vh>
<v t="PaulJackson.20041105003017"><vh>cpuset_pin</vh></v>
<v t="PaulJackson.20041105003017.1"><vh>cpuset_size</vh></v>
<v t="PaulJackson.20041105003017.2"><vh>cpuset_where</vh></v>
<v t="PaulJackson.20041105003017.3"><vh>cpuset_unpin</vh></v>
</v>
<v t="PaulJackson.20050922011133"><vh>Using Cpusets with Hyper-Threads</vh></v>
<v t="PaulJackson.20040305164814.5" a="E"><vh>Advanced Cpuset Library Functions</vh>
<v t="PaulJackson.20061116185722"><vh>cpuset_version</vh></v>
<v t="PaulJackson.20040305164814.6"><vh>cpuset_alloc</vh></v>
<v t="PaulJackson.20040305164814.7" a="E"><vh>cpuset_free</vh></v>
<v t="PaulJackson.20040825131747"><vh>cpuset_cpus_nbits</vh></v>
<v t="PaulJackson.20040825131747.1" a="E"><vh>cpuset_mems_nbits</vh></v>
<v t="PaulJackson.20040305164814.11"><vh>cpuset_setcpus</vh></v>
<v t="PaulJackson.20040305164814.12"><vh>cpuset_setmems</vh></v>
<v t="PaulJackson.20040305164814.14" a="E"><vh>cpuset_set_iopt</vh></v>
<v t="PaulJackson.20040823181111"><vh>cpuset_set_sopt</vh></v>
<v t="PaulJackson.20040305164814.17"><vh>cpuset_getcpus</vh></v>
<v t="PaulJackson.20040305164814.18"><vh>cpuset_getmems</vh></v>
<v t="PaulJackson.20040928122625"><vh>cpuset_cpus_weight</vh></v>
<v t="PaulJackson.20040928122625.1"><vh>cpuset_mems_weight</vh></v>
<v t="PaulJackson.20040305164814.20"><vh>cpuset_get_iopt</vh></v>
<v t="PaulJackson.20040823181313"><vh>cpuset_get_sopt</vh></v>
<v t="PaulJackson.20040316171901"><vh>cpuset_localcpus</vh></v>
<v t="PaulJackson.20040316171901.1"><vh>cpuset_localmems</vh></v>
<v t="PaulJackson.20041007004400"><vh>cpuset_cpumemdist</vh></v>
<v t="PaulJackson.20041107213838" a="E"><vh>cpuset_cpu2node</vh></v>
<v t="PaulJackson.20050324125230"><vh>cpuset_addr2node</vh></v>
<v t="PaulJackson.20040305164814.23"><vh>cpuset_create</vh></v>
<v t="PaulJackson.20040312213402.4"><vh>cpuset_delete</vh></v>
<v t="PaulJackson.20040305164814.22"><vh>cpuset_query</vh></v>
<v t="PaulJackson.20040312213402.5"><vh>cpuset_modify</vh></v>
<v t="PaulJackson.20040823012055"><vh>cpuset_getcpusetpath</vh></v>
<v t="PaulJackson.20040924152444" a="E"><vh>cpuset_cpusetofpid</vh></v>
<v t="PaulJackson.20050405065349"><vh>cpuset_mountpoint</vh></v>
<v t="PaulJackson.20060504163450"><vh>cpuset_collides_exclusive</vh></v>
<v t="PaulJackson.20061113170659"><vh>cpuset_nuke</vh></v>
<v t="PaulJackson.20040315173325.3"><vh>cpuset_init_pidlist</vh></v>
<v t="PaulJackson.20040823225452"><vh>cpuset_pidlist_length</vh></v>
<v t="PaulJackson.20040315173325.4"><vh>cpuset_get_pidlist</vh></v>
<v t="PaulJackson.20040315173325.5" a="E"><vh>cpuset_freepidlist</vh></v>
<v t="PaulJackson.20040305164814.28"><vh>cpuset_move</vh></v>
<v t="PaulJackson.20040305164814.32" a="E"><vh>cpuset_move_all</vh></v>
<v t="PaulJackson.20061116000754"><vh>cpuset_move_cpuset_tasks	</vh></v>
<v t="PaulJackson.20040312234233"><vh>cpuset_migrate</vh></v>
<v t="PaulJackson.20040317031228"><vh>cpuset_migrate_all</vh></v>
<v t="PaulJackson.20040924014236"><vh>cpuset_reattach</vh></v>
<v t="PaulJackson.20060504163259" a="E"><vh>cpuset_open_memory_pressure</vh></v>
<v t="PaulJackson.20060504163259.1"><vh>cpuset_read_memory_pressure</vh></v>
<v t="PaulJackson.20060504163312" a="E"><vh>cpuset_close_memory_pressure</vh></v>
<v t="PaulJackson.20040923180054"><vh>cpuset_c_rel_to_sys_cpu</vh></v>
<v t="PaulJackson.20040923180054.1"><vh>cpuset_c_sys_to_rel_cpu</vh></v>
<v t="PaulJackson.20040923180054.2"><vh>cpuset_c_rel_to_sys_mem</vh></v>
<v t="PaulJackson.20040923180054.3"><vh>cpuset_c_sys_to_rel_mem</vh></v>
<v t="PaulJackson.20041013094450"><vh>cpuset_p_rel_to_sys_cpu</vh></v>
<v t="PaulJackson.20041013094458" a="E"><vh>cpuset_p_sys_to_rel_cpu</vh></v>
<v t="PaulJackson.20041013094507"><vh>cpuset_p_rel_to_sys_mem</vh></v>
<v t="PaulJackson.20041013095901"><vh>cpuset_p_sys_to_rel_mem</vh></v>
<v t="PaulJackson.20060504163613"><vh>cpuset_get_placement</vh></v>
<v t="PaulJackson.20060504163613.1"><vh>cpuset_equal_placement</vh></v>
<v t="PaulJackson.20060504163613.2"><vh>cpuset_free_placement</vh></v>
<v t="PaulJackson.20061115164635" a="E"><vh>cpuset_fts_open</vh></v>
<v t="PaulJackson.20061115164646" a="E"><vh>cpuset_fts_read</vh></v>
<v t="PaulJackson.20061114031653"><vh>cpuset_fts_reverse</vh></v>
<v t="PaulJackson.20061114034532"><vh>cpuset_fts_rewind</vh></v>
<v t="PaulJackson.20061115164725"><vh>cpuset_fts_get_path</vh></v>
<v t="PaulJackson.20061115164734"><vh>cpuset_fts_get_stat</vh></v>
<v t="PaulJackson.20061115164747"><vh>cpuset_fts_get_cpuset</vh></v>
<v t="PaulJackson.20061115164741"><vh>cpuset_fts_get_errno</vh></v>
<v t="PaulJackson.20061115164753"><vh>cpuset_fts_get_info</vh></v>
<v t="PaulJackson.20061115164906"><vh>cpuset_fts_close</vh></v>
<v t="PaulJackson.20040901220706"><vh>cpuset_cpubind</vh></v>
<v t="PaulJackson.20040930225850"><vh>cpuset_latestcpu</vh></v>
<v t="PaulJackson.20040901220706.1"><vh>cpuset_membind</vh></v>
<v t="PaulJackson.20041015125831"><vh>cpuset_export</vh></v>
<v t="PaulJackson.20041015130034"><vh>cpuset_import</vh></v>
<v t="PaulJackson.20040312213402.7" a="E"><vh>cpuset_function</vh></v>
</v>
<v t="PaulJackson.20050404224742"><vh>System Error Numbers</vh></v>
<v t="PaulJackson.20041107213920"><vh>Change History</vh></v>
</v>
</v>
</vnodes>
<tnodes>
<t tx="PaulJackson.20040305164814">@nocolor
</t>
<t tx="PaulJackson.20040305164814.4">The **libcpuset** programming model for cpusets provides a hierarchical
cpuset name space that integrates smoothly with Linux 2.6 kernel support for
simple processor and memory placement.  As systems become larger, with more
complex memory, processor and bus architectures, the hierarchical cpuset
model for managing processor and memory resources will become increasingly
important.

The cpuset name space remains visible to all tasks on a system. Once created, a
cpuset remains in existence until it is deleted or until the system is rebooted,
even if no tasks are currently running in that cpuset.

The key properties of a cpuset are its pathname, the list of which CPUs and
memory nodes it contains, and whether the cpuset has exclusive rights to these
resources.

Every task (process) in the system is attached to (running inside) a cpuset.
Tasks inherit their parents cpuset attachment when forked. This binding of task
to a cpuset can subsequently be changed, either by the task itself, or
externally from another task, given sufficient authority.

Tasks have their CPU and memory placement constrained to whatever their
containing cpuset allows. A cpuset may have exclusive rights to its CPUs and
memory, which provides certain guarantees that other cpusets will not overlap.

At system boot, a top level root cpuset is created, which includes all CPUs and
memory nodes on the system. The usual mount point of the cpuset file system, and
hence the usual file system path to this root cpuset, is ``/dev/cpuset``.

Changing the cpuset binding of a task does not by default move the memory the
tasks might have currently allocated, even if that memory is on memory nodes no
longer allowed in the tasks cpuset. On kernels that support such memory
migration, use the **[optional]** *cpuset_migrate* function to move allocated
memory as well.

To create a cpuset from 'C' code, one obtains a handle to a new ``struct
cpuset``, sets the desired attributes via that handle, and issues a
``cpuset_create()`` to actually create the desired cpuset and bind it to the
specified name. One can also list by name what cpusets exist, query and modify
their properties, move tasks between cpusets, list what tasks are currently
attached to a cpuset, and delete cpusets.

The ``cpuset_alloc()`` call applies a hidden *undefined* mark to each attribute
of the allocated struct cpuset. Calls to the various ``cpuset_set*()`` routines
mark the attribute being set as *defined*. Calls to ``cpuset_create()`` and
``cpuset_modify()`` only set the attributes of the cpuset marked *defined*. This
is primarily noticable when creating a cpuset. Code in the kernel sets some
attributes of new cpusets, such as ``memory_spread_page``,
``memory_spread_slab`` and ``notify_on_release``, by default to the value
inherited from their parent. Unless the application using ``libcpuset``
explicitly overrides the setting of these attributes in the struct cpuset,
between the calls to ``cpuset_alloc()`` and ``cpuset_create()``, the kernel
default settings will prevail. These hidden marks have no noticeable affect when
modifying an existing cpuset using the sequence of calls ``cpuset_alloc()``,
``cpuset_query()``, and ``cpuset_modify()``, because the ``cpuset_query()`` call
sets all attributes and marks them *defined*, while reading the attributes from
the cpuset.

The names of cpusets in this 'C' library are either relative to the
root cpuset mount point (typically ``/dev/cpuset``) if the name starts
with a slash '/' character or else relative to the current tasks cpuset.

Cpusets can be renamed using the ``rename(2)`` system call.  The per-cpuset
files within each cpuset directory cannot be renamed, and the ``rename(2)``
system call cannot be used to modify the cpuset hierarchy. You cannot change the
parent cpuset of a cpuset using ``rename(2)``.

Despite its name, the ``pid`` parameter to various ``libcpuset`` routines is actually
a thread id, and each thread in a threaded group can be attached to a different
cpuset.  The value returned from a call to ``gettid(2)`` can be passed in the
argument ``pid``.</t>
<t tx="PaulJackson.20040305164814.5">The advanced cpuset API provides functions usable from 'C' for managing
cpusets on a system-wide basis.

These functions primarily deal with the three entities (1) ``struct cpuset *``,
(2) system cpusets and (3) tasks.

    The ``struct cpuset *`` provides a transient in-memory structure
    used to build up a description of an existing or desired cpuset.
    These structs can be allocated, freed, queried and modified.

    Actual kernel cpusets are created under ``/dev/cpuset``, which is the usual
    mount point of the kernels virtual cpuset filesystem. These cpusets are
    visible to all tasks (with sufficient authority) in the system, and persist
    until the system is rebooted or until the cpuset is explicitly deleted.
    These cpusets can be created, deleted, queried, modified, listed and
    examined.

    Every task (also known as a process) is bound to exactly one
    cpuset at a time. You can list which tasks are bound to a given
    cpuset, and to which cpuset a given task is bound. You can change
    to which cpuset a task is bound.

The primary attributes of a cpuset are its lists of CPUs and memory
nodes. The scheduling affinity for each task, whether set by default
or explicitly by the ``sched_setaffinity()`` system call, is constrained
to those CPUs that are available in that tasks cpuset. The NUMA memory
placement for each task, whether set by default or explicitly by the
``mbind()`` system call, is constrained to those memory nodes that are
available in that tasks cpuset.  This provides the essential purpose of
cpusets - to constrain the CPU and memory usage of tasks to specified
subsets of the system.

The other essential attribute of a cpuset is its pathname beneath
``/dev/cpuset``. All tasks bound to the same cpuset pathname can be
managed as a unit, and this hierarchical name space describes the
nested resource management and hierarchical permission space supported
by cpusets. Also, this hierarchy is used to enforce strict exclusion,
using the following rules:

  * A cpuset may only be marked strictly exclusive for CPU or memory
    if its parent is also.
  * A cpuset may not make any CPUs or memory nodes available that are
    not also available in its parent.
  * If a cpuset is exclusive for CPU or memory, then it may not overlap
    CPUs or memory with any of its siblings.

The combination of these rules enables checking for strict exclusion just by
making various checks on the parent, siblings and existing child cpusets of the
cpuset being changed, without having to check all cpusets in the system.

On error, some of these routines return -1 or NULL and set ``errno``. If one of the
routines below that requires cpuset kernel support is invoked on an operating
system kernel that does not support cpusets, then that routine returns failure
and ``errno`` is set to ``ENOSYS``. If invoked on a system that supports
cpusets, but when the cpuset file system is not currently mounted at
``/dev/cpuset``, then it returns failure and ``errno`` is set to ``ENODEV``.

The following inclusion and linkage provides access to the cpuset API
from 'C' code:

  ::

   #include &lt;bitmask.h&gt;
   #include &lt;cpuset.h&gt;
   /* link with -lcpuset */

The following functions are supported in the advanced cpuset 'C' API:

- Cpuset library (libcpuset) version

  * cpuset_version_ - **[optional]** Version (simple integer) of the library

- Allocate and free ``struct cpuset *``

  * cpuset_alloc_ - Return handle to newly allocated ``struct cpuset *``
  * cpuset_free_ - Discard no longer needed ``struct cpuset *``

- Lengths of CPUs and memory nodes bitmasks - needed to allocate bitmasks

  * cpuset_cpus_nbits_ - Number of bits needed for a CPU bitmask on current system
  * cpuset_mems_nbits_ - Number of bits needed for a memory bitmask on current system

- Set various attributes of a ``struct cpuset *``

  * cpuset_setcpus_ - Specify CPUs in cpuset
  * cpuset_setmems_ - Specify memory nodes in cpuset
  * cpuset_set_iopt_ - Specify an integer value option of cpuset
  * cpuset_set_sopt_ - **[optional]** Specify a string value option of cpuset

- Query various attributes of a ``struct cpuset *``

  * cpuset_getcpus_ - Query CPUs in cpuset
  * cpuset_getmems_ - Query memory nodes in cpuset
  * cpuset_cpus_weight_ - Number of CPUs in a cpuset
  * cpuset_mems_weight_ - Number of memory nodes in a cpuset
  * cpuset_get_iopt_ - Query an integer value option of cpuset
  * cpuset_get_sopt_ - **[optional]** Query a string value option of cpuset

- Local CPUs and memory nodes

  * cpuset_localcpus_ - Query the CPUs local to specified memory nodes
  * cpuset_localmems_ - Query the memory nodes local to specified CPUs
  * cpuset_cpumemdist_ - **[optional]** Hardware distance from CPU to memory node
  * cpuset_cpu2node_ - Return system number of memory node closest to specified CPU.
  * cpuset_addr2node_ - Return sytem number of memory node holding page at specified address.

- Create, delete, query, modify, list and examine cpusets.

  * cpuset_create_ - Create named cpuset as specified by struct cpuset *
  * cpuset_delete_ - Delete the specified cpuset (if empty)
  * cpuset_query_ - Set struct cpuset to settings of specified cpuset
  * cpuset_modify_ - Modify a cpuset's settings to those specified in a struct cpuset
  * cpuset_getcpusetpath_ - Get path of a tasks (0 for current) cpuset.
  * cpuset_cpusetofpid_ - Set struct cpuset to settings of cpuset of specified task
  * cpuset_mountpoint_ - Return path at which cpuset filesystem is mounted
  * cpuset_collides_exclusive_ - **[optional]** True if would collide exclusive
  * cpuset_nuke_ - **[optional]** Remove cpuset anyway possible

- List tasks currently attached to a cpuset

  * cpuset_init_pidlist_ - Initialize a list of tasks attached to a cpuset
  * cpuset_pidlist_length_ - Return the number of tasks in such a list
  * cpuset_get_pidlist_ - Return a specific task from such a list
  * cpuset_freepidlist_ - Deallocate such a list

- Attach tasks to cpusets.

  * cpuset_move_ - Move task (0 for current) to a cpuset
  * cpuset_move_all_ - Move all tasks in a list of pids to a cpuset
  * cpuset_move_cpuset_tasks_ - **[optional]** Move all tasks in a cpuset to another cpuset
  * cpuset_migrate_ - **[optional]** Move a task and its memory to a cpuset
  * cpuset_migrate_all_ - **[optional]** Move all tasks with memory in a list of pids to a cpuset
  * cpuset_reattach_ - Rebind ``cpus_allowed`` of each task in a cpuset after changing its cpus

- Determine memory pressure

  * cpuset_open_memory_pressure_ - **[optional]** Open handle to read memory_pressure
  * cpuset_read_memory_pressure_ - **[optional]** Read cpuset current memory_pressure
  * cpuset_close_memory_pressure_ - **[optional]** Close handle to read memory pressure

- Map between cpuset relative and system-wide CPU and memory node numbers

  * cpuset_c_rel_to_sys_cpu_ - Map cpuset relative CPU number to system wide number
  * cpuset_c_sys_to_rel_cpu_ - Map system wide CPU number to cpuset relative number
  * cpuset_c_rel_to_sys_mem_ - Map cpuset relative memory node number to system wide number
  * cpuset_c_sys_to_rel_mem_ - Map system wide memory node number to cpuset relative number
  * cpuset_p_rel_to_sys_cpu_ - Map task cpuset relative CPU number to system wide number
  * cpuset_p_sys_to_rel_cpu_ - Map system wide CPU number to task cpuset relative number
  * cpuset_p_rel_to_sys_mem_ - Map task cpuset relative memory node number to system wide number
  * cpuset_p_sys_to_rel_mem_ - Map system wide memory node number to task cpuset relative number

- Placement operations - for detecting cpuset migration

  * cpuset_get_placement_ - **[optional]** Return current placement of task pid
  * cpuset_equal_placement_ - **[optional]** True if two placements equal
  * cpuset_free_placement_ - **[optional]** Free placement

- Traverse a cpuset hierarchy.

  * cpuset_fts_open_ - **[optional]** Open cpuset hierarchy
  * cpuset_fts_read_ - **[optional]** Next entry in hierarchy
  * cpuset_fts_reverse_ - **[optional]** Reverse order of cpusets
  * cpuset_fts_rewind_ - **[optional]** Rewind to first cpuset in list
  * cpuset_fts_get_path_ - **[optional]** Get entry's cpuset path
  * cpuset_fts_get_stat_ - **[optional]** Get entry's stat(2) pointer
  * cpuset_fts_get_cpuset_ - **[optional]** Get entry's cpuset pointer
  * cpuset_fts_get_errno_ - **[optional]** Get entry's errno
  * cpuset_fts_get_info_ - **[optional]** Get operation causing error
  * cpuset_fts_close_ - **[optional]** Close cpuset hierarchy

- Bind to a CPU or memory node within the current cpuset

  * cpuset_cpubind_ - Bind to a single CPU within a cpuset (uses sched_setaffinity(2))
  * cpuset_latestcpu_ - Most recent CPU on which a task has executed
  * cpuset_membind_ - Bind to a single memory node within a cpuset (uses set_mempolicy(2))

- Export cpuset settings to a regular file, and import them from a regular file

  * cpuset_export_ - Export cpuset settings to a text file
  * cpuset_import_ - Import cpuset settings from a text file

- Support calls to **[optional]** cpuset_* API routines

  * cpuset_function_ - Return pointer to a libcpuset.so function, or ``NULL``


A typical calling sequence would use the above functions in the following
order to create a new cpuset named "xyz" and attach itself to it.

 ::

    struct cpuset *cp = cpuset_alloc();
    various cpuset_set*(cp, ...) calls
    cpuset_create(cp, "xyz");
    cpuset_free(cp);	
    cpuset_move(0, "xyz");

Some functions above are marked **[optional]**.  For more information, see the `Extensible API` section, above, for an explanation of this
marking and how to invoke such functions in a portable manner.
</t>
<t tx="PaulJackson.20040305164814.6">``struct cpuset *cpuset_alloc();``
----------------------------------

    Creates, initializes and returns a handle to a struct cpuset, which is an
    opaque data structure used to describe a cpuset.

    After obtaining a struct cpuset handle with this call, one can use the
    various ``cpuset_set()`` methods to specify which CPUs and memory nodes are
    in the cpuset and other attributes. Then one can create such a cpuset with
    the ``cpuset_create()`` call and free cpuset handles with the ``cpuset_free()``
    call.

    The `cpuset_alloc`_ function returns a zero pointer (NULL) and sets
    `errno` in the event that ``malloc(3)`` fails. See the ``malloc(3)``
    man page for possible values of `errno` (``ENOMEM`` being the most
    likely).
</t>
<t tx="PaulJackson.20040305164814.7">``void cpuset_free(struct cpuset *cp);``
----------------------------------------

    Frees the memory associated with a struct cpuset handle, which must have
    been returned by a previous ``cpuset_alloc()`` call.  If cp is NULL, no
    operation is performed.
    </t>
<t tx="PaulJackson.20040305164814.11">``int cpuset_setcpus(struct cpuset *cp, const struct bitmask *cpus);``
-----------------------------------------------------------------------

    Given a bitmask of CPUs, the cpuset_setcpus() call sets the specified cpuset
    ``cp`` to include exactly those CPUs.

    Returns 0 on success, else -1 on error, setting ``errno``. This routine can
    fail if malloc(3) fails.  See the ``malloc(3)`` man page for possible values
    of `errno` (``ENOMEM`` being the most likely).

	</t>
<t tx="PaulJackson.20040305164814.12">``void cpuset_setmems(struct cpuset *cp, const struct bitmask *mems);``
-----------------------------------------------------------------------

	Given a bitmask of memory nodes, the ``cpuset_setmems()`` call sets the
	specified cpuset ``cp`` to include exactly those memory nodes.

    Returns 0 on success, else -1 on error, setting ``errno``. This routine can
    fail if malloc(3) fails.  See the ``malloc(3)`` man page for possible values
    of `errno` (``ENOMEM`` being the most likely).
</t>
<t tx="PaulJackson.20040305164814.14">``int cpuset_set_iopt(struct cpuset *cp, const char *optionname, int value);``
------------------------------------------------------------------------------

    Sets cpuset integer valued option ``optionname`` to specified integer value.
    Returns 0 if ``optionname`` is recognized and value is an allowed value for
    that option. Returns -1 if ``optionname`` is recognized, but value is not
    allowed. Returns -2 if ``optionname`` is not recognized.  Boolean options
    accept any non-zero value as equivalent to a value of one (1).

    The following ``optionname``'s are recognized:

    * ``cpu_exclusive`` - sibling cpusets not allowed to overlap cpus
      (see section `Exclusive Cpusets`_, above)
    * ``mem_exclusive`` - sibling cpusets not allowed to overlap mems
      (see section `Exclusive Cpusets`_, above)
    * ``notify_on_release`` - invoke ``/sbin/cpuset_release_agent`` when
      cpuset released (see section `Notify On Release`_, above)
    * ``memory_migrate`` - causes memory pages to migrate to new mems
      (see section `Memory Migration`_, above)
    * ``memory_spread_page`` - causes kernel buffer (page) cache to
      spread over cpuset (see section `Memory Spread`_, above)
    * ``memory_spread_slab`` - causes kernel file i/o data (directory and
      inode slab caches) to spread over cpuset(see section `Memory Spread`_, above)
</t>
<t tx="PaulJackson.20040305164814.17">``int cpuset_getcpus(const struct cpuset *cp, struct bitmask *cpus);``
-----------------------------------------------------------------------

    Query CPUs in cpuset ``cp``, by writing them to the bitmask ``cpus``.
    Pass ``cp`` == NULL to query the current tasks cpuset

    If the CPUs have not been set in cpuset ``cp``, then no operation
    is performed, -1 is returned, and ``errno`` is set to ``EINVAL``.

    Returns 0 on success, else -1 on error, setting ``errno``. This routine can
    fail if malloc(3) fails.  See the ``malloc(3)`` man page for possible values
    of `errno` (``ENOMEM`` being the most likely).
</t>
<t tx="PaulJackson.20040305164814.18">``int cpuset_getmems(const struct cpuset *cp, struct bitmask *mems);``
-----------------------------------------------------------------------

    Query memory nodes in cpuset ``cp``, by writing them to the bitmask ``mems``.
    Pass ``cp`` == NULL to query the current tasks cpuset.

    If the memory nodes have not been set in cpuset ``cp``, then no operation
    is performed, -1 is returned, and ``errno`` is set to ``EINVAL``.

    Returns 0 on success, else -1 on error, setting ``errno``. This routine can
    fail if malloc(3) fails.  See the ``malloc(3)`` man page for possible values
    of `errno` (``ENOMEM`` being the most likely).
    </t>
<t tx="PaulJackson.20040305164814.20">``int cpuset_get_iopt(const struct cpuset *cp, const char *optionname);``
-------------------------------------------------------------------------

    Query value of integer option ``optionname`` in cpuset ``cp``. Returns value of
    ``optionname`` if it is recognized, else returns -1.  Integer values in an uninitialized
    cpuset have value 0.

    The following ``optionname``'s are recognized:

    * ``cpu_exclusive`` - sibling cpusets not allowed to overlap cpus
      (see section `Exclusive Cpusets`_, above)
    * ``mem_exclusive`` - sibling cpusets not allowed to overlap mems
      (see section `Exclusive Cpusets`_, above)
    * ``notify_on_release`` - invoke ``/sbin/cpuset_release_agent`` when
      cpuset released (see section `Notify On Release`_, above)
    * ``memory_migrate`` - causes memory pages to migrate to new mems
      (see section `Memory Migration`_, above)
    * ``memory_spread_page`` - causes kernel buffer (page) cache to
      spread over cpuset (see section `Memory Spread`_, above)
    * ``memory_spread_slab`` - causes kernel file i/o data (directory and
      inode slab caches) to spread over cpuset(see section `Memory Spread`_, above)
 </t>
<t tx="PaulJackson.20040305164814.22">``int cpuset_query(struct cpuset *cp, const char *cpusetpath);``
----------------------------------------------------------------

    Set struct cpuset to settings of cpuset at specified path ``cpusetpath``.
    ``Struct cpuset *cp`` must have been returned by a previous ``cpuset_alloc()``
    call. Any previous settings of ``cp`` are lost.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, or -1 on error, setting ``errno``. Errors include
    ``cpusetpath`` not referencing a valid cpuset path relative to
    ``/dev/cpuset``, or the current task lacking permission to query that cpuset.
    </t>
<t tx="PaulJackson.20040305164814.23">``int cpuset_create(const char *cpusetpath, const struct *cp);``
----------------------------------------------------------------

    Create a cpuset at the specified ``cpusetpath``, as described in the provided
    ``struct cpuset *cp``. The parent cpuset of that pathname must already exist.

    The parameter ``cp`` refers to a handle obtained from a ``cpuset_alloc()``
    call.  If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``. This routine can
    fail if malloc(3) fails.  See the ``malloc(3)`` man page for possible values
    of `errno` (``ENOMEM`` being the most likely).

  </t>
<t tx="PaulJackson.20040305164814.28">``int cpuset_move(pid_t p, const char *cpusetpath);``
-----------------------------------------------------

    Move task whose process id is ``p`` to cpuset ``cpusetpath``.
    If ``pid`` is zero, then the current task is moved.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``.

</t>
<t tx="PaulJackson.20040305164814.32">``int cpuset_move_all(struct cpuset_pid_list *pl, const char *cpusetpath);``
----------------------------------------------------------------------------

    Move all tasks in list ``pl`` to cpuset ``cpusetpath``.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``.

</t>
<t tx="PaulJackson.20040305191413">The essential purpose of cpusets is to provide CPU and memory containers or
"soft partitions" within which to run sets of related tasks.

On an SMP (multiple CPU) system without some means of CPU placement, any task
can run on any CPU. On a NUMA (multiple memory node) system, any memory page can
be allocated on any node. This can cause both poor cache locality and poor
memory access times, substantially reducing performance and run-time
repeatability. By restraining all other jobs from using any of the CPUs or
memory nodes assigned to critical jobs, interference with critical jobs can be
minimized.

For example, some multi-threaded high performance computing (HPC) jobs consist
of a number of threads that communicate via message passing interfaces (MPI) and
other such jobs rely on multi-platform shared-memory parallel programming
(OpenMP) that can tightly couple parallel computation threads using special
language directives. It is common that threads in such jobs need to be executing
at the same time to make optimum progress. In such cases, if a single thread
loses a CPU, all threads stop making forward progress and spin at a barrier.
Cpusets can eliminate the need for a gang scheduler, provide isolation of
one such job from other tasks on a system, and facilitate providing equal
resources to each thread in such a job.  This results in both optimum and
repeatable performance.

This document focuses on the 'C' API provided by the user level **libcpuset**
library. This library depends on the following Linux 2.6 kernel facilities:

   * sched_setaffinity (for CPU binding),
   * mbind and set_mempolicy (for memory node binding), and
   * kernel cpusets support.

The sched_setaffinity, mbind and set_mempolicy calls enable specifying
the CPU and memory placement for individual tasks.  On smaller or limited
use systems, these calls may be sufficient.

The kernel cpuset facility provides additional support for system wide
management of CPU and memory resources, by related sets of tasks.
It provides a hierarchical structure to the resources, with file system
like name-space and permissions, and support for guaranteed exclusive use
of resources.

The Linux kernel provides the following support for cpusets:

  * Each task has a link to a cpuset structure that specifies the
    CPUs and memory nodes available for its use.
  * A hook in the sched_setaffinity and mbind system calls ensures that
    any requested CPU or memory node is available in that tasks cpuset.
  * Tasks sharing the same placement constraints reference the same cpuset.
  * These kernel cpusets are arranged in a hierarchical virtual file system,
    reflecting the possible nesting of "soft partitions".
  * The kernel task scheduler is constrained to only schedule a task on
    the CPUs in that task's cpuset.
  * The kernel memory allocator is constrained to only allocate physical
    memory to a task from the memory nodes in that tasks cpuset.
  * The kernel memory allocator provides an economical per-cpuset metric
    of the aggregate memory pressure (frequency of requests for a free
    memory page not easily satisfied by an available free page) of the tasks
    in a cpuset (see the per-cpuset 'memory_pressure' file.)
  * The kernel memory allocator provides the option to request that memory
    pages used for file I/O (the kernel page cache) and associated kernel
    data structures for file inodes and directories be evenly spread across
    all the memory nodes in a cpuset, rather than preferentially allocated on
    whatever memory node the task that first accessed the page was first running
    (see the per-cpuset 'memory_spread_page' and 'memory_spread_slab' files).
  * The memory migration facility in the kernel can be controlled using
    per-cpuset files, so that when the memory nodes allowed to a task by
    cpusets changes, any pages it had on no longer allowed nodes are migrated
    to nodes now allowed.

A cpuset constrains the jobs (set of related tasks) running in it to a subset of
the systems memory and CPUs. They enable administrators and system service
software to:

  * Create and delete named cpusets.
  * Decide which CPUs and memory nodes are available to a cpuset.
  * Attach a task to a particular cpuset.
  * Identify all tasks sharing the same cpuset.
  * Exclude any other cpuset from overlapping a given cpuset,
    giving the tasks running in that cpuset exclusive use
    of those CPUs and memory nodes.
  * Perform bulk operations on all tasks associated with a cpuset,
    such as varying the resources available to that cpuset, or
    hibernating those tasks in temporary favor of some other job.
  * Perform sub-partitioning with hierarchical permissions and
    resource management.

Cpusets are exposed by the kernel to user space by mounting the ``cpuset``
virtual file system (VFS) at ``/dev/cpuset``, rather than by additional
system calls. Such a VFS is a natural way to represent nested resource
allocations and the associated hierarchical permission model.

Within a single cpuset, other facilities such as ``dplace``,
first-touch memory placement, pthreads, ``sched_setaffinity`` and ``mbind``
can be used to manage processor and memory placement to a more fine-grained level.

There is a single set of kernel mechanisms that supports all these
facilities and provides a consistent processor and memory model
regardless of what mix of utilities and API's you use to manage it.
This provides a consistent execution model for all users.
</t>
<t tx="PaulJackson.20040312144603">.. raw:: LaTeX

    \pagenumbering{roman}\large\parindent0pt\parskip10pt

=============================================================
Cpuset Library and Linux Kernel Support
=============================================================


This document describes the 'C' library **libcpuset** interface to Linux cpusets.

Cpusets provide system-wide control of the CPUs on which tasks may execute, and
the memory nodes on which they allocate memory. Each cpuset defines a list of
allowed CPUs and memory nodes, and each process in the system is attached to a
cpuset. Cpusets are represented in a hierarchical virtual file system. Cpusets
can be nested and they have file-like permissions.

The efficient administration of large multi-processor systems depends on
dynamically allocating portions of the systems CPU and memory resources to
different users and purposes. The optimum performance of NUMA systems depends
optimizing CPU and memory placement of critical applications, and minimizing
interference between applications. Cpusets provides a convenient means to
control such CPU and memory placement and usage.

:Author:    Paul Jackson
:Address:   pj@sgi.com
:Date:      14 November 2006
:Copyright: Copyright (c) 2006-2007 SGI.  All Rights Reserved.

.. raw:: LateX

  \newpage\small \vspace*{1cm}

Permission is hereby granted, free of charge, to any person obtaining a copy of
this Documentation, to deal in the Documentation without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Documentation, and to permit persons to
whom the Documentation is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Documentation.

THE DOCUMENTATION IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL SILICON
GRAPHICS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE DOCUMENTATION OR THE USE OR OTHER DEALINGS IN THE DOCUMENTATION.

Except as contained in this notice, the names of Silicon Graphics and SGI shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in this Documentation without prior written authorization from SGI.

----

This document is written using the outline processor Leo_, and version controlled
using CSSC_. It is rendered using `Python Docutils`_ on reStructuredText_
extracted from Leo_, directly into both html_ and LaTeX_.  The LaTeX_ is
converted into pdf_ using the pdflatex_ utility.  The html_ is converted into
plain text using the lynx_ utility.

.. _Leo:  http://webpages.charter.net/edreamleo/front.html
.. _CSSC: http://cssc.sourceforge.net/index.shtml
.. _Python Docutils: http://docutils.sourceforge.net
.. _reStructuredText: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html
.. _pdf: http://www.adobe.com/products/acrobat
.. _html: http://www.w3.org/MarkUp
.. _pdflatex: http://www.tug.org/applications/pdftex
.. _LaTeX: http://www.latex-project.org
.. _lynx: http://lynx.isc.org

----

Silicon Graphics and SGI are registered trademarks of Silicon Graphics, Inc., in
the United States and other countries worldwide. Linux is a registered
trademark of Linus Torvalds in several countries. Novell is a registered
trademark, and SUSE is a trademark of Novell, Inc. in the United States and
other countries. All other trademarks mentioned herein are the property of their
respective owners.

.. raw:: LaTeX

   \newpage\large\vspace*{1cm}

.. contents:: Table of Contents
   :depth: 1


.. sectnum::
   :depth: 2

.. raw:: LaTeX

   \newpage \pagenumbering{arabic} \normalsize</t>
<t tx="PaulJackson.20040312213402.4">``int cpuset_delete(const char *cpusetpath);``
----------------------------------------------

    Delete a cpuset at the specified ``cpusetpath``. The cpuset of that pathname must
    already exist, be empty (no child cpusets) and be unused (no using tasks).

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``.
    </t>
<t tx="PaulJackson.20040312213402.5">``int cpuset_modify(const char *cpusetpath, const struct *cp);``
----------------------------------------------------------------

    Modify the cpuset at the specified ``cpusetpath``, as described in the provided
    ``struct cpuset *cp``. The cpuset at that pathname must already exist.

    The parameter ``cp`` refers to a handle obtained from a ``cpuset_alloc()`` call.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``.
    </t>
<t tx="PaulJackson.20040312213402.7">``cpuset_function(const char *function_name);``
-----------------------------------------------

    Return pointer to the named ``libcpuset.so`` function, or ``NULL``. For base
    functions that are in all implementations of libcpuset, there is no
    particular value in using ``cpuset_function()`` to obtain a pointer to the
    function dynamically. But for **[optional]** cpuset functions, the use of
    ``cpuset_function()`` enables dynamically adapting to run-time environments that
    may or may not support that function.

    For more information, see the `Extensible API` section, above.</t>
<t tx="PaulJackson.20040312234233">``int cpuset_migrate(pid_t pid, const char *cpusetpath);``
----------------------------------------------------------

    Migrate task whose process id is ``p`` to cpuset ``cpusetpath``,
    moving its currently allocated memory to nodes in that cpuset,
    if not already there.  If ``pid`` is zero, then the current task
    is migrated.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``

    For more information, see the `Memory Migration` section, above.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20040315131148">As of this writing, the Linux kernel NUMA support presumes that there
are some memory nodes and some CPUs, and that for each CPU, there is
exactly one preferred or local memory node. Frequently, multiple (two
or four perhaps) CPUs will be local to the same memory node. Some
memory nodes have no local CPUs - these are called headless nodes.

However, this is not the only possible architecture, and architectures
are constantly changing, usually toward the more complex. Driven by
nonstop increases in logic density for a half century now, the bus,
cache, CPU, memory and storage hierarchy of large systems continues to
evolve.

This cpuset interface should remain stable over a long period of time,
and be usable over a variety of system architectures. So this
interface *avoids* presuming that there is exactly one memory node
local to each CPU. Rather it just presumes that there are some
CPUs and memory nodes, that some zero or more memory nodes are local
to each CPU, and that some zero or more CPUs are local to each memory
node.

This interface provides the functions cpuset_localmems_ to identify, for a CPU,
which memory node(s) are local to that CPU, and cpuset_localcpus_ to
identify for a memory node, which CPU(s) are local. Applications using this
interface can explicitly specify just one of CPU or memory placement, and
then use these functions to determine the corresponding memory or CPU placement.

In some situations, applications will want to explicitly place both
CPUs and memory nodes, not necessarily according to the default relation
between a CPU and its local memory node.  Such situations include
the following.

*	You can control specific CPU and memory node placement when precise
	placement control is required, such as when optimizing performance
	for a specific important application.
	
*	A higher level system service such as a batch scheduler can control
	specific CPU and memory node placement.
	
*	On systems having some memory on headless nodes (memory nodes with no
	associated local CPU), you can explicitly use a particular headless
	node with a particular CPU.
	
*	On systems supporting Hyper-Threadthreading, one could affectively disable
	Hyper-Threading, by using just one out of the two or more execution
	engines (CPUs) available on a processor die.
</t>
<t tx="PaulJackson.20040315151432">In order to provide for the convenient and robust extensibility of
this cpuset API over time, the following function enables dynamically
obtaining pointers for optional functions by name, at run-time:

  void \*cpuset_function(const char \* function_name) - returns function pointer,
  or NULL if function name unrecognized

For maximum portability, you should not reference any optional cpuset function
by name by explicit name.

However, if you are willing to presume that an optional function will
always be available on the target systems of interest, you might decide to
explicitly reference it by name, in order to improve the clarity and simplicity
of the software in question.

Also to support robust extensibility, flags and integer option values have names
dynamically resolved at run-time, not via preprocessor macros.

All functions use only the primitive types of int, char \*, pid_t (for process id
of a task), size_t (for buffer sizes), pointers to opaque structures, functions
whose signatures use these types, and pointers to such functions. They use no
structure members, special types or magic define constants.

Some functions in `Advanced Cpuset Library Functions`_ are marked **[optional]**. They
are not available in all implementations of libcpuset. Additional **[optional]**
cpuset_* functions may also be added in the future. Functions that
are *not* marked **[optional]** are available on all implementations of
libcpuset.so, and can be called directly without using cpuset_function().
However, any of them *can* also be called indirectly via cpuset_function().

To safely invoke an optional function, such as for example ``cpuset_migrate()``,
use the following call sequence::


	/* fp points to function of the type of cpuset_migrate() */
	int (*fp)(struct cpuset *fromcp, struct cpuset *tocp, pid_t pid);
	fp = cpuset_function("cpuset_migrate");
	if (fp) {
		fp( ... );
	} else {
		puts ("cpuset migration not supported");
	}

If you invoke an **[optional]** function directly, then your resulting program
will not be able to link with any version of libcpuset.so that does not define that
particular function.
</t>
<t tx="PaulJackson.20040315173325.3">``struct cpuset_pidlist *cpuset_init_pidlist(const char *cpusetpath, int recursiveflag);``
------------------------------------------------------------------------------------------

    Initialize and return a list of tasks (PIDs) attached to cpuset ``cpusetpath``.
    If ``recursiveflag`` is zero, include only the tasks directly in that cpuset,
    otherwise include all tasks in that cpuset or any descendant thereof.

    Beware that tasks may come and go from a cpuset, after this call is made.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    On error, return ``NULL`` and set ``errno``.
    </t>
<t tx="PaulJackson.20040315173325.4">``pid_t cpuset_get_pidlist(const struct cpuset_pidlist *pl, int i);``
---------------------------------------------------------------------

    Return the i'th element of a ``cpuset_pidlist``. The elements of a
    ``cpuset_pidlist`` of length ``N`` are numbered ``0`` through ``N-1``.
    Return ``(pid_t)-1`` for any other index ``i``.
</t>
<t tx="PaulJackson.20040315173325.5">``void cpuset_freepidlist(struct cpuset_pidlist *pl);``
-------------------------------------------------------

    Deallocate a list of attached pids.
</t>
<t tx="PaulJackson.20040316171901">``int cpuset_localcpus(const struct bitmask *mems, struct bitmask *cpus);``
----------------------------------------------------------------------------

  Query the CPUs local to specified memory nodes ``mems``, by writing them to
  the bitmask ``cpus``.  Return 0 on success, -1 on error, setting ``errno``.
</t>
<t tx="PaulJackson.20040316171901.1">``int cpuset_localmems(const struct bitmask *cpus, struct bitmask *mems);``
----------------------------------------------------------------------------

  Query the memory nodes local to specified CPUs ``cpus``, by writing them to
  the bitmask ``mems``.  Return 0 on success, -1 on error, setting ``errno``.
</t>
<t tx="PaulJackson.20040317031228">``int cpuset_migrate_all(struct cpuset_pid_list *pl, const char *cpusetpath);``
-------------------------------------------------------------------------------

    Move all tasks in list ``pl`` to cpuset ``cpusetpath``, moving their
    currently allocated memory to nodes in that cpuset, if not already there.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    For more information, see the `Memory Migration` section, above.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20040822234350">Several developers in the Open Source community provided the Linux 2.6
kernel support for CPU and memory placement, including the following:

  - Robert Love - ``Tech9, Novell`` (USA) - Scheduler attributes such as CPU affinity
  - Simon Derr - ``Bull`` (France) - CPU placement, core architecture
    and file system interface to cpusets
  - Andi Kleen - ``SUSE`` (Germany) - NUMA memory placement, mbind and mempolicy
  - Paul Jackson - ``SGI`` (USA) - kernel bitmask improvements, kernel cpuset
    integration, **libbitmask**, **libcpuset**

At least a couple of command line utilities have been developed that use these
affinity calls to allow placing a process on a specific CPU. Robert Love has a
package *schedutils* with a command ``taskset``. The ``numactl`` command in Andi
Kleen's work has options to run a specified command on specified CPUs and memory
nodes.

Andi Kleen led a session at the 2003 Kernel Summit in Ottawa in NUMA
memory management, and has been developing a NUMA library and kernel
support for memory placement. Minutes from that session are available
at: http://lwn.net/Articles/40626/.

The cpuset kernel facility and file system for Linux 2.6 kernels is based on the
work of Simon Derr, with integration and refinements by Paul Jackson. In
addition this document of the **libcpuset** user library, the kernel cpuset
facility is documented in the kernel source file Documentation/cpusets.txt, and
Simon Derr has provided a document of cpusets at LINEBREAK
http://www.bullopensource.org/cpuset/.

The user level bitmask library supports convenient manipulation of multi-word
bitmasks useful for CPUs and memory nodes. This bitmask library is required by
and designed to work with the cpuset library. The design notes of **libbitmask**
are available in a separate document, Bitmask_Library.html or
Bitmask_Library.pdf.

Unlike ``sched_setaffinity()`` and ``mbind()``, which are implemented as
additional kernel system calls, the primary kernel interface for accessing the
cpuset facility is `The Cpuset File System`_, usually mounted at ``/dev/cpuset``.
The cpuset library **libcpuset** provides convenient access to these facilities
from 'C' programs.

Cpusets extend the usefulness of the Linux 2.6 kernel mechanisms
``sched_setaffinity()`` for CPU placement, and ``mbind()`` and
``set_mempolicy()`` for memory placement. On smaller or dedicated use systems,
these other mechanisms are often sufficient. The **libcpuset** library provides
a convenient API to these other mechanisms that has the added advantage of being
robustly adapting to memory migration.

On larger NUMA systems, running more than one, performance critical,
job, it is necessary to be able to manage jobs in their entirety.
This includes providing a job with exclusive CPU and memory that no
other job can use and being able to list all tasks currently in a
cpuset.

You can use both these other placement mechanisms and cpusets together, using
the `Advanced Cpuset Library Functions`_ to manage overall job placement, and using
the other mechanisms, perhaps via the `Basic Cpuset Library Functions`_ within each
given job to manage the details of thread and memory page placement.
</t>
<t tx="PaulJackson.20040823012055">``char *cpuset_getcpusetpath(pid_t pid, char *buf, size_t size);``
------------------------------------------------------------------

    The ``cpuset_getcpusetpath()`` function copies an absolute pathname of the cpuset to
    which task of process id ``pid`` is attached, to the array pointed to by
    ``buf``, which is of length ``size``.  Use pid == 0 for the current process.

    The provided path is relative to the cpuset file system mount point.

    If the cpuset path name would require a buffer longer than size elements,
    ``NULL`` is returned, and ``errno`` is set to ``ERANGE`` an application should check
    for this error, and allocate a larger buffer if necessary.

    Returns ``NULL`` on failure with ``errno`` set accordingly, and buf on success.
    The contents of buf are undefined on error.

    ERRORS

       ``EACCES`` Permission to read or search a component of the file name was denied.

       ``EFAULT`` buf points to a bad address.

       ``ESRCH``  The pid does not exist.

       ``E2BIG``  Larger buffer needed.

       ``ENOSYS`` Kernel does not support cpusets.
       </t>
<t tx="PaulJackson.20040823181111">``int cpuset_set_sopt(struct cpuset *cp, const char *optionname, const char *value);``
--------------------------------------------------------------------------------------

    Sets cpuset string valued option ``optionname`` to specified string value.
    Returns 0 if ``optionname`` is recognized and value is an allowed value for
    that option. Returns -1 if ``optionname`` is recognized, but value is not
    allowed. Returns -2 if ``optionname`` is not recognized.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20040823181313">``const char *cpuset_get_sopt(const struct cpuset *cp, const char *optionname);``
---------------------------------------------------------------------------------

    Query value of string option ``optionname`` in cpuset ``cp``. Returns
    pointer to value of ``optionname`` if it is recognized, else returns ``NULL``.
    String values in an uninitialized cpuset have value ``NULL``.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20040823225452">``int cpuset_pidlist_length(const struct cpuset_pidlist *pl);``
---------------------------------------------------------------

    Return the number of elements (PIDs) in cpuset_pidlist ``pl``.

</t>
<t tx="PaulJackson.20040824000303">There are multiple ways to use cpusets, including:

  * They can be queried and changed from a shell prompt,
    using such command line utilities as ``echo``, ``cat``,
    ``mkdir`` and ``ls``.

  * They can be queried and changed via the **libcpuset**
    'C' programming API.  The primary emphasis of this document
    is on the 'C' API.

This section describes the use of cpusets using shell commands.

One convenient way to learn how cpusets work is to experiment
with them at the shell prompt, before doing extensive 'C'
coding.

Note that there is one significant difference between these two interfaces.

Modifying the CPUs in a cpuset the shell prompt requires an additional step, due
to intentional limitations in the kernel support for cpusets. The
cpuset_reattach_ routine can be used to perform this step when using
**libcpuset**.  The extra step consists of writing the pid of each task attached
to that cpuset back into the cpusets ``tasks`` file:

    In order to minimize the impact of cpusets on critical kernel
    code, such as the scheduler, and due to the fact that the kernel
    does not support one task updating the memory placement of another
    task directly, the impact on a task of changing its cpuset CPU
    or memory node placement, or of changing to which cpuset a task
    is attached, is subtle.

    If a cpuset has its memory nodes modified, then for each task attached
    to that cpuset, the next time that the kernel attempts to allocate
    a page of memory for that task, the kernel will notice the change
    in the tasks cpuset, and update its per-task memory placement to
    remain within the new cpusets memory placement.  If the task was using
    mempolicy MPOL_BIND, and the nodes to which it was bound overlap with
    its new cpuset, then the task will continue to use whatever subset
    of MPOL_BIND nodes are still allowed in the new cpuset.  If the task
    was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
    in the new cpuset, then the task will be essentially treated as if it
    was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
    as queried by get_mempolicy(), doesn't change).  If a task is moved
    from one cpuset to another, then the kernel will adjust the tasks
    memory placement, as above, the next time that the kernel attempts
    to allocate a page of memory for that task.

    If a cpuset has its CPUs modified, then each task using that cpuset does
    _not_ change its behavior automatically. In order to minimize the impact on
    the critical scheduling code in the kernel, tasks will continue to use their
    prior CPU placement until they are rebound to their cpuset, by rewriting
    their pid to the 'tasks' file of their cpuset. If a task had been bound to
    some subset of its cpuset using the sched_setaffinity() call, the effect of
    this is lost on the rebinding. The rebound tasks ``cpus_allowed`` is set to
    include all ``cpus`` in the tasks new cpuset. If a task is moved from one
    cpuset to another, its CPU placement is updated in the same way as if the
    tasks pid is rewritten to the 'tasks' file of its current cpuset.

    In summary, the memory placement of a task whose cpuset is changed is
    automatically updated by the kernel, on the next allocation of a page for
    that task, but the processor placement is not updated, until that tasks pid
    is rewritten to the 'tasks' file of its cpuset. The delay in rebinding a
    tasks memory placement is necessary because the kernel does not support one
    task changing another tasks memory placement. The added user level step in
    rebinding a tasks CPU placement is necessary to avoid impacting the
    scheduler code in the kernel with a check for changes in a tasks processor
    placement.

To create a new cpuset and attach the current command shell to it, the steps are:

 1) mkdir /dev/cpuset (if not already done)
 2) mount -t cpuset cpuset /dev/cpuset (if not already done)
 3) Create the new cpuset using ``mkdir(1)``.
 4) Assign CPUs and memory nodes to the new cpuset.
 5) Attach the shell to the new cpuset.

For example, the following sequence of commands will setup a cpuset
named "Charlie", containing just CPUs 2 and 3, and memory node 1,
and then attach the current shell to that cpuset:

 ::

  mkdir /dev/cpuset
  mount -t cpuset cpuset /dev/cpuset
  cd /dev/cpuset
  mkdir Charlie
  cd Charlie
  /bin/echo 2-3 &gt; cpus
  /bin/echo 1 &gt; mems
  /bin/echo $$ &gt; tasks
  # The current shell is now running in cpuset Charlie
  # The next line should display '/Charlie'
  cat /proc/self/cpuset


Migrating a job (the set of tasks attached to a cpuset) to different CPUs and
memory nodes in the system, including moving the memory pages currently
allocated to that job, can be done as follows.  Lets say you want to move the job
in cpuset *alpha* (CPUs 4-7 and memory nodes 2-3) to a new cpuset *beta* (CPUs
16-19 and memory nodes 8-9).

  1) First create the new cpuset *beta*.
  2) Then allow CPUs 16-19 and memory nodes 8-9 in *beta*.
  3) Then enable ``memory_migration`` in *beta*.
  4) Then move each task from *alpha* to *beta*.

The following sequence of commands accomplishes this::

      cd /dev/cpuset
      mkdir beta
      cd beta
      /bin/echo 16-19 &gt; cpus
      /bin/echo 8-9 &gt; mems
      /bin/echo 1 &gt; memory_migrate
      while read i; do /bin/echo $i; done &lt; ../alpha/tasks &gt; tasks

The  above  should move any tasks in *alpha* to *beta*, and any memory held by these
tasks on memory nodes 2-3 to memory nodes 8-9, respectively.

Notice that the last step of the above sequence did **not** do::

      cp ../alpha/tasks tasks     # Doesn't work (ignores all but first task)

The **while** loop, rather than the seemingly easier use of the ``cp(1)``
command, is necessary because only one task PID at a time may be written to the
``tasks`` file.

The same affect (writing one pid at a time) as the **while** loop can be
accomplished more efficiently, in fewer keystrokes and in syntax that works in
any shell, but alas more obscurely, by using the **sed -u [unbuffered]** option::

      sed -un p &lt; ../alpha/tasks &gt; tasks
</t>
<t tx="PaulJackson.20040825131747">``int cpuset_cpus_nbits();``
---------------------------------------------------------------

    Return the number of bits in a CPU bitmask on current system.
    Useful when using ``bitmask_alloc()`` to allocate a CPU mask.
    Some other routines below return ``cpuset_cpus_nbits()`` as
    an out-of-bounds indicator.

</t>
<t tx="PaulJackson.20040825131747.1">``int cpuset_mems_nbits();``
---------------------------------------------------------------

    Return the number of bits in a memory node bitmask on current system.
    Useful when using ``bitmask_alloc()`` to allocate a memory node mask.
    Some other routines below return ``cpuset_mems_nbits()`` as
    an out-of-bounds indicator.
</t>
<t tx="PaulJackson.20040901220706">``int cpuset_cpubind(int cpu);``
--------------------------------

  Bind the scheduling of the current task to CPU ``cpu``, using the
  sched_setaffinity(2) system call.

  Fails with a return of -1, and ``errno`` set to ``EINVAL``, if ``cpu`` is not allowed in
  the current tasks cpuset.

  The following code will bind the scheduling of a thread to the n-th CPU of the
  current cpuset:

    ::

        /*
         * Confine current task to only run on the n-th CPU
         * of its current cpuset. In a cpuset of N CPUs,
         * valid values for n are 0 .. N-1.
         */
        cpuset_cpubind(cpuset_p_rel_to_sys_cpu(0, n));
</t>
<t tx="PaulJackson.20040901220706.1">``int cpuset_membind(int mem);``
--------------------------------

  Bind the memory allocation of the current task to memory node ``mem``, using
  the set_mempolicy(2) system call with a policy of MPOL_BIND.

  Fails with a return of -1, and ``errno`` set to ``EINVAL``, if ``mem`` is not allowed in
  the current tasks cpuset.

  The following code will bind the memory allocation of a thread to the n-th
  memory node of the current cpuset:

    ::

        /*
         * Confine current task to only allocate memory on
         * n-th node of its current cpuset.  In a cpuset of
         * N memory nodes, valid values for n are 0 .. N-1.
         */
        cpuset_membind(cpuset_p_rel_to_sys_mem(0, n));
    </t>
<t tx="PaulJackson.20040923180054">``int cpuset_c_rel_to_sys_cpu(const struct cpuset *cp, int cpu);``
------------------------------------------------------------------

  Return the system-wide CPU number that is used by the ``cpu``-th CPU
  of the specified cpuset ``cp``.  Returns result of ``cpuset_cpus_nbits()``
  if ``cpu`` is not in the range [``0``, ``bitmask_weight(cpuset_getcpus(cp))``).

</t>
<t tx="PaulJackson.20040923180054.1">``int cpuset_c_sys_to_rel_cpu(const struct cpuset *cp, int cpu);``
------------------------------------------------------------------

  Return the ``cpu``-th CPU of the specified cpuset ``cp`` that is
  used by the system-wide CPU number.  Returns result of ``cpuset_cpus_nbits()``
  if ``bitmask_isbitset(cpuset_getcpus(cp), cpu)`` is false.

</t>
<t tx="PaulJackson.20040923180054.2">``int cpuset_c_rel_to_sys_mem(const struct cpuset *cp, int mem);``
------------------------------------------------------------------

  Return the system-wide memory node number that is used by the ``mem``-th
  memory node of the specified cpuset ``cp``.  Returns result of ``cpuset_mems_nbits()``
  if ``mem`` is not in the range [``0``, ``bitmask_weight(cpuset_getmems(cp))``).
</t>
<t tx="PaulJackson.20040923180054.3">``int cpuset_c_sys_to_rel_mem(const struct cpuset *cp, int mem);``
------------------------------------------------------------------

  Return the ``mem``-th memory node of the specified cpuset ``cp`` that is
  used by the system-wide memory node number.  Returns result of ``cpuset_mems_nbits()``
  if ``bitmask_isbitset(cpuset_getmems(cp), mem)`` is false.

</t>
<t tx="PaulJackson.20040924014236">``int cpuset_reattach(const char *cpusetpath);``
-----------------------------------------------------

    Reattach all tasks in cpuset ``cpusetpath`` to itself. This additional step
    is necessary anytime that the ``cpus`` of a cpuset have been changed, in
    order to rebind the ``cpus_allowed`` of each task in the cpuset to the new
    value. This routine writes the pid of each task currently attached to the
    named cpuset to the ``tasks`` file of that cpuset. If additional tasks are
    being spawned too rapidly into the cpuset at the same time, there is an
    unavoidable race condition, and some tasks may be missed.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    Returns 0 on success, else -1 on error, setting ``errno``.

</t>
<t tx="PaulJackson.20040924152444">``int cpuset_cpusetofpid(struct cpuset *cp, pid_t pid);``
---------------------------------------------------------

    Set struct cpuset to settings of cpuset to which specified task ``pid`` is attached.
    ``Struct cpuset *cp`` must have been returned by a previous ``cpuset_alloc()``
    call. Any previous settings of ``cp`` are lost.

    Returns 0 on success, or -1 on error, setting ``errno``.

    ERRORS

       ``EACCES`` Permission to read or search a component of the file name was denied.

       ``EFAULT`` buf points to a bad address.

       ``ESRCH``  The pid does not exist.

       ``ERANGE`` Larger buffer needed.

       ``ENOSYS`` Kernel does not support cpusets.</t>
<t tx="PaulJackson.20040928122625">``int cpuset_cpus_weight(const struct cpuset *cp);``
----------------------------------------------------

    Query number of CPUs in cpuset ``cp``. Pass ``cp`` == NULL to query the
    current tasks cpuset

    If the CPUs have not been set in cpuset ``cp``, then zero(0) is
    returned.
</t>
<t tx="PaulJackson.20040928122625.1">``int cpuset_mems_weight(const struct cpuset *cp);``
----------------------------------------------------

    Query number of memory nodes in cpuset ``cp``.
    Pass ``cp`` == NULL to query the current tasks cpuset

    If the memory nodes have not been set in cpuset ``cp``, then
    zero (0) is returned.
</t>
<t tx="PaulJackson.20040930225850">``int cpuset_latestcpu(pid_t pid);``
------------------------------------

    Return the most recent CPU on which the specified task ``pid`` executed.
    If pid is 0, examine current task.

    The ``cpuset_latestcpu()`` call returns the number of the CPU on which the
    specified task ``pid`` most recently executed. If a process can be scheduled
    on two or more CPUs, then the results of ``cpuset_lastcpu()`` may become
    invalid even before the query returns to the invoking user code.

    The last used CPU is visible for a given pid as field #39 (starting with #1)
    in the file /proc/pid/stat. Currently this file has 41 fields, so its the
    3rd to the last field.
</t>
<t tx="PaulJackson.20041007004400">``unsigned int cpuset_cpumemdist(int cpu, int mem);``
-----------------------------------------------------

    Distance between hardware CPU ``cpu`` and memory node ``mem``, on a scale
    which has the closest distance of a CPU to its local memory valued at ten
    (10), and other distances more or less proportional.  Distance is a rough
    metric of the bandwidth and delay combined, where a higher distance means
    lower bandwidth and longer delays.

    If either ``cpu`` or ``mem`` is not known to the current system, or if any
    internal error occurs while evaluating this distance, or if a node has no
    CPUs nor memory (I/O only), then the distance returned is UCHAR_MAX (from
    limits.h).

    These distances are obtained from the systems ACPI SLIT table, and should
    conform to:

      `System Locality Information Table Interface Specification`_ LINEBREAK
      Version 1.0, July 25, 2003

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.

.. _System Locality Information Table Interface Specification: http://h21007.www2.hp.com/dspp/files/unprotected/devresource/Docs/TechPapers/IA64/slit.pdf
</t>
<t tx="PaulJackson.20041013094450">``int cpuset_p_rel_to_sys_cpu(pid_t pid, int cpu);``
----------------------------------------------------

  Return the system-wide CPU number that is used by the ``cpu``-th CPU
  of the cpuset to which task ``pid`` is attached.  Returns result of ``cpuset_cpus_nbits()``
  if that cpuset doesn't encompass that relative cpu number.

</t>
<t tx="PaulJackson.20041013094458">``int cpuset_p_sys_to_rel_cpu(pid_t pid, int cpu);``
----------------------------------------------------

  Return the ``cpu``-th CPU of the cpuset to which task ``pid`` is attached that is
  used by the system-wide CPU number.  Returns result of ``cpuset_cpus_nbits()``
  if that cpuset doesn't encompass that system-wide cpu number.

</t>
<t tx="PaulJackson.20041013094507">``int cpuset_p_rel_to_sys_mem(pid_t pid, int mem);``
----------------------------------------------------

  Return the system-wide memory node number that is used by the ``mem``-th
  memory node of the cpuset to which task ``pid`` is attached.  Returns result
  of ``cpuset_mems_nbits()`` if that cpuset doesn't encompass that relative
  memory node number.
</t>
<t tx="PaulJackson.20041013095901">``int cpuset_p_sys_to_rel_mem(pid_t pid, int mem);``
----------------------------------------------------

  Return the ``mem``-th memory node of the cpuset to which task ``pid`` is attached that is
  used by the system-wide memory node number.  Returns result of ``cpuset_mems_nbits()``
  if that cpuset doesn't encompass that system-wide memory node.

</t>
<t tx="PaulJackson.20041015125831">``int cpuset_export(const struct cpuset *cp, char *buf, int buflen);``
----------------------------------------------------------------------

    Write the settings of cpuset ``cp`` to ``file``.  If no file exists
    at the path specified by ``file``, create one.  If a file already
    exists there, overwrite it.

    Returns -1 and sets ``errno`` on error. Upon successful return, returns the
    number of characters printed (not including the trailing '\0' used to end
    output to strings). The function cpuset_export does not write more than size
    bytes (including the trailing '\0'). If the output was truncated due to this
    limit then the return value is the number of characters (not including the
    trailing '\0') which would have been written to the final string if enough
    space had been available. Thus, a return value of size or more means that
    the output was truncated.

    See `Cpuset Text Format`_ for details of the format of an exported
    cpuset file.
    </t>
<t tx="PaulJackson.20041015130034">``int cpuset_import(struct cpuset *cp, const char *file, int *errlinenum_ptr, char *errmsg_bufptr, int errmsg_buflen);``
--------------------------------------------------------------------------------------------------------------------------

    Read the settings of cpuset ``cp`` from ``file``.  If no file exists
    at the path specified by ``file``, create one.  If a file already
    exists there, overwrite it.

    ``Struct cpuset *cp`` must have been returned by a previous ``cpuset_alloc()``
    call. Any previous settings of ``cp`` are lost.

    Returns 0 on success, or -1 on error, setting ``errno``. Errors include
    ``file`` not referencing a readable file.

    If parsing errors are encountered reading the file, and if errlinenum_ptr
    is not NULL, then the number of the first line (numbers start with one)
    with an error is written to \*errlinenum_ptr.  If an error occurs on the open,
    and errlinenum_ptr is not NULL, then zero is written to \*errlinenum_ptr.

    If parsing errors are encountered reading the file and if errmsg_bufptr
    is not NULL, then it is presumed to point to a character buffer of at
    least errmsg_buflen characters, and a nul terminated error message is
    written to \*errmsg_bufptr providing a human readable error message
    explaining the error message in more detail.  As of this writing, the
    possible error messages are:

        * "Token 'CPU' requires list"
        * "Token 'MEM' requires list"
        * "Invalid list format: %s"
        * "Unrecognized token: %s"
        * "Insufficient memory"

    See `Cpuset Text Format`_ for details of the format required for imported
    cpuset file.
</t>
<t tx="PaulJackson.20041015222637">Cpuset settings may be exported to, and imported from, text files using
a text format representation of cpusets.

The permissions of files holding these text
representations have no special significance to the implementation of cpusets.
Rather, the permissions of the special cpuset files in the cpuset file system,
normally mounted at ``/dev/cpuset``, control reading and writing of and attaching to
cpusets.

The text representation of cpusets is not essential to the use of
cpusets. One can directly manipulate the special files in the cpuset file
system.  This text representation provides an alternative that may be
convenient for some uses.

The cpuset text format supports one directive per line. Comments begin with the
'#' character and extend to the end of line.

After stripping comments, the first white space separated token on each
remaining line selects from the following possible directives:

 cpus
    Specify which CPUs are in this cpuset.  The second token
    on the line must be a comma-separated list of CPU numbers
    and ranges of numbers, optionally modified by a stride operator
    (see below).

 mems
    Specify which memory nodes are in this cpuset.  The second
    token on the line must be a comma-separated list of memory
    node numbers and ranges of numbers, optionally modified by a
    stride operator (see below).

 cpu_exclusive
    The cpu_exclusive flag is set.

 mem_exclusive
    The mem_exclusive flag is set.

 notify_on_release
    The notify_on_release flag is set

Additional unnecessary tokens on a line are quietly ignored. Lines containing
only comments and white space are ignored.

The token "cpu" is allowed for "cpus", and "mem" for "mems". Matching is case
insensitive.

The `stride operator` for "cpus" and "mems" values is used to designate every
N-th CPU or memory node in a range of numbers.  It is written as a colon
**":"** followed by the number N, with no spaces on either side of the colon.
For example, the range ``0-31:2`` designates the 16 even numbers 0, 2, 4, ... 30.

The **libcpuset** routines cpuset_import_ and cpuset_export_ to handle converting
the internal 'struct cpuset' representation of cpusets to (export) and from (import)
this text representation.</t>
<t tx="PaulJackson.20041104233632">The basic cpuset API provides functions usable from 'C' for processor and
memory placement within a cpuset.

These functions enable an application to place various threads of its execution
on specific CPUs within its current cpuset, and perform related functions such
as asking how large the current cpuset is, and on which CPU within the current
cpuset a thread is currently executing.

These functions do not provide the full power of the advanced cpuset API, but
they are easier to use, and provide some common needs of multi-threaded
applications.

Unlike the rest of this document, the functions described in this section use
cpuset relative numbering. In a cpuset of N CPUs, the relative cpu numbers range
from zero to N - 1.

Unlike the underlying system calls ``sched_setaffinity``, ``mbind`` or
``set_mempolicy``, these basic cpuset API functions are robust in the
presence of cpuset migration.  If you pin a thread in your job to one of
the CPUs in your jobs cpuset, it will stay properly pinned even if your
jobs cpuset is migrated later to another set of CPUs and memory nodes,
or even if the migration is occurring at the same time as your calls.

memory placement is done automatically, preferring the node local to the
requested CPU. Threads may only be placed on a single CPU. This avoids the need
to allocate and free the bitmasks required to specify a set of several CPUs.
These functions do not support creating or removing cpusets, only the placement
of threads within an existing cpuset. This avoids the need to explicitly
allocate and free cpuset structures. Operations only apply to the current
thread, avoiding the need to pass the process id of the thread to be affected.

If more powerful capabilities are needed, use the `Advanced Cpuset Library
Functions`_. These basic functions do not provide any essential new capability.
They are implemented using the advanced functions, and are fully interoperable
with them.

On error, these routines return -1 and set ``errno``. If invoked on an operating
system kernel that does not support cpusets, ``errno`` is set to ``ENOSYS``. If invoked
on a system that supports cpusets, but when the cpuset file system is not
currently mounted at ``/dev/cpuset``, then ``errno`` is set to ``ENODEV``.

The following inclusion and linkage provides access to the cpuset API
from 'C' code:

  ::

   #include &lt;cpuset.h&gt;
   /* link with -lcpuset */

The following functions are supported in the basic cpuset 'C' API:

  * cpuset_pin_ - Pin the current thread to a CPU, preferring local memory
  * cpuset_size_ - Return the number of CPUs are in the current tasks cpuset
  * cpuset_where_ - On which CPU in current tasks cpuset did the task most recently execute
  * cpuset_unpin_ - Remove affect of `cpuset_pin`_, let task have run of its entire cpuset
</t>
<t tx="PaulJackson.20041105003017">``int cpuset_pin(int relcpu);``
-------------------------------

    Pin the current task to execute only on the CPU ``relcpu``, which is a
    relative CPU number within the current cpuset of that task. Also,
    automatically pin the memory allowed to be used by the current task to
    prefer the memory on that same node (as determined by the cpuset_cpu2node_
    function), but to allow any memory in the cpuset if no free memory is readily
    available on the same node.

    Return 0 on success, -1 on error. Errors include ``relcpu`` being too large
    (greater than cpuset_size() - 1). They also include running on a system that
    doesn't support cpusets (``ENOSYS``) and running when the cpuset file system is
    not mounted at ``/dev/cpuset`` (``ENODEV``).

</t>
<t tx="PaulJackson.20041105003017.1">``int cpuset_size();``
----------------------

    Return the number of CPUs in the current tasks cpuset. The relative CPU
    numbers that are passed to the cpuset_pin_ function and that are returned by
    the cpuset_where_ function, must be between 0 and N - 1 inclusive, where N
    is the value returned by cpuset_size_.

    Returns -1 and sets ``errno`` on error. Errors include running on a system that
    doesn't support cpusets (``ENOSYS``) and running when the cpuset file system is
    not mounted at ``/dev/cpuset`` (``ENODEV``).
</t>
<t tx="PaulJackson.20041105003017.2">``int cpuset_where();``
-----------------------

    Return the CPU number, relative to the current tasks cpuset, of the CPU on
    which the current task most recently executed. If a task is allowed to
    execute on more than one CPU, then there is no guarantee that the task is
    still executing on the CPU returned by cpuset_where_, by the time that the
    user code obtains the return value.

    Returns -1 and sets ``errno`` on error. Errors include running on a system that
    doesn't support cpusets (``ENOSYS``) and running when the cpuset file system is
    not mounted at ``/dev/cpuset`` (``ENODEV``).

</t>
<t tx="PaulJackson.20041105003017.3">``int cpuset_unpin();``
------------------------

    Remove the CPU and memory pinning affects of any previous cpuset_pin_ call,
    allowing the current task to execute on any CPU in its current cpuset and to
    allocate memory on any memory node in its current cpuset. Return -1 on
    error, 0 on success.

    Returns -1 and sets ``errno`` on error. Errors include running on a system that
    doesn't support cpusets (``ENOSYS``) and running when the cpuset file system is
    not mounted at ``/dev/cpuset`` (``ENODEV``).
</t>
<t tx="PaulJackson.20041107213838">``int cpuset_cpu2node(int cpu);``
---------------------------------

    Return system wide number of memory node closest to CPU ``cpu``.
    For NUMA architectures common as of this writing, this
    would be the number of the node on which ``cpu`` is
    located.  If an architecture did not have memory on
    the same node as a CPU, then it would be the node number
    of the memory node closest to or preferred by that ``cpu``.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20041107213920">Here is the history of changes to this document and associated software.

- 2006-04-26  Paul Jackson  &lt;pj@sgi.com&gt;

    * First published.

- 2006-11-14  Paul Jackson  &lt;pj@sgi.com&gt; Version 1 (`cpuset_version`_\ ())

    * Added `cpuset_nuke`_ routine.
    * Added various ``cpuset_fts_*`` routines.
    * Added various ``cpuset_*_affinity`` routines.
    * Added `cpuset_move_cpuset_tasks`_ routine.
    * Converted from using ``ftw(3)`` to ``fts(3)``.
    * Various minor documentation errors fixed.
    * Added `cpuset_version`_ routine.
    * Improved error checking by `cpuset_move_all`_.
    * Correct system call numbering for sched_setaffinity on i386 arch.
    * Correct `cpuset_latestcpu`_ result if command basename has space character.
    * Remove 256 byte limit on `cpuset_export`_ output.

- 2007-01-14  Paul Jackson &lt;pj@sgi.com&gt; Version 2 (`cpuset_version`_\ ())

    * Fix `cpuset_create`_, `cpuset_modify`_ to not zero undefined attributes
      such as ``memory_spread_page``, ``memory_spread_slab`` and
      ``notify_on_release``. See further the explanation of an "*undefined*
      mark", in `Cpuset Programming Model`_. `cpuset_create`_ now respects
      default kernel cpuset creation settings, and `cpuset_modify`_ now respects
      the existing settings of the cpuset, unless explicitly set.

- 2007-04-01  Paul Jackson &lt;pj@sgi.com&gt; Version 3 (`cpuset_version`_\ ())

    * Fix `cpuset_setcpus`_\ () and `cpuset_setmems`_\ () to mark the
      ``cpus`` and ``mems`` fields as defined, so that setting them
      before doing a `cpuset_create`_\ () has affect.
</t>
<t tx="PaulJackson.20050114221828">Cpusets have a permission structure which determines which users have rights to
query, modify and attach to any given cpuset. The permissions of a cpuset are
determined by the permissions of the special files and directories in the cpuset
file system.  The cpuset filesystem is normally mounted at ``/dev/cpuset``.

The directory representing each cpuset, and the special per-cpuset files in each
such directory, both have traditional ``Unix`` hierarchical file system
permissions for read, write and execute (or search), for each of the owning
user, the owning group, and all others.

For instance, a task can put itself in some other cpuset (than its current one)
if it can write the tasks file for that cpuset (requires execute permission on
the encompassing directories and write permission on that tasks file).

Because the cpuset file system is not persistent, changes in permissions, and
even the existence of cpusets other than the root cpuset, are not preserved
across system reboots.

An additional constraint is applied to requests to place some other task in a
cpuset. One task may not attach another to a cpuset unless it would have
permission to send that task a signal.

A task may create a child cpuset if it can access and write the parent cpuset
directory. It can modify the CPUs or memory nodes in a cpuset if it can access
that cpusets directory (execute permissions on the encompassing directories) and
write the corresponding ``cpus`` or ``mems`` file.

It should be noted, however, that changes to the CPUs of a cpuset do not apply
to any task in that cpuset until the task is reattached to that cpuset. If a
task can write the cpus file, it should also be able to write the tasks file and
might be expected to have permission to reattach the tasks therein (equivalent
to permission to send them a signal).

Some utilities and libraries read the cpuset special files of the cpuset in
which the calling task is placed, and may be hindered if a task is placed in a
cpuset that is not owned by the task's user-id, unless special care is taken to
allow that task continued read access to its own cpuset special files (and
search access to the directory containing them).  For example, the Message
Passing Toolkit's MPI library's placement functionality activated through the
use of MPI_DSM_DISTRIBUTE will fail to place ranks correctly if it cannot read
the cpuset special files through insufficient permissions on the cpuset
directory or the special files within it.

There is one minor difference between the manner in which cpuset path
permissions are evaluated by ``libcpuset`` and the manner in which file system
operation permissions are evaluated by direct system calls.  System calls that
operate on file pathnames, such as the ``open(2)`` system call, rely on direct
kernel support for a tasks current directory. Therefore such calls can
successfully operate on files in or below a tasks current directory even if the
task lacks search permission on some ancestor directory. Calls in ``libcpuset``
that operate on cpuset pathnames, such as the ``cpuset_query()`` call, rely on
``libcpuset`` internal conversion of all cpuset pathnames to full, root-based
paths, so cannot successfully operate on a cpuset unless the task has search
permission on all ancestor directories, starting with the usual cpuset mount
point (``/dev/cpuset``.)</t>
<t tx="PaulJackson.20050324125230">``int cpuset_addr2node(void *addr);``
-------------------------------------

    Return system wide number of memory node on which is located the
    physical page of memory at virtual address ``addr`` of
    the current tasks address space.  Returns -1 if ``addr``
    is not a valid address in the address space of the current
    process, with ``errno`` set to EFAULT.  If the referenced
    physical page was not allocated (faulted in) by the kernel
    prior to this call, it will be during the call.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20050404224742">The Linux kernel implementation of cpusets sets ``errno`` to specify the reason
for a failed system call affecting cpusets. These ``errno`` values are available
when a cpuset library call fails. Most of these values can also be displayed by
shell commands used to directly manipulate files below ``/dev/cpuset``.

The possible ``errno`` settings and their meaning when set on a failed cpuset call
are as listed below.

  ENOSYS
    Invoked on an operating system kernel that does not support cpusets.

  ENODEV
    Invoked on a system that supports
    cpusets, but when the cpuset file system is not currently
    mounted at
    ``/dev/cpuset``.

  ENOMEM
    Insufficient memory is available.

  EBUSY
    Attempted
    ``cpuset_delete()``
    on a cpuset with attached tasks.

  EBUSY
    Attempted
    ``cpuset_delete()``
    on a cpuset with child cpusets.

  ENOENT
    Attempted
    ``cpuset_create()``
    in a parent cpuset that doesn't exist.

  EEXIST
    Attempted
    ``cpuset_create()``
    for a cpuset that already exists.

  EEXIST
    Attempted to ``rename(2)`` a cpuset to a name that already exists.

  ENOTDIR
    Attempted to ``rename(2)`` a non-existent cpuset.


  E2BIG
    Attempted a
    ``write(2)``
    system  call on a special cpuset file
    with a length larger than some kernel determined upper
    limit on the length of such writes.

  ESRCH
    Attempted to
    ``cpuset_move()``
    a non-existent task.

  EACCES
    Attempted to
    ``cpuset_move()``
    a task which one lacks permission to move.

  EACCES
    Attempted to ``write(2)`` a ``memory_pressure`` file.

  ENOSPC
    Attempted to ``cpuset_move()`` a task to
    an empty cpuset.

  EINVAL
    The
    ``relcpu``
    argument to
    ``cpuset_pin()``
    is out of range (not between
    "zero" and "cpuset_size() - 1").

  EINVAL
    Attempted to change a cpuset in a way that would
    violate a cpu_exclusive or mem_exclusive attribute
    of that cpuset or any of its siblings.

  EINVAL
    Attempted to write an empty cpus or mems bitmask
    to the kernel.  The kernel creates new cpusets
    (via mkdir) with empty cpus and mems, and the
    user level cpuset and bitmask code works with
    empty masks.  But the kernel will not allow an
    empty bitmask (no bits set) to be written to the
    special cpus or mems files of a cpuset.

  EIO
    Attempted to
    ``write(2)``
    a string to a cpuset tasks file that
    does not begin with an ASCII decimal
    integer.

  EIO
    Attempted to ``rename(2)`` a cpuset outside of its current directory.

  ENOSPC
    Attempted to ``write(2)`` a list to a cpus
    file that did not include any online cpus.

  ENOSPC
    Attempted to ``write(2)`` a list to a mems
    file that did not include any online memory
    nodes.

  EACCES
    Attempted to add a cpu or mem to a cpuset that is
    not already in its parent.

  EACCES
    Attempted to set cpu_exclusive or mem_exclusive
    on a cpuset whose parent lacks the same setting.

  ENODEV
    The cpuset was removed by another task at the same
    time as a ``write(2)`` was attempted on one of the
    special files in the cpuset directory.

  EBUSY
    Attempted to remove a cpu or mem from a cpuset
    that is also in a child of that cpuset.

  EFAULT
    Attempted to read or write a cpuset file using
    a buffer that was outside your accessible address space.

  ENAMETOOLONG
    Attempted to read a ``/proc/&lt;pid&gt;/cpuset`` file for
    a cpuset path that was longer than the kernel page
    size.

  ENAMETOOLONG
    Attempted to create a cpuset whose base directory
    name was longer than 255 characters.

  ENAMETOOLONG
    Attempted to create a cpuset whose full pathname
    including the "/dev/cpuset" is longer than 4095
    characters.

  ENXIO
    Attempted to create a ``cpu_exclusive`` cpuset whose
    ``cpus`` covered just part of one or more physical
    processor packages, such as including just one of the
    two Cores on a package.  For Linux kernel version 2.6.16
    on i386 and x86_64, this operation is rejected with this
    error to avoid a fatal kernel bug.  Otherwise, this is a
    normal and supported operation.

  EINVAL
    Specified a cpus or mems list to the kernel which
    included a range with the second number smaller than
    the first number.

  EINVAL
    Specified a cpus or mems list to the kernel which
    included an invalid character in the string.

  ERANGE
    Specified a cpus or mems list to the kernel which
    included a number too large for the kernel to set
    in its bitmasks.

  ETIME
    Time limit for `cpuset_nuke`_ operation reached without successful
    completion of operation.

  ENOTEMPTY
    Tasks remain after multiple attempts by `cpuset_move_cpuset_tasks`_
    to move them to a different cpuset.

  EPERM
    Lacked permission to kill (send a signal to) a task.

  EPERM
    Lacked permission to read a cpuset or its files.

  EPERM
    Attempted to unlink a per-cpuset file.  Such files can not be
    unlinked.  They can only be removed by removing (rmdir) the
    directory representing the cpuset that contains these files.
</t>
<t tx="PaulJackson.20050405065349">``const char *cpuset_mountpoint();``
------------------------------------

    Return the filesystem path at which the cpuset file system is
    mounted.  The current implementation of this routine returns
    ``/dev/cpuset``, or  the string ``[cpuset filesystem not mounted]``
    if the cpuset file system is not mounted, or the string
    ``[cpuset filesystem not supported]`` if the system does not
    support cpusets.

    In general, if the first character of the return string is a
    slash (``/``) then the result is the mount point of the cpuset
    file system, otherwise the result is an error message string.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20050922011133">`Threading` in a software application splits instructions into multiple
streams so that multiple processors can act on them.

Hyper-Threading (HT) Technology, developed by Intel Corporation,
provides thread-level parallelism on each processor, resulting in more
efficient use of processor resources, higher processing throughput,
and improved performance. One physical CPU can appear as two logical CPUs
by having additional registers to overlap two instruction streams or
a single processor can have dual-cores executing instructions in parallel.

In addition to their traditional use to control the placement of
jobs on the CPUs and memory nodes of a system, cpusets also provides
a convenient mechanism to control the use of Hyper-Threading.

Some jobs achieve better performance using both of the Hyper-Thread
sides, A and B, of a processor core, and some run better using just
one of the sides, allowing the other side to idle.

Since each logical (Hyper-Threaded) processor in a core has a distinct
CPU number, one can easily specify a cpuset that contains both sides, or
contains just one side from each of the processor cores in the cpuset.

Cpusets can be configured to include any combination of the logical
CPUs in a system.

For example, the following cpuset configuration file called ``cpuset.cfg``
includes the A sides of an HT enabled system, along with all the memory,
on the first 32 nodes (assuming 2 cores per node). The colon ':' prefixes
the stride.  The stride of '2' in this example means use every other logical
CPU.

    ::

        cpus 0-127:2    # even numbered CPUs 0, 2, 4, ... 126
        mems 0-31       # memory nodes 0, 1, 2, ... 31


To create a cpuset called ``foo`` and run a command called ``bar`` in
that cpuset, defined by the cpuset configuration file ``cpuset.cfg``
shown above, use the following commands:

    ::

        cpuset -c /foo &lt; cpuset.cfg
        cpuset -i /foo -I bar

To specify both sides of the first 64 cores, use the following entry
in your cpuset configuration file:

    ::

        cpus 0-127

To specify just the B sides, use the following entry in your cpuset
configuration file:

    ::

        cpus 1-127:2

The examples above assume that CPUs are uniformly numbered, with the even
numbers for the A side and odd numbers for the B side. This is usually the case,
but not guaranteed. One could still place a job on a system that was not
uniformly numbered, but currently it would involve a longer argument list to the
``cpus`` option, explicitly listing the desired CPUs.

Typically, the logical numbering of CPUs puts the even numbered CPUs on the A
sides, and the odd numbered CPUs on the B side. The stride operator (":2", above)
makes it easy to specify that only every other side will be used. If the CPU
number range starts with an even number, this will be the A sides, and if the
range starts with an odd number, this will be the B sides.

A ``ps(1)`` or ``top(1)`` invocation will show a handful of threads on unused
CPUs, but these are kernel threads assigned to every CPU in support of user
applications running on those CPUs, to handle tasks such as asynchronous file
system writes and task migration between CPUs. If no application is actually
using a CPU, then the kernel threads on that CPU will be almost always idle
and will consume very little memory.


</t>
<t tx="PaulJackson.20060430161249">If the ``notify_on_release`` flag is enabled (1) in a cpuset, then whenever the last
task  in the cpuset leaves (exits or attaches to some other cpuset) and the last
child cpuset of that cpuset  is  removed, the  kernel  runs  the  command
``/sbin/cpuset_release_agent``,  supplying the pathname (relative to the mount point
of the cpuset file system) of the  abandoned  cpuset.   This  enables  automatic
removal of abandoned cpusets.

The default value of ``notify_on_release`` in the root cpuset at system boot is
disabled (0). The default value of other cpusets at creation is the current
value of their parents ``notify_on_release`` setting.

The  command  ``/sbin/cpuset_release_agent``  is invoked, with the name (``/dev/cpuset``
relative path) of that cpuset in ``argv[1]``.  This supports  automatic  cleanup  of
abandoned cpusets.

The usual contents of the command ``/sbin/cpuset_release_agent`` is simply the shell
script::

              #!/bin/sh
              rmdir /dev/cpuset/$1

By default ``notify_on_release`` is off (0).  Newly created  cpusets  inherit  their
``notify_on_release`` setting from their parent cpuset.

As  with  other  flag values below, this flag can be changed by writing an ASCII
number 0 or 1 (with optional trailing newline) into the file, to  clear  or  set
the flag, respectively.
</t>
<t tx="PaulJackson.20060430161249.1">The `List Format`_ is used to represent CPU and memory node
bitmasks (sets of CPU and memory node numbers) in the
``/dev/cpuset`` file system.

It is a comma separated list  of  CPU  or  memory
node  numbers  and ranges of numbers, in ASCII decimal.

Examples of the `List Format`_:

 ::

    0-4,9           # bits 0, 1, 2, 3, 4, and 9 set
    0-3,7,12-15     # bits 0, 1, 2, 3, 7, 12, 13, 14, and 15 set


</t>
<t tx="PaulJackson.20060430161323">The  ``memory_pressure`` of a cpuset provides a simple per-cpuset metric of the rate
that the tasks in a cpuset are attempting to free up in use memory on the  nodes
of the cpuset to satisfy additional memory requests.

This  enables  batch schedulers  monitoring  jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.

This is useful both on tightly managed systems running a wide mix of submitted
jobs, which may choose to terminate or re-prioritize jobs that are trying to use
more memory than allowed on the nodes assigned them, and with tightly coupled,
long running, massively parallel scientific computing jobs that will
dramatically fail to meet required performance goals if they start to use more
memory than allowed to them.

This mechanism provides a very economical way for the batch scheduler to monitor a
cpuset for signs of memory pressure.  It's up to the batch scheduler or other user
code to decide what to do about it and take action.

If the ``memory_pressure_enabled`` flag in the top cpuset is not set (0), then
the kernel does not compute this filter, and the per-cpuset files
``memory_pressure`` always contain the value zero (0).

If the ``memory_pressure_enabled`` flag in the top cpuset is set (1), then the
kernel computes this filter for each cpuset in the system, and the
memory_pressure file for each cpuset reflects the recent rate of
such low memory page allocation attempts by tasks in said cpuset.

Reading the ``memory_pressure`` file of a cpuset is very efficient.
The expectation is that batch schedulers can poll these files and
detect jobs that are causing memory stress, so that action can be
taken to avoid impacting the rest of the system with a job that
is trying to aggressively exceed its allowed memory.

*Note well*: unless enabled by setting ``memory_pressure_enabled`` in the top cpuset,
``memory_pressure`` is not computed for any cpuset, and always reads a value of
zero.

Why a per-cpuset, running average:

      Because this meter is per-cpuset, rather than per-task or memory region, the system
      load imposed by a batch scheduler monitoring this metric is sharply
      reduced on large systems, because a scan of the system-wide tasklist can
      be avoided on each set of queries.

      Because this meter is a  running  average,  instead  of  an  accumulating
      counter, a batch scheduler can detect memory pressure with a single read,
      instead of having to read and accumulate results for a period of time.

      Because this meter is per-cpuset rather than per-task or  memory region,  the  batch
      scheduler  can  obtain  the key information, memory pressure in a cpuset,
      with a single read, rather than having to query  and  accumulate  results
      over all the (dynamically changing) set of tasks in the cpuset.

A per-cpuset simple digital filter is kept within the kernel, and updated by any
task attached to that cpuset, if it enters the synchronous (direct) page reclaim
code.

The per-cpuset ``memory_pressure`` file provides an integer number representing
the recent (half-life of 10 seconds) rate of direct page reclaims caused by the
tasks in the cpuset, in units of reclaims attempted per second, times 1000.

The kernel computes this value using a single-pole low-pass recursive (IIR)
digital filter coded with 32 bit integer arithmetic.  The value decays
at an exponential rate.

Given the simple 32 bit integer arithmetic used in the kernel to compute this value,
this meter works best for reporting page reclaim rates between one per millisecond (msec)
and one per 32 (approx) seconds.  At constant rates faster than one per msec it maxes
out at values just under 1,000,000.  At constant rates between one per msec, and
one per second it will stabilize to a value N*1000, where N is the rate of events
per second.  At constant rates between one per second and one per 32 seconds,
it will be choppy, moving up on the seconds that have an event, and then decaying
until the next event.  At rates slower than about one in 32 seconds, it decays all
the way back to zero between each event.

</t>
<t tx="PaulJackson.20060430161438">Normally, under the default setting (disabled) of ``memory_migrate``, once a
page is allocated (given a physical page of main memory)  that page stays on
whatever node it was allocated, as long as it remains allocated.  If the
cpuset memory placement policy ``mems`` subsequently changes, currently
allocated pages are not moved.  If pages are swapped out to disk and back,
then on return to main memory, they may be allocated on different nodes,
depending on the cpuset ``mems`` setting in affect at the time the page
is swapped back in.

When memory migration is enabled in a cpuset, if the ``mems`` setting of the
cpuset is changed, then any memory page in use by any task in the cpuset that is
on a memory node no longer allowed will be migrated to a memory node that is
allowed.

Also if a task is moved into a cpuset with ``memory_migrate`` enabled, any
memory pages it uses that were on memory nodes allowed in its previous cpuset,
but which are not allowed in its new cpuset, will be migrated to a memory node
allowed in the new cpuset.

The relative placement of a migrated page within the cpuset is preserved during
these migration operations if possible. For example, if the page was on the
second valid node of the prior cpuset, the page will be placed on the second
valid node of the new cpuset, if possible.

In order to maintain the cpuset relative position of pages, even pages on memory
nodes allowed in both the old and new cpusets may be migrated.  For example, if
``memory_migrate`` is enabled in a cpuset, and that cpusets ``mems`` file is
written, changing it from say memory nodes "4-7", to memory nodes "5-8", then
the following page migrations will be done, in order, for all pages in the
address space of tasks in that cpuset::

    First, migrate pages on node 7 to node 8
    Second, migrate pages on node 6 to node 7
    Third, migrate pages on node 5 to node 6
    Fourth, migrate pages on node 4 to node 5

In this example, pages on any memory node other than "4-7" will not be migrated.
The order in which nodes are handled in a migration is intentionally chosen so as
to avoid migrating memory `to` a node until any migrations `from` that node have
first been accomplished.</t>
<t tx="PaulJackson.20060430161438.1">There are two Boolean flag files per cpuset that control where the kernel
allocates pages for the file system buffers and related in kernel data
structures. They are called ``memory_spread_page`` and ``memory_spread_slab``.

If the per-cpuset Boolean flag file ``memory_spread_page`` is set, the
kernel will spread the file system buffers (page cache) evenly over all the
nodes that the faulting task is allowed to use, instead of preferring to put
those pages on the node where the task is running.

If the per-cpuset Boolean flag file ``memory_spread_slab`` is set, the
kernel will spread some file system related slab caches, such as for inodes and
directory entries, evenly over all the nodes that the faulting task is allowed to
use; instead of preferring to put those pages on the node where the task is
running.

The setting of these flags does not affect anonymous data segment or stack
segment pages of a task.

By default, both kinds of memory spreading are off, and memory pages are
allocated on the node local to where the task is running, except perhaps as
modified by the tasks NUMA mempolicy or cpuset configuration, as long as
sufficient free memory pages are available.

When new cpusets are created, they inherit the memory spread settings  of  their
parent.

Setting memory spreading causes allocations for the affected page or slab caches
to ignore the tasks NUMA mempolicy and be spread instead.    Tasks using mbind()
or  set_mempolicy()  calls to set NUMA mempolicies will not notice any change in
these calls as a result of their containing tasks memory  spread  settings.   If
memory spreading is turned off,  the currently specified NUMA mempolicy once
again applies to memory page allocations.

Both ``memory_spread_page`` and ``memory_spread_slab`` are Boolean flag files.
By default they contain "0", meaning that the feature is off for that cpuset. If
a "1" is written to that file,  that turns the named feature on.

This memory placement policy is also known (in other contexts) as round-robin or
interleave.

This  policy  can  provide  substantial improvements for jobs that need to place
thread local data on the corresponding node, but that need to access large  file
system  data  sets  that  need to be spread across the several nodes in the jobs
cpuset in order to fit.  Without this policy, especially  for  jobs  that  might
have  one thread reading in the data set, the memory allocation across the nodes
in the jobs cpuset can become very uneven.</t>
<t tx="PaulJackson.20060430164839">If  a  cpuset  is  marked ``cpu_exclusive`` or ``mem_exclusive``, no other cpuset, other
than a direct ancestor or descendant, may share any of the same CPUs  or  memory
nodes.

A cpuset that is ``cpu_exclusive`` has a scheduler (sched) domain associated
with it. The sched domain consists of all CPUs in the current cpuset that are
not part of any exclusive child cpusets. This ensures that the scheduler load
balancing code only balances against the CPUs that are in the sched domain as
defined above and not all of the CPUs in the system. This removes any overhead
due to load balancing code trying to pull tasks outside of the ``cpu_exclusive``
cpuset only to be prevented by the  ``Cpus_allowed`` mask of the task.

A cpuset that is ``mem_exclusive`` restricts kernel allocations for page, buffer, and
other data commonly shared by the kernel across multiple  users.   All  cpusets,
whether  ``mem_exclusive``  or  not,  restrict allocations of memory for user space.
This enables configuring a system so that several  independent  jobs  can  share
common  kernel  data,  such as file system pages, while isolating each jobs user
allocation in its own cpuset.  To  do  this,  construct  a  large  ``mem_exclusive``
cpuset  to hold all the jobs, and construct child, non-mem_exclusive cpusets for
each individual job.  Only a small amount of  typical  kernel  memory,  such  as
requests  from  interrupt  handlers,  is  allowed  to  be  taken  outside even a
``mem_exclusive`` cpuset.
</t>
<t tx="PaulJackson.20060430171303.1">The `Mask Format`_ is used to represent CPU and memory node bitmasks in the
**/proc/&lt;pid&gt;/status** file.

It is hexadecimal, using ASCII characters "0" - "9" and "a" - "f". This format
displays each 32-bit word in hex (zero filled), and for masks longer than one
word, uses a comma separator between words. Words are displayed in big-endian
order most significant first. And hex digits within a word are also in
big-endian order.

The number of 32-bit words displayed is the minimum number needed to display all
bits of the bitmask, based on the size of the bitmask.

Examples of the `Mask Format`_:

 ::

    00000001                        # just bit 0 set
    80000000,00000000,00000000      # just bit 95 set
    00000001,00000000,00000000      # just bit 64 set
    000000ff,00000000               # bits 32-39 set
    00000000,000E3862               # bits 1,5,6,11-13,17-19 set

A mask with bits 0, 1, 2, 4, 8, 16, 32 and 64 set displays as
"00000001,00000001,00010117".  The first "1" is for bit 64, the second
for bit 32, the third for bit 16, the fourth for bit 8, the fifth for
bit 4, and the "7" is for bits 2, 1 and 0.
</t>
<t tx="PaulJackson.20060430183511">Cpusets are named, nested sets of CPUs and memory nodes. Each cpuset is
represented by a directory in the cpuset virtual file system, normally mounted
at ``/dev/cpuset``.

New cpusets are created using the ``mkdir`` system call or shell command. The
properties of a cpuset, such as its flags, allowed CPUs and memory nodes, and
attached tasks, are queried and modified by reading or writing to the
appropriate file in that cpusets directory.

The state of each cpuset is represented by small text files in that cpusets
directory. These files may be read and written using traditional shell utilities
such as ``cat(1)`` and ``echo(1)``, or using ordinary file access routines from
programmatic languages, such as ``open(2)``, ``read(2)``, ``write(2)`` and
``close(2)`` from the 'C' library.

These per-cpuset files represent internal kernel state and do not have any
persistent image on disk. These files are automatically created when the cpuset
is created, as a result of the ``mkdir`` invocation. It is not allowed to add or
remove files from a cpuset directory.

Each of these per-cpuset files is listed and described below:

   tasks:
          List of the process ID's (PIDs) of the tasks in that cpuset. The list
          is formatted as a series of ASCII decimal numbers, each followed by a
          newline. A task may be added to a cpuset (removing it from the cpuset
          previously containing it) by writing its PID to that cpusets ``tasks``
          file (with or without a trailing newline.)

          Beware that only one PID may be written to the ``tasks`` file at a
          time. If a string is written that contains more than one PID, all but
          the first will be ignored.

   notify_on_release:
          Flag (0 or 1). If set (1), then that cpuset will receive special
          handling whenever its last using task and last child cpuset goes away.
          For more information, see the `Notify On Release` section, below.

   cpus:
          List  of  CPUs on which tasks in that cpuset are allowed to execute.  See
          `List Format`_ below for a description of the format of ``cpus``.

          The CPUs allowed to a cpuset may be changed by writing a new list to  its
          ``cpus``  file.   Note  however, such a change does not take affect until the
          PIDs of the tasks in the cpuset are rewritten to the cpusets ``tasks``  file.

   cpu_exclusive:
          Flag (0 or 1).  If set (1), then the cpuset has exclusive use of its CPUs
          (no sibling or cousin cpuset may overlap CPUs).  By default this  is  off
          (0).  Newly created cpusets also initially default this to off (0).

   mems:
          List of memory nodes on which tasks in that cpuset are allowed to
          allocate memory. See `List Format`_ below for a description of the format
          of ``mems``.

   mem_exclusive:
          Flag (0 or 1).  If set (1), then the cpuset has exclusive use of its memory
          nodes (no sibling or cousin may overlap).  By  default  this  is  off
          (0).  Newly created cpusets also initially default this to off (0).

   memory_migrate:
          Flag  (0  or  1).  If set (1), then memory migration is enabled.  For more information, see the `Memory Migration` section, below.

   memory_pressure:
          A measure of how much memory pressure the tasks in this cpuset are
          causing. For more information, see the `Memory Pressure` section, below. Always has value zero
          (0) unless ``memory_pressure_enabled`` is enabled in the top cpuset.
          This file is read-only.

   memory_pressure_enabled:
          Flag  (0  or  1).  This file is only present in the root cpuset, normally
          ``/dev/cpuset``.  If set (1), then ``memory_pressure`` calculations  are  enabled
          for all cpusets in the system.  For more information, see the `Memory Pressure` section, below.

   memory_spread_page:
          Flag  (0  or  1).   If  set  (1), then the kernel page cache (file system
          buffers) are uniformly spread across the cpuset.  For more information, see the `Memory Spread` section, below.

   memory_spread_slab:
          Flag  (0  or  1).   If  set (1), then the kernel slab caches for file i/o
          (directory and inode structures) are uniformly spread across the  cpuset.
          For more information, see the `Memory Spread` section, below.

In addition one new file type is added to the ``/proc`` file system:

  /proc/&lt;pid&gt;/cpuset:
      For each task (pid), list its cpuset path, relative to the
      root of the cpuset file system.  This file is read-only.

Finally, the two control fields actually used by the kernel scheduler
and memory allocator to constrain scheduling and allocation to the
allowed CPUs are exposed as two more fields in the ``status`` file of
each task:

  **/proc/&lt;pid&gt;/status**:

      ``Cpus_allowed``:
            bit vector of CPUs on which this task may be scheduled.

      ``Mems_allowed``:
            bit vector of memory nodes on which this task may obtain memory.

There are several reasons why a tasks ``Cpus_allowed`` and ``Mems_allowed`` values
may differ from the ``cpus`` and ``mems`` that are allowed in its current cpuset, as follows:

  A. A task might use ``sched_setaffinity``, ``mbind`` or ``set_mempolicy``
     to restrain its placement to less than its cpuset.
  B. Various temporary changes to ``Cpus_allowed`` are done by
     kernel internal code.
  C. Attaching a task to a cpuset doesn't change its ``Mems_allowed``
     until the next time that task needs kernel memory.
  D. Changing a cpusets ``cpus`` doesn't change the ``Cpus_allowed`` of
     the tasks attached to it until those tasks are reattached
     to that cpuset (to avoid a hook in the hotpath scheduler
     code in the kernel).
  E. If hotplug is used to remove all the CPUs, or all the memory nodes,
     in a cpuset, then the tasks attached to that cpuset will have their
     ``Cpus_allowed`` or ``Mems_allowed`` altered to the CPUs or memory
     nodes of the closest ancestor to that cpuset that is not empty.

Beware of items D and E, above. Due to item D, user space action is required to
update a tasks ``Cpus_allowed`` after changing its cpuset. Use the routine
cpuset_reattach_ to perform this update after a changing the ``cpus`` allowed to
a cpuset.

Due to item E, the confines of a cpuset can be **violated** after a hotplug
removal that empties a cpuset. To avoid having a cpuset without CPU or memory
resources, update your system's cpuset configuration to reflect the new hardware
configuration. The kernel prefers misplacing a task, over starving a task of
essential compute resources.

There is one other condition under which the confines of a cpuset may be
violated. A few kernel critical internal memory allocation requests, marked
``GFP_ATOMIC``, must be satisfied immediately. The kernel may drop some request
or malfunction if one of these allocations fail. If such a request cannot be
satisfied within the current tasks cpuset, then the kernel relaxes the cpuset,
and looks for memory anywhere it can find it. It's better to violate the cpuset
than stress the kernel.

New cpusets are created using ``mkdir`` (at the shell or in C).
Old ones are removed using ``rmdir``.  The above files are accessed
using ``read(2)`` and ``write(2)`` system calls, or shell commands such
as ``cat(1)`` and ``echo(1)``.

The CPUs and memory nodes in a given cpuset are always a subset
of its parent.  The root cpuset has all possible CPUs and memory
nodes in the system.  A cpuset may be exclusive (cpu or memory)
only if its parent is similarly exclusive.

Each task has a pointer to a cpuset.  Multiple tasks may reference
the same cpuset.  Requests by a task, using the ``sched_setaffinity(2)``
system call to include CPUs in its CPU affinity mask, and using the
``mbind(2)`` and ``set_mempolicy(2)`` system calls to include memory nodes
in its memory policy, are both filtered through that tasks cpuset,
filtering out any CPUs or memory nodes not in that cpuset.  The
scheduler will not schedule a task on a CPU that is not allowed in
its cpus_allowed vector, and the kernel page allocator will not
allocate a page on a node that is not allowed in the requesting tasks
mems_allowed vector.

If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct
ancestor or descendant, may share any of the same CPUs or memory nodes.

User level code may create and destroy cpusets by name in the cpuset
virtual file system, manage the attributes and permissions of these
cpusets and which CPUs and memory nodes are assigned to each cpuset,
specify and query to which cpuset a task is assigned, and list the
task pids assigned to a cpuset.

Cpuset names are limited in length by the kernel's VFS implementation.
No single component of a cpuset name may exceed 255 characters, and
the full pathname of a cpuset including the ``/dev/cpuset`` mount
point may not exceed 4095 characters in length.</t>
<t tx="PaulJackson.20060504163259">``int cpuset_open_memory_pressure(const char *cpusetpath);``
------------------------------------------------------------

    Open a file descriptor from which to read the ``memory_pressure`` of
    the cpuset ``cpusetpath``.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character,
    then this a path relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    By default, computation by the kernel of ``memory_pressure`` is disabled.
    Set the ``memory_pressure_enabled`` flag in the top cpuset to enable it.

    On error, return ``-1`` and set ``errno``.

    For more information, see the `Memory Pressure` section, above.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20060504163259.1">``int cpuset_read_memory_pressure(int fd);``
--------------------------------------------

    Read and return the current memory_pressure of the cpuset
    for which file descriptor ``fd`` was opened using ``cpuset_open_memory_pressure``.

    Uses the system call ``pread(2)``.  On success, returns a non-negative number,
    as described in section `Memory Pressure`_.  On failure, returns ``-1`` and sets
    ``errno``.

    By default, computation by the kernel of ``memory_pressure`` is disabled.
    Set the ``memory_pressure_enabled`` flag in the top cpuset to enable it.

    For more information, see the `Memory Pressure` section, above.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.

</t>
<t tx="PaulJackson.20060504163312">``void cpuset_close_memory_pressure(int fd);``
--------------------------------------------------

    Close the file descriptor ``fd`` which was opened using ``cpuset_open_memory_pressure``.

    If ``fd`` is not a valid open file descriptor, then this call does nothing.
    No error is returned in any case.

    By default, computation by the kernel of ``memory_pressure`` is disabled.
    Set the ``memory_pressure_enabled`` flag in the top cpuset to enable it.

    For more information, see the `Memory Pressure` section, above.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.

</t>
<t tx="PaulJackson.20060504163450">``int cpuset_collides_exclusive(const char *cpusetpath, const struct *cp);``
-----------------------------------------------------------------------------

    Return true (1) if cpuset ``cp`` would collide with any sibling of the
    cpuset at ``cpusetpath`` due to overlap of ``cpu_exclusive`` ``cpus`` or
    ``mem_exclusive`` ``mems``.  Return false (0) if no collision, or for
    any error.

    `cpuset_create`_ fails with ``errno`` == ``EINVAL`` if the requested
    cpuset would overlap with any sibling, where either one is ``cpu_exclusive``
    or ``mem_exclusive``. This is a common, and not obvious error.
    ``cpuset_collides_exclusive()`` checks for this particular
    case, so that code creating cpusets can better identify the situation,
    perhaps to issue a more informative error message.

    Can also be used to diagnose `cpuset_modify`_ failures.  This
    routine ignores any existing cpuset with the same path as the
    given ``cpusetpath``, and only looks for exclusive collisions with
    sibling cpusets of that path.

    In case of any error, returns (0) -- does not collide.  Presumably
    any actual attempt to create or modify a cpuset will encounter the
    same error, and report it usefully.

    This routine is not particularly efficient; most likely code creating or
    modifying a cpuset will want to try the operation first, and then if that
    fails with ``errno EINVAL``, perhaps call this routine to determine if an
    exclusive cpuset collision caused the error.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20060504163613">cpuset_get_placement(pid) - [optional Return current placement of task pid


    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.

    This function returns an opaque ``struct placement *`` pointer. The results
    of calling `cpuset_get_placement`_\ () twice at different points in a program
    can be compared by calling `cpuset_equal_placement`_\ () to determine if the
    specified task has had its cpuset CPU and memory placement modified between
    those two `cpuset_get_placement`_\ () calls.

    When finished with a ``struct placement *`` pointer, free it by calling
    `cpuset_free_placement`_\ ().

    </t>
<t tx="PaulJackson.20060504163613.1">cpuset_equal_placement(plc1, plc2) - [optional] True if two placements equal


    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.

    This function compares two ``struct placement *`` pointers, returned by two
    separate calls to `cpuset_get_placement`_\ (). This is done to determine if the
    specified task has had its cpuset CPU and memory placement modified between
    those two `cpuset_get_placement`_\ () calls.

    When finished with a ``struct placement *`` pointer, free it by calling
    `cpuset_free_placement`_\ ().

    Two ``struct placement *`` pointers will compare equal if they have the same
    CPU placement ``cpus``, the same memory placement ``mems``, and the same
    cpuset path.
    </t>
<t tx="PaulJackson.20060504163613.2">cpuset_free_placement(plc) - [optional] Free placement


    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.

    Use this routine to free a ``struct placement *`` pointer returned by a
    previous call to `cpuset_get_placement`_\ ().
</t>
<t tx="PaulJackson.20060516001414">The Linux 2.6 kernel supports additional processor and memory placement mechanisms.

* The ``sched_setaffinity(2)`` and ``sched_getaffinity(2)`` system calls set and get
  a process's CPU affinity mask, which determines the set of CPUs on which it is
  eligible to run. The ``taskset(1)`` command provides a command line utility
  for manipulating a process's CPU affinity mask using these system calls. Only
  CPUs within a tasks cpuset are allowed, but CPUs are numbered using system
  wide CPU numbers, not cpuset relative numbers.

* The ``set_mempolicy(2)``, and ``get_mempolicy(2)`` system calls set and get
  a tasks NUMA memory policy for a process and its children, which defines on
  which nodes memory is allocated for the process. The ``numactl(8)`` command
  provides a command line utility and the ``libnuma`` library provides a 'C'
  API for manipulating a process's NUMA memory policy using these system calls.
  Only memory nodes within a tasks cpuset are allowed, but nodes are numbered
  using system wide node numbers, not cpuset relative numbers.

* The ``mbind(2)`` system call sets the NUMA memory policy for the pages in a specific
  range of a tasks virtual address space.

Cpusets are designed to interact cleanly with these other mechanisms, to support
for example having a batch scheduler use cpusets to control the CPU and memory
placement of various jobs, while within each job, these other mechanisms are
used to manage placement in more detail. It is possible for a batch scheduler to
change a jobs cpuset placement while preserving the internal CPU affinity and
NUMA memory placement policy.

The CPU and memory node placement constraints imposed by cpusets always
constrain those of these other mechanisms. You can use these other mechanisms to
reduce further the set of CPUs or memory nodes allowed to a task by cpusets, but
you can not use these other mechanisms to escape a tasks cpuset confinement.

Calls to ``sched_setaffinity(2)`` to modify a tasks CPU affinity automatically
mask off CPUs that are not allowed by the affected tasks cpuset. If that results
in all the requested CPUs being masked off, then the call fails with errno set
to ``EINVAL``. If some of the requested CPUs are allowed by the tasks cpuset,
then the call proceeds as if only the allowed CPUs were requested, silently
ignoring the other, unallowed CPUs. If a task is moved to a different cpuset, or
if the ``cpus`` of a cpuset are changed, then the CPU affinity of the affected
task or tasks is lost. If a batch scheduler needs to preserve the CPU affinity of
the tasks in a job being moved, it should use the ``sched_getaffinity(2)`` and
``sched_setaffinity(2)`` calls to save and restore each affected tasks CPU
affinity across the move, relative to the cpuset. The **cpu_set_t** mask data
type supported by the 'C' library for use with the CPU affinity calls is
different then the **libbitmask** bitmasks used by **libcpuset**, so some coding
is required to convert between the two, in order to calculate and preserve
cpuset relative CPU affinity.

Similar to CPU affinity, calls to modify a tasks NUMA memory policy silently
mask off requested memory nodes outside the tasks allowed cpuset, and will fail
if that results in requesting an empty set of memory nodes. Unlike CPU affinity,
the NUMA memory policy system calls do not support one task querying or
modifying another tasks policy. So the kernel automatically handles preserving
cpuset relative NUMA memory policy when either a task is attached to a different
cpuset, or a cpusets ``mems`` setting is changed. If the old and new ``mems``
sets have the same size, then the cpuset relative offset of affected NUMA memory
policies is preserved. If the new ``mems`` is smaller, then the old ``mems``
relative offsets are folded onto the new ``mems``, modulo the size of the new
``mems``. If the new ``mems`` is larger, then just the first ``N`` nodes are used,
where ``N`` is the size of the old ``mems``.
</t>
<t tx="PaulJackson.20060516113818">If a job intends to use the `Other Placement Mechanisms`_ described above, then
that job cannot be *guaranteed* safe operation under the control of a batch
scheduler if that job might be migrated to different CPUs or memory nodes. This is
because these `Other Placement Mechanisms`_ use system wide numbering of CPUs
and memory nodes, not cpuset relative numbering, and the job might be migrated
without its knowledge while it is trying to adjust its placement.

That is, between the point where such an application computes the CPU or memory
node on which it wants to place a thread, and the point where it issues the
``sched_setaffinity(2)``, ``mbind(2)`` or ``set_mempolicy(2)`` call to direct
such a placement, the thread might be migrated to a different cpuset, or its
cpuset changed to different CPUs or memory nodes, invalidating the CPU or memory
node number it just computed.

This potential race condition is not a significant problem for applications that
only use these `Other Placement Mechanisms`_ early in the job run for initial
placement setup, if the job is only migrated by a batch scheduler after it has been
running for a while.

This **libcpuset** library provides the following mechanisms to support cpuset
relative thread placement that is robust even if the job is being concurrently
migrated such as by a batch scheduler.

If a job needs to pin a thread to a single CPU, then it can use the
convenient cpuset_pin_ function. This is the most common case.

If a job needs to implement some other variation of placement, such as to
specific memory nodes, or to more than one CPU, then it can use the following
functions to safely guard such code from placement changes caused by job
migration:

  * cpuset_get_placement_
  * cpuset_equal_placement_
  * cpuset_free_placement_

The **libcpuset** functions cpuset_c_rel_to_sys_cpu_ and variations provide a
convenient means to convert between system wide and cpuset relative CPU and
memory node numbering.

</t>
<t tx="PaulJackson.20060516121002">Jobs that make proper use of `Cpuset Aware Thread Pinning`_, rather than the
unsafe `Other Placement Mechanisms`_, can be safely migrated to a different
cpuset, or have their cpuset's CPUs or memory nodes safely changed, without
destroying the per-thread placement done within the job.

A batch scheduler can safely migrate jobs while preserving per-thread placement
of a job that is concurrently using `Cpuset Aware Thread Pinning`_.

A batch scheduler can accomplish this with the following steps:

1. Suspend the tasks in the job, perhaps by sending their process group a SIGSTOP.
2. Use the cpuset_init_pidlist_ and related pidlist functions to determine
   the list of tasks in the job.
3. Use ``sched_getaffinity(2)`` to query the CPU affinity of each task in the job.
4. Create a new cpuset, under a temporary name, with the new desired CPU and
   memory placement.
5. Invoke cpuset_migrate_all_ to move the jobs tasks from the old cpuset to the new.
6. Use cpuset_delete_ to delete the old cpuset.
7. Use ``rename(2)`` on the ``/dev/cpuset`` based path of the new temporary cpuset to rename
   that cpuset to the to the old cpuset name.
8. Convert the results of the previous ``sched_getaffinity(2)`` calls to the new
   cpuset placement, preserving cpuset relative offset by using the cpuset_c_rel_to_sys_cpu_
   and related functions.
9. Use ``sched_setaffinity(2)`` to reestablish the per-task CPU binding of each
   thread in the job.
10. Resume the tasks in the job, perhaps by sending their process group a SIGCONT.

The ``sched_getaffinity(2)`` and ``sched_setaffinity(2)`` 'C' library
calls are limited by 'C' library internals to systems with 1024 CPUs or fewer.
To write code that will work on larger systems, one should use the
``syscall(2)`` indirect system call wrapper to directly invoke the underlying
system call, bypassing the 'C' library API for these calls. Perhaps in the
future the **libcpuset** library will provide functions that make it easier for
a batch scheduler to obtain, migrate, and set a tasks CPU affinity.

The suspend and resume are required in order to keep tasks in the job from
changing their per thread CPU placement between step 3 and step 6. The kernel
automatically migrates the per-thread memory node placement during step 4, which
it has to as there is no way for one task to modify the NUMA memory placement
policy of another task. But the kernel does not automatically migrate the
per-thread CPU placement, as that can be handled by a user level process such
as a batch scheduler doing the migration, as above.

Migrating a job from a larger cpuset (more CPUs or nodes) to a smaller cpuset
will lose placement information, and subsequently moving that cpuset back to a
larger cpuset will not recover that information. Such migrations lose track of
information on a jobs placement. This loss of information of the jobs CPU
affinity can be avoided as described above, using ``sched_getaffinity(2)`` and
``sched_setaffinity(2)`` to save and restore the placement (affinity) across
such a pair of moves. This loss of information of the jobs NUMA memory placement
cannot be avoided because one task (the one doing the migration) cannot save nor
restore the NUMA memory placement policy of another. So if a batch scheduler
wants to migrate jobs without causing them to lose their ``mbind(2)`` or
``set_mempolicy(2)`` placement, it should only migrate to cpusets with at least
as many memory nodes as the original cpuset.
</t>
<t tx="PaulJackson.20061113170659">``int cpuset_nuke(const char *cpusetpath, unsigned int seconds);``
------------------------------------------------------------------

    Remove a cpuset, including killing tasks in it, and
    removing any descendent cpusets and killing their tasks.

    Tasks can take a long time (minutes on some configurations)
    to exit.  Loop up to ``seconds`` seconds, trying to kill them.

    The following steps are taken to remove a cpuset:

        1. First, kill all the pids, looping until there are
           no more pids in this cpuset or below, or until the
           'seconds' timeout limit is exceeded.
        2. Then depth first recursively rmdir the cpuset directories.
        3. If by this point the original cpuset is gone, return success.

    If the timeout is exceeded, and tasks still exist, fail with
    errno == ETIME.

    This routine sleeps a variable amount of time.  After the first attempt to
    kill all the tasks in the cpuset or its descendents, it sleeps one
    second, the next time two seconds, increasing one second each loop
    up to a max of ten seconds.  If more loops past ten are required
    to kill all the tasks, it sleeps ten seconds each subsequent loop.
    In any case, before the last loop, it sleeps however many seconds
    remain of the original timeout ``seconds`` requested.  The total
    time of all sleeps will be no more than the requested ``seconds``.

    If the cpuset started out empty of any tasks, or if the passed in
    ``seconds`` was zero, then this routine will return quickly, having
    not slept at all.  Otherwise, this routine will at a minimum send
    a ``SIGKILL`` to all the tasks in this cpuset subtree, then sleep one
    second, before looking to see if any tasks remain.  If tasks remain
    in the cpuset subtree, and a longer ``seconds`` timeout was requested
    (more than one), it will continue to kill remaining tasks and sleep,
    in a loop, for as long as time and tasks remain.

    The signal sent for the kill is hardcoded to ``SIGKILL``.  If some
    other signal should be sent first, use a separate code loop,
    perhaps based on cpuset_init_pidlist and cpuset_get_pidlist, to
    scan the task pids in a cpuset.  If ``SIGKILL`` should -not- be sent,
    this ``cpuset_nuke()`` routine can still be called to recursively
    remove a cpuset subtree, by specifying a timeout of zero ``seconds``.

    On success, returns 0 with ``errno`` == 0.

    On failure, returns -1, setting ``errno``.

    ERRORS

     ``EACCES``  search permission denied on intervening directory

     ``ETIME``   timed out - tasks remain after 'seconds' timeout

     ``EMFILE``  too many open files

     ``ENODEV``  /dev/cpuset not mounted

     ``ENOENT``  component of cpuset path doesn't exist

     ``ENOMEM``  out of memory

     ``ENOSYS``  kernel doesn't support cpusets

     ``ENOTDIR`` component of cpuset path is not a directory

     ``EPERM``   lacked permission to kill a task

     ``EPERM``   lacked permission to read cpusets or files therein

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20061114031653">``void cpuset_fts_reverse(struct cpuset_fts_tree *cs_tree);``
-------------------------------------------------------------

    Reverse order of ``cs_entry``'s in the ``cpuset_fts_tree`` ``cs_tree``
    obtained from a `cpuset_fts_open`_\ () call.

    An open ``cpuset_fts_tree`` stores a list of ``cs_entry`` cpuset entries, in
    pre-order, meaning that a series of `cpuset_fts_read`_\ () calls will always
    return a parent cpuset before any of its child cpusets.  Following a
    `cpuset_fts_reverse`_\ () call, the order of cpuset entries is reversed, putting
    it in post-order, so that a series of `cpuset_fts_read`_\ () calls will always
    return any children cpusets before their parent cpuset.  A second
    `cpuset_fts_reverse`_\ () call would put the list back in pre-order again.

    To avoid exposing confusing inner details of the implementation across the
    API, a `cpuset_fts_rewind`_\ () call is always automatically performed on a
    ``cpuset_fts_tree`` whenever `cpuset_fts_reverse`_\ () is called on it.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
<t tx="PaulJackson.20061114034532">``void cpuset_fts_rewind(struct cpuset_fts_tree *cs_tree);``
-------------------------------------------------------------

    Rewind a cpuset tree ``cs_tree`` obtained from a `cpuset_fts_open`_\ ()
    call, so that subsequent `cpuset_fts_read`_\ () calls start from the
    beginning again.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164635">``struct cpuset_fts_tree *cpuset_fts_open(const char *cpusetpath);``
--------------------------------------------------------------------

    Open a cpuset hierarchy. Returns a pointer to a ``cpuset_fts_tree``
    structure, which can be used to traverse all cpusets below the specified
    cpuset ``cpusetpath``.

    If the parameter ``cpusetpath`` starts with a slash (``/``) character, then
    this path is relative to ``/dev/cpuset``, otherwise it is relative to the
    current tasks cpuset.

    The `cpuset_fts_open`_ routine is implemented internally using the
    ``fts(3)`` library routines for traversing a file hierarchy. The entire
    cpuset subtree below ``cpusetpath`` is traversed as part of the
    `cpuset_fts_open`_\ () call, and all cpuset state and directory stat
    information is captured at that time. The other ``cpuset_fts_*`` routines
    just access this captured state. Any changes to the traversed cpusets made
    after the return of the `cpuset_fts_open`_\ () call will not be visible via
    the returned ``cpuset_fts_tree`` structure.

    Internally, the ``fts(3)`` options ``FTS_NOCHDIR`` and ``FTS_XDEV`` are
    used, to avoid changing the invoking tasks current directory, and to avoid
    descending into any other file systems mounted below ``/dev/cpuset``. The
    order in which cpusets will be returned by the `cpuset_fts_read`_ routine
    corresponds to the fts ``pre-order`` (FTS_D) visitation order. The internal
    fts scan by `cpuset_fts_open`_ ignores the ``post-order`` (FTS_DP)
    results.

    Because the `cpuset_fts_open`_\ () call collects all the information at once
    from an entire cpuset subtree, a simple error return would not provide
    sufficient information as to what failed, and on what cpuset in the subtree.
    So, except for ``malloc(3)`` failures, errors are captured in the list of
    entries.

    See `cpuset_fts_get_info`_ for details of the ``info`` field.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164646">``const struct cpuset_fts_entry *cpuset_fts_read(struct cpuset_fts_tree *cs_tree);``
------------------------------------------------------------------------------------

    Returns next ``cs_entry`` in ``cpuset_fts_tree`` ``cs_tree`` obtained from an
    `cpuset_fts_open`_\ () call. One ``cs_entry`` is returned for each cpuset
    directory that was found in the subtree scanned by the `cpuset_fts_open`_\ ()
    call.  Use the ``info`` field obtained from a `cpuset_fts_get_info`_\ ()
    call to determine which fields of a particular ``cs_entry`` are valid,
    and which fields contain error information or are not valid.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164725">``const char *cpuset_fts_get_path(const struct cpuset_fts_entry *cs_entry);``
-----------------------------------------------------------------------------

    Return the cpuset path, relative to ``/dev/cpuset``, as nul-terminated
    string, of a ``cs_entry`` obtained from a `cpuset_fts_read`_\ () call.

    The results of this call are valid for all ``cs_entry``'s returned from
    `cpuset_fts_read`_\ () calls, regardless of the value returned by `cpuset_fts_get_info`_\ ()
    for that ``cs_entry``.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164734">``const struct stat *cpuset_fts_get_stat(const struct cpuset_fts_entry *cs_entry);``
------------------------------------------------------------------------------------

    Return pointer to ``stat(2)`` information about the cpuset directory of a
    ``cs_entry`` obtained from a `cpuset_fts_read`_\ () call.

    The results of this call are valid for all ``cs_entry``'s returned from
    `cpuset_fts_read`_\ () calls, regardless of the value returned by `cpuset_fts_get_info`_\ ()
    for that ``cs_entry``, **except** in the cases that:

      * the ``info`` field returned by `cpuset_fts_get_info`_ contains
        ``CPUSET_FTS_ERR_DNR``, in which case, a directory in the path to the
        cpuset could not be read and this call will return a ``NULL`` pointer, or

      * the ``info`` field returned by `cpuset_fts_get_info`_ contains
        ``CPUSET_FTS_ERR_STAT``, in which case a ``stat(2)`` failed on this cpuset
        directory and this call will return a pointer to a ``struct stat``
        containing all zeros.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164741">``int cpuset_fts_get_errno(const struct cpuset_fts_entry *cs_entry);``
----------------------------------------------------------------------


    Return the ``err`` field of a ``cs_entry`` obtained from a
    `cpuset_fts_read`_\ () call.

    If an entry (obtained from `cpuset_fts_read`_) has one of the
    ``CPUSET_FTS_ERR_*`` values in the ``info`` field (as described in
    `cpuset_fts_get_info`_), then this ``err`` field captures the failing
    ``errno`` value for that operation. If an entry has the value
    ``CPUSET_FTS_CPUSET`` in its ``info`` field, then this ``err`` field will
    have the value ``0``.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164747">``const struct cpuset *cpuset_fts_get_cpuset(const struct cpuset_fts_entry *cs_entry);``
----------------------------------------------------------------------------------------

    Return the ``struct cpuset`` pointer of a ``cs_entry`` obtained
    from a `cpuset_fts_read`_\ () call.  The ``struct cpuset`` so referenced
    describes the cpuset represented by one directory in the cpuset
    hierarchy, and can be used with various other calls in this library.

    The results of this call are only valid for a ``cs_entry`` if the
    `cpuset_fts_get_info`_\ () call returns ``CPUSET_FTS_CPUSET`` for the ``info`` field
    of a ``cs_entry``.   If the ``info`` field contained ``CPUSET_FTS_ERR_CPUSET``,
    then `cpuset_fts_get_cpuset`_ returns a pointer to a ``struct cpuset`` that
    is all zeros.  If the ``info`` field contains any other ``CPUSET_FTS_ERR_*`` value,
    then `cpuset_fts_get_cpuset`_ returns a NULL pointer.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164753">``int cpuset_fts_get_info(const struct cpuset_fts_entry *cs_entry);``
---------------------------------------------------------------------

    Return the ``info`` field of a ``cs_entry`` obtained from a
    `cpuset_fts_read`_\ () call.

    If this ``info`` field has one of the following ``CPUSET_FTS_ERR_*`` values,
    then it indicates which operation failed, the ``err`` field (returned by
    `cpuset_fts_get_errno`_) captures the failing ``errno`` value for that
    operation, the ``path`` field (returned by `cpuset_fts_get_path`_) indicates
    which cpuset failed, and some of the other entry fields may not be valid,
    depending on the value. If an entry has the value ``CPUSET_FTS_CPUSET`` for its
    ``info`` field, then the ``err`` field will have the value ``0``, and the
    other fields will be contain valid information about that cpuset.

    ``info`` field values:

        ``CPUSET_FTS_CPUSET = 0``:
            Valid cpuset
        ``CPUSET_FTS_ERR_DNR = 1``:
            Error - couldn't read directory
        ``CPUSET_FTS_ERR_STAT = 2``:
            Error - couldn't stat directory
        ``CPUSET_FTS_ERR_CPUSET = 3``:
            Error - `cpuset_query`_ failed

    The above ``info`` field values are defined using an anonymous *enum* in the
    ``cpuset.h`` header file. If it necessary to maintain source code
    compatibility with earlier versions of the ``cpuset.h`` header file lacking
    the above ``CPUSET_FTS_*`` values, one can conditionally check that the ``C``
    preprocessor symbol ``CPUSET_FTS_INFO_VALUES_DEFINED`` is not defined
    and provide alternative coding for that case.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061115164906">``void cpuset_fts_close(struct cpuset_fts_tree *cs_tree);``
-----------------------------------------------------------

    Close a ``cs_tree`` obtained from a `cpuset_fts_open`_\ () call, freeing
    any internally allocated memory for that ``cs_tree``.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.</t>
<t tx="PaulJackson.20061116000754">``int cpuset_move_cpuset_tasks(const char *fromrelpath, const char *torelpath);``
-----------------------------------------------------------------------------------------------------

    Move all tasks in cpuset ``fromrelpath`` to cpuset ``torelpath``. This may
    race with tasks being added to or forking into ``fromrelpath``. Loop
    repeatedly, reading the tasks file of cpuset ``fromrelpath`` and writing any
    task pid's found there to the tasks file of cpuset ``torelpath``, up to ten
    attempts, or until the ``tasks`` file of cpuset ``fromrelpath`` is empty, or
    until the cpuset ``fromrelpath`` is no longer present.

    Returns 0 with errno == 0 if able to empty the tasks file of cpuset
    ``fromrelpath``. Of course it is still possible that some independent task
    could add another task to cpuset ``fromrelpath`` at the same time that such
    a successful result is being returned, so there can be no guarantee that a
    successful return means that ``fromrelpath`` is still empty of tasks.

    The cpuset ``fromrelpath`` might disappear during this operation, perhaps
    because it has ``notify_on_release`` set and was automatically removed as
    soon as its last task was detached from it. Consider a missing
    ``fromrelpath`` to be a successful move.

    If called with ``fromrelpath`` and ``torelpath`` pathnames that evaluate to
    the same cpuset, then treat that as if `cpuset_reattach`_\ () was called,
    rebinding each task in this cpuset one time, and return success or failure
    depending on the return of that `cpuset_reattach`_\ () call.

    On failure, returns -1, setting ``errno``.

    ERRORS

     ``EACCES``     search permission denied on intervening directory

     ``ENOTEMPTY``  tasks remain after multiple attempts to move them

     ``EMFILE``     too many open files

     ``ENODEV``     /dev/cpuset not mounted

     ``ENOENT``     component of cpuset path doesn't exist

     ``ENOMEM``     out of memory

     ``ENOSYS``     kernel doesn't support cpusets

     ``ENOTDIR``    component of cpuset path is not a directory

     ``EPERM``      lacked permission to read cpusets or files therein

     ``EACCES`      lacked permission to write a cpuset ``tasks`` file

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.


    </t>
<t tx="PaulJackson.20061116185722">``int cpuset_version();``
-------------------------

    Version (simple integer) of the cpuset library (``libcpuset``). The version
    number returned by `cpuset_version`_\ () is incremented anytime that any
    changes or additions are made to its API or behaviour.  Other mechanims are
    provided to maintain full upward compatibility with this libraries API.  This
    `cpuset_version`_\ () call is intended to provide a fallback mechanism in case
    an application needs to distinguish between two previous versions of this
    library.

    This is an **[optional]** function.   Use `cpuset_function`_ to invoke it.
</t>
</tnodes>
</leo_file>