File: detail-placement-rankfiles.rst

package info (click to toggle)
openmpi 5.0.7-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 202,312 kB
  • sloc: ansic: 612,441; makefile: 42,495; sh: 11,230; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,154; python: 1,856; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (85 lines) | stat: -rw-r--r-- 2,493 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
.. -*- rst -*-

   Copyright (c) 2022-2023 Nanook Consulting.  All rights reserved.
   Copyright (c) 2023      Jeffrey M. Squyres.  All rights reserved.

   $COPYRIGHT$

   Additional copyrights may follow

   $HEADER$

.. The following line is included so that Sphinx won't complain
   about this file not being directly included in some toctree

Rankfiles
=========

Another way to specify arbitrary mappings is with a rankfile, which
gives you detailed control over process binding as well.

Rankfiles are text files that specify detailed information about how
individual processes should be mapped to nodes, and to which
processor(s) they should be bound. Each line of a rankfile specifies
the location of one process. The general form of each line in the
rankfile is:

.. code::

   rank <N>=<hostname> slot=<slot list>

For example:

.. code::

   $ cat myrankfile
   rank 0=aa slot=10-12
   rank 1=bb slot=0,1,4
   rank 2=cc slot=1-2
   $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out

Means that:

* Rank 0 runs on node aa, bound to logical cores 10-12.
* Rank 1 runs on node bb, bound to logical cores 0, 1, and 4.
* Rank 2 runs on node cc, bound to logical cores 1 and 2.

Similarly:

.. code::

   $ cat myrankfile
   rank 0=aa slot=1:0-2
   rank 1=bb slot=0:0,1,4
   rank 2=cc slot=1-2
   $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out

Means that:

* Rank 0 runs on node aa, bound to logical package 1, cores 10-12 (the
  0th through 2nd cores on that package).
* Rank 1 runs on node bb, bound to logical package 0, cores 0, 1,
  and 4.
* Rank 2 runs on node cc, bound to logical cores 1 and 2.

The hostnames listed above are "absolute," meaning that actual
resolvable hostnames are specified. However, hostnames can also be
specified as "relative," meaning that they are specified in relation
to an externally-specified list of hostnames (e.g., by ``prun``'s
``--host`` argument, a hostfile, or a job scheduler).

The "relative" specification is of the form "``+n<X>``", where ``X``
is an integer specifying the Xth hostname in the set of all available
hostnames, indexed from 0. For example:

.. code::

   $ cat myrankfile
   rank 0=+n0 slot=10-12
   rank 1=+n1 slot=0,1,4
   rank 2=+n2 slot=1-2
   $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out

All package/core slot locations are be specified as *logical*
indexes. You can use tools such as HWLOC's ``lstopo`` to find the
logical indexes of packages and cores.