File: detail-placement-diagnostics.rst

package info (click to toggle)
openmpi 5.0.7-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 202,312 kB
  • sloc: ansic: 612,441; makefile: 42,495; sh: 11,230; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,154; python: 1,856; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (91 lines) | stat: -rw-r--r-- 3,409 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
.. -*- rst -*-

   Copyright (c) 2022-2023 Nanook Consulting.  All rights reserved.
   Copyright (c) 2023      Jeffrey M. Squyres.  All rights reserved.

   $COPYRIGHT$

   Additional copyrights may follow

   $HEADER$

.. The following line is included so that Sphinx won't complain
   about this file not being directly included in some toctree

Diagnostics
===========

PRRTE provides various diagnostic reports that aid the user in
verifying and tuning the mapping/ranking/binding for a specific job.

The ``:REPORT`` qualifier to the ``--bind-to`` command line option can
be used to report process bindings.

As an example, consider a node with:

* 2 processor packages,
* 4 cores per package, and
* 8 hardware threads per core.

In each of the examples below the binding is reported in a human readable
format.

.. code::

   $ prun --np 4 --map-by core --bind-to core:REPORT ./a.out
   [node01:103137] MCW rank 0 bound to package[0][core:0]
   [node01:103137] MCW rank 1 bound to package[0][core:1]
   [node01:103137] MCW rank 2 bound to package[0][core:2]
   [node01:103137] MCW rank 3 bound to package[0][core:3]

In the example above, processes are bound to successive cores on the
first package.

.. code::

   $ prun --np 4 --map-by package --bind-to package:REPORT ./a.out
   [node01:103115] MCW rank 0 bound to package[0][core:0-9]
   [node01:103115] MCW rank 1 bound to package[1][core:10-19]
   [node01:103115] MCW rank 2 bound to package[0][core:0-9]
   [node01:103115] MCW rank 3 bound to package[1][core:10-19]

In the example above, processes are bound to all cores on successive
packages in a round-robin fashion.

.. code::

   $ prun --np 4 --map-by package:PE=2 --bind-to core:REPORT ./a.out
   [node01:103328] MCW rank 0 bound to package[0][core:0-1]
   [node01:103328] MCW rank 1 bound to package[1][core:10-11]
   [node01:103328] MCW rank 2 bound to package[0][core:2-3]
   [node01:103328] MCW rank 3 bound to package[1][core:12-13]

The example above shows us that 2 cores have been bound per process.
The ``:PE=2`` qualifier states that 2 CPUs underneath the package
(which would be cores in this case) are mapped to each process.

.. code::

   $ prun --np 4 --map-by core:PE=2:HWTCPUS --bind-to :REPORT  hostname
   [node01:103506] MCW rank 0 bound to package[0][hwt:0-1]
   [node01:103506] MCW rank 1 bound to package[0][hwt:8-9]
   [node01:103506] MCW rank 2 bound to package[0][hwt:16-17]
   [node01:103506] MCW rank 3 bound to package[0][hwt:24-25]

The example above shows us that 2 hardware threads have been bound per
process.  In this case ``prun`` is directing the DVM to map by
hardware threads since we used the ``:HWTCPUS`` qualifier. Without
that qualifier this command would return an error since by default the
DVM will not map to resources smaller than a core.  The ``:PE=2``
qualifier states that 2 processing elements underneath the core (which
would be hardware threads in this case) are mapped to each process.

.. code::

   $ prun --np 4 --bind-to none:REPORT  hostname
   [node01:107126] MCW rank 0 is not bound (or bound to all available processors)
   [node01:107126] MCW rank 1 is not bound (or bound to all available processors)
   [node01:107126] MCW rank 2 is not bound (or bound to all available processors)
   [node01:107126] MCW rank 3 is not bound (or bound to all available processors)

Binding is turned off in the above example, as reported.