File: slurm.rst

package info (click to toggle)
openmpi 5.0.8-9
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 201,680 kB
  • sloc: ansic: 613,078; makefile: 42,350; sh: 11,194; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,179; python: 1,859; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (122 lines) | stat: -rw-r--r-- 3,716 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
Launching with Slurm
====================

Open MPI supports two modes of launching parallel MPI jobs under
Slurm:

#. Using Open MPI's full-features ``mpirun`` launcher.
#. Using Slurm's "direct launch" capability.

Unless there is a strong reason to use ``srun`` for direct launch, the
Open MPI team recommends using ``mpirun`` for launching under Slurm jobs.

.. note:: In versions of Open MPI prior to 5.0.x, using ``srun`` for
   direct launch could be faster than using ``mpirun``.  **This is no
   longer true.**

Using ``mpirun``
----------------

When ``mpirun`` is launched in a Slurm job, ``mpirun`` will
automatically utilize the Slurm infrastructure for launching and
controlling the individual MPI processes.
Hence, it is unnecessary to specify the ``--hostfile``,
``--host``, or ``-n`` options to ``mpirun``.

.. note:: Using ``mpirun`` is the recommended method for launching Open
   MPI jobs in Slurm jobs.

   ``mpirun``'s Slurm support should always be available, regardless
   of how Open MPI or Slurm was installed.

For example:

.. code-block:: sh

   # Allocate a Slurm job with 4 slots
   shell$ salloc -n 4
   salloc: Granted job allocation 1234

   # Now run an Open MPI job on all the slots allocated by Slurm
   shell$ mpirun mpi-hello-world

This will run the 4 MPI processes on the node(s) that were allocated
by Slurm.

Or, if submitting a script:

.. code-block:: sh

   shell$ cat my_script.sh
   #!/bin/sh
   mpirun mpi-hello-world
   shell$ sbatch -n 4 my_script.sh
   srun: jobid 1235 submitted
   shell$

Similar to the ``salloc`` case, no command line options specifying
number of MPI processes were necessary, since Open MPI will obtain
that information directly from Slurm at run time.

Using Slurm's "direct launch" functionality
-------------------------------------------

Assuming that Slurm was configured with its PMIx plugin, you can use
``srun`` to "direct launch" Open MPI applications without the use of
Open MPI's ``mpirun`` command.

First, you must ensure that Slurm was built and installed with PMIx
support.  This can determined as shown below:

.. code-block:: sh

   shell$ srun --mpi=list
   MPI plugin types are...
	none
	pmi2
	pmix
   specific pmix plugin versions available: pmix_v4

The output from ``srun`` may vary somewhat depending on the version of Slurm installed.
If PMIx is not present in the output, then you will not be able to use srun
to launch Open MPI applications.

.. note:: PMI-2 is not supported in Open MPI 5.0.0 and later releases.

Provided the Slurm installation includes the PMIx plugin, Open MPI applications 
can then be launched directly via the ``srun`` command.  For example:

.. code-block:: sh

   shell$ srun -N 4 --mpi=pmix mpi-hello-world

Or you can use ``sbatch`` with a script:

.. code-block:: sh

   shell$ cat my_script.sh
   #!/bin/sh
   srun --mpi=pmix mpi-hello-world
   shell$ sbatch -N 4 my_script.sh
   srun: jobid 1235 submitted
   shell$

Similar using ``mpirun`` inside of an ``sbatch`` batch script, no
``srun`` command line options specifying number of processes were
necessary, because ``sbatch`` set all the relevant Slurm-level
parameters about number of processes, cores, partition, etc.

Slurm 20.11
-----------

There were some changes in Slurm behavior that were introduced in
Slurm 20.11.0 and subsequently reverted out in Slurm 20.11.3.

SchedMD (the makers of Slurm) strongly suggest that all Open MPI users
avoid using Slurm versions 20.11.0 through 20.11.2.

Indeed, you will likely run into problems using just about any version
of Open MPI these problematic Slurm releases.

.. important:: Please either downgrade to an older version or upgrade
               to a newer version of Slurm.