File: job_container.conf.5

package info (click to toggle)
slurm-wlm-contrib 24.11.5-4
  • links: PTS, VCS
  • area: contrib
  • in suites: forky, sid
  • size: 50,600 kB
  • sloc: ansic: 529,598; exp: 64,795; python: 17,051; sh: 9,411; javascript: 6,528; makefile: 4,030; perl: 3,762; pascal: 131
file content (216 lines) | stat: -rw-r--r-- 8,041 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
.TH "job_container.conf" "5" "Slurm Configuration File" "April 2025" "Slurm Configuration File"

.SH "NAME"
job_container.conf \- Slurm configuration file for job_container/tmpfs plugin

.SH "DESCRIPTION"

\fBjob_container.conf\fP is an ASCII file which defines parameters used by
Slurm's job_container/tmpfs plugin. The plugin reads the
job_container.conf file to find out the configuration settings. Based on them it
constructs a private (or optionally shared) filesystem namespace for the job and
mounts a list of directories (defaults to /tmp and /dev/shm) inside it. This
gives the job a private view of these directories. These paths are mounted
inside the location specified by 'BasePath' in the \fBjob_container.conf\fR
file. When the job completes, the private namespace is unmounted and all
files therein are automatically removed.
To make use of this plugin, 'PrologFlags=Contain' must also be present in
your \fBslurm.conf\fR file, as shown:

.nf
JobContainerType=job_container/tmpfs
PrologFlags=Contain
.fi

The file will always be located in the same directory as the \fBslurm.conf\fR.

.LP
If using the \fBjob_container.conf\fR file to define a namespace available to
nodes the first parameter on the line should be \fBNodeName\fR. If configuring a
namespace without specifying nodes, the first parameter on the line
should be \fBBasePath\fR.

.LP
Parameter names are case insensitive.
Any text following a "#" in the configuration file is treated
as a comment through the end of that line.
Changes to the configuration file take effect upon restart of Slurm daemons.

.LP
The following job_container.conf parameters are defined to control the behavior
of the job_container/tmpfs plugin.

.TP
\fBAutoBasePath\fR
This determines if plugin should create the BasePath directory or not. Set it
to 'true' if directory is not pre\-created before slurm startup. If set to true,
the directory is created with permission 0755. Directory is not deleted during
slurm shutdown. If set to 'false' or not specified, plugin would expect
directory to exist. This option can be used on a global or per\-line basis.
This parameter is optional.
.IP

.TP
\fBBasePath\fR
Specify the \fIPATH\fR that the tmpfs plugin should use as a base to mount the
private directories. This path must be readable and writable by the plugin. The
plugin constructs a directory for each job inside this path, which is then used
for mounting. The \fBBasePath\fR gets mounted as 'private' during slurmd start
and remains mounted until shutdown. The first "%h" within the name is replaced
with the hostname on which the \fBslurmd\fR is running. The first "%n" within
the name is replaced with the Slurm node name on which the \fBslurmd\fR is
running. Set \fIPATH\fR to 'none' to disable the tmpfs plugin on node subsets
when there is a global setting.

\fBNOTE\fR: The \fBBasePath\fR must be unique to each node. If BasePath is on a
shared filesystem, you can use "%h" or "%n" to create node-unique directories.

\fBNOTE\fR: The \fBBasePath\fR parameter cannot be set to any of
the paths specified by \fBDirs\fR. Using these directories will cause conflicts
when trying to mount and unmount the private directories for the job.
.IP

.TP
\fBCloneNSScript\fR
Specify fully qualified pathname of an optional initialization script. This
script is run fter the namespace construction of a job. This script will be
provided the SLURM_NS environment variable to allow the script to join the
newly created namespace and do further setup work. This parameter is optional.
.IP

.TP
\fBCloneNSScript_Wait\fR
The number of seconds to wait for the CloneNSScript to complete before
considering the script failed. The default value is 10 seconds.
.IP

.TP
\fBCloneNSEpilog\fR
Specify fully qualified pathname of an optional epilog script. This script runs
just before the namespace is torn down. This script will be provided the
SLURM_NS environment variable to allow the script to join the soon to be
removed namespace and do any cleanup work. This parameter is optional.
.IP

.TP
\fBCloneNSEpilog_Wait\fR
The number of seconds to wait for the CloneNSEpilog to complete before
considering the script failed. The default value is 10 seconds.
.IP

.TP
\fBDirs\fR
A list of mount points separated with commas to create private mounts for.
This parameter is optional and if not specified it defaults to "/tmp,/dev/shm".

\fBNOTE\fR: /dev/shm has special handling, and instead of a bind mount is always
a fresh tmpfs filesystem.
.IP

.TP
\fBEntireStepInNS\fR
Specifying EntireStepInNS=true will pivot all slurmstepd processes (excluding
the external step, which is tasked with creating the namespace) into the
constructed namespace. This will cause issues if certain paths such as
SlurmdSpoolDir are inaccessible. This parameter is optional.
.IP

.TP
\fBInitScript\fR
Specify fully qualified pathname of an optional initialization script. This
script is run before the namespace construction of a job. It can be used to
make the job join additional namespaces prior to the construction of /tmp
namespace or it can be used for any site\-specific setup. This parameter is
optional.
.IP

.TP
\fBNodeName\fR
A NodeName specification can be used to permit one job_container.conf
file to be used for all compute nodes in a cluster by specifying the node(s)
that each line should apply to.
The NodeName specification can use a Slurm hostlist specification as shown in
the example below. This parameter is optional.
.IP

.TP
\fBShared\fR
Specifying Shared=true will propagate new mounts between the job specific
filesystem namespace and the root filesystem namespace, enable using autofs on
the node. This parameter is optional.
.IP

.SH "NOTES"
.LP
If any parameters in job_container.conf are changed while slurm is running, then
slurmd on the respective nodes will need to be
restarted for changes to take effect (scontrol reconfigure is not sufficient).
Additionally this can be disruptive to
jobs already running on the node. So care must be taken to make sure no jobs
are running if any changes to job_container.conf are deployed.

Restarting slurmd is safe and non\-disruptive to running jobs, as long as
job_container.conf is not changed between restarts in which case above point
applies.

.SH "EXAMPLE"
.TP
\fB/etc/slurm/slurm.conf\fR:
These are the entries required in \fBslurm.conf\fR to activate the
job_container/tmpfs plugin.
.IP
.nf
JobContainerType=job_container/tmpfs
PrologFlags=Contain
.fi

.TP
\fB/etc/slurm/job_container.conf\fR:
The first sample file will define 1 basepath for all nodes and it will be
automatically created.
.nf
AutoBasePath=true
BasePath=/var/nvme/storage
.fi

The second sample file will define 2 basepaths.
The first will only be on largemem[1\-2] and it will be automatically created.
The second will only be on gpu[1\-10], will be expected to exist and will run
an initscript before each job.
.nf
NodeName=largemem[1\-2] AutoBasePath=true BasePath=/var/nvme/storage_a
NodeName=gpu[1\-10] BasePath=/var/nvme/storage_b InitScript=/etc/slurm/init.sh
.fi

The third sample file will Define 1 basepath that will be on all nodes,
automatically created, with /tmp and /var/tmp as private mounts.
.nf
AutoBasePath=true
BasePath=/var/nvme/storage Dirs=/tmp,/var/tmp
.fi
.IP

.SH "COPYING"
Copyright (C) 2021 Regents of the University of California
Produced at Lawrence Berkeley National Laboratory
.br
Copyright (C) 2021\-2022 SchedMD LLC.

.LP
This file is part of Slurm, a resource management program.
For details, see <https://slurm.schedmd.com/>.
.LP
Slurm is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
.LP
Slurm is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.

.SH "SEE ALSO"
.LP

\fBslurm.conf\fR(5)