File: pam_slurm_adopt.shtml

package info (click to toggle)
slurm-wlm-contrib 24.11.5-4
  • links: PTS, VCS
  • area: contrib
  • in suites: forky, sid
  • size: 50,600 kB
  • sloc: ansic: 529,598; exp: 64,795; python: 17,051; sh: 9,411; javascript: 6,528; makefile: 4,030; perl: 3,762; pascal: 131
file content (591 lines) | stat: -rw-r--r-- 19,992 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
<!--#include virtual="header.txt"-->

<h1>pam_slurm_adopt</h1>

<p>The purpose of this module is to prevent users from sshing into nodes that
they do not have a running job on, and to track the ssh connection and any
other spawned processes for accounting and to ensure complete job cleanup when
the job is completed. This module does this by determining the job which
originated the ssh connection. The user's connection is "adopted" into the
"external" step of the job. When access is denied, the user will receive a
relevant error message.</p>

<h2 id="contents">Contents<a class="slurm_link" href="#contents"></a></h2>
<ul>
<li><a href="#INSTALLATION">Installation</a></li>
<li><a href="#SLURM_CONFIG">Slurm Configuration</a></li>
<li><a href="#ssh_config">SSH Configuration</a></li>
<li><a href="#PAM_CONFIG">PAM Configuration</a></li>
<li><a href="#admin_access">Administrative Access Configuration</a></li>
<li><a href="#OPTIONS">pam_slurm_adopt Module Options</a></li>
<li><a href="#firewall">Firewalls, IP Addresses, etc.</a></li>
<li><a href="#selinux">SELinux</a></li>
<li><a href="#LIMITATIONS">Limitations</a></li>
</ul>

<h2 id="INSTALLATION">Installation
<a class="slurm_link" href="#INSTALLATION"></a>
</h2>

<h3 id="source">Source:<a class="slurm_link" href="#source"></a></h3>

<p>In your Slurm source directory, navigate to ./contribs/pam_slurm_adopt/
and run</p>

<pre>make &amp;&amp; make install</pre>

<p>as root. This will place pam_slurm_adopt.a, pam_slurm_adopt.la,
and pam_slurm_adopt.so in /lib/security/ (on Debian systems) or
/lib64/security/ (on RedHat/SuSE systems).</p>

<h3 id="rpm">RPM:<a class="slurm_link" href="#rpm"></a></h3>

<p>The included slurm.spec will build a slurm-pam_slurm RPM which will install
pam_slurm_adopt. Refer to the
<a href="https://slurm.schedmd.com/quickstart_admin.html">Quick Start
Administrator Guide</a> for instructions on managing an RPM-based install.</p>

<h3 id="deb">DEB:<a class="slurm_link" href="#deb"></a></h3>

<p>The included debian packaging scripts will build the
slurm-smd-libpam-slurm-adopt package which will install pam_slurm_adopt.
<a href="https://slurm.schedmd.com/quickstart_admin.html">Quick Start
Administrator Guide</a> for instructions on managing an DEB-based install.</p>

<h2 id="SLURM_CONFIG">Slurm Configuration
<a class="slurm_link" href="#SLURM_CONFIG"></a>
</h2>

<p><b>PrologFlags=contain</b> must be set in the slurm.conf. This sets up the
"extern" step into which ssh-launched processes will be adopted. You must also
enable the task/cgroup plugin in slurm.conf. See the
<a href="https://slurm.schedmd.com/cgroups.html">Slurm cgroups guide.</a>
<b>CAUTION</b> This option must be in place <i>before</i> using this module.
The module bases its checks on local steps that have already been launched. Jobs
launched without this option do not have an extern step, so pam_slurm_adopt will
not have access to those jobs.</p>

<p><b>LaunchParameters=ulimit_pam_adopt</b> will set RLIMIT_RSS in processes
adopted by the external step, similar to tasks running in regular steps.</p>

<p>The <b>UsePAM</b> option in slurm.conf is not related to pam_slurm_adopt.</p>

<h2 id="ssh_config">SSH Configuration
<a class="slurm_link" href="#ssh_config"></a>
</h2>

<p>Verify that <b>UsePAM</b> is set to <b>On</b> in /etc/ssh/sshd_config (it
should be on by default).</p>

<p>Ensure that only supported <b>AuthenticationMethods</b> are enabled in
sshd_config (on the compute nodes). At this time, only <b>publickey</b> and
<b>password</b> are supported. In particular <b>keyboard-interactive</b> is
explicitly unsupported and must be removed from <b>AuthenticationMethods</b>.
If this step is not observed process adoption will be broken
and SSH sessions will persist even after the job ends. See
<a href="#LIMITATIONS"> Limitations</a> for more information.</p>

<h2 id="PAM_CONFIG">PAM Configuration
<a class="slurm_link" href="#PAM_CONFIG"></a>
</h2>

<p>Add the following line to the appropriate file in /etc/pam.d, such as
system-auth or sshd (you may use either the "required" or "sufficient" PAM
control flag):</p>

<pre>
account    required      pam_slurm_adopt.so
</pre>

<p> The order of plugins is very important. pam_slurm_adopt.so should be the
last PAM module in the account stack. Included files such as common-account
should normally be included before pam_slurm_adopt.

You might have the following account stack in sshd:</p>

<pre>
account    required      pam_nologin.so
account    include       password-auth
...
-account    required      pam_slurm_adopt.so
</pre>

<p>Note the "-" before the account entry for pam_slurm_adopt. It allows
PAM to fail gracefully if the pam_slurm_adopt.so file is not found. If Slurm
is on a shared filesystem, such as NFS, then this is suggested to avoid being
locked out of a node while the shared filesystem is mounting or down.</p>

<p>pam_slurm_adopt must be used with the task/cgroup task plugin and the
proctrack/cgroup proctrack plugin.
The pam_systemd module will conflict with pam_slurm_adopt, so you need to
disable it in all files that are included in sshd or system-auth (e.g.
password-auth, common-session, etc.).</p>

<p>If you need the user management features from pam_systemd, such as
handling user runtime directory /run/user/$UID, you can have the prolog script
run 'loginctl enable-linger $SLURM_JOB_USER' and the epilog script disable
it again (after making sure there are no other jobs from this user on the node)
by running 'loginctl disable-linger $SLURM_JOB_USER'. You will also need to
export the XDG_* environment variables if your software requires them.
You can see an example of prolog and epilog scripts here:</p>

Prolog:
<pre>
loginctl enable-linger $SLURM_JOB_USER
exit 0
</pre>
TaskProlog:
<pre>
echo "export XDG_RUNTIME_DIR=/run/user/$SLURM_JOB_UID"
echo "export XDG_SESSION_ID=$(&lt;/proc/self/sessionid)"
echo "export XDG_SESSION_TYPE=tty"
echo "export XDG_SESSION_CLASS=user"
</pre>
Epilog:
<pre>
#Only disable linger if this is the last job running for this user.
O_P=0
for pid in $(scontrol listpids | awk -v jid=$SLURM_JOB_ID 'NR!=1 { if ($2 != jid && $1 != "-1"){print $1} }'); do
        ps --noheader -o euser p $pid | grep -q $SLURM_JOB_USER && O_P=1
done
if [ $O_P -eq 0 ]; then
        loginctl disable-linger $SLURM_JOB_USER
fi
exit 0
</pre>

<p>You must also make sure a different PAM
module isn't short-circuiting the account stack before it gets to
pam_slurm_adopt.so. From the example above, the following two lines have been
commented out in the included password-auth file:</p>

<pre>
#account    sufficient    pam_localuser.so
#-session   optional      pam_systemd.so
</pre>

<p>Note: This may involve editing a file that is auto-generated.
Do not run the config script that generates the file or your
changes will be erased.</p>

<h2 id="admin_access">Administrative Access Configuration
<a class="slurm_link" href="#admin_access"></a>
</h2>

<p>
pam_slurm_adopt will always allow the root user access, and will do so even
before checking for slurm configuration. If you wish to also allow other admins
to the system with their own user accounts, this can be accomplished by stacking
other modules along with pam_slurm_adopt.
<p>

<p>
Stacking pam_access with pam_slurm_adopt is one way to permit administrative
access, with two possible implementations with subtle differences in behavior.
Both will require editing the pam_access configuration file
(/etc/security/access.conf). In the following example, the access.conf file will
allow members of the group "wheel" to log in.
</p>

<pre>
+:(wheel):ALL
-:ALL:ALL
</pre>

<p>
Then you will need to stack the modules in the /etc/pam.d/sshd file.  The order
here matters, and each ordering has different implications. In the example
below, pam_slurm adopt is listed first as "sufficient", followed by pam_access.
In this configuration when the admin has a job running, their ssh session
will be adopted into the job. If they do not, access will be permitted by
pam_access, but note that pam_slurm_adopt will still emit the "access denied"
message.
</p>

<pre>
account    sufficient    pam_slurm_adopt.so
account    required      pam_access.so
</pre>

<p>
Flipping this order, with pam_access(sufficient) before
pam_slurm_adopt(required), members of the administrative group will bypass
pam_slurm_adopt entirely.
</p>

<pre>
account    sufficient    pam_access.so
account    required      pam_slurm_adopt.so
</pre>

<p>
The pam_listfile module is another module that can be stacked with
pam_slurm_adopt and achieve similar results.  In the following example, it will
allow all users in the specified file to log in, skipping the pam_slurm_adopt
module.  This can also be flipped similar to pam_access, with the same
implications.
</p>

<pre>
account    sufficient    pam_listfile.so item=user sense=allow onerr=fail file=/path/to/allowed_users_file
account    required      pam_slurm_adopt.so
</pre>

<p>
The pam_listfile module can also be configured to look for membership of a
group. In this example, instead of checking the user, the plugin checks that
the user belongs to one of the groups specified in the 'groupfile'.
If pam_systemd is needed for those users, you can link it to the
pam_listfile.so module in the session phase, as shown below.
If the pam_listfile module succeeds, the evaluation continues (success=ignore).
Otherwise, the next module (pam_systemd) is ignored (default=1 skips the next
module). The pam_systemd module will only be used for admin users in this
example.
</p>
<pre>
account    sufficient                    pam_listfile.so item=group sense=allow onerr=fail file=/path/to/allowed_users_file
-account   required                      pam_slurm_adopt.so

session    [default=1 success=ignore]    pam_listfile.so item=group sense=allow onerr=fail file=/etc/groupfile
-session   optional                      pam_systemd.so
</pre>

<p>
More information about the capabilities and configuration options for pam_access
and pam_listfile can be found in their respective man pages.
</p>

<h2 id="OPTIONS">pam_slurm_adopt Module Options
<a class="slurm_link" href="#OPTIONS"></a>
</h2>

<p>
This module is configurable. Add these options to the end of the pam_slurm_adopt
line in the appropriate file in /etc/pam.d/ (e.g., sshd or system-auth):
</p>

<pre>
account sufficient pam_slurm_adopt.so optionname=optionvalue
</pre>

<p>This module has the following options:</p>

<dl>

<dt>
  <b id="action_no_jobs">action_no_jobs</b>
  <a class="slurm_link" href="#action_no_jobs"></a>
</dt>
<dd>
The action to perform if the user has no jobs on the node. Configurable
values are:

<dl>
<dt></dt>
<dd>
<dl>
  <dt>
    <b id="action_no_jobs_ignore">ignore</b>
    <a class="slurm_link" href="#action_no_jobs_ignore"></a>
  </dt>
<dd>Do nothing. Fall through to the next pam module.</dd>
<dt>
  <b id="action_no_jobs_deny">deny</b> (default)
  <a class="slurm_link" href="#action_no_jobs_deny"></a>
</dt>
<dd>Deny the connection.</dd>
</dl>
</dd>
</dl>

</dd>

<dt>
  <b id="action_unknown">action_unknown</b>
  <a class="slurm_link" href="#action_unknown"></a>
</dt>
<dd>
The action to perform when the user has multiple jobs on the node <b>and</b>
the RPC does not locate the source job. If the RPC mechanism works properly in
your environment, this option will likely be relevant <b>only</b> when
connecting from a login node. Configurable values are:

<dl>
<dt></dt>
<dd>
<dl>
<dt>
  <b id="action_unknown_newest">newest</b> (default)
  <a class="slurm_link" href="#action_unknown_newest"></a>
</dt>
<dd>On systems with <i>cgroup/v1</i> pick the newest job on the node.
The "newest" job is chosen based on the mtime of the job's step_extern cgroup;
asking Slurm would require an RPC to the controller. Thus, the memory cgroup
must be in use so that the code can check mtimes of cgroup directories. The user
can ssh in but may be adopted into a job that exits earlier than the
job they intended to check on. The ssh connection will at least be
subject to appropriate limits and the user can be informed of better
ways to accomplish their objectives if this becomes a problem.
<b>NOTE</b>: If the module fails to retrieve the cgroup mtime, then the picked
job may not be the newest one.
On systems with <i>cgroup/v2</i> the newest is just the job with the greatest
id, and thus this does not ensure that it is really the newest job.
</dd>
<dt>
  <b id="action_unknown_allow">allow</b>
  <a class="slurm_link" href="#action_unknown_allow"></a>
</dt>
<dd>Let the connection through without adoption.</dd>
<dt>
  <b id="action_unknown_deny">deny</b>
  <a class="slurm_link" href="#action_unknown_deny"></a>
</dt>
<dd>Deny the connection.</dd>
</dl>
</dd>
</dl>

</dd>

<dt>
  <b id="action_adopt_failure">action_adopt_failure</b>
  <a class="slurm_link" href="#action_adopt_failure"></a>
</dt>
<dd>The action to perform if the process is unable to be adopted into any
job for whatever reason. If the process cannot be adopted into the job
identified by the callerid RPC, it will fall through to the action_unknown
code and try to adopt there. A failure at that point or if there is only
one job will result in this action being taken. Configurable values are:

<dl>
<dt></dt>
<dd>
<dl>
<dt>
  <b id="action_adopt_failure_allow">allow</b> (default)
  <a class="slurm_link" href="#action_adopt_failure_allow"></a>
</dt>
<dd>Let the connection through without adoption. <b>WARNING: This value is
insecure and is recommended for testing purposes only. We recommend using
"deny."</b></dd>
<dt>
  <b id="action_adopt_failure_deny">deny</b>
  <a class="slurm_link" href="#action_adopt_failure_deny"></a>
</dt>
<dd>Deny the connection.</dd>
</dl>
</dd>
</dl>

</dd>

<dt>
  <b id="action_generic_failure">action_generic_failure</b>
  <a class="slurm_link" href="#action_generic_failure"></a>
</dt>
<dd>The action to perform if there are certain failures such as the
inability to talk to the local <i>slurmd</i> or if the kernel doesn't offer
the correct facilities. Configurable values are:

<dl>
<dt></dt>
<dd>
<dl>
<dt>
  <b id="action_generic_failure_ignore">ignore</b> (default)
  <a class="slurm_link" href="#action_generic_failure_ignore"></a>
</dt>
<dd>Do nothing. Fall through to the next pam module. <b>WARNING: This value is
insecure and is recommended for testing purposes only. We recommend using
"deny."</b></dd>
<dt>
  <b id="action_generic_failure_allow">allow</b>
  <a class="slurm_link" href="#action_generic_failure_allow"></a>
</dt>
<dd>Let the connection through without adoption.</dd>
<dt>
  <b id="action_generic_failure_deny">deny</b>
  <a class="slurm_link" href="#action_generic_failure_deny"></a>
</dt>
<dd>Deny the connection.</dd>
</dl>
</dd>
</dl>

</dd>

<dt>
  <b id="disable_x11">disable_x11</b>
  <a class="slurm_link" href="#disable_x11"></a>
</dt>
<dd>Turn off Slurm built-in X11 forwarding support. Configurable values are:

<dl>
<dt></dt>
<dd>
<dl>
<dt>
  <b id="disable_x11_0">0</b> (default)
  <a class="slurm_link" href="#disable_x11_0"></a>
</dt>
<dd>If the job the connection is adopted into has Slurm's X11 forwarding
enabled, the DISPLAY variable will be overwritten with the X11 tunnel
endpoint details.</dd>
<dt>
  <b id="disable_x11_1">1</b>
  <a class="slurm_link" href="#disable_x11_1"></a>
</dt>
<dd>Do not check for Slurm's X11 forwarding support, and do not alter the
DISPLAY variable.</dd>
</dl>
</dd>
</dl>

</dd>

<dt>
  <b id="join_container">join_container</b>
  <a class="slurm_link" href="#join_container"></a>
</dt>
<dd>Control the interaction with the job_container/tmpfs plugin.
Configurable values are:
<dl>
<dt></dt>
<dd>
<dl>
<dt>
  <b id="join_container_true">true</b> (default)
  <a class="slurm_link" href="#join_container_true"></a>
</dt>
<dd>Attempt to join a container created by the job_container/tmpfs plugin.</dd>
<dt>
  <b id="join_container_false">false</b>
  <a class="slurm_link" href="#join_container_false"></a>
</dt>
<dd>Do not attempt to join a container.</dd>
</dl>
</dd>
</dl>
</dd>

<dt>
  <b id="log_level">log_level</b>
  <a class="slurm_link" href="#log_level"></a>
</dt>
<dd>See <a href="https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdDebug">
SlurmdDebug</a> in slurm.conf for available options.
The default log_level is <b>info</b>.
</dd>

<dt>
  <b id="nodename">nodename</b>
  <a class="slurm_link" href="#nodename"></a>
</dt>
<dd>If the NodeName defined in <b>slurm.conf</b> is different than this node's
hostname (as reported by <b>hostname -s</b>), then this must be set to the
NodeName in <b>slurm.conf</b> that this host operates as.
</dd>

<dt>
  <b id="service">service</b>
  <a class="slurm_link" href="#service"></a>
</dt>
<dd>The pam service name for which this module should run. By default
it only runs for sshd for which it was designed for. A
different service name can be specified like "login" or "*" to
allow the module to in any service context. For local pam logins
this module could cause unexpected behavior or even security
issues. Therefore if the service name does not match then this
module will not perform the adoption logic and returns
PAM_IGNORE immediately.
</dd>

</dl>

<h3 id="firewall">Firewalls, IP Addresses, etc.
<a class="slurm_link" href="#firewall"></a>
</h3>

<p><i>slurmd</i> should be accessible on any IP address from which a user might
launch ssh. The RPC to determine the source job must be able to reach the
<i>slurmd</i> port on that particular IP address. If there is no <i>slurmd</i>
on the source node, such as on a <a href="quickstart_admin.html#login">login
node</a>, it is better to have the RPC be
rejected rather than silently dropped. This will allow better responsiveness to
the RPC initiator.</p>

<h3 id="selinux">SELinux<a class="slurm_link" href="#selinux"></a></h3>

<p>SELinux may conflict with pam_slurm_adopt, but it is generally possible for
them to work side by side. This is an example type enforcement file that was
used on a fairly stock Debian system. It is provided to give some direction
and to show what is required to get this working but may require additional
modification.</p>

<pre>
module pam_slurm_adopt 1.0;

require {
	type sshd_t;
	type var_spool_t;
	type unconfined_t;
	type initrc_var_run_t;
	class sock_file write;
	class dir { read search };
	class unix_stream_socket connectto;
}

#============= sshd_t ==============
allow sshd_t initrc_var_run_t:dir search;
allow sshd_t initrc_var_run_t:sock_file write;
allow sshd_t unconfined_t:unix_stream_socket connectto;
allow sshd_t var_spool_t:dir read;
allow sshd_t var_spool_t:sock_file write;
</pre>

<p>It is possible for some plugins to require more permissions than this.
Notably, job_container/tmpfs will require something more like this:</p>

<pre>
module pam_slurm_adopt 1.0;

require {
	type nsfs_t;
	type var_spool_t;
	type initrc_var_run_t;
	type unconfined_t;
	type sshd_t;
	class sock_file write;
	class dir { read search };
	class unix_stream_socket connectto;
	class fd use;
	class file read;
	class capability sys_admin;
}

#============= sshd_t ==============
allow sshd_t initrc_var_run_t:dir search;
allow sshd_t initrc_var_run_t:sock_file write;
allow sshd_t nsfs_t:file read;
allow sshd_t unconfined_t:fd use;
allow sshd_t unconfined_t:unix_stream_socket connectto;
allow sshd_t var_spool_t:dir read;
allow sshd_t var_spool_t:sock_file write;
allow sshd_t self:capability sys_admin;
</pre>

<h2 id="LIMITATIONS">Limitations
<a class="slurm_link" href="#LIMITATIONS"></a>
</h2>

<p>Internally, some AuthenticationMethods cause sshd to fork an extra process
during the login flow, which sshd partially offloads the authentication
dialogue to. This can confuse PAM modules, and may break process adoption
with pam_slurm_adopt.</p>

<p>When using SELinux support in Slurm, the session started via pam_slurm_adopt
won't necessarily be in the same context as the job it is associated with.</p>

<p style="text-align:center;">Last modified 26 March 2025</p>

<!--#include virtual="footer.txt"-->