File: reservations.shtml

package info (click to toggle)
slurm-wlm 24.11.5-4
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 51,508 kB
  • sloc: ansic: 529,598; exp: 64,795; python: 17,051; sh: 10,365; javascript: 6,528; makefile: 4,116; perl: 3,762; pascal: 131
file content (535 lines) | stat: -rw-r--r-- 22,612 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
<!--#include virtual="header.txt"-->

<h1>Advanced Resource Reservation Guide</h1>

<p>Slurm has the ability to reserve resources for jobs
being executed by select users and/or select bank accounts.
A resource reservation identifies the resources in that reservation
and a time period during which the reservation is available.
The resources which can be reserved include cores, nodes, licenses and/or
burst buffers.
A reservation that contains nodes or cores is associated with one partition,
and can't span resources over multiple partitions.
The only exception from this is when
the reservation is created with explicitly requested nodes.
Note that resource reservations are not compatible with Slurm's
gang scheduler plugin since the termination time of running jobs
cannot be accurately predicted.</p>

<p>Note that reserved burst buffers and licenses are treated somewhat
differently than reserved cores or nodes.
When cores or nodes are reserved, then jobs using that reservation can use only
those resources (this behavior can be change using FLEX flag) and no other jobs can use those resources.
Reserved burst buffers and licenses can only be used by jobs associated with
that reservation, but licenses not explicitly reserved are available to any job.
This eliminates the need to explicitly put licenses into every advanced
reservation created.</p>

<p>Reservations can be created, updated, or destroyed only by user root
or the configured <i>SlurmUser</i> using the <i>scontrol</i> command.
The <i>scontrol</i> and <i>sview</i> commands can be used
to view reservations. Additionally, root and the configured <i>SlurmUser</i>
have access to all reservations, even if they would normally not have access.
The man pages for the various commands contain details.</p>

<h2 id="creation">Reservation Creation
<a class="slurm_link"href="#creation"></a>
</h2>

<p>One common mode of operation for a reservation would be to reserve
an entire computer at a particular time for a system down time.
The example below shows the creation of a full-system reservation
at 16:00 hours on 6 February and lasting for 120 minutes.
The "maint" flag is used to identify the reservation for accounting
purposes as system maintenance.
The "ignore_jobs" flag is used to indicate that we can ignore currently
running jobs when creating this reservation.
By default, only resources which are not expected to have a running job
at the start time can be reserved (the time limit of all running
jobs will have been reached).
In this case we can manually cancel the running jobs as needed
to perform system maintenance.
As the reservation time approaches,
only jobs that can complete by the reservation time will be initiated.</p>
<pre>
$ scontrol create reservation starttime=2009-02-06T16:00:00 \
   duration=120 user=root flags=maint,ignore_jobs nodes=ALL
Reservation created: root_3

$ scontrol show reservation
ReservationName=root_3 StartTime=2009-02-06T16:00:00
   EndTime=2009-02-06T18:00:00 Duration=120
   Nodes=ALL NodeCnt=20
   Features=(null) PartitionName=(null)
   Flags=MAINT,SPEC_NODES,IGNORE_JOBS Licenses=(null)
   BurstBuffers=(null)
   Users=root Accounts=(null)
</pre>

<p>A variation of this would be to configure licenses to represent system
resources, such as a global file system.
The system resource may not require an actual license for use, but
Slurm licenses can be used to prevent jobs needing the resource from being
started when that resource is unavailable.
One could create a reservation for all of those licenses in order to perform
maintenance on that resource.
In the example below, we create a reservation for 1000 licenses with the name
of "lustre". 
If there are a total of 1000 lustre licenses configured in this cluster,
this reservation will prevent any job specifying the need for a lustre
license from being scheduled on this cluster during this reservation.</p>
<pre>
$ scontrol create reservation starttime=2009-04-06T16:00:00 \
   duration=120 user=root flags=license_only \
   licenses=lustre:1000
Reservation created: root_4

$ scontrol show reservation
ReservationName=root_4 StartTime=2009-04-06T16:00:00
   EndTime=2009-04-06T18:00:00 Duration=120
   Nodes= NodeCnt=0
   Features=(null) PartitionName=(null)
   Flags=LICENSE_ONLY Licenses=lustre*1000
   BurstBuffers=(null)
   Users=root Accounts=(null)
</pre>

<p>Another mode of operation would be to reserve specific nodes
for an indefinite period in order to study problems on those
nodes. This could also be accomplished using a Slurm partition
specifically for this purpose, but that would fail to capture
the maintenance nature of their use.</p>
<pre>
$ scontrol create reservation user=root starttime=now \
   duration=infinite flags=maint nodes=sun000
Reservation created: root_5

$ scontrol show res
ReservationName=root_5 StartTime=2009-02-04T16:22:57
   EndTime=2009-02-04T16:21:57 Duration=4294967295
   Nodes=sun000 NodeCnt=1
   Features=(null) PartitionName=(null)
   Flags=MAINT,SPEC_NODES Licenses=(null)
   BurstBuffers=(null)
   Users=root Accounts=(null)
</pre>

<p>Our next example is to reserve ten nodes in the default
Slurm partition starting at noon and with a duration of 60
minutes occurring daily. The reservation will be available
only to users "alan" and "brenda".</p>
<pre>
$ scontrol create reservation user=alan,brenda \
   starttime=noon duration=60 flags=daily nodecnt=10
Reservation created: alan_6

$ scontrol show res
ReservationName=alan_6 StartTime=2009-02-05T12:00:00
   EndTime=2009-02-05T13:00:00 Duration=60
   Nodes=sun[000-003,007,010-013,017] NodeCnt=10
   Features=(null) PartitionName=pdebug
   Flags=DAILY Licenses=(null) BurstBuffers=(null)
   Users=alan,brenda Accounts=(null)
</pre>

<p>Our next example is to reserve 100GB of burst buffer space
starting at noon today and with a duration of 60 minutes.
The reservation will be available only to users "alan" and "brenda".</p>
<pre>
$ scontrol create reservation user=alan,brenda \
   starttime=noon duration=60 flags=any_nodes burstbuffer=100GB
Reservation created: alan_7

$ scontrol show res
ReservationName=alan_7 StartTime=2009-02-05T12:00:00
   EndTime=2009-02-05T13:00:00 Duration=60
   Nodes= NodeCnt=0
   Features=(null) PartitionName=(null)
   Flags=ANY_NODES Licenses=(null) BurstBuffer=100GB
   Users=alan,brenda Accounts=(null)
</pre>

<p>Note that specific nodes to be associated with the reservation are
identified immediately after creation of the reservation. This permits
users to stage files to the nodes in preparation for use during the
reservation. Note that the reservation creation request can also
identify the partition from which to select the nodes or _one_
feature that every selected node must contain.</p>

<p>On a smaller system, one might want to reserve cores rather than
whole nodes.
This capability permits the administrator to identify the core count to be
reserved on each node as shown in the examples below.<br>
<b>NOTE</b>: Core reservations are not available when the system is configured
to use the select/linear plugin.</p>
<pre>
# Create a two core reservation for user alan
$ scontrol create reservation StartTime=now Duration=60 \
  NodeCnt=1 CoreCnt=2 User=alan

# Create a reservation for user brenda with two cores on
# node tux8 and 4 cores on node tux9
$ scontrol create reservation StartTime=now Duration=60 \
  Nodes=tux8,tux9 CoreCnt=2,4 User=brenda
</pre>

<p>Reservations can not only be created for the use of specific accounts and
users, but specific accounts and/or users can be prevented from using them.
In the following example, a reservation is created for account "foo", but user
"alan" is prevented from using the reservation even when using the account
"foo".</p>

<pre>
$ scontrol create reservation account=foo \
   user=-alan partition=pdebug \
   starttime=noon duration=60 nodecnt=2k,2k
Reservation created: alan_9

$ scontrol show res
ReservationName=alan_9 StartTime=2011-12-05T13:00:00
   EndTime=2011-12-05T14:00:00 Duration=60
   Nodes=bgp[000x011,210x311] NodeCnt=4096
   Features=(null) PartitionName=pdebug
   Flags= Licenses=(null) BurstBuffers=(null)
   Users=-alan Accounts=foo
</pre>

<p>When creating a reservation, you can request that Slurm include all the
nodes in a partition by specifying the <b>PartitionName</b> option.
If you only want a certain number of nodes or CPUs from that partition
you can combine <b>PartitionName</b> with the <b>CoreCnt</b>, <b>NodeCnt</b>
or <b>TRES</b> options to specify how many of a resource you want.
In the following example, a reservation is created in the 'gpu' partition
that uses the <b>TRES</b> option to limit the reservation to 24 processors,
divided among 4 nodes.</p>

<pre>
$ scontrol create reservationname=test start=now duration=1 \
   user=user1 partition=gpu tres=cpu=24,node=4
Reservation created: test

$ scontrol show res
ReservationName=test StartTime=2020-08-28T11:07:09
   EndTime=2020-08-28T11:08:09 Duration=00:01:00
   Nodes=node[01-04] NodeCnt=4 CoreCnt=24
   Features=(null) PartitionName=gpu
     NodeName=node01 CoreIDs=0-5
     NodeName=node02 CoreIDs=0-5
     NodeName=node03 CoreIDs=0-5
     NodeName=node04 CoreIDs=0-5
   TRES=cpu=24
   Users=user1 Accounts=(null) Licenses=(null)
   State=ACTIVE BurstBuffer=(null)
   MaxStartDelay=(null)
</pre>

<h2 id="use">Reservation Use<a class="slurm_link" href="#use"></a></h2>

<p>The reservation create response includes the reservation's name.
This name is automatically generated by Slurm based upon the first
user or account name and a numeric suffix. In order to use the
reservation, the job submit request must explicitly specify that
reservation name. The job must be contained completely within the
named reservation. The job will be canceled after the reservation
reaches its EndTime. If letting the job continue execution after
the reservation EndTime, a configuration option <i>ResvOverRun</i>
in slurm.conf can be set to control how long the job can continue execution.</p>
<pre>
$ sbatch --reservation=alan_6 -N4 my.script
sbatch: Submitted batch job 65540
</pre>

<p>Note that use of a reservation does not alter a job's priority, but it
does act as an enhancement to the job's priority.
Any job with a reservation is considered for scheduling to resources 
before any other job in the same Slurm partition (queue) not associated
with a reservation.</p>

<h2 id="modification">Reservation Modification
<a class="slurm_link" href="#modification"></a>
</h2>

<p>Reservations can be modified by user root as desired.
For example their duration could be altered or the users
granted access changed as shown below:</p>
<pre>
$ scontrol update ReservationName=root_3 \
   duration=150 users=admin
Reservation updated.

bash-3.00$ scontrol show ReservationName=root_3
ReservationName=root_3 StartTime=2009-02-06T16:00:00
   EndTime=2009-02-06T18:30:00 Duration=150
   Nodes=ALL NodeCnt=20 Features=(null)
   PartitionName=(null) Flags=MAINT,SPEC_NODES
   Licenses=(null) BurstBuffers=(null)
   Users=admin Accounts=(null)
</pre>

<h2 id="deletion">Reservation Deletion
<a class="slurm_link" href="#deletion"></a>
</h2>

<p>Reservations are automatically purged after their end time.
They may also be manually deleted as shown below.
Note that a reservation can not be deleted while there are
jobs running in it.</p>
<pre>
$ scontrol delete ReservationName=alan_6
</pre>
<p>

<b>NOTE</b>: By default, when a reservation ends the reservation request will be
removed from any pending jobs submitted to the reservation and will be put into
a held state.  Use the NO_HOLD_JOBS_AFTER_END reservation flag to let jobs run
outside of the reservation after the reservation is gone.
</p>

<h2 id="overlap">Overlapping Reservations
<a class="slurm_link" href="#overlap"></a>
</h2>

<p>By default, reservations must not overlap. They must either include
different nodes or operate at different times. If specific nodes
are not specified when a reservation is created, Slurm will
automatically select nodes to avoid overlap and ensure that
the selected nodes are available when the reservation begins.</p>

<p>There is very limited support for overlapping reservations
with two specific modes of operation available.
For ease of system maintenance, you can create a reservation
with the "maint" flag that overlaps existing reservations.
This permits an administrator to easily create a maintenance
reservation for an entire cluster without needing to remove
or reschedule pre-existing reservations. Users requesting access
to one of these pre-existing reservations will be prevented from
using resources that are also in this maintenance reservation.
For example, users alan and brenda might have a reservation for
some nodes daily from noon until 1PM. If there is a maintenance
reservation for all nodes starting at 12:30PM, the only jobs they
may start in their reservation would have to be completed by 12:30PM,
when the maintenance reservation begins.</p>

<p>The second exception operates in the same manner as a maintenance
reservation except that it is not logged in the accounting system as nodes
reserved for maintenance.
It requires the use of the "overlap" flag when creating the second
reservation.
This might be used to ensure availability of resources for a specific
user within a group having a reservation.
Using the previous example of alan and brenda having a 10 node reservation
for 60 minutes, we might want to reserve 4 nodes of that for brenda
during the first 30 minutes of the time period.
In this case, the creation of one overlapping reservation (for a total of
two reservations) may be simpler than creating three separate reservations,
partly since the use of any reservation requires the job specification
of the reservation name.
<ol>
<li>A six node reservation for both alan and brenda that lasts the full
60 minutes</li>
<li>A four node reservation for brenda for the first 30 minutes</li>
<li>A four node reservation for both alan and brenda that lasts for the
final 30 minutes</li>
</ol></p>

<p>If the "maint" or "overlap" flag is used when creating reservations,
one could create a reservation within a reservation within a third
reservation.
Note a reservation having a "maint" or "overlap" flag will not have
resources removed from it by a subsequent reservation also having a
"maint" or "overlap" flag, so nesting of reservations only works to a
depth of two.</p>

<h2 id="float">Reservations Floating Through Time
<a class="slurm_link" href="#float"></a>
</h2>

<p>Slurm can be used to create an advanced reservation with a start time that
remains a fixed period of time in the future.
These reservation are not intended to run jobs, but to prevent long running
jobs from being initiated on specific nodes.
That node might be placed in a DRAINING state to prevent <b>any</b> new jobs
from being started there.
Alternately, an advanced reservation might be placed on the node to prevent
jobs exceeding some specific time limit from being started.
Attempts by users to make use of a reservation with a floating start time will
be rejected.
When ready to perform the maintenance, place the node in DRAINING state and
delete the previously created advanced reservation.</p>

<p>Create the reservation by using the flag value of <b>TIME_FLOAT</b> and a
start time that is relative to the current time (use the keyword <b>now</b>).
The reservation duration should generally be a value which is large relative
to typical job run times in order to not adversely impact backfill scheduling
decisions.
Alternately the reservation can have a specific end time, in which case the
reservation's start time will increase through time until the reservation's
end time is reached.
When the current time passes the reservation end time then the reservation will
be purged.
In the example below, node tux8 is prevented from starting any jobs exceeding
a 60 minute time limit.
The duration of this reservation is 100 (minutes).</p>
<pre>
$ scontrol create reservation user=operator nodes=tux8 \
  starttime=now+60minutes duration=100 flags=time_float
</pre>

<h2 id="replace">Reservations that Replace Allocated Resources
<a class="slurm_link" href="#replace"></a>
</h2>

<p>By default, nodes in a reservation that are DOWN or DRAINED will be replaced,
but not nodes that are allocated to jobs. This behavior may be explicitly
requested with the <b>REPLACE_DOWN</b> flag.</p>

<p>However, you may instruct Slurm to also replace nodes which are allocated to
jobs with new idle nodes. This is done using the <b>REPLACE</b> flag as shown in
the example below.
The effect of this is to always maintain a constant size pool of resources.
This option is not supported for reservations specifying cores which
span more than one node, rather than full nodes. (E.g. a 1 core reservation on
node "tux1" will be moved if node "tux1" goes down, but a reservation
containing 2 cores on node "tux1" and 3 cores on "tux2" will not be moved if
"tux1" goes down.)</p>
<pre>
$ scontrol create reservation starttime=now duration=60 \
  users=foo nodecnt=2 flags=replace
Reservation created: foo_82

$ scontrol show res
ReservationName=foo_82 StartTime=2014-11-20T16:21:11
   EndTime=2014-11-20T17:21:11 Duration=01:00:00
   Nodes=tux[0-1] NodeCnt=2 CoreCnt=12 Features=(null)
   PartitionName=debug Flags=REPLACE
   Users=jette Accounts=(null) Licenses=(null) State=ACTIVE

$ sbatch -n4 --reservation=foo_82 tmp
Submitted batch job 97

$ scontrol show res
ReservationName=foo_82 StartTime=2014-11-20T16:21:11
   EndTime=2014-11-20T17:21:11 Duration=01:00:00
   Nodes=tux[1-2] NodeCnt=2 CoreCnt=12 Features=(null)
   PartitionName=debug Flags=REPLACE
   Users=jette Accounts=(null) Licenses=(null) State=ACTIVE

$ sbatch -n4 --reservation=foo_82 tmp
Submitted batch job 98

$ scontrol show res
ReservationName=foo_82 StartTime=2014-11-20T16:21:11
   EndTime=2014-11-20T17:21:11 Duration=01:00:00
   Nodes=tux[2-3] NodeCnt=2 CoreCnt=12 Features=(null)
   PartitionName=debug Flags=REPLACE
   Users=jette Accounts=(null) Licenses=(null) State=ACTIVE

$ squeue
JOBID PARTITION  NAME  USER ST  TIME  NODES NODELIST(REASON)
   97     debug   tmp   foo  R  0:09      1 tux0
   98     debug   tmp   foo  R  0:07      1 tux1
</pre>

<h2 id="flex">FLEX Reservations<a class="slurm_link" href="#flex"></a></h2>

<p>By default, jobs that run in reservations must fit within the time and
size constraints of the reserved resources. With the <b>FLEX</b> flag jobs
are able to start before the reservation begins or continue after it ends.
They are also able to use the reserved node(s) along with additional nodes if
required and available.

<p>The default behavior for jobs that request a reservation is that they must
be able to run within the confines (time and space) of that reservation.
The following example shows that the <b>FLEX</b> flag allows the job to run
before the reservation starts, after it ends, and on a node outside
of the reservation.</p>
<pre>
$ scontrol create reservation user=user1 nodes=node01 starttime=now+10minutes duration=10 flags=flex
Reservation created: user1_831

$ sbatch -wnode0[1-2] -t30:00 --reservation=user1_831 test.job
Submitted batch job 57996

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             57996     debug sleepjob    user1  R       0:08      2 node[01-02]
</pre>

<h2 id="magnetic">Magnetic Reservations
<a class="slurm_link" href="#magnetic"></a>
</h2>

<p>The default behavior for reservations is that jobs must request a
reservation in order to run in it. The <b>MAGNETIC</b> flag allows you to
create a reservation that will allow jobs to run in it without requiring that
they specify the name of the reservation. The reservation will only "attract"
jobs that meet the access control requirements.</p>

<p><b>NOTE</b>: Magnetic reservations cannot "attract" heterogeneous jobs -
heterogeneous jobs will only run in magnetic reservations if they explicitly
request the reservation.</p>

<p>The following example shows a reservation created on node05. The user
specified as being able to access the reservation then submits a job and
the job starts on the reserved node.</p>
<pre>
$ scontrol create reservation user=user1 nodes=node05 starttime=now duration=10 flags=magnetic
Reservation created: user1_850

$ scontrol show res
ReservationName=user1_850 StartTime=2020-07-29T13:44:13 EndTime=2020-07-29T13:54:13 Duration=00:10:00
   Nodes=node05 NodeCnt=1 CoreCnt=12 Features=(null) PartitionName=(null) Flags=SPEC_NODES,MAGNETIC
   TRES=cpu=12
   Users=user1 Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null)
   MaxStartDelay=(null)

$ sbatch -N1 -t5:00 test.job
Submitted batch job 62297

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             62297     debug sleepjob    user1  R       0:04      1 node05
</pre>

<h2 id="purge">Reservation Purging After Last Job
<a class="slurm_link" href="#purge"></a>
</h2>

<p>A reservation may be automatically purged after the last associated job
completes. This is accomplished by using a "purge_comp" flag.
Once the reservation has been created, it must be populated within 5 minutes
of its start time or it will be purged before any jobs have been run.</p>

<h2 id="account">Reservation Accounting
<a class="slurm_link" href="#account"></a>
</h2>

<p>Jobs executed within a reservation are accounted for using the appropriate
user and bank account. If resources within a reservation are not used, those
resources will be accounted for as being used by all users or bank accounts
associated with the reservation on an equal basis (e.g. if two users are
eligible to use a reservation and neither does, each user will be reported
to have used half of the reserved resources).</p>

<h2 id="pro_epi">Prolog and Epilog
<a class="slurm_link" href="#pro_epi"></a>
</h2>

<p>Slurm supports both a reservation prolog and epilog.
They may be configured using the <b>ResvProlog</b> and <b>ResvEpilog</b>
configuration parameters in the slurm.conf file.
These scripts can be used to cancel jobs, modify partition configuration,
etc.</p>

<h2 id="future">Future Work<a class="slurm_link" href="#future"></a></h2>

<p>Reservations made within a partition having gang scheduling assumes
the highest level rather than the actual level of time-slicing when
considering the initiation of jobs.
This will prevent the initiation of some jobs which would complete execution
before a reservation given fewer jobs to time-slice with.</p>

<p style="text-align: center;">Last modified 02 August 2024</p>

<!--#include virtual="footer.txt"-->