1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
|
<!--#include virtual="header.txt"-->
<h1>Core Specialization</h1>
<p>Core specialization is a feature designed to isolate system overhead
(system interrupts, etc.) to designated cores on a compute node.
This can reduce context switching in applications to improve completion time.
The job will be charged for all allocated cores, but will not be able to
directly use the specialized cores.</p>
<h2 id="command">Command Options<a class="slurm_link" href="#command"></a></h2>
<p>All job allocation commands (<i>salloc</i>, <i>sbatch</i> and <i>srun</i>)
accept the <i>-S</i> or <i>--core-spec</i> option with a core count value
argument (e.g. "-S 1" or "--core-spec=2").
The count identifies the number of cores to be reserved for system overhead on
each allocated compute node.
Each job's core specialization count can be viewed using the <i>scontrol</i>,
<i>sview</i> or <i>squeue</i> command.
Specification of a core specialization count for a job step is ignored
(i.e. for the <i>srun</i> command within a job allocation created using the
<i>salloc</i> or <i>sbatch</i> command).
Use the <i>squeue</i> command with the "%X" format option to see the count
(it is not reported in the default output format).
The <i>scontrol</i> and <i>sview</i> commands can also be used to modify
the count for pending jobs.</p>
<p>Explicitly setting a job's specialized core value implicitly sets its
<i>--exclusive</i> option, reserving entire nodes for the job.
The job will be charged for all non-specialized CPUs on the node and the job's
NumCPUs value reported by the <i>scontrol</i>, <i>sview</i> and <i>squeue</i>
commands will reflect all non-specialized CPUS on all allocated nodes as will
the job's accounting.</p>
<p>If AllowSpecResourcesUsage=yes and the explicitly requested specialized
core/thread count is lower than the number of cores in the CoreSpecCount or in
the CpuSpecList (leaving what would otherwise be specialized cores available
for use), then the step will have access to all of the normal cores as well as
the extra unused specialized cores. In <i>sacct</i>, the step's allocated CPUs
will include the specialized cores or threads that it has access to. However,
the job's allocated CPU count never includes specialized cores or threads to
ensure that utilization reports are accurate.</p>
<p>Here is an example configuration, setting cores 0 and 1 as
specialized:</p>
<pre>
AllowSpecResourcesUsage=yes
Nodename=n0 Port=10100 CoresPerSocket=16 ThreadsPerCore=1 CpuSpecList=0-1
</pre>
<p>Submit a job requesting a core spec count of 1 (freeing up core
number 1 for job use).</p>
<pre>
$ salloc --core-spec=1
salloc: Granted job allocation 4152
$ srun bash -c 'cat /proc/self/status |grep Cpus_'
Cpus_allowed: fffe
Cpus_allowed_list: 1-15
</pre>
<p>Notice the job CPU count vs the step CPU count.</p>
<pre>
$ sacct -j 4152 -ojobid%20,alloccpus
JobID AllocCPUS
-------------------- ----------
4152 14
4152.interactive 15
4152.0 15
</pre>
<h2 id="core">Core Selection<a class="slurm_link" href="#core"></a></h2>
<p>The specific resources to be used for specialization may be identified using
the <i>CPUSpecList</i> configuration parameter associated with each node in
the <i>slurm.conf</i> file.
Note that the <i>core_spec/cray_aries</i> does not currently support identification of
specific cores, so that plugin should not be used in conjunction with the
<i>CPUSpecList</i> configuration parameter, even on Cray systems.
If <i>CoreSpecCount</i> is configured, but not <i>CPUSpecList</i>, the cores
selected for specialization will follow the assignment algorithm
described below .
The first core selected will be the highest numbered core on the highest
numbered socket.
Subsequent cores selected will be the highest numbered core on lower
numbered sockets.
If additional cores are required, they will come from the next highest numbered
cores on each socket.
By way of example, consider a node with two sockets, each with four cores.
The specialized cores will be selected in the following order:</p>
<ol>
<li>socket: 1 core: 3</li>
<li>socket: 0 core: 3</li>
<li>socket: 1 core: 2</li>
<li>socket: 0 core: 2</li>
<li>socket: 1 core: 1</li>
<li>socket: 0 core: 1</li>
<li>socket: 1 core: 0</li>
<li>socket: 0 core: 0</li>
</ol>
<p>Slurm can be configured to specialize the first, rather than the last cores
by configuring SchedulerParameters=spec_cores_first. In that case,
the first core selected will be the lowest numbered core on the lowest
numbered socket.
Subsequent cores selected will be the lowest numbered core on higher
numbered sockets.
If additional cores are required, they well come from the next lowest numbered
cores on each socket.</p>
<p>Note that core specialization reservation may impact the use of some
job allocation request options, especially --cores-per-socket.</p>
<h2 id="system">System Configuration
<a class="slurm_link" href="#system"></a>
</h2>
<p>There are two fundamentally different mechanisms for core specialization;
one for Cray systems and a different model for other systems.</p>
<p>For Cray systems, configure <i>SelectType=select/cray_aries</i> and
<i>CoreSpecPlugin=core_spec/cray_aries</i>.
By default, no resources will be reserved for system use.
The user must explicitly set a specialized core count as described above.</p>
<p>For all other systems, configure SelectType to <i>cons_res</i> or
<i>cons_tres</i>, configure CoreSpecPlugin to <i>core_spec/none</i> (the
default), and enable the <i>task/cgroup</i> TaskPlugin.
In addition, specialized resources should be configured in slurm.conf on the
node specification line using the <i>CoreSpecCount</i> or <i>CPUSpecList</i>
options to identify the CPUs to reserve.
The <i>MemSpecLimit</i> option can be used to reserve memory.
These resources will be reserved using Linux cgroups.
Users wanting a different number of specialized cores should use the
<i>--core-spec</i> option as described above.</p>
<p>A job's core specialization option will be silently cleared on other
configurations.
In addition, each compute node's core count must be configured or the CPUs
count must be configured to the node's core count.
If the core count is not configured and the CPUs value is configured to the
count of hyperthreads, then hyperthreads rather than cores will be reserved for
system use.</p>
<p>If users are to be granted the right to control the number of specialized
cores for their job, the configuration parameter <i>AllowSpecResourcesUsage</i>
must be set to a value of <i>1</i>.</p>
<p style="text-align:center;">Last modified 21 October 2022</p>
<!--#include virtual="footer.txt"-->
|