1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
|
<!--#include virtual="header.txt"-->
<h1><a name="top">Dynamic Nodes</a></h1>
<h2 id="overview">Overview<a class="slurm_link" href="#overview"></a></h2>
<p>Starting in Slurm 22.05, nodes can be dynamically added and removed from
Slurm.
</p>
<h2 id="communications">Dynamic Node Communications
<a class="slurm_link" href="#communications"></a>
</h2>
<p>
For regular, non-dynamically created nodes, Slurm knows how to communicate with
nodes by reading in the slurm.conf. This is why it is important for a
non-dynamic setup that the slurm.conf is synchronized across the cluster. For
dynamically created nodes, The controller automatically grabs the node's
<b>NodeAddr</b> and <b>NodeHostname</b> for dynamic slurmd registrations. The
controller then passes the node addresses to the clients so that they
communicate, and even fanout, to other nodes.
</p>
<h2 id="config">Slurm Configuration
<a class="slurm_link" href="#config"></a>
</h2>
<p>
<dl>
<dt><b>MaxNodeCount=#</b>
<dd>
Set to the number of possible nodes that can be active in a system at a time.
See the slurm.conf <a href="slurm.conf.html#OPT_MaxNodeCount">man</a> page for
more details.
<dt><b>SelectType=select/cons_tres</b>
<dd>Dynamic nodes are only supported with cons_tres.
</dl>
</p>
<h3 id="partitions">Partition Assignment
<a class="slurm_link" href="#partitions"></a>
</h3>
<p>
Dynamic nodes can be automatically assigned to partitions at creation by using
the partition's nodes <a href="slurm.conf.html#OPT_Nodes_1">ALL</a> keyword or
<a href="slurm.conf.html#SECTION_NODESET-CONFIGURATION">NodeSets</a> and
specifying a feature on the nodes.
</p>
<p>
e.g.
<pre>
Nodeset=ns1 Feature=f1
Nodeset=ns2 Feature=f2
PartitionName=all Nodes=ALL Default=yes
PartitionName=dyn1 Nodes=ns1
PartitionName=dyn2 Nodes=ns2
PartitionName=dyn3 Nodes=ns1,ns2
</pre>
</a>
<h2 id="create">Creating Nodes
<a class="slurm_link" href="#create"></a>
</h2>
<p>
Nodes can be created two ways:
<ol>
<li>
<dl>
<dt><b>Dynamic slurmd registration</b>
<dd>
<p>Using the slurmd <a href="slurmd.html#OPT_-Z">-Z</a> and
<a href="slurmd.html#OPT_conf-<node-parameters>">--conf</a> options a slurmd
will register with the controller and will automatically be added to the system.
</p>
<p>
e.g.
<pre>
slurmd -Z --conf "RealMemory=80000 Gres=gpu:2 Feature=f1"
</pre>
</p>
</dl>
</li>
<li>
<dl>
<dt><b>scontrol create NodeName= ...</b>
<dd>
<p>Create nodes using scontrol by specifying the same <b>NodeName</b>
line that you would define in the slurm.conf. See slurm.conf
<a href="slurm.conf.html#SECTION_NODE-CONFIGURATION">man</a> page for node
options. Only <b>State=CLOUD</b> and <b>State=FUTURE</b> are supported. The
node configuration should match what the slurmd will register with
(e.g. slurmd -C) plus any additional attributes.
</p>
</p>
e.g.
<pre>
scontrol create NodeName=d[1-100] CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848 Gres=gpu:2 Feature=f1 State=cloud
</pre>
</p>
</dl>
</li>
</ol>
</p>
<h2 id="delete">Deleting Nodes
<a class="slurm_link" href="#delete"></a>
</h2>
<p>
Nodes can be deleted using <b>scontrol delete nodename=<nodelist></b>.
Only dynamic nodes that have no running jobs and that are not part of a
reservation can be deleted.
</p>
<h2 id="limitations">Limitations
<a class="slurm_link" href="#limitations"></a>
</h2>
<p>
<ol>
<li>
Dynamic nodes are incompatible with <b>TopologyParam=RouteTree</b>.
</li>
<li>
When non-default topology options are in use, extra steps should be taken
to include dynamic nodes in the topology:
<ol>
<li>Dynamic nodes must be added to <b>topology.conf</b></li>
<li>The <b>slurmctld</b> must be restarted or reconfigured</li>
</ol>
</li>
<li>
Dynamic nodes are not sorted internally and when added to Slurm they will
potentially be alphabetically out of order internally — leading to
suboptimal job allocations if node names represent topology of the nodes.
</li>
</ol>
</p>
<p style="text-align:center;">Last modified 31 December 2024</p>
<!--#include virtual="footer.txt"-->
|