Starting in Slurm 22.05, nodes can be dynamically added and removed from Slurm.
For regular, non-dynamically created nodes, Slurm knows how to communicate with nodes by reading in the slurm.conf. This is why it is important for a non-dynamic setup that the slurm.conf is synchronized across the cluster. For dynamically created nodes, The controller automatically grabs the node's NodeAddr and NodeHostname for dynamic slurmd registrations. The controller then passes the node addresses to the clients so that they communicate, and even fanout, to other nodes.
Dynamic nodes can be automatically assigned to partitions at creation by using the partition's nodes ALL keyword or NodeSets and specifying a feature on the nodes.
e.g.
Nodeset=ns1 Feature=f1 Nodeset=ns2 Feature=f2 PartitionName=all Nodes=ALL Default=yes PartitionName=dyn1 Nodes=ns1 PartitionName=dyn2 Nodes=ns2 PartitionName=dyn3 Nodes=ns1,ns2
Nodes can be created two ways:
Create nodes using scontrol by specifying the same NodeName line that you would define in the slurm.conf. See slurm.conf man page for node options. Only State=CLOUD and State=FUTURE are supported. The node configuration should match what the slurmd will register with (e.g. slurmd -C) plus any additional attributes.
e.g.scontrol create NodeName=d[1-100] CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848 Gres=gpu:2 Feature=f1 State=cloud
Nodes can be deleted using scontrol delete nodename=<nodelist>. Only dynamic nodes that have no running jobs and that are not part of a reservation can be deleted.
Last modified 31 December 2024