Slurm Workload Manager
Slurm is an open-source workload manager designed for Linux clusters of
all sizes.
It provides three key functions.
First it allocates exclusive and/or non-exclusive access to resources
(computer nodes) to users for some duration of time so they can perform work.
Second, it provides a framework for starting, executing, and monitoring work
(typically a parallel job) on a set of allocated nodes.
Finally, it arbitrates contention for resources by managing a queue of
pending work.
Slurm's design is very modular with dozens of optional plugins.
In its simplest configuration, it can be installed and configured in a
couple of minutes (see
Caos NSA and Perceus: All-in-one Cluster Software Stack
by Jeffrey B. Layton).
More complex configurations can satisfy the job scheduling needs of
world-class computer centers and rely upon a
MySQL database for archiving
accounting records, managing
resource limits by user or bank account,
or supporting sophisticated
job prioritization algorithms.
While other workload managers do exist, Slurm is unique in several
respects:
- Scalability: It is designed to operate in a heterogeneous cluster
with up to tens of millions of processors.
- Performance: It can accept 1,000 job submissions per second and
fully execute 500 simple jobs per second (depending upon hardware and system
configuration).
- Free and Open Source: Its source code is freely available under the
GNU General Public License.
- Portability: Written in C with a GNU autoconf configuration engine.
While initially written for Linux, Slurm has been ported to a diverse assortment
of systems.
- Power Management: Job can specify their desired CPU frequency and
power use by job is recorded. Idle resources can be powered down until needed.
- Fault Tolerant: It is highly tolerant of system failures, including
failure of the node executing its control functions.
- Flexibility: A plugin mechanism exists to support various
interconnects, authentication mechanisms, schedulers, etc. These plugins are
documented and simple enough for the motivated end user to understand the
source and add functionality.
- Resizable Jobs: Jobs can grow and shrink on demand. Job submissions
can specify size and time limit ranges.
- Status Jobs: Status running jobs at the level of individual tasks to
help identify load imbalances and other anomalies.
Slurm provides workload management on many of the most powerful computers in
the world. On the November 2013 Top500 list,
five of the ten top systems use Slurm including the number one system.
These five systems alone contain over 5.7 million cores.
A few of the systems using Slurm are listed below:
-
Tianhe-2 designed by
The National University of Defense Technology (NUDT)
in China has 16,000 nodes, each with two Intel Xeon IvyBridge processors and
three Xeon Phi processors for a total of 3.1 million cores and a peak
performance of 33.86 Petaflops.
- Sequoia,
an IBM BlueGene/Q system at
Lawrence Livermore National Laboratory
with 1.6 petabytes of memory, 96 racks, 98,304 compute nodes, and 1.6
million cores, with a peak performance of over 17.17 Petaflops.
- Piz Daint
a Cray XC30 system at the
Swiss National Supercomputing Centre
with 28 racks and 5,272 hybrid compute nodes each with an
Intel Xeon E5-2670 CPUs
plus an NVIDIA Tesla K20X GPUs
for a total of 115,984 compute cores and
a peak performance of 6.27 Petaflops.
- Stampede at the
Texas Advanced Computing Center/University of Texas
is a Dell with over
80,000 Intel Xeon cores,
Intel Phi co-processors, plus
128 NVIDIA GPUs
delivering 5.17 Petaflops.
- TGCC Curie,
owned by GENCI and operated in the TGCC by
CEA, Curie is offering 3 different fractions
of x86-64 computing resources for addressing a wide range of scientific
challenges and offering an aggregate peak performance of 2 PetaFlops.
-
Tera 100 at CEA
with 140,000 Intel Xeon 7500 processing cores, 300TB of
central memory and a theoretical computing power of 1.25 Petaflops.
- Lomonosov, a
T-Platforms system at
Moscow State University Research Computing Center
with 52,168 Intel Xeon processing cores and 8,840 NVIDIA GPUs.
- LOEWE-CSC,
a combined CPU-GPU Linux cluster at
The Center for Scientific Computing (CSC)
of the Goethe University Frankfurt, Germany,
with 20,928 AMD Magny-Cours CPU cores (176 Teraflops peak
performance) plus 778 ATI Radeon 5870 GPUs (2.1 Petaflops peak
performance single precision and 599 Teraflops double precision) and
QDR Infiniband interconnect.
Last modified 24 November 2013