High Throughput Computing Administration Guide

This document contains Slurm administrator information specifically for high throughput computing, namely the execution of many short jobs. Getting optimal performance for high throughput computing does require some tuning and this document should help you off to a good start. A working knowledge of Slurm should be considered a prerequisite for this material.

Performance Results

Slurm has also been validated to execute 500 simple batch jobs per second on a sustained basis with short bursts of activity at a much higher level. Actual performance depends upon the jobs to be executed plus the hardware and configuration used.

System configuration

Several system configuration parameters may require modification to support a large number of open files and TCP connections with large bursts of messages. Changes can be made using the /etc/rc.d/rc.local or /etc/sysctl.conf script to preserve changes after reboot. In either case, you can write values directly into these files (e.g. "echo 32832 > /proc/sys/fs/file-max").

The transmit queue length (txqueuelen) may also need to be modified using the ifconfig command. A value of 4096 has been found to work well for one site with a very large cluster (e.g. "ifconfig txqueuelen 4096").

Munge configuration

By default the Munge daemon runs with two threads, but a higher thread count can improve its throughput. We suggest starting the Munge daemon with ten threads for high throughput support (e.g. "munged --num-threads 10").

User limits

The ulimit values in effect for the slurmctld daemon should be set quite high for memory size, open file count and stack size.

Slurm Configuration

Several Slurm configuration parameters should be adjusted to reflect the needs of high throughput computing. The changes described below will not be possible in all environments, but these are the configuration options that you may want to consider for higher throughput.

SlurmDBD Configuration

Turning accounting off provides a minimal improvement in performance. If using SlurmDBD increased speedup can be achieved by setting the CommitDelay option in the slurmdbd.conf to introduce a delay between the time slurmdbd receives a connection from slurmctld and when it commits the information to the database. This allows multiple requests to be accumulated and reduces the number of commit requests to the database.

You might also consider setting the 'Purge*' options in your slurmdbd.conf to clear out old Data. A Typical configuration would look like this...

Last modified 13 March 2025