1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
|
.TH slurmctld "8" "Slurm Daemon" "April 2025" "Slurm Daemon"
.SH "NAME"
slurmctld \- The central management daemon of Slurm.
.SH "SYNOPSIS"
\fBslurmctld\fR [\fIOPTIONS\fR...]
.SH "DESCRIPTION"
\fBslurmctld\fR is the central management daemon of Slurm. It monitors
all other Slurm daemons and resources, accepts work (jobs), and allocates
resources to those jobs. Given the critical functionality of \fBslurmctld\fR,
there may be a backup server to assume these functions in the event that
the primary server fails.
.SH "OPTIONS"
.TP
\fB\-c\fR
Clear all previous \fBslurmctld\fR state from its last checkpoint.
With this option, all jobs, including both running and queued, and all
node states, will be deleted. Without this option, previously running
jobs will be preserved along with node \fIState\fR of DOWN, DRAINED
and DRAINING nodes and the associated \fIReason\fR field for those nodes.
\fBNOTE\fR: It is rare you would ever want to use this in production as all
jobs will be killed.
.IP
.TP
\fB\-D\fR
Run \fBslurmctld\fR in the foreground with logging copied to stdout.
This limits the resilience of 'scontrol reconfigure' and should be
avoided in production.
.IP
.TP
\fB\-f <file>\fR
Read configuration from the specified file. See \fBNOTES\fR below.
.IP
.TP
\fB\-h\fR
Help; print a brief summary of command options.
.IP
.TP
\fB\-i\fR
Ignore errors found while reading in state files on startup.
Warning: Use of this option will mean losing the data that wasn't recovered
from the state files.
.IP
.TP
\fB\-L <file>\fR
Write log messages to the specified file.
.IP
.TP
\fB\-n <value>\fR
Set the daemon's nice value to the specified value, typically a negative number.
.IP
.TP
\fB\-r\fR
Recover partial state from last checkpoint: jobs and node DOWN/DRAIN
state and reason information state. No partition state is recovered.
This is the default action.
.IP
.TP
\fB\-R\fR
Recover full state from last checkpoint: jobs, node, partition state, and power
save settings.
Without this option, previously running jobs will be preserved along
with node \fIState\fR of DOWN, DRAINED and DRAINING nodes and the associated
\fIReason\fR field for those nodes. No other node or partition state will
be preserved.
.IP
.TP
\fB\-s\fR
Change working directory of slurmctld to SlurmctldLogFile path if possible, or
to Slurm's StateSaveLocation otherwise. If both of them fail it will fallback to
/var/tmp.
.IP
.TP
\fB\-\-systemd\fR
Use when starting the daemon with systemd. This will allow slurmctld to notify
systemd of the new PID when using 'scontrol reconfigure'.
\fBNOTE\fR: The User and Group options in the slurmctld's systemd unit file need
to both specify the SlurmUser.
.IP
.TP
\fB\-v\fR
Verbose operation. Multiple \fBv\fR's can be specified, with each '\fBv\fR'
beyond the first increasing verbosity, up to 6 times (i.e. \-vvvvvv).
.IP
.TP
\fB\-V\fR
Print version information and exit.
.IP
.SH "ENVIRONMENT VARIABLES"
The following environment variables can be used to override settings
compiled into slurmctld.
.TP 20
\fBABORT_ON_FATAL\fR
When a fatal error is detected, use abort() instead of exit() to terminate the
process. This allows backtraces to be captured without recompiling Slurm.
.IP
.TP
\fBSLURM_CONF\fR
The location of the Slurm configuration file. This is overridden by
explicitly naming a configuration file on the command line.
.IP
.TP
\fBSLURM_DEBUG_FLAGS\fR
Specify debug flags for the scheduler to use. See DebugFlags in the
\fBslurm.conf\fR(5) man page for a full list of flags. The environment
variable takes precedence over the setting in the slurm.conf.
.IP
.SH "CORE FILE LOCATION"
If slurmctld is started with the \fB\-D\fR option then the core file will be
written to the current working directory.
Otherwise if \fBSlurmctldLogFile\fR is a fully qualified path name (starting
with a slash), the core file will be written to the same directory as the
log file, provided SlurmUser has write permission on the directory.
Otherwise the core file will be written to the \fBStateSaveLocation\fR,
or "/var/tmp/" as a last resort. If none of the above directories have
write permission for SlurmUser, no core file will be produced.
.SH "SIGNALS"
.TP
\fBSIGTERM SIGINT SIGQUIT\fR
\fBslurmctld\fR will shutdown cleanly, saving its current state to the state
save directory.
.IP
.TP
\fBSIGABRT\fR
\fBslurmctld\fR will shutdown cleanly, saving its current state, and perform a
core dump.
.IP
.TP
\fBSIGHUP\fR
Reloads the slurm configuration files, similar to 'scontrol reconfigure'.
.IP
.TP
\fBSIGUSR2\fR
Reread the log level from the configs, and then reopen the log file. This
should be used when setting up \fBlogrotate\fR(8).
.IP
.TP
\fBSIGCHLD SIGUSR1 SIGTSTP SIGXCPU SIGPIPE SIGALRM\fR
These signals are explicitly ignored.
.IP
.SH "NOTES"
It may be useful to experiment with different \fBslurmctld\fR specific
configuration parameters using a distinct configuration file
(e.g. timeouts). However, this special configuration file will not be
used by the \fBslurmd\fR daemon or the Slurm programs, unless you
specifically tell each of them to use it. If you desire changing
communication ports, the location of the temporary file system, or
other parameters used by other Slurm components, change the common
configuration file, \fBslurm.conf\fR.
.SH "COPYING"
Copyright (C) 2002\-2007 The Regents of the University of California.
Copyright (C) 2008\-2010 Lawrence Livermore National Security.
Copyright (C) 2010\-2022 SchedMD LLC.
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
CODE\-OCEC\-09\-009. All rights reserved.
.LP
This file is part of Slurm, a resource management program.
For details, see <https://slurm.schedmd.com/>.
.LP
Slurm is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
.LP
Slurm is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.
.SH "SEE ALSO"
\fBslurm.conf\fR(5), \fBslurmd\fR(8)
|