File: queue.man.in

package info (click to toggle)
queue 1.30.1-4woody2
  • links: PTS
  • area: main
  • in suites: woody
  • size: 2,388 kB
  • ctags: 1,630
  • sloc: ansic: 10,499; cpp: 2,771; sh: 2,640; lex: 104; makefile: 87
file content (383 lines) | stat: -rw-r--r-- 16,128 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
.TH queue 1 "@MANDATE@" "GNU Queue Version @VERSION@ www.gnuqueue.org" "GNU Queue" 
.SH NAME
queue and qsh  - farm and batch-process jobs out on the local network
.SH SYNOPSIS
.B queue
[\-h hostname|\-H hostname] [\-i|\-q] [\-d spooldir] 
[\-o|\-p|\-n] [\-w|\-r] -- command command.options
.PP
.B qsh
[-l ignored] [-d spooldir] [-o|-p|-n] 
[-w|-r] hostname command command.options 
.SH DESCRIPTION
This documentation is no longer being maintained and may be inaccurate
or incomplete.  The Info documentation is now the authoritative source.
.PP
This manual page documents GNU
.BR Queue 
load-balancing/batch-processing system and local rsh replacement.
.PP
.B queue with only a "\-\-" followed by a command defaults to
immediate execution (\-i) wait for output (\-w) and full-pty emulation
(\-p).
.PP
The defaults for \fR\&\f(CWqsh\fR are a slightly different: no-pty emulation is the default, and a hostname argument is
required. A plus (\fR\&\f(CW+\fR) is the wildcard hostname; specifying \fR\&\f(CW+\fR in place of a valid hostname is the
same as not using an \fR\&\f(CW-h\fR or \fR\&\f(CW-H\fR option with \fR\&\f(CWqueue\fR. \fR\&\f(CWqsh\fR is envisioned
as a \fR\&\f(CWrsh\fR compatibility mode for use with software that expects a \fR\&\f(CWrsh\fR-like syntax.
This is useful with some MPI implementations; see See section MPI in the Info
file.
.PP
The options are:
.IP \fR\&\f(CW-h\ hostname\ \fR\ 
.IP \fR\&\f(CW--host\ hostname\fR\ 
force queue to run on hostname.
.IP \fR\&\f(CW-H\ hostname\ \fR\ 
.IP \fR\&\f(CW--robust-host\ hostname\fR\ 
Run job on hostname if it is up.
.IP \fR\&\f(CW-i|-q\fR\ 
.IP \fR\&\f(CW--immediate|--queue\fR\ 
Shorthand for the (\fR\&\f(CWnow\fR spooldir) and queue (\fR\&\f(CWqueue\fR spooldir).
.IP \fR\&\f(CW[-d\ spooldir]\ \fR\ 
.IP \fR\&\f(CW[--spooldir\ spooldir]\fR\ 
With \fR\&\f(CW-q\fR option, specifies the name of the batch processing directory, e.g., \fR\&\f(CWmlab\fR
.IP \fR\&\f(CW-o|-p|-n\ \fR\ 
.IP \fR\&\f(CW--half-pty|--full-pty|--no-pty\fR\ 
Toggle between half-pty emulation, full-pty emulation (default), and the more efficient no-pty emulation.
.IP \fR\&\f(CW-w|-r\fR\ 
.IP \fR\&\f(CW--wait|--batch\fR\ 
Toggle between wait (stub daemon; default) and return (mail batch) mode.
.IP \fR\&\f(CW-v\fR\ 
.IP \fR\&\f(CW--version\fR\ 
Version
.IP \fR\&\f(CW--help\fR\ 
List of options
.PP
GNU Queue is a UNIX process network
load-balancing system that features an innovative 'stub daemon'
mechanism which allows users to control their remote jobs in a nearly
seamless and transparent fashion. When an interactive remote job is
launched, such as say EMACS interfacing Allegro Lisp, a
stub daemon runs on the remote end. By sending signals to the remote
stub - including hitting the suspend key - the process on the remote
end may be controlled. Resuming the stub resumes the remote job. The
user's environment is almost completely replicated, including not only
environmental variables, but nice values, rlimits, terminal settings
are all replicated on the remote end. Together with \fR\&\f(CWMIT_MAGIC_COOKIE_1\fR
(or \fR\&\f(CWxhost +\fR) the system is X-windows transparent as well,
provided the users local \fR\&\f(CWDISPLAY\fR variable is set to the fully
qualified pathname of the local machine. 
.PP
One of the most appealing features of the stub system even with
experienced users is that asynchronous job control of remote jobs by
the shell is possible and intuitive. One simply runs the stub in the
background under the local shell; the shell notifies the user when the
remote job has a change in status by monitoring the stub daemon. 
.PP
When the remote process has terminated, the stub returns the exit
value to the shell; otherwise, the stub simulates a death by the same
signal as that which terminated or suspended the remote job. In this
way, control of the remote process is intuitive even to novice users,
as it is just like controlling a local job from the shell. Many of my
original users had to be reminded that their jobs were, in fact,
running remotely. 
.PP
In addition, Queue also features a more traditional distributed batch
processing environment, with results returned to the user via
email. In addition, traditional batch processing limitations may be
placed on jobs running in either environment (stub or with the email
mechanism) such as suspension of jobs if the system exceeds a certain
load average, limits on CPU time, disk free requirements, limits on
the times in which jobs may run, etc. (These are documented in the
sample \fR\&\f(CWprofile\fR file included.)
.PP
.PP
.PP
In order to use queue to farm out jobs onto the network, the 
queued must be running on every host in your cluster, as 
defined in the host Access Control File 
(default: /usr/local/share/qhostsfile).
.PP
Once queued is running, jobs may normally be farmed out to other
hosts withing the homogenous cluster.
For example, try something like  \fR\&\f(CWqueue -i
-w -p  -- emacs -nw\fR. You should be able to background and foreground the
remote EMACS process from the local shell just as if it were running
as a local copy. 
.PP
Another example command is \fR\&\f(CWqueue -i -w -n -- hostname\fR which should
return the best host,  as controlled by options in
the profile file (See below) to run a job on. 
.PP
The options on queue need to be explained:
.PP
\&\fR\&\f(CW-i\fR specifies immediate execution mode, placing the job in the \fR\&\f(CWnow\fR
spool. This is the default. Alternatively, you may specify either the \fR\&\f(CW-q\fR option, 
which is shorthand for the \fR\&\f(CWwait\fR spool, or use the \fR\&\f(CW-d
spooldir\fR option to place the job under the control of the \fR\&\f(CWprofile\fR file
in the \fR\&\f(CWspooldir\fR subdirectory of the spool directory, which must previously
have been created by the Queue administrator.
.PP
In any case, execution of the job will wait until it satisfies the conditions
of the profile file for that particular spool directory, which may
include waiting for a slot to become free. This method of batch processing
is completely compatible with the stub mechanism, although it may
disorient users to use it in this way as they may be unknowingly
forced to wait until a slot on a remote machine becomes available. 
.PP
\&\fR\&\f(CW-w\fR activates the stub mechanism, which is the default.
The queue stub process will
terminate when the remote process terminates; you may send signals and
suspend/resume the remote process by doing the same to the stub
process. Standard input/output will be that of the 'queue' stub
process. \fR\&\f(CW-r\fR deactivates the stub process; standard input/output will
be via email back to the users; the \fR\&\f(CWqueue\fR process will return
immediately. 
.PP
\&\fR\&\f(CW-p\fR or \fR\&\f(CW-n\fR specifies whether or not a virtual tty should be
allocated at the remote end, or whether the system should merely use
the more efficient socket mechanism. Many interactive processes, such
as \fR\&\f(CWEMACS\fR or \fR\&\f(CWMatlab\fR, require a virtual tty to be present, so the \fR\&\f(CW-p\fR
option is required for these. Other processes, such as a simple
\&\fR\&\f(CWhostname\fR do not require a \fR\&\f(CWtty\fR and so may be run without the
default \fR\&\f(CW-p\fR. Note that \fR\&\f(CWqueue\fR is intelligent and will override
the \fR\&\f(CW-p\fR option if it detects both \fR\&\f(CWstdio\fR/\fR\&\f(CWstdout\fR have been re-directed
to a non-terminal; this feature is useful in facilitating system
administration scripts that allow users to execute jobs. [At some
point we may wish to change the default to \fR\&\f(CW-p\fR as the system
automatically detects when \fR\&\f(CW-n\fR will suffice.] Simple, non-interactive
jobs such as \fR\&\f(CWhostname\fR do not need the less efficient pty/tty
mechanism and so should be run with the \fR\&\f(CW-n\fR option. The \fR\&\f(CW-n\fR option
is the default when \fR\&\f(CWqueue\fR is invoked in \fR\&\f(CWrsh\fR compatibility mode
with \fR\&\f(CWqsh\fR.
.PP
The \fR\&\f(CW--\fR with \fR\&\f(CWqueue\fR specifies `end of queue options' and everything beyond this
point is interpreted as the command, or arguments to be given to the
command. Consequently, user options (i.e., when invoking queue through
a script front end, may be placed here): 
.PP
.PP
.ID
\&\fR\&\f(CW
#!/bin/sh
exec queue -i -w -p -- big_job $*
.DE
\&\fR
.PP
or 
.PP
.ID
\&\fR\&\f(CW
#!/bin/sh
exec queue -q -w -p -d big_job_queue -- big_job  $*
.DE
\&\fR
.PP
for example. This places queue in immediate mode following
instructions in the \fR\&\f(CWnow\fR spool subdirectory (first example) or in
batch-processing mode into the \fR\&\f(CWbig_job\fR spool subdirectory, provided it
has been created by the administrator. In both cases, stubs are being
used, which will not terminate until the big_job process terminates on the
remote end. 
.PP
In both cases, \fR\&\f(CWpty\fR/\fR\&\f(CWttys\fR will be allocated, unless the user redirects
both the standard input and standard output of the simple invoking
scripts. Invoking queue through these scripts has the additional
advantage that the process name will be that of the script, clarifying
what is the process is. For example, the script might called \fR\&\f(CWbig_job\fR or
\&\fR\&\f(CWbig_job.remote\fR, causing \fR\&\f(CWqueue\fR to appear this way in the user's process
list.
.PP
\&\fR\&\f(CWqueue\fR can be used for batch processing by using the \fR\&\f(CW-q -r -n\fR
options, e.g.,
.PP
.ID
\&\fR\&\f(CW
#!/bin/sh
exec queue -q -r -n -d big_job -- big_job $*
.DE
\&\fR
.PP
would run \fR\&\f(CWbig_job\fR in batch mode. \fR\&\f(CW-q\fR and \fR\&\f(CW-d big_job\fR options force Queue to
follow instructions in the \fR\&\f(CWbig_job/profile\fR file under Queue's spool
directory and wait for the next available job slot. \fR\&\f(CW-r\fR activates
batch-processing mode, causing Queue to exit immediately and return
results (including stdout and stderr output) via email. 
.PP
The final option, \fR\&\f(CW-n\fR, is the option to disable allocation of a pty on the
remote end; it is unnecessary in this case (as batch mode disables
ptys anyway) but is here to demonstrate how it might be used in a
\&\fR\&\f(CW-i -w -n\fR or \fR\&\f(CW-q -w -n\fR invocation.
.PP
.PP
Under \fR\&\f(CW/usr/spool/queue\fR you may create several directories
for batch jobs, each identified with the class of the
batch job (e.g., \fR\&\f(CWbig_job\fR or \fR\&\f(CWsmall_job\fR). You may then place
restrictions on that class, such as maximum number of
jobs running, or total CPU time, by placing a \fR\&\f(CWprofile\fR
file like this one in that directory.
.PP
However, the \fR\&\f(CWnow\fR queue is mandatory; it is the 
directory used by the \fR\&\f(CW-i\fR mode (immediate moe)
of queue to launch jobs over the network
immediately rather than as batch jobs.
.PP
Specify that this queue is turned on:
.PP
.ID
\&\fR\&\f(CW
exec on
.DE
\&\fR
.PP
The next two lines in \fR\&\f(CWprofile\fR may be set to an email address
rather than a file; the leading \fR\&\f(CW/\fR identifies
then as file logs. Files now beginning with \fR\&\f(CWcf\fR,\fR\&\f(CWof\fR, or \fR\&\f(CWef\fR are ignored
by the queued:
.PP
.ID
\&\fR\&\f(CW
mail /usr/local/com/queue/now/mail_log
supervisor /usr/local/com/queue/now/mail_log2
.DE
\&\fR
.PP
Note that \fR\&\f(CW/usr/local/com/queue\fR is our spool directory, and \fR\&\f(CWnow\fR is
the job batch directory for the special \fR\&\f(CWnow\fR queue (run via the \fR\&\f(CW-i\fR
or immediate-mode flag to the queue executable), so these files
may reside in the job batch directories.
.PP
The \fR\&\f(CWpfactor\fR command is used to control the likelihood
of a job being executed on a given machine. Typically, this is done
in conjunction with the \fR\&\f(CWhost\fR command, which specifies that the option
on the rest of the line be honored on that host only.
.PP
In this example, \fR\&\f(CWpfactor\fR is set to the relative MIPS of each
machine, for example:
.PP
.ID
\&\fR\&\f(CW
host fast_host pfactor 100
host slow_host pfactor  50
.DE
\&\fR
.PP
Where \fR\&\f(CWfast_host\fR and \fR\&\f(CWslow_host\fR are the hostnames of the respective machines.
.PP
This is useful for controlling load balancing. Each
queue on each machine reports back an `apparant load average'
calculated as follows:
.PP
1-min load average/ (( max(0, vmaxexec - maxexec) + 1)*pfactor)
.PP
The machine with the lowest apparant load average for that queue
is the one most likely to get the job.
.PP
Consequently, a more powerful \fR\&\f(CWpfactor\fR proportionally reduces the load average
that is reported back for this queue, indicating a more 
powerful system. 
.PP
Vmaxexec is the ``apparant maximum'' number of jobs allowed to execute in
this queue, or simply equal to maxexec if it was not set.
The default value of these variables is large value treated
by the system as infinity.
.PP
.ID
\&\fR\&\f(CW
host fast_host vmaxexec 2
host slow_host vmaxexec 1
maxexec 3
.DE
\&\fR
.PP
The purpose of \fR\&\f(CWvmaxexec\fR is to make the system appear fully loaded
at some point before the maximum number of jobs are already
running, so that the likelihood of the machine being used
tapers off sharply after \fR\&\f(CWvmaxexec\fR slots are filled.
.PP
Below \fR\&\f(CWvmaxexec\fR jobs, the system aggressively discriminates against
hosts already running jobs in this Queue.
.PP
In job queues running above \fR\&\f(CWvmaxexec\fR jobs, hosts appear more equal to the system, 
and only the load average and \fR\&\f(CWpfactor\fR is used to assign jobs. The theory here is that above \fR\&\f(CWvmaxexec\fR jobs, the hosts are fully saturated, and the load average is a better indicator than the simple number of jobs running in a job queue of where
to send the next job.
.PP
Thus, under lightly-loaded situations, the system routes jobs around hosts 
already running jobs in this job queue. In more heavily loaded situations,
load-averages and \fR\&\f(CWpfactor\fRs are used in determining where to run jobs. 
.PP
Additional options in \fR\&\f(CWprofile\fR
.PP
.IP \fR\&\f(CWexec\fR\ 
on, off, or drain. Drain drains running jobs.
.IP
.IP \fR\&\f(CWminfree\fR\ 
disk space on specified device must be at least this free.
.IP
.IP \fR\&\f(CWmaxfree\fR\ 
maximum number of jobs allowed to run in this queue.
.IP
.IP \fR\&\f(CWloadsched\fR\ 
1 minute load average must be below this value to launch new jobs.
.IP
.IP \fR\&\f(CWloadstop\fR\ 
if 1 minute load average exceeds this, jobs in this queue are suspended until it drops again.
.IP
.IP \fR\&\f(CWtimesched\fR\ 
Jobs are only scheduled during these times
.IP
.IP \fR\&\f(CWtimestop\fR\ 
Jobs running will be suspended outside of these times
.IP
.IP \fR\&\f(CWnice\fR\ 
Running jobs are at least at this nice value
.IP
.IP \fR\&\f(CWrlimitcpu\fR\ 
maximum cpu time by a job in this queue
.IP
.IP \fR\&\f(CWrlimitdata\fR\ 
maximum data memory size by a job
.IP
.IP \fR\&\f(CWrlimitstack\fR\ 
maximum stack size
.IP
.IP \fR\&\f(CWrlimitfsize\fR\ 
maximum fsize
.IP
.IP \fR\&\f(CWrlimitrss\fR\ 
maximum resident portion size.
.IP
.IP \fR\&\f(CWrlimitcore\fR\ 
maximum size of core dump
.IP
.PP
These options, if present, will only override the
user's values (via queue) for these limits if they are lower
than what the user has set (or larger in the case of \fR\&\f(CWnice\fR).
.SH FILES
These are the default file paths. PREFIX is typically '/usr/local/bin'.
.PP
.nf
PREFIX/share/qhostsfile		Host Access Control List File
PREFIX/com/queue			spool directory
PREFIX/local/com/queue/now	spool directory for immediate execution
PREFIX/com/queue/wait		spool directory for the '-q' shorthand
SPOOLDIR/profile			control file for the SPOOLDIR job queue
PREFIX/com/queue/now/profile	control file for immediate jobs
PREFIX/var/queue_pid_hostname	temporary file
.Sp
.fi
.SH COPYING
Copyright
.if t \(co
1998-2000 W. G. Krebs \<wkrebs@gnu.org\>
.PP
Permission is granted to make and distribute verbatim copies of
this manpage provided the copyright notice and this permission notice
are preserved on all copies.
.SH BUGS
Bug reports to \<bug-queue@gnu.org\>
.SH AUTHORS
W. G. Krebs \<wkrebs@gnu.org\> is the primary author of GNU Queue.
.PP
See Acknowledgements file for a complete list of contributors.