File: sge_pe.5

package info (click to toggle)
gridengine 6.2-4
  • links: PTS, VCS
  • area: main
  • in suites: lenny
  • size: 51,532 kB
  • ctags: 51,172
  • sloc: ansic: 418,155; java: 37,080; sh: 22,593; jsp: 7,699; makefile: 5,292; csh: 4,244; xml: 2,901; cpp: 2,086; perl: 1,895; tcl: 1,188; lisp: 669; ruby: 642; yacc: 393; lex: 266
file content (345 lines) | stat: -rw-r--r-- 12,122 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
'\" t
.\"___INFO__MARK_BEGIN__
.\"
.\" Copyright: 2004 by Sun Microsystems, Inc.
.\"
.\"___INFO__MARK_END__
.\" $RCSfile$     Last Update: $Date$     Revision: $Revision$
.\"
.\"
.\" Some handy macro definitions [from Tom Christensen's man(1) manual page].
.\"
.de SB		\" small and bold
.if !"\\$1"" \\s-2\\fB\&\\$1\\s0\\fR\\$2 \\$3 \\$4 \\$5
..
.\"
.de T		\" switch to typewriter font
.ft CW		\" probably want CW if you don't have TA font
..
.\"
.de TY		\" put $1 in typewriter font
.if t .T
.if n ``\c
\\$1\c
.if t .ft P
.if n \&''\c
\\$2
..
.\"
.de M		\" man page reference
\\fI\\$1\\fR\\|(\\$2)\\$3
..
.TH xxQS_NAME_Sxx_PE 5 "$Date$" "xxRELxx" "xxQS_NAMExx File Formats"
.\"
.SH NAME
xxqs_name_sxx_pe \- xxQS_NAMExx parallel environment configuration file format
.\"
.\"
.SH DESCRIPTION
Parallel environments are parallel programming and runtime environments
allowing for the execution of shared memory or distributed memory
parallelized applications. Parallel environments usually require some
kind of setup to be operational before starting parallel applications.
Examples for common parallel environments are shared memory parallel
operating systems and the distributed memory environments Parallel Virtual
Machine (PVM) or Message Passing Interface (MPI).
.PP
.I xxqs_name_sxx_pe
allows for the definition of interfaces to arbitrary parallel environments.
Once a parallel environment is defined or modified with the \fB\-ap\fP or
\fB\-mp\fP options to
.M qconf 1
and linked with one or more queues via \fIpe_list\fP in 
.M queue_conf 5
the environment can be requested for a job via the \fB\-pe\fP switch
to
.M qsub 1
together with a request of a range for the number of parallel process
to be allocated by the job. Additional \fB\-l\fP options may be used
to specify the job requirement to further detail.
.PP
Note, xxQS_NAMExx allows backslashes (\\) be used to escape newline
(\\newline) characters. The backslash and the newline are replaced with a
space (" ") character before any interpretation.
.\"
.\"
.SH FORMAT
The format of a
.I xxqs_name_sxx_pe
file is defined as follows:
.\"
.\"
.SS "\fBpe_name\fP"
The name of the parallel environment as defined for \fIpe_name\fP in
.M sge_types 1 . 
To be used in the
.M qsub 1
\fB\-pe\fP switch.
.\"
.\"
.SS "\fBslots\fP"
The number of parallel processes being allowed to run in total under the
parallel environment concurrently.
Type is number, valid values are 0 to 9999999.
.\"
.\"
.SS "\fBuser_lists\fP"
A comma separated list of user access list names (see
.M access_list 5 ).
Each user contained in at least one of the enlisted access lists has
access to the parallel environment. If the \fBuser_lists\fP parameter is set to
NONE (the default) any user has access being not explicitly excluded
via the \fBxuser_lists\fP parameter described below.
If a user is contained both in an access list enlisted in \fBxuser_lists\fP
and \fBuser_lists\fP the user is denied access to the parallel environment.
.\"
.\"
.SS "\fBxuser_lists\fP"
The \fBxuser_lists\fP parameter contains a comma separated list of so called
user access lists as described in
.M access_list 5 .
Each user contained in at least one of the enlisted access lists is not
allowed to access the parallel environment. If the \fBxuser_lists\fP
parameter is set to NONE (the default) any user has access. If a user
is contained both in an access list enlisted in \fBxuser_lists\fP and
\fBuser_lists\fP the user is denied access to the parallel environment.
.\"
.\"
.SS "\fBstart_proc_args\fP"
The invocation command line of a start-up procedure for the parallel
environment. The start-up procedure is invoked by
.M xxqs_name_sxx_shepherd 8
prior to executing the job script. Its purpose is to setup the
parallel environment correspondingly to its needs.
An optional prefix "user@" specifies the user under which this 
procedure is to be started.
The standard output of the start-up
procedure is redirected to the file \fIREQNAME\fP.po\fIJID\fP in the
job's working 
directory (see
.M qsub 1 ),
with \fIREQNAME\fP being the name of the job as 
displayed by
.M qstat 1
and \fIJID\fP being the job's identification number.
Likewise, 
the standard error output is redirected to \fIREQNAME\fP.pe\fIJID\fP
.br
The following special
variables being expanded at runtime can be used (besides any other
strings which have to be interpreted by the start and stop procedures)
to constitute a command line:
.IP "\fI$pe_hostfile\fP"
The pathname of a file containing
a detailed description of the layout of the parallel environment to be
setup by the start-up procedure. Each line of the file refers to a host
on which parallel processes are to be run. The first entry of each line
denotes the hostname, the second entry the number of parallel processes
to be run on the host, the third entry the name of the queue, and the
fourth entry a processor range to be used in case of a multiprocessor
machine.
.IP "\fI$host\fP"
The name of the host on which the start-up or stop procedures are
started.
.IP "\fI$job_owner\fP"
The user name of the job owner.
.IP "\fI$job_id\fP"
xxQS_NAMExx's unique job identification number.
.IP "\fI$job_name\fP"
The name of the job.
.IP "\fI$pe\fP"
The name of the parallel environment in use.
.IP "\fI$pe_slots\fP"
Number of slots granted for the job.
.IP "\fI$processors\fP"
The \fBprocessors\fP string as contained in the queue configuration
(see
.M queue_conf 5 )
of the master queue (the queue in which the start-up and stop procedures
are started).
.IP "\fI$queue\fP"
The cluster queue of the master queue instance.
.\"
.\"
.SS "\fBstop_proc_args\fP"
The invocation command line of a shutdown procedure for the parallel
environment. The shutdown procedure is invoked by
.M xxqs_name_sxx_shepherd 8
after the job script has finished. Its purpose is to stop the
parallel environment and to remove it from all participating
systems.
An optional prefix "user@" specifies the user under which this 
procedure is to be started.
The standard output of the stop
procedure is also redirected to the file \fIREQNAME\fP.po\fIJID\fP in the
job's working 
directory (see
.M qsub 1 ),
with \fIREQNAME\fP being the name of the job as 
displayed by
.M qstat 1
and \fIJID\fP being the job's identification number.
Likewise, 
the standard error output is redirected to \fIREQNAME\fP.pe\fIJID\fP
.br
The same special variables as for \fBstart_proc_args\fP
can be used to constitute a command line.
.\"
.\"
.SS "\fBallocation_rule\fP"
The allocation rule is interpreted by the scheduler thread
and helps the scheduler to decide how to distribute parallel
processes among the available machines. If, for instance,
a parallel environment is built for shared memory applications
only, all parallel processes have to be assigned to a single
machine, no matter how much suitable machines are available.
If, however, the parallel environment follows the
distributed memory paradigm, an even distribution of processes
among machines may be favorable.
.br
The current version of the scheduler only understands the
following allocation rules:
.IP "\fB<int>\fP:" 1i
An integer number fixing the number of processes per
host. If the number is 1, all processes have to reside
on different hosts. If the special denominator
.B $pe_slots
is used, the full range of processes as specified with the
.M qsub 1
\fB\-pe\fP switch has to be allocated on a single host
(no matter which value belonging to the range is finally
chosen for the job to be allocated).
.IP "\fB$fill_up\fP:" 1i
Starting from the best suitable host/queue, all available slots are 
allocated. Further hosts and queues are "filled up" as long as a job still 
requires slots for parallel tasks.
.IP "\fB$round_robin\fP:" 1i
From all suitable hosts a single slot is allocated until all tasks 
requested by the parallel job are dispatched. If more tasks are requested 
than suitable hosts are found, allocation starts again from the first host. 
The allocation scheme walks through suitable hosts in a best-suitable-first 
order.
.\"
.\"
.SS "\fBcontrol_slaves\fP"
This parameter can be set to TRUE or FALSE (the default). It indicates 
whether xxQS_NAMExx is the creator of the slave tasks of a parallel application
via 
.M xxqs_name_sxx_execd 8
and
.M xxqs_name_sxx_shepherd 8
and thus has full control over all 
processes in a parallel application, which enables capabilities such as 
resource limitation and correct accounting. However, to gain control over
the 
slave tasks of a parallel application, a sophisticated PE interface is
required, 
which works closely together with xxQS_NAMExx facilities. Such PE interfaces are 
available through your local xxQS_NAMExx support office.
.sp 1
Please set the control_slaves parameter to false for all other PE
interfaces.
.\"
.\"
.SS "\fBjob_is_first_task\fP"
This parameter is only checked if
.B control_slaves
(see above) is set to TRUE 
and thus xxQS_NAMExx is the creator of the slave tasks of a parallel 
application via
.M xxqs_name_sxx_execd 8
and
.M xxqs_name_sxx_shepherd 8 .
In this case, a 
sophisticated PE interface is required closely coupling the parallel 
environment and xxQS_NAMExx. The documentation accompanying such 
PE interfaces will recommend the setting for \fBjob_is_first_task\fP.
.PP
The
.B job_is_first_task
parameter can be set to TRUE or FALSE. A value of 
TRUE indicates that the xxQS_NAMExx job script already contains one of 
the tasks of the parallel application, while a value of FALSE indicates
that the 
job script (and its child processes) is not part of the parallel program.
.\"
.\"

.SS "\fBurgency_slots\fP"
For pending jobs with a slot range PE request the number of slots 
is not determined. This setting specifies the method to be used by 
xxQS_NAMExx to assess the number of slots such jobs might finally
get.
.PP
The assumed slot allocation has a meaning when determining the 
resource-request-based priority contribution for numeric resources
as described in
.M sge_priority 5 
and is displayed when 
.M qstat 1 
is run without \fB\-g t\fP option.
.PP
The following methods are supported:
.IP "\fB<int>\fP:" 1i
The specified integer number is directly used as prospective slot amount.
.IP "\fBmin\fP:" 1i
The slot range minimum is used as prospective slot amount. If no 
lower bound is specified with the range 1 is assumed.
.IP "\fBmax\fP:" 1i
The of the slot range maximum is used as prospective slot amount. 
If no upper bound is specified with the range the absolute maximum 
possible due to the PE's \fBslots\fP setting is assumed.
.IP "\fBavg\fP:" 1i
The average of all numbers occurring within the job's PE range 
request is assumed.
.\"
.\"
.SS "\fBaccounting_summary\fP"
This parameter is only checked if
.B control_slaves
(see above) is set to TRUE 
and thus xxQS_NAMExx is the creator of the slave tasks of a parallel 
application via
.M xxqs_name_sxx_execd 8
and
.M xxqs_name_sxx_shepherd 8 .
In this case, accounting information is available for every single
slave task started by xxQS_NAMExx.
.PP
The
.B accounting_summary
parameter can be set to TRUE or FALSE. A value of 
TRUE indicates that only a single accounting record is written to the
.M accounting 5
file, containing the accounting summary of the whole job including all slave tasks,
while a value of FALSE indicates an individual
.M accounting 5
record is written for every slave task, as well as for the master task.
.\"
.\"
.SH RESTRICTIONS
\fBNote\fP, that the functionality of the start-up, shutdown
and signaling procedures remains the full responsibility
of the administrator configuring the parallel environment.
xxQS_NAMExx will just invoke these procedures and evaluate their
exit status. If the procedures do not perform their tasks
properly or if the parallel environment or the parallel
application behave unexpectedly, xxQS_NAMExx has no means to detect
this.
.\"
.\"
.SH "SEE ALSO"
.M xxqs_name_sxx_intro 1 ,
.M xxqs_name_sxx__types 1 ,
.M qconf 1 ,
.M qdel 1 ,
.M qmod 1 ,
.M qsub 1 ,
.M access_list 5 ,
.M xxqs_name_sxx_qmaster 8 ,
.M xxqs_name_sxx_shepherd 8 .
.\"
.SH "COPYRIGHT"
See
.M xxqs_name_sxx_intro 1
for a full statement of rights and permissions.