File: sge_pe.html

package info (click to toggle)
gridengine 6.2u5-7.1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 57,216 kB
  • sloc: ansic: 438,030; java: 66,252; sh: 36,399; jsp: 7,757; xml: 5,850; makefile: 5,520; csh: 4,571; cpp: 2,848; perl: 2,401; tcl: 692; lisp: 669; yacc: 668; ruby: 642; lex: 344
file content (325 lines) | stat: -rw-r--r-- 15,785 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
<HTML>
<BODY BGCOLOR=white>
<PRE>
<!-- Manpage converted by man2html 3.0.1 -->
NAME
     sge_pe - Sun Grid Engine parallel environment  configuration
     file format

DESCRIPTION
     Parallel environments are parallel programming  and  runtime
     environments  allowing for the execution of shared memory or
     distributed  memory  parallelized   applications.   Parallel
     environments usually require some kind of setup to be opera-
     tional before starting parallel applications.  Examples  for
     common  parallel  environments  are  shared  memory parallel
     operating systems and the  distributed  memory  environments
     Parallel  Virtual Machine (PVM) or Message Passing Interface
     (MPI).

     <I>sge</I>_<I>pe</I> allows for the definition of interfaces to  arbitrary
     parallel  environments.   Once  a  parallel  environment  is
     defined or modified with the -ap or -mp options to  <B><A HREF="../htmlman1/qconf.html?pathrev=V62u5_TAG">qconf(1)</A></B>
     and   linked   with  one  or  more  queues  via  <I>pe</I>_<I>list</I>  in
     <B><A HREF="../htmlman5/queue_conf.html?pathrev=V62u5_TAG">queue_conf(5)</A></B> the environment can be requested for a job via
     the -pe switch to <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> together with a request of a range
     for the number of parallel processes to be allocated by  the
     job.  Additional  -l  options may be used to specify the job
     requirement to further detail.

     Note, Sun Grid Engine allows  backslashes  (\)  be  used  to
     escape  newline (\newline) characters. The backslash and the
     newline are replaced with a space (" ") character before any
     interpretation.

FORMAT
     The format of a <I>sge</I>_<I>pe</I> file is defined as follows:

  pe_name
     The name of the parallel environment as defined for  <I>pe</I>_<I>name</I>
     in <B><A HREF="../htmlman1/sge_types.html?pathrev=V62u5_TAG">sge_types(1)</A></B>.  To be used in the <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> -pe switch.

  slots
     The number of parallel processes being  allowed  to  run  in
     total  under the parallel environment concurrently.  Type is
     number, valid values are 0 to 9999999.

  user_lists
     A comma separated  list  of  user  access  list  names  (see
     <B><A HREF="../htmlman5/access_list.html?pathrev=V62u5_TAG">access_list(5)</A></B>).  Each user contained in at least one of the
     enlisted access lists has access to  the  parallel  environ-
     ment.  If  the  user_lists  parameter  is  set  to NONE (the
     default) any user has access being not  explicitly  excluded
     via the xuser_lists parameter described below.  If a user is
     contained both in an access list enlisted in xuser_lists and
     user_lists  the  user  is  denied  access  to  the  parallel
     environment.

  xuser_lists
     The xuser_lists parameter contains a comma separated list of
     so  called user access lists as described in <B><A HREF="../htmlman5/access_list.html?pathrev=V62u5_TAG">access_list(5)</A></B>.
     Each user contained in at least one of the  enlisted  access
     lists  is not allowed to access the parallel environment. If
     the xuser_lists parameter is set to NONE (the  default)  any
     user  has  access.  If a user is contained both in an access
     list enlisted in xuser_lists  and  user_lists  the  user  is
     denied access to the parallel environment.

  start_proc_args
     The invocation command line of a start-up procedure for  the
     parallel  environment.  The start-up procedure is invoked by
     <B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B> prior to executing the job script. Its  pur-
     pose is to setup the parallel environment correspondingly to
     its needs.  An optional prefix "user@"  specifies  the  user
     under  which  this procedure is to be started.  The standard
     output of the start-up procedure is redirected to  the  file
     <I>REQNAME</I>.po<I>JID</I>  in the job's working directory (see <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B>),
     with <I>REQNAME</I> being the name  of  the  job  as  displayed  by
     <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B>  and  <I>JID</I>  being  the  job's identification number.
     Likewise,  the  standard  error  output  is  redirected   to
     <I>REQNAME</I>.pe<I>JID</I>
     The following special variables being  expanded  at  runtime
     can  be  used  (besides  any  other strings which have to be
     interpreted by the start and stop procedures) to  constitute
     a command line:

     $<I>pe</I>_<I>hostfile</I>
          The pathname of a file containing a  detailed  descrip-
          tion  of  the  layout of the parallel environment to be
          setup by the start-up procedure. Each line of the  file
          refers  to a host on which parallel processes are to be
          run. The first entry of each line denotes the hostname,
          the second entry the number of parallel processes to be
          run on the host, the third entry the name of the queue,
          and  the  fourth  entry a processor range to be used in
          case of a multiprocessor machine.

     $<I>host</I>
          The name of the host on which the start-up or stop pro-
          cedures are started.

     $<I>job</I>_<I>owner</I>
          The user name of the job owner.

     $<I>job</I>_<I>id</I>
          Sun Grid Engine's unique job identification number.

     $<I>job</I>_<I>name</I>
          The name of the job.

     $<I>pe</I>  The name of the parallel environment in use.

     $<I>pe</I>_<I>slots</I>
          Number of slots granted for the job.

     $<I>processors</I>
          The processors string as contained in the queue  confi-
          guration  (see  <B><A HREF="../htmlman5/queue_conf.html?pathrev=V62u5_TAG">queue_conf(5)</A></B>) of the master queue (the
          queue in which the start-up  and  stop  procedures  are
          started).

     $<I>queue</I>
          The cluster queue of the master queue instance.

  stop_proc_args
     The invocation command line of a shutdown procedure for  the
     parallel  environment.  The shutdown procedure is invoked by
     <B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B> after the job script has finished. Its  pur-
     pose  is  to  stop the parallel environment and to remove it
     from all participating systems.  An optional prefix  "user@"
     specifies  the  user  under  which  this  procedure is to be
     started.  The standard output of the stop procedure is  also
     redirected  to  the  file <I>REQNAME</I>.po<I>JID</I> in the job's working
     directory (see <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B>), with <I>REQNAME</I> being the name of  the
     job  as  displayed by <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B> and <I>JID</I> being the job's iden-
     tification number.  Likewise, the standard error  output  is
     redirected to <I>REQNAME</I>.pe<I>JID</I>
     The same special variables as  for  start_proc_args  can  be
     used to constitute a command line.

  allocation_rule
     The allocation rule is interpreted by the  scheduler  thread
     and helps the scheduler to decide how to distribute parallel
     processes among the available machines. If, for instance,  a
     parallel environment is built for shared memory applications
     only, all parallel processes have to be assigned to a single
     machine, no matter how much suitable machines are available.
     If, however, the parallel environment  follows  the  distri-
     buted  memory  paradigm,  an  even distribution of processes
     among machines may be favorable.
     The current version of the scheduler  only  understands  the
     following allocation rules:

     &lt;int&gt;:    An integer number fixing the number  of  processes
               per  host.  If the number is 1, all processes have
               to reside  on  different  hosts.  If  the  special
               denominator  $pe_slots  is used, the full range of
               processes as specified with the <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B> -pe switch
               has  to  be  allocated on a single host (no matter
               which value belonging  to  the  range  is  finally
               chosen for the job to be allocated).

     $fill_up: Starting from the best  suitable  host/queue,  all
               available  slots  are allocated. Further hosts and
               queues are "filled up" as  long  as  a  job  still
               requires slots for parallel tasks.

     $round_robin:
               From all suitable hosts a single slot is allocated
               until  all tasks requested by the parallel job are
               dispatched. If more tasks are requested than suit-
               able hosts are found, allocation starts again from
               the  first  host.  The  allocation  scheme   walks
               through  suitable  hosts  in a best-suitable-first
               order.

  control_slaves
     This parameter can be set to TRUE or FALSE (the default). It
     indicates  whether  Sun  Grid  Engine  is the creator of the
     slave tasks of a parallel application via  <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>  and
     <B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B> and thus has full control over all processes
     in a parallel application, which enables  capabilities  such
     as  resource  limitation and correct accounting. However, to
     gain control over the slave tasks of a parallel application,
     a  sophisticated  PE  interface  is  required,  which  works
     closely together with Sun Grid Engine  facilities.  Such  PE
     interfaces  are available through your local Sun Grid Engine
     support office.

     Please set the control_slaves parameter  to  false  for  all
     other PE interfaces.

  job_is_first_task
     The job_is_first_task parameter can be set to TRUE or FALSE.
     A  value  of  TRUE  indicates  that  the Sun Grid Engine job
     script already contains one of the  tasks  of  the  parallel
     application (the number of slots reserved for the job is the
     number of slots requested with  the  -pe  switch),  while  a
     value  of FALSE indicates that the job script (and its child
     processes) is not part of the parallel program  (the  number
     of  slots  reserved  for  the  job  is  the  number of slots
     requested with the -pe switch + 1).

     If    wallclock    accounting    is    used    (execd_params
     ACCT_RESERVED_USAGE  and/or  SHARETREE_RESERVED_USAGE set to
     TRUE)   and   control_slaves   is   set   to   FALSE,    the
     job_is_first_task  parameter  influences  the accounting for
     the job:  A value of TRUE means that accounting for cpu  and
     requested  memory  gets  multiplied  by  the number of slots
     requested with the -pe switch, if job_is_first_task  is  set
     to  FALSE,  the  accounting  information  gets multiplied by
     number of slots + 1.


  urgency_slots
     For pending jobs with a slot range PE request the number  of
     slots  is  not determined. This setting specifies the method
     to be used by Sun Grid Engine to assess the number of  slots
     such jobs might finally get.

     The assumed slot allocation has a meaning  when  determining
     the resource-request-based priority contribution for numeric
     resources as described in <B><A HREF="../htmlman5/sge_priority.html?pathrev=V62u5_TAG">sge_priority(5)</A></B> and  is  displayed
     when <B><A HREF="../htmlman1/qstat.html?pathrev=V62u5_TAG">qstat(1)</A></B> is run without -g t option.

     The following methods are supported:

     &lt;int&gt;:    The specified integer number is directly  used  as
               prospective slot amount.

     min:      The slot range minimum is used as prospective slot
               amount.  If  no  lower bound is specified with the
               range 1 is assumed.

     max:      The of the slot range maximum is used as  prospec-
               tive  slot  amount. If no upper bound is specified
               with the range the absolute maximum  possible  due
               to the PE's slots setting is assumed.

     avg:      The average of all numbers  occurring  within  the
               job's PE range request is assumed.

  accounting_summary
     This parameter is only checked if control_slaves (see above)
     is  set  to  TRUE and thus Sun Grid Engine is the creator of
     the slave tasks of a parallel application  via  <B><A HREF="../htmlman8/sge_execd.html?pathrev=V62u5_TAG">sge_execd(8)</A></B>
     and  <B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B>.   In this case, accounting information
     is available for every single slave task started by Sun Grid
     Engine.

     The accounting_summary parameter  can  be  set  to  TRUE  or
     FALSE. A value of TRUE indicates that only a single account-
     ing record is written to the <B><A HREF="../htmlman5/accounting.html?pathrev=V62u5_TAG">accounting(5)</A></B> file,  containing
     the  accounting summary of the whole job including all slave
     tasks, while  a  value  of  FALSE  indicates  an  individual
     <B><A HREF="../htmlman5/accounting.html?pathrev=V62u5_TAG">accounting(5)</A></B>  record  is  written  for every slave task, as
     well as for the master task.
     Note:   When   running   tightly   integrated   jobs    with
     <I>SHARETREE</I>_<I>RESERVED</I>_<I>USAGE</I>     set,     and     with    having
     <I>accounting</I>_<I>summary</I>  enabled  in  the  parallel  environment,
     reserved  usage  will only be reported by the master task of
     the parallel job.  No per parallel task usage  records  will
     be  sent  from  execd  to  qmaster,  which can significantly
     reduce load on qmaster when running large tightly integrated
     parallel jobs.

RESTRICTIONS
     Note, that the functionality of the start-up,  shutdown  and
     signaling  procedures remains the full responsibility of the
     administrator configuring  the  parallel  environment.   Sun
     Grid  Engine  will just invoke these procedures and evaluate
     their exit status. If the procedures do  not  perform  their
     tasks  properly or if the parallel environment or the paral-
     lel application behave unexpectedly, Sun Grid Engine has  no
     means to detect this.

SEE ALSO
     <B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B>,  <B><A HREF="../htmlman1/sge__types.html?pathrev=V62u5_TAG">sge__types(1)</A></B>,  <B><A HREF="../htmlman1/qconf.html?pathrev=V62u5_TAG">qconf(1)</A></B>,  <B><A HREF="../htmlman1/qdel.html?pathrev=V62u5_TAG">qdel(1)</A></B>,  <B><A HREF="../htmlman1/qmod.html?pathrev=V62u5_TAG">qmod(1)</A></B>,
     <B><A HREF="../htmlman1/qsub.html?pathrev=V62u5_TAG">qsub(1)</A></B>, <B><A HREF="../htmlman5/access_list.html?pathrev=V62u5_TAG">access_list(5)</A></B>, <B><A HREF="../htmlman8/sge_qmaster.html?pathrev=V62u5_TAG">sge_qmaster(8)</A></B>, <B><A HREF="../htmlman8/sge_shepherd.html?pathrev=V62u5_TAG">sge_shepherd(8)</A></B>.

COPYRIGHT
     See <B><A HREF="../htmlman1/sge_intro.html?pathrev=V62u5_TAG">sge_intro(1)</A></B> for a full statement of rights and  permis-
     sions.



































</PRE>
<HR>
<ADDRESS>
Man(1) output converted with
<a href="http://www.oac.uci.edu/indiv/ehood/man2html.html">man2html</a>
</ADDRESS>
</BODY>
</HTML>