File: gridengine.html

package info (click to toggle)
openmpi 5.0.8-10
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 201,692 kB
  • sloc: ansic: 613,078; makefile: 42,351; sh: 11,194; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,179; python: 1,859; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (422 lines) | stat: -rw-r--r-- 31,798 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
<!DOCTYPE html>
<html class="writer-html5" lang="en">
<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>10.10. Launching with Grid Engine &mdash; Open MPI 5.0.8 documentation</title>
      <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
      <link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />

  
  <!--[if lt IE 9]>
    <script src="../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
        <script src="../_static/jquery.js"></script>
        <script src="../_static/underscore.js"></script>
        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
        <script src="../_static/doctools.js"></script>
        <script src="../_static/sphinx_highlight.js"></script>
    <script src="../_static/js/theme.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="10.11. Unusual jobs" href="unusual.html" />
    <link rel="prev" title="10.9. Launching with PBS / Torque" href="tm.html" /> 
</head>

<body class="wy-body-for-nav"> 
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >

          
          
          <a href="../index.html" class="icon icon-home">
            Open MPI
          </a>
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>
        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
              <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../quickstart.html">1. Quick start</a></li>
<li class="toctree-l1"><a class="reference internal" href="../getting-help.html">2. Getting help</a></li>
<li class="toctree-l1"><a class="reference internal" href="../release-notes/index.html">3. Release notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="../installing-open-mpi/index.html">4. Building and installing Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../features/index.html">5. Open MPI-specific features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../validate.html">6. Validating your installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../version-numbering.html">7. Version numbers and compatibility</a></li>
<li class="toctree-l1"><a class="reference internal" href="../mca.html">8. The Modular Component Architecture (MCA)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../building-apps/index.html">9. Building MPI applications</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">10. Launching MPI applications</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="quickstart.html">10.1. Quick start: Launching MPI applications</a></li>
<li class="toctree-l2"><a class="reference internal" href="prerequisites.html">10.2. Prerequisites</a></li>
<li class="toctree-l2"><a class="reference internal" href="pmix-and-prrte.html">10.3. The role of PMIx and PRRTE</a></li>
<li class="toctree-l2"><a class="reference internal" href="scheduling.html">10.4. Scheduling processes across hosts</a></li>
<li class="toctree-l2"><a class="reference internal" href="localhost.html">10.5. Launching only on the local node</a></li>
<li class="toctree-l2"><a class="reference internal" href="ssh.html">10.6. Launching with SSH</a></li>
<li class="toctree-l2"><a class="reference internal" href="slurm.html">10.7. Launching with Slurm</a></li>
<li class="toctree-l2"><a class="reference internal" href="lsf.html">10.8. Launching with LSF</a></li>
<li class="toctree-l2"><a class="reference internal" href="tm.html">10.9. Launching with PBS / Torque</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">10.10. Launching with Grid Engine</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#verify-grid-engine-support">10.10.1. Verify Grid Engine support</a></li>
<li class="toctree-l3"><a class="reference internal" href="#launching">10.10.2. Launching</a></li>
<li class="toctree-l3"><a class="reference internal" href="#grid-engine-tight-integration-support-of-the-qsub-notify-flag">10.10.3. Grid Engine tight integration support of the <code class="docutils literal notranslate"><span class="pre">qsub</span> <span class="pre">-notify</span></code> flag</a></li>
<li class="toctree-l3"><a class="reference internal" href="#grid-engine-job-suspend-resume-support">10.10.4. Grid Engine job suspend / resume support</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="unusual.html">10.11. Unusual jobs</a></li>
<li class="toctree-l2"><a class="reference internal" href="troubleshooting.html">10.12. Troubleshooting</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../tuning-apps/index.html">11. Run-time operation and tuning MPI applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../app-debug/index.html">12. Debugging Open MPI Parallel Applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/index.html">13. Developer’s guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="../contributing.html">14. Contributing to Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../license/index.html">15. License</a></li>
<li class="toctree-l1"><a class="reference internal" href="../history.html">16. History of Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../man-openmpi/index.html">17. Open MPI manual pages</a></li>
<li class="toctree-l1"><a class="reference internal" href="../man-openshmem/index.html">18. OpenSHMEM manual pages</a></li>
</ul>

        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../index.html">Open MPI</a>
      </nav>

      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
          <li class="breadcrumb-item"><a href="index.html"><span class="section-number">10. </span>Launching MPI applications</a></li>
      <li class="breadcrumb-item active"><span class="section-number">10.10. </span>Launching with Grid Engine</li>
      <li class="wy-breadcrumbs-aside">
            <a href="../_sources/launching-apps/gridengine.rst.txt" rel="nofollow"> View page source</a>
      </li>
  </ul>
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
             
  <style>
.wy-table-responsive table td,.wy-table-responsive table th{white-space:normal}
</style><div class="section" id="launching-with-grid-engine">
<h1><span class="section-number">10.10. </span>Launching with Grid Engine<a class="headerlink" href="#launching-with-grid-engine" title="Permalink to this heading"></a></h1>
<p>Open MPI supports the family of run-time schedulers including the Sun
Grid Engine (SGE), Oracle Grid Engine (OGE), Grid Engine (GE), Son of
Grid Engine, and others.</p>
<p>This documentation will collectively refer to all of them as “Grid
Engine”, unless a referring to a specific flavor of the Grid Engine
family.</p>
<div class="section" id="verify-grid-engine-support">
<h2><span class="section-number">10.10.1. </span>Verify Grid Engine support<a class="headerlink" href="#verify-grid-engine-support" title="Permalink to this heading"></a></h2>
<div class="admonition important">
<p class="admonition-title">Important</p>
<p>To build Grid Engine support in Open MPI, you will need
to explicitly request the SGE support with the <code class="docutils literal notranslate"><span class="pre">--with-sge</span></code>
command line switch to Open MPI’s <code class="docutils literal notranslate"><span class="pre">configure</span></code> script.</p>
</div>
<p>To verify if support for Grid Engine is configured into your Open MPI
installation, run <code class="docutils literal notranslate"><span class="pre">prte_info</span></code> as shown below and look for
<code class="docutils literal notranslate"><span class="pre">gridengine</span></code>.</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shell$ prte_info | grep gridengine
              MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>PRRTE is the software layer that provides run-time
environment support to Open MPI.  Open MPI typically hides most
PMIx and PRRTE details from the end user, but this is one place
that Open MPI is unable to hide the fact that PRRTE provides this
functionality, not Open MPI.  Hence, users need to use the
<code class="docutils literal notranslate"><span class="pre">prte_info</span></code> command to check for Grid Engine support (not
<code class="docutils literal notranslate"><span class="pre">ompi_info</span></code>).</p>
</div>
</div>
<div class="section" id="launching">
<h2><span class="section-number">10.10.2. </span>Launching<a class="headerlink" href="#launching" title="Permalink to this heading"></a></h2>
<p>When Grid Engine support is included, Open MPI will automatically
detect when it is running inside SGE and will just “do the Right
Thing.”</p>
<p>Specifically, if you execute an <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> command in a Grid Engine
job, it will automatically use the Grid Engine mechanisms to launch
and kill processes.  There is no need to specify what nodes to run on
— Open MPI will obtain this information directly from Grid
Engine and default to a number of processes equal to the slot count
specified.  For example, this will run 4 MPI processes on the nodes
that were allocated by Grid Engine:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="c1"># Get the environment variables for Grid Engine</span>

<span class="c1"># (Assuming Grid Engine is installed at /opt/sge and $Grid</span>
<span class="c1"># Engine_CELL is &#39;default&#39; in your environment)</span>
shell$<span class="w"> </span>.<span class="w"> </span>/opt/sge/default/common/settings.sh

<span class="c1"># Allocate an Grid Engine interactive job with 4 slots from a</span>
<span class="c1"># parallel environment (PE) named &#39;ompi&#39; and run a 4-process Open</span>
<span class="c1"># MPI job</span>
shell$<span class="w"> </span>qrsh<span class="w"> </span>-pe<span class="w"> </span>ompi<span class="w"> </span><span class="m">4</span><span class="w"> </span>-b<span class="w"> </span>y<span class="w"> </span>mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">4</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<p>There are also other ways to submit jobs under Grid Engine:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="c1"># Submit a batch job with the &#39;mpirun&#39; command embedded in a script</span>
shell$<span class="w"> </span>qsub<span class="w"> </span>-pe<span class="w"> </span>ompi<span class="w"> </span><span class="m">4</span><span class="w"> </span>my_mpirun_job.csh

<span class="c1"># Submit a Grid Engine and OMPI job and mpirun in one line</span>
shell$<span class="w"> </span>qrsh<span class="w"> </span>-V<span class="w"> </span>-pe<span class="w"> </span>ompi<span class="w"> </span><span class="m">4</span><span class="w"> </span>mpirun<span class="w"> </span>hostname

<span class="c1"># Use qstat(1) to show the status of Grid Engine jobs and queues</span>
shell$<span class="w"> </span>qstat<span class="w"> </span>-f
</pre></div>
</div>
<p>In reference to the setup, be sure you have a Parallel Environment
(PE) defined for submitting parallel jobs. You don’t have to name your
PE “ompi”.  The following example shows a PE named “ompi” that would
look like:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shell$ qconf -sp ompi
   pe_name            ompi
   slots              99999
   user_lists         NONE
   xuser_lists        NONE
   start_proc_args    NONE
   stop_proc_args     NONE
   allocation_rule    $fill_up
   control_slaves     TRUE
   job_is_first_task  FALSE
   urgency_slots      min
   accounting_summary FALSE
   qsort_args         NONE
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p><code class="docutils literal notranslate"><span class="pre">qsort_args</span></code> is necessary with the Son of Grid Engine
distribution, version 8.1.1 and later, and probably only applicable
to it.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>For very old versions of Sun Grid Engine, omit
<code class="docutils literal notranslate"><span class="pre">accounting_summary</span></code> too.</p>
</div>
<p>You may want to alter other parameters, but the important one is
<code class="docutils literal notranslate"><span class="pre">control_slaves</span></code>, specifying that the environment has “tight
integration”.  Note also the lack of a start or stop procedure.  The
tight integration means that mpirun automatically picks up the slot
count to use as a default in place of the <code class="docutils literal notranslate"><span class="pre">-n</span></code> argument, picks up a
host file, spawns remote processes via <code class="docutils literal notranslate"><span class="pre">qrsh</span></code> so that Grid Engine
can control and monitor them, and creates and destroys a per-job
temporary directory (<code class="docutils literal notranslate"><span class="pre">$TMPDIR</span></code>), in which Open MPI’s directory will
be created (by default).</p>
<p>Be sure the queue will make use of the PE that you specified:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shell$ qconf -sq all.q
[...snipped...]
pe_list               make cre ompi
[...snipped...]
</pre></div>
</div>
<p>To determine whether the Grid Engine parallel job is successfully
launched to the remote nodes, you can pass in the MCA parameter
<code class="docutils literal notranslate"><span class="pre">--mca</span> <span class="pre">plm_base_verbose</span> <span class="pre">1</span></code> to <code class="docutils literal notranslate"><span class="pre">mpirun</span></code>.</p>
<p>This will add in a <code class="docutils literal notranslate"><span class="pre">-verbose</span></code> flag to the <code class="docutils literal notranslate"><span class="pre">qrsh</span> <span class="pre">-inherit</span></code> command
that is used to send parallel tasks to the remote Grid Engine
execution hosts. It will show whether the connections to the remote
hosts are established successfully or not.</p>
<div class="admonition error">
<p class="admonition-title">Error</p>
<p>TODO is this site still live?  Doesn’t look like it..  Jeff
emailed Dave Love on 31 Dec 2021 to ask if this is still the
correct URL.</p>
<p>Update March 2022: it doesn’t look like this web site is good any
more.  Perhaps use <a class="reference external" href="https://github.com/grisu48/gridengine">https://github.com/grisu48/gridengine</a> instead…?</p>
</div>
<p>Various Grid Engine documentation with pointers to more is available
at <a class="reference external" href="http://arc.liv.ac.uk/sge/">the Son of GridEngine site</a>, and
configuration instructions can be found at <a class="reference external" href="http://arc.liv.ac.uk/SGE/howto/sge-configs.html">the Son of GridEngine
configuration how-to site</a>.</p>
</div>
<div class="section" id="grid-engine-tight-integration-support-of-the-qsub-notify-flag">
<h2><span class="section-number">10.10.3. </span>Grid Engine tight integration support of the <code class="docutils literal notranslate"><span class="pre">qsub</span> <span class="pre">-notify</span></code> flag<a class="headerlink" href="#grid-engine-tight-integration-support-of-the-qsub-notify-flag" title="Permalink to this heading"></a></h2>
<p>If you are running SGE 6.2 Update 3 or later, then the <code class="docutils literal notranslate"><span class="pre">-notify</span></code>
flag is supported.  If you are running earlier versions, then the
<code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag will not work and using it will cause the job to be
killed.</p>
<p>To use <code class="docutils literal notranslate"><span class="pre">-notify</span></code>, one has to be careful.  First, let us review what
<code class="docutils literal notranslate"><span class="pre">-notify</span></code> does.  Here is an excerpt from the qsub man page for the
<code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag.</p>
<blockquote>
<div><p>The <code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag, when set causes Sun Grid Engine to send
warning signals to a running job prior to sending the signals
themselves. If a SIGSTOP is pending, the job will receive a SIGUSR1
several seconds before the SIGSTOP.  If a SIGKILL is pending, the
job will receive a SIGUSR2 several seconds before the SIGKILL.  The
amount of time delay is controlled by the notify parameter in each
queue configuration.</p>
</div></blockquote>
<p>Let us assume the reason you want to use the <code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag is to
get the SIGUSR1 signal prior to getting the SIGTSTP signal.  Something
like this batch script can be used:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="ch">#! /bin/bash</span>
<span class="c1">#$ -S /bin/bash</span>
<span class="c1">#$ -V</span>
<span class="c1">#$ -cwd</span>
<span class="c1">#$ -N Job1</span>
<span class="c1">#$ -pe ompi 16</span>
<span class="c1">#$ -j y</span>
<span class="c1">#$ -l h_rt=00:20:00</span>
mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">16</span><span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<div class="admonition error">
<p class="admonition-title">Error</p>
<p>Ralph: Does <code class="docutils literal notranslate"><span class="pre">orte_forward_job_control</span></code> still exist?</p>
</div>
<p>However, one has to make one of two changes to this script for things
to work properly.  By default, a SIGUSR1 signal will kill a shell
script.  So we have to make sure that does not happen. Here is one way
to handle it:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="ch">#! /bin/bash</span>
<span class="c1">#$ -S /bin/bash</span>
<span class="c1">#$ -V</span>
<span class="c1">#$ -cwd</span>
<span class="c1">#$ -N Job1</span>
<span class="c1">#$ -pe ompi 16</span>
<span class="c1">#$ -j y</span>
<span class="c1">#$ -l h_rt=00:20:00</span>
<span class="nb">exec</span><span class="w"> </span>mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">16</span><span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<p>Alternatively, one can catch the signals in the script instead of doing
an exec on the mpirun:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="ch">#! /bin/bash</span>
<span class="c1">#$ -S /bin/bash</span>
<span class="c1">#$ -V</span>
<span class="c1">#$ -cwd</span>
<span class="c1">#$ -N Job1</span>
<span class="c1">#$ -pe ompi 16</span>
<span class="c1">#$ -j y</span>
<span class="c1">#$ -l h_rt=00:20:00</span>

<span class="k">function</span><span class="w"> </span>sigusr1handler<span class="o">()</span>
<span class="o">{</span>
<span class="w">    </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SIGUSR1 caught by shell script&quot;</span><span class="w"> </span><span class="m">1</span>&gt;<span class="p">&amp;</span><span class="m">2</span>
<span class="o">}</span>

<span class="k">function</span><span class="w"> </span>sigusr2handler<span class="o">()</span>
<span class="o">{</span>
<span class="w">    </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SIGUSR2 caught by shell script&quot;</span><span class="w"> </span><span class="m">1</span>&gt;<span class="p">&amp;</span><span class="m">2</span>
<span class="o">}</span>

<span class="nb">trap</span><span class="w"> </span>sigusr1handler<span class="w"> </span>SIGUSR1
<span class="nb">trap</span><span class="w"> </span>sigusr2handler<span class="w"> </span>SIGUSR2

mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">16</span><span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
</div>
<div class="section" id="grid-engine-job-suspend-resume-support">
<h2><span class="section-number">10.10.4. </span>Grid Engine job suspend / resume support<a class="headerlink" href="#grid-engine-job-suspend-resume-support" title="Permalink to this heading"></a></h2>
<p>To suspend the job, you send a SIGTSTP (not SIGSTOP) signal to
<code class="docutils literal notranslate"><span class="pre">mpirun</span></code>.  <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> will catch this signal and forward it to the
<code class="docutils literal notranslate"><span class="pre">mpi-hello-world</span></code> as a SIGSTOP signal.  To resume the job, you send
a SIGCONT signal to <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> which will be caught and forwarded to
the <code class="docutils literal notranslate"><span class="pre">mpi-hello-world</span></code>.</p>
<p>By default, this feature is not enabled.  This means that both the
SIGTSTP and SIGCONT signals will simply be consumed by the <code class="docutils literal notranslate"><span class="pre">mpirun</span></code>
process.  To have them forwarded, you have to run the job with <code class="docutils literal notranslate"><span class="pre">--mca</span>
<span class="pre">orte_forward_job_control</span> <span class="pre">1</span></code>.  Here is an example on Solaris:</p>
<div class="admonition error">
<p class="admonition-title">Error</p>
<p>TODO Ralph: does <code class="docutils literal notranslate"><span class="pre">orte_forward_job_control</span></code> still exist?</p>
</div>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>shell$<span class="w"> </span>mpirun<span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>-n<span class="w"> </span><span class="m">2</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<p>In another window, we suspend and continue the job:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>shell$<span class="w"> </span>prstat<span class="w"> </span>-p<span class="w"> </span><span class="m">15301</span>,15303,15305
<span class="w">   </span>PID<span class="w"> </span>USERNAME<span class="w">  </span>SIZE<span class="w">   </span>RSS<span class="w"> </span>STATE<span class="w">  </span>PRI<span class="w"> </span>NICE<span class="w">      </span>TIME<span class="w">  </span>CPU<span class="w"> </span>PROCESS/NLWP
<span class="w"> </span><span class="m">15305</span><span class="w"> </span>rolfv<span class="w">     </span>158M<span class="w">   </span>22M<span class="w"> </span>cpu1<span class="w">     </span><span class="m">0</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:00:21<span class="w"> </span><span class="m">5</span>.9%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15303</span><span class="w"> </span>rolfv<span class="w">     </span>158M<span class="w">   </span>22M<span class="w"> </span>cpu2<span class="w">     </span><span class="m">0</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:00:21<span class="w"> </span><span class="m">5</span>.9%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15301</span><span class="w"> </span>rolfv<span class="w">    </span>8128K<span class="w"> </span>5144K<span class="w"> </span>sleep<span class="w">   </span><span class="m">59</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:00:00<span class="w"> </span><span class="m">0</span>.0%<span class="w"> </span>mpirun/1

shell$<span class="w"> </span><span class="nb">kill</span><span class="w"> </span>-TSTP<span class="w"> </span><span class="m">15301</span>
shell$<span class="w"> </span>prstat<span class="w"> </span>-p<span class="w"> </span><span class="m">15301</span>,15303,15305
<span class="w">   </span>PID<span class="w"> </span>USERNAME<span class="w">  </span>SIZE<span class="w">   </span>RSS<span class="w"> </span>STATE<span class="w">  </span>PRI<span class="w"> </span>NICE<span class="w">      </span>TIME<span class="w">  </span>CPU<span class="w"> </span>PROCESS/NLWP
<span class="w"> </span><span class="m">15303</span><span class="w"> </span>rolfv<span class="w">     </span>158M<span class="w">   </span>22M<span class="w"> </span>stop<span class="w">    </span><span class="m">30</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:01:44<span class="w">  </span><span class="m">21</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15305</span><span class="w"> </span>rolfv<span class="w">     </span>158M<span class="w">   </span>22M<span class="w"> </span>stop<span class="w">    </span><span class="m">20</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:01:44<span class="w">  </span><span class="m">21</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15301</span><span class="w"> </span>rolfv<span class="w">    </span>8128K<span class="w"> </span>5144K<span class="w"> </span>sleep<span class="w">   </span><span class="m">59</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:00:00<span class="w"> </span><span class="m">0</span>.0%<span class="w"> </span>mpirun/1

shell$<span class="w"> </span><span class="nb">kill</span><span class="w"> </span>-CONT<span class="w"> </span><span class="m">15301</span>
shell$<span class="w"> </span>prstat<span class="w"> </span>-p<span class="w"> </span><span class="m">15301</span>,15303,15305
<span class="w">   </span>PID<span class="w"> </span>USERNAME<span class="w">  </span>SIZE<span class="w">   </span>RSS<span class="w"> </span>STATE<span class="w">  </span>PRI<span class="w"> </span>NICE<span class="w">      </span>TIME<span class="w">  </span>CPU<span class="w"> </span>PROCESS/NLWP
<span class="w"> </span><span class="m">15305</span><span class="w"> </span>rolfv<span class="w">     </span>158M<span class="w">   </span>22M<span class="w"> </span>cpu1<span class="w">     </span><span class="m">0</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:02:06<span class="w">  </span><span class="m">17</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15303</span><span class="w"> </span>rolfv<span class="w">     </span>158M<span class="w">   </span>22M<span class="w"> </span>cpu3<span class="w">     </span><span class="m">0</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:02:06<span class="w">  </span><span class="m">17</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15301</span><span class="w"> </span>rolfv<span class="w">    </span>8128K<span class="w"> </span>5144K<span class="w"> </span>sleep<span class="w">   </span><span class="m">59</span><span class="w">    </span><span class="m">0</span><span class="w">   </span><span class="m">0</span>:00:00<span class="w"> </span><span class="m">0</span>.0%<span class="w"> </span>mpirun/1
</pre></div>
</div>
<p>Note that all this does is stop the <code class="docutils literal notranslate"><span class="pre">mpi-hello-world</span></code> processes.  It
does not, for example, free any pinned memory when the job is in the
suspended state.</p>
<p>To get this to work under the Grid Engine environment, you have to
change the <code class="docutils literal notranslate"><span class="pre">suspend_method</span></code> entry in the queue.  It has to be set to
SIGTSTP.  Here is an example of what a queue should look like.</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>shell$<span class="w"> </span>qconf<span class="w"> </span>-sq<span class="w"> </span>all.q
qname<span class="w">                 </span>all.q
<span class="o">[</span>...snipped...<span class="o">]</span>
starter_method<span class="w">        </span>NONE
suspend_method<span class="w">        </span>SIGTSTP
resume_method<span class="w">         </span>NONE
</pre></div>
</div>
<p>Note that if you need to suspend other types of jobs with SIGSTOP
(instead of SIGTSTP) in this queue then you need to provide a script
that can implement the correct signals for each job type.</p>
</div>
</div>


           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="tm.html" class="btn btn-neutral float-left" title="10.9. Launching with PBS / Torque" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="unusual.html" class="btn btn-neutral float-right" title="10.11. Unusual jobs" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2003-2025, The Open MPI Community.
      <span class="lastupdated">Last updated on 2025-05-30 16:41:43 UTC.
      </span></p>
  </div>

  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
    provided by <a href="https://readthedocs.org">Read the Docs</a>.
   

</footer>
        </div>
      </div>
    </section>
  </div>
  <script>
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script> 

</body>
</html>