1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422
|
<!DOCTYPE html>
<html class="writer-html5" lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>10.10. Launching with Grid Engine — Open MPI 5.0.8 documentation</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />
<!--[if lt IE 9]>
<script src="../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../_static/doctools.js"></script>
<script src="../_static/sphinx_highlight.js"></script>
<script src="../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="10.11. Unusual jobs" href="unusual.html" />
<link rel="prev" title="10.9. Launching with PBS / Torque" href="tm.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home">
Open MPI
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../quickstart.html">1. Quick start</a></li>
<li class="toctree-l1"><a class="reference internal" href="../getting-help.html">2. Getting help</a></li>
<li class="toctree-l1"><a class="reference internal" href="../release-notes/index.html">3. Release notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="../installing-open-mpi/index.html">4. Building and installing Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../features/index.html">5. Open MPI-specific features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../validate.html">6. Validating your installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../version-numbering.html">7. Version numbers and compatibility</a></li>
<li class="toctree-l1"><a class="reference internal" href="../mca.html">8. The Modular Component Architecture (MCA)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../building-apps/index.html">9. Building MPI applications</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">10. Launching MPI applications</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="quickstart.html">10.1. Quick start: Launching MPI applications</a></li>
<li class="toctree-l2"><a class="reference internal" href="prerequisites.html">10.2. Prerequisites</a></li>
<li class="toctree-l2"><a class="reference internal" href="pmix-and-prrte.html">10.3. The role of PMIx and PRRTE</a></li>
<li class="toctree-l2"><a class="reference internal" href="scheduling.html">10.4. Scheduling processes across hosts</a></li>
<li class="toctree-l2"><a class="reference internal" href="localhost.html">10.5. Launching only on the local node</a></li>
<li class="toctree-l2"><a class="reference internal" href="ssh.html">10.6. Launching with SSH</a></li>
<li class="toctree-l2"><a class="reference internal" href="slurm.html">10.7. Launching with Slurm</a></li>
<li class="toctree-l2"><a class="reference internal" href="lsf.html">10.8. Launching with LSF</a></li>
<li class="toctree-l2"><a class="reference internal" href="tm.html">10.9. Launching with PBS / Torque</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">10.10. Launching with Grid Engine</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#verify-grid-engine-support">10.10.1. Verify Grid Engine support</a></li>
<li class="toctree-l3"><a class="reference internal" href="#launching">10.10.2. Launching</a></li>
<li class="toctree-l3"><a class="reference internal" href="#grid-engine-tight-integration-support-of-the-qsub-notify-flag">10.10.3. Grid Engine tight integration support of the <code class="docutils literal notranslate"><span class="pre">qsub</span> <span class="pre">-notify</span></code> flag</a></li>
<li class="toctree-l3"><a class="reference internal" href="#grid-engine-job-suspend-resume-support">10.10.4. Grid Engine job suspend / resume support</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="unusual.html">10.11. Unusual jobs</a></li>
<li class="toctree-l2"><a class="reference internal" href="troubleshooting.html">10.12. Troubleshooting</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../tuning-apps/index.html">11. Run-time operation and tuning MPI applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../app-debug/index.html">12. Debugging Open MPI Parallel Applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/index.html">13. Developer’s guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="../contributing.html">14. Contributing to Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../license/index.html">15. License</a></li>
<li class="toctree-l1"><a class="reference internal" href="../history.html">16. History of Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../man-openmpi/index.html">17. Open MPI manual pages</a></li>
<li class="toctree-l1"><a class="reference internal" href="../man-openshmem/index.html">18. OpenSHMEM manual pages</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Open MPI</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item"><a href="index.html"><span class="section-number">10. </span>Launching MPI applications</a></li>
<li class="breadcrumb-item active"><span class="section-number">10.10. </span>Launching with Grid Engine</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/launching-apps/gridengine.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<style>
.wy-table-responsive table td,.wy-table-responsive table th{white-space:normal}
</style><div class="section" id="launching-with-grid-engine">
<h1><span class="section-number">10.10. </span>Launching with Grid Engine<a class="headerlink" href="#launching-with-grid-engine" title="Permalink to this heading"></a></h1>
<p>Open MPI supports the family of run-time schedulers including the Sun
Grid Engine (SGE), Oracle Grid Engine (OGE), Grid Engine (GE), Son of
Grid Engine, and others.</p>
<p>This documentation will collectively refer to all of them as “Grid
Engine”, unless a referring to a specific flavor of the Grid Engine
family.</p>
<div class="section" id="verify-grid-engine-support">
<h2><span class="section-number">10.10.1. </span>Verify Grid Engine support<a class="headerlink" href="#verify-grid-engine-support" title="Permalink to this heading"></a></h2>
<div class="admonition important">
<p class="admonition-title">Important</p>
<p>To build Grid Engine support in Open MPI, you will need
to explicitly request the SGE support with the <code class="docutils literal notranslate"><span class="pre">--with-sge</span></code>
command line switch to Open MPI’s <code class="docutils literal notranslate"><span class="pre">configure</span></code> script.</p>
</div>
<p>To verify if support for Grid Engine is configured into your Open MPI
installation, run <code class="docutils literal notranslate"><span class="pre">prte_info</span></code> as shown below and look for
<code class="docutils literal notranslate"><span class="pre">gridengine</span></code>.</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shell$ prte_info | grep gridengine
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>PRRTE is the software layer that provides run-time
environment support to Open MPI. Open MPI typically hides most
PMIx and PRRTE details from the end user, but this is one place
that Open MPI is unable to hide the fact that PRRTE provides this
functionality, not Open MPI. Hence, users need to use the
<code class="docutils literal notranslate"><span class="pre">prte_info</span></code> command to check for Grid Engine support (not
<code class="docutils literal notranslate"><span class="pre">ompi_info</span></code>).</p>
</div>
</div>
<div class="section" id="launching">
<h2><span class="section-number">10.10.2. </span>Launching<a class="headerlink" href="#launching" title="Permalink to this heading"></a></h2>
<p>When Grid Engine support is included, Open MPI will automatically
detect when it is running inside SGE and will just “do the Right
Thing.”</p>
<p>Specifically, if you execute an <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> command in a Grid Engine
job, it will automatically use the Grid Engine mechanisms to launch
and kill processes. There is no need to specify what nodes to run on
— Open MPI will obtain this information directly from Grid
Engine and default to a number of processes equal to the slot count
specified. For example, this will run 4 MPI processes on the nodes
that were allocated by Grid Engine:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="c1"># Get the environment variables for Grid Engine</span>
<span class="c1"># (Assuming Grid Engine is installed at /opt/sge and $Grid</span>
<span class="c1"># Engine_CELL is 'default' in your environment)</span>
shell$<span class="w"> </span>.<span class="w"> </span>/opt/sge/default/common/settings.sh
<span class="c1"># Allocate an Grid Engine interactive job with 4 slots from a</span>
<span class="c1"># parallel environment (PE) named 'ompi' and run a 4-process Open</span>
<span class="c1"># MPI job</span>
shell$<span class="w"> </span>qrsh<span class="w"> </span>-pe<span class="w"> </span>ompi<span class="w"> </span><span class="m">4</span><span class="w"> </span>-b<span class="w"> </span>y<span class="w"> </span>mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">4</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<p>There are also other ways to submit jobs under Grid Engine:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="c1"># Submit a batch job with the 'mpirun' command embedded in a script</span>
shell$<span class="w"> </span>qsub<span class="w"> </span>-pe<span class="w"> </span>ompi<span class="w"> </span><span class="m">4</span><span class="w"> </span>my_mpirun_job.csh
<span class="c1"># Submit a Grid Engine and OMPI job and mpirun in one line</span>
shell$<span class="w"> </span>qrsh<span class="w"> </span>-V<span class="w"> </span>-pe<span class="w"> </span>ompi<span class="w"> </span><span class="m">4</span><span class="w"> </span>mpirun<span class="w"> </span>hostname
<span class="c1"># Use qstat(1) to show the status of Grid Engine jobs and queues</span>
shell$<span class="w"> </span>qstat<span class="w"> </span>-f
</pre></div>
</div>
<p>In reference to the setup, be sure you have a Parallel Environment
(PE) defined for submitting parallel jobs. You don’t have to name your
PE “ompi”. The following example shows a PE named “ompi” that would
look like:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shell$ qconf -sp ompi
pe_name ompi
slots 99999
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
qsort_args NONE
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p><code class="docutils literal notranslate"><span class="pre">qsort_args</span></code> is necessary with the Son of Grid Engine
distribution, version 8.1.1 and later, and probably only applicable
to it.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>For very old versions of Sun Grid Engine, omit
<code class="docutils literal notranslate"><span class="pre">accounting_summary</span></code> too.</p>
</div>
<p>You may want to alter other parameters, but the important one is
<code class="docutils literal notranslate"><span class="pre">control_slaves</span></code>, specifying that the environment has “tight
integration”. Note also the lack of a start or stop procedure. The
tight integration means that mpirun automatically picks up the slot
count to use as a default in place of the <code class="docutils literal notranslate"><span class="pre">-n</span></code> argument, picks up a
host file, spawns remote processes via <code class="docutils literal notranslate"><span class="pre">qrsh</span></code> so that Grid Engine
can control and monitor them, and creates and destroys a per-job
temporary directory (<code class="docutils literal notranslate"><span class="pre">$TMPDIR</span></code>), in which Open MPI’s directory will
be created (by default).</p>
<p>Be sure the queue will make use of the PE that you specified:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shell$ qconf -sq all.q
[...snipped...]
pe_list make cre ompi
[...snipped...]
</pre></div>
</div>
<p>To determine whether the Grid Engine parallel job is successfully
launched to the remote nodes, you can pass in the MCA parameter
<code class="docutils literal notranslate"><span class="pre">--mca</span> <span class="pre">plm_base_verbose</span> <span class="pre">1</span></code> to <code class="docutils literal notranslate"><span class="pre">mpirun</span></code>.</p>
<p>This will add in a <code class="docutils literal notranslate"><span class="pre">-verbose</span></code> flag to the <code class="docutils literal notranslate"><span class="pre">qrsh</span> <span class="pre">-inherit</span></code> command
that is used to send parallel tasks to the remote Grid Engine
execution hosts. It will show whether the connections to the remote
hosts are established successfully or not.</p>
<div class="admonition error">
<p class="admonition-title">Error</p>
<p>TODO is this site still live? Doesn’t look like it.. Jeff
emailed Dave Love on 31 Dec 2021 to ask if this is still the
correct URL.</p>
<p>Update March 2022: it doesn’t look like this web site is good any
more. Perhaps use <a class="reference external" href="https://github.com/grisu48/gridengine">https://github.com/grisu48/gridengine</a> instead…?</p>
</div>
<p>Various Grid Engine documentation with pointers to more is available
at <a class="reference external" href="http://arc.liv.ac.uk/sge/">the Son of GridEngine site</a>, and
configuration instructions can be found at <a class="reference external" href="http://arc.liv.ac.uk/SGE/howto/sge-configs.html">the Son of GridEngine
configuration how-to site</a>.</p>
</div>
<div class="section" id="grid-engine-tight-integration-support-of-the-qsub-notify-flag">
<h2><span class="section-number">10.10.3. </span>Grid Engine tight integration support of the <code class="docutils literal notranslate"><span class="pre">qsub</span> <span class="pre">-notify</span></code> flag<a class="headerlink" href="#grid-engine-tight-integration-support-of-the-qsub-notify-flag" title="Permalink to this heading"></a></h2>
<p>If you are running SGE 6.2 Update 3 or later, then the <code class="docutils literal notranslate"><span class="pre">-notify</span></code>
flag is supported. If you are running earlier versions, then the
<code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag will not work and using it will cause the job to be
killed.</p>
<p>To use <code class="docutils literal notranslate"><span class="pre">-notify</span></code>, one has to be careful. First, let us review what
<code class="docutils literal notranslate"><span class="pre">-notify</span></code> does. Here is an excerpt from the qsub man page for the
<code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag.</p>
<blockquote>
<div><p>The <code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag, when set causes Sun Grid Engine to send
warning signals to a running job prior to sending the signals
themselves. If a SIGSTOP is pending, the job will receive a SIGUSR1
several seconds before the SIGSTOP. If a SIGKILL is pending, the
job will receive a SIGUSR2 several seconds before the SIGKILL. The
amount of time delay is controlled by the notify parameter in each
queue configuration.</p>
</div></blockquote>
<p>Let us assume the reason you want to use the <code class="docutils literal notranslate"><span class="pre">-notify</span></code> flag is to
get the SIGUSR1 signal prior to getting the SIGTSTP signal. Something
like this batch script can be used:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="ch">#! /bin/bash</span>
<span class="c1">#$ -S /bin/bash</span>
<span class="c1">#$ -V</span>
<span class="c1">#$ -cwd</span>
<span class="c1">#$ -N Job1</span>
<span class="c1">#$ -pe ompi 16</span>
<span class="c1">#$ -j y</span>
<span class="c1">#$ -l h_rt=00:20:00</span>
mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">16</span><span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<div class="admonition error">
<p class="admonition-title">Error</p>
<p>Ralph: Does <code class="docutils literal notranslate"><span class="pre">orte_forward_job_control</span></code> still exist?</p>
</div>
<p>However, one has to make one of two changes to this script for things
to work properly. By default, a SIGUSR1 signal will kill a shell
script. So we have to make sure that does not happen. Here is one way
to handle it:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="ch">#! /bin/bash</span>
<span class="c1">#$ -S /bin/bash</span>
<span class="c1">#$ -V</span>
<span class="c1">#$ -cwd</span>
<span class="c1">#$ -N Job1</span>
<span class="c1">#$ -pe ompi 16</span>
<span class="c1">#$ -j y</span>
<span class="c1">#$ -l h_rt=00:20:00</span>
<span class="nb">exec</span><span class="w"> </span>mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">16</span><span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<p>Alternatively, one can catch the signals in the script instead of doing
an exec on the mpirun:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="ch">#! /bin/bash</span>
<span class="c1">#$ -S /bin/bash</span>
<span class="c1">#$ -V</span>
<span class="c1">#$ -cwd</span>
<span class="c1">#$ -N Job1</span>
<span class="c1">#$ -pe ompi 16</span>
<span class="c1">#$ -j y</span>
<span class="c1">#$ -l h_rt=00:20:00</span>
<span class="k">function</span><span class="w"> </span>sigusr1handler<span class="o">()</span>
<span class="o">{</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"SIGUSR1 caught by shell script"</span><span class="w"> </span><span class="m">1</span>><span class="p">&</span><span class="m">2</span>
<span class="o">}</span>
<span class="k">function</span><span class="w"> </span>sigusr2handler<span class="o">()</span>
<span class="o">{</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"SIGUSR2 caught by shell script"</span><span class="w"> </span><span class="m">1</span>><span class="p">&</span><span class="m">2</span>
<span class="o">}</span>
<span class="nb">trap</span><span class="w"> </span>sigusr1handler<span class="w"> </span>SIGUSR1
<span class="nb">trap</span><span class="w"> </span>sigusr2handler<span class="w"> </span>SIGUSR2
mpirun<span class="w"> </span>-n<span class="w"> </span><span class="m">16</span><span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
</div>
<div class="section" id="grid-engine-job-suspend-resume-support">
<h2><span class="section-number">10.10.4. </span>Grid Engine job suspend / resume support<a class="headerlink" href="#grid-engine-job-suspend-resume-support" title="Permalink to this heading"></a></h2>
<p>To suspend the job, you send a SIGTSTP (not SIGSTOP) signal to
<code class="docutils literal notranslate"><span class="pre">mpirun</span></code>. <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> will catch this signal and forward it to the
<code class="docutils literal notranslate"><span class="pre">mpi-hello-world</span></code> as a SIGSTOP signal. To resume the job, you send
a SIGCONT signal to <code class="docutils literal notranslate"><span class="pre">mpirun</span></code> which will be caught and forwarded to
the <code class="docutils literal notranslate"><span class="pre">mpi-hello-world</span></code>.</p>
<p>By default, this feature is not enabled. This means that both the
SIGTSTP and SIGCONT signals will simply be consumed by the <code class="docutils literal notranslate"><span class="pre">mpirun</span></code>
process. To have them forwarded, you have to run the job with <code class="docutils literal notranslate"><span class="pre">--mca</span>
<span class="pre">orte_forward_job_control</span> <span class="pre">1</span></code>. Here is an example on Solaris:</p>
<div class="admonition error">
<p class="admonition-title">Error</p>
<p>TODO Ralph: does <code class="docutils literal notranslate"><span class="pre">orte_forward_job_control</span></code> still exist?</p>
</div>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>shell$<span class="w"> </span>mpirun<span class="w"> </span>-mca<span class="w"> </span>orte_forward_job_control<span class="w"> </span><span class="m">1</span><span class="w"> </span>-n<span class="w"> </span><span class="m">2</span><span class="w"> </span>mpi-hello-world
</pre></div>
</div>
<p>In another window, we suspend and continue the job:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>shell$<span class="w"> </span>prstat<span class="w"> </span>-p<span class="w"> </span><span class="m">15301</span>,15303,15305
<span class="w"> </span>PID<span class="w"> </span>USERNAME<span class="w"> </span>SIZE<span class="w"> </span>RSS<span class="w"> </span>STATE<span class="w"> </span>PRI<span class="w"> </span>NICE<span class="w"> </span>TIME<span class="w"> </span>CPU<span class="w"> </span>PROCESS/NLWP
<span class="w"> </span><span class="m">15305</span><span class="w"> </span>rolfv<span class="w"> </span>158M<span class="w"> </span>22M<span class="w"> </span>cpu1<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:00:21<span class="w"> </span><span class="m">5</span>.9%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15303</span><span class="w"> </span>rolfv<span class="w"> </span>158M<span class="w"> </span>22M<span class="w"> </span>cpu2<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:00:21<span class="w"> </span><span class="m">5</span>.9%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15301</span><span class="w"> </span>rolfv<span class="w"> </span>8128K<span class="w"> </span>5144K<span class="w"> </span>sleep<span class="w"> </span><span class="m">59</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:00:00<span class="w"> </span><span class="m">0</span>.0%<span class="w"> </span>mpirun/1
shell$<span class="w"> </span><span class="nb">kill</span><span class="w"> </span>-TSTP<span class="w"> </span><span class="m">15301</span>
shell$<span class="w"> </span>prstat<span class="w"> </span>-p<span class="w"> </span><span class="m">15301</span>,15303,15305
<span class="w"> </span>PID<span class="w"> </span>USERNAME<span class="w"> </span>SIZE<span class="w"> </span>RSS<span class="w"> </span>STATE<span class="w"> </span>PRI<span class="w"> </span>NICE<span class="w"> </span>TIME<span class="w"> </span>CPU<span class="w"> </span>PROCESS/NLWP
<span class="w"> </span><span class="m">15303</span><span class="w"> </span>rolfv<span class="w"> </span>158M<span class="w"> </span>22M<span class="w"> </span>stop<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:01:44<span class="w"> </span><span class="m">21</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15305</span><span class="w"> </span>rolfv<span class="w"> </span>158M<span class="w"> </span>22M<span class="w"> </span>stop<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:01:44<span class="w"> </span><span class="m">21</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15301</span><span class="w"> </span>rolfv<span class="w"> </span>8128K<span class="w"> </span>5144K<span class="w"> </span>sleep<span class="w"> </span><span class="m">59</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:00:00<span class="w"> </span><span class="m">0</span>.0%<span class="w"> </span>mpirun/1
shell$<span class="w"> </span><span class="nb">kill</span><span class="w"> </span>-CONT<span class="w"> </span><span class="m">15301</span>
shell$<span class="w"> </span>prstat<span class="w"> </span>-p<span class="w"> </span><span class="m">15301</span>,15303,15305
<span class="w"> </span>PID<span class="w"> </span>USERNAME<span class="w"> </span>SIZE<span class="w"> </span>RSS<span class="w"> </span>STATE<span class="w"> </span>PRI<span class="w"> </span>NICE<span class="w"> </span>TIME<span class="w"> </span>CPU<span class="w"> </span>PROCESS/NLWP
<span class="w"> </span><span class="m">15305</span><span class="w"> </span>rolfv<span class="w"> </span>158M<span class="w"> </span>22M<span class="w"> </span>cpu1<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:02:06<span class="w"> </span><span class="m">17</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15303</span><span class="w"> </span>rolfv<span class="w"> </span>158M<span class="w"> </span>22M<span class="w"> </span>cpu3<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:02:06<span class="w"> </span><span class="m">17</span>%<span class="w"> </span>mpi-hello-world/1
<span class="w"> </span><span class="m">15301</span><span class="w"> </span>rolfv<span class="w"> </span>8128K<span class="w"> </span>5144K<span class="w"> </span>sleep<span class="w"> </span><span class="m">59</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>:00:00<span class="w"> </span><span class="m">0</span>.0%<span class="w"> </span>mpirun/1
</pre></div>
</div>
<p>Note that all this does is stop the <code class="docutils literal notranslate"><span class="pre">mpi-hello-world</span></code> processes. It
does not, for example, free any pinned memory when the job is in the
suspended state.</p>
<p>To get this to work under the Grid Engine environment, you have to
change the <code class="docutils literal notranslate"><span class="pre">suspend_method</span></code> entry in the queue. It has to be set to
SIGTSTP. Here is an example of what a queue should look like.</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>shell$<span class="w"> </span>qconf<span class="w"> </span>-sq<span class="w"> </span>all.q
qname<span class="w"> </span>all.q
<span class="o">[</span>...snipped...<span class="o">]</span>
starter_method<span class="w"> </span>NONE
suspend_method<span class="w"> </span>SIGTSTP
resume_method<span class="w"> </span>NONE
</pre></div>
</div>
<p>Note that if you need to suspend other types of jobs with SIGSTOP
(instead of SIGTSTP) in this queue then you need to provide a script
that can implement the correct signals for each job type.</p>
</div>
</div>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="tm.html" class="btn btn-neutral float-left" title="10.9. Launching with PBS / Torque" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="unusual.html" class="btn btn-neutral float-right" title="10.11. Unusual jobs" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright 2003-2025, The Open MPI Community.
<span class="lastupdated">Last updated on 2025-05-30 16:41:43 UTC.
</span></p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>
|