File: benchmarking.html

package info (click to toggle)
openmpi 5.0.8-4
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 201,684 kB
  • sloc: ansic: 613,078; makefile: 42,353; sh: 11,194; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,179; python: 1,859; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (246 lines) | stat: -rw-r--r-- 14,651 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
<!DOCTYPE html>
<html class="writer-html5" lang="en">
<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>11.11. Benchmarking Open MPI applications &mdash; Open MPI 5.0.8 documentation</title>
      <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
      <link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />

  
  <!--[if lt IE 9]>
    <script src="../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
        <script src="../_static/jquery.js"></script>
        <script src="../_static/underscore.js"></script>
        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
        <script src="../_static/doctools.js"></script>
        <script src="../_static/sphinx_highlight.js"></script>
    <script src="../_static/js/theme.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="11.12. Building heterogeneous MPI applications" href="heterogeneity.html" />
    <link rel="prev" title="11.10. Tuning Collectives" href="coll-tuned.html" /> 
</head>

<body class="wy-body-for-nav"> 
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >

          
          
          <a href="../index.html" class="icon icon-home">
            Open MPI
          </a>
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>
        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
              <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../quickstart.html">1. Quick start</a></li>
<li class="toctree-l1"><a class="reference internal" href="../getting-help.html">2. Getting help</a></li>
<li class="toctree-l1"><a class="reference internal" href="../release-notes/index.html">3. Release notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="../installing-open-mpi/index.html">4. Building and installing Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../features/index.html">5. Open MPI-specific features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../validate.html">6. Validating your installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../version-numbering.html">7. Version numbers and compatibility</a></li>
<li class="toctree-l1"><a class="reference internal" href="../mca.html">8. The Modular Component Architecture (MCA)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../building-apps/index.html">9. Building MPI applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../launching-apps/index.html">10. Launching MPI applications</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">11. Run-time operation and tuning MPI applications</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="environment-var.html">11.1. Environment variables set for MPI applications</a></li>
<li class="toctree-l2"><a class="reference internal" href="networking/index.html">11.2. Networking support</a></li>
<li class="toctree-l2"><a class="reference internal" href="multithreaded.html">11.3. Running multi-threaded MPI applications</a></li>
<li class="toctree-l2"><a class="reference internal" href="dynamic-loading.html">11.4. Dynamically loading <code class="docutils literal notranslate"><span class="pre">libmpi</span></code> at runtime</a></li>
<li class="toctree-l2"><a class="reference internal" href="fork-system-popen.html">11.5. Calling fork(), system(), or popen() in MPI processes</a></li>
<li class="toctree-l2"><a class="reference internal" href="fault-tolerance/index.html">11.6. Fault tolerance</a></li>
<li class="toctree-l2"><a class="reference internal" href="large-clusters/index.html">11.7. Large Clusters</a></li>
<li class="toctree-l2"><a class="reference internal" href="affinity.html">11.8. Processor and memory affinity</a></li>
<li class="toctree-l2"><a class="reference internal" href="mpi-io/index.html">11.9. MPI-IO tuning options</a></li>
<li class="toctree-l2"><a class="reference internal" href="coll-tuned.html">11.10. Tuning Collectives</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">11.11. Benchmarking Open MPI applications</a></li>
<li class="toctree-l2"><a class="reference internal" href="heterogeneity.html">11.12. Building heterogeneous MPI applications</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../app-debug/index.html">12. Debugging Open MPI Parallel Applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/index.html">13. Developer’s guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="../contributing.html">14. Contributing to Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../license/index.html">15. License</a></li>
<li class="toctree-l1"><a class="reference internal" href="../history.html">16. History of Open MPI</a></li>
<li class="toctree-l1"><a class="reference internal" href="../man-openmpi/index.html">17. Open MPI manual pages</a></li>
<li class="toctree-l1"><a class="reference internal" href="../man-openshmem/index.html">18. OpenSHMEM manual pages</a></li>
</ul>

        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../index.html">Open MPI</a>
      </nav>

      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
          <li class="breadcrumb-item"><a href="index.html"><span class="section-number">11. </span>Run-time operation and tuning MPI applications</a></li>
      <li class="breadcrumb-item active"><span class="section-number">11.11. </span>Benchmarking Open MPI applications</li>
      <li class="wy-breadcrumbs-aside">
            <a href="../_sources/tuning-apps/benchmarking.rst.txt" rel="nofollow"> View page source</a>
      </li>
  </ul>
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
             
  <style>
.wy-table-responsive table td,.wy-table-responsive table th{white-space:normal}
</style><div class="section" id="benchmarking-open-mpi-applications">
<h1><span class="section-number">11.11. </span>Benchmarking Open MPI applications<a class="headerlink" href="#benchmarking-open-mpi-applications" title="Permalink to this heading"></a></h1>
<p>Running benchmarks is an extremely difficult task to do correctly.
There are many, many factors to take into account; it is <em>not</em> as
simple as just compiling and running a stock benchmark application.
This documentation is by no means a definitive guide, but it does try
to offer some suggestions for generating accurate, meaningful
benchmarks.</p>
<ol class="arabic">
<li><p>Decide <em>exactly</em> what you are benchmarking and setup your system
accordingly.  For example, if you are trying to benchmark maximum
performance, then many of the suggestions listed below are
extremely relevant (be the only user on the systems and network in
question, be the only software running, use processor affinity,
etc.).  If you’re trying to benchmark average performance, some of
the suggestions below may be less relevant.  Regardless, it is
critical to <em>know</em> exactly what you’re trying to benchmark, and
<em>know</em> (not guess) both your system and the benchmark application
itself well enough to understand what the results mean.</p>
<p>To be specific, many benchmark applications are not well understood
for exactly what they are testing.  There have been many cases
where users run a given benchmark application and wrongfully
conclude that their system’s performance is bad — solely on
the basis of a single benchmark that they did not understand.  Read
the documentation of the benchmark carefully, and possibly even
look into the code itself to see exactly what it is testing.</p>
<p>Case in point: not all ping-pong benchmarks are created equal.
Most users assume that a ping-pong benchmark is a ping-pong
benchmark is a ping-pong benchmark.  But this is not true; the
common ping-pong benchmarks tend to test subtly different things
(e.g., NetPIPE, TCP bench, IMB, OSU, etc.).  <em>Make sure you
understand what your benchmark is actually testing.</em></p>
</li>
<li><p>Make sure that you are the <em>only</em> user on the systems where you are
running the benchmark to eliminate contention from other
processes.</p></li>
<li><p>Make sure that you are the <em>only</em> user on the entire network /
interconnect to eliminate network traffic contention from other
processes.  This is usually somewhat difficult to do, especially in
larger, shared systems.  But your most accurate, repeatable results
will be achieved when you are the only user on the entire network.</p></li>
<li><p>Disable all services and daemons that are not being used.  Even
“harmless” daemons consume system resources (such as RAM) and cause
“jitter” by occasionally waking up, consuming CPU cycles, reading
or writing to disk, etc.  The optimum benchmark system has an
absolute minimum number of system services running.</p></li>
<li><p>Ensure that processor and memory affinity are properly utilized to
disallow the operating system from swapping MPI processes between
processors (and causing unnecessary cache thrashing, for example).</p>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>On NUMA architectures, having the processes getting
bumped from one socket to another is more expensive in
terms of cache locality (with all of the cache
coherency overhead that comes with the lack of it)
than in terms of memory transfer routing (see below).</p>
</div>
</li>
<li><p>Be sure to understand your system’s architecture, particularly with
respect to the memory, disk, and network characteristics, and test
accordingly.  For example, on NUMA architectures, memory accesses
may be routed through a memory interconnect; remote device and/or
memory accesses will be noticeably slower than local device and/or
memory accesses.</p></li>
<li><p>Compile your benchmark with the appropriate compiler optimization
flags.  With some MPI implementations, the compiler wrappers (like
<code class="docutils literal notranslate"><span class="pre">mpicc</span></code>, <code class="docutils literal notranslate"><span class="pre">mpifort</span></code>, etc.) add optimization flags
automatically.  Open MPI does not.  Add <code class="docutils literal notranslate"><span class="pre">-O</span></code> or other flags
explicitly.</p></li>
<li><p>Make sure your benchmark runs for a sufficient amount of time.
Short-running benchmarks are generally less accurate because they
take fewer samples; longer-running jobs tend to take more samples.</p></li>
<li><p>If your benchmark is trying to benchmark extremely short events
(such as the time required for a single ping-pong of messages):</p>
<ul class="simple">
<li><p>Perform some “warmup” events first.  Many MPI implementations
(including Open MPI) — and other subsystems upon which the
MPI uses — may use “lazy” semantics to setup and maintain
streams of communications.  Hence, the first event (or first few
events) may well take significantly longer than subsequent
events.</p></li>
<li><p>Use a high-resolution timer if possible —
<code class="docutils literal notranslate"><span class="pre">gettimeofday()</span></code> only returns millisecond precision (sometimes
on the order of several microseconds).</p></li>
<li><p>Run the event many, many times (hundreds or thousands, depending
on the event and the time it takes).  Not only does this provide
more samples, it may also be necessary, especially when the
precision of the timer you’re using may be several orders of
magnitude less precise than the event you’re trying to
benchmark.</p></li>
</ul>
</li>
<li><p>Decide whether you are reporting minimum, average, or maximum
numbers, and have good reasons why.</p></li>
<li><p>Accurately label and report all results.  Reproducibility is a
major goal of benchmarking; benchmark results are effectively
useless if they are not precisely labeled as to exactly what they
are reporting.  Keep a log and detailed notes about the ‘’exact’’
system configuration that you are benchmarking.  Note, for example,
all hardware and software characteristics (to include hardware,
firmware, and software versions as appropriate).</p></li>
</ol>
</div>


           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="coll-tuned.html" class="btn btn-neutral float-left" title="11.10. Tuning Collectives" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="heterogeneity.html" class="btn btn-neutral float-right" title="11.12. Building heterogeneous MPI applications" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2003-2025, The Open MPI Community.
      <span class="lastupdated">Last updated on 2025-05-30 16:41:43 UTC.
      </span></p>
  </div>

  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
    provided by <a href="https://readthedocs.org">Read the Docs</a>.
   

</footer>
        </div>
      </div>
    </section>
  </div>
  <script>
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script> 

</body>
</html>