File: packagers.rst

package info (click to toggle)
openmpi 5.0.8-3
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 201,692 kB
  • sloc: ansic: 613,078; makefile: 42,353; sh: 11,194; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,179; python: 1,859; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (318 lines) | stat: -rw-r--r-- 13,989 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
.. _label-install-packagers:

Advice for packagers
====================

.. _label-install-packagers-do-not-use-internal:

Do not use Open MPI's internal dependent libraries
--------------------------------------------------

The Open MPI community **strongly** suggests that binary Open MPI
packages should *not* include Hwloc, Libevent, PMIx, or PRRTE.
:ref:`Although several of these libraries are required by Open MPI
<label-install-required-support-libraries>` (and are therefore bundled
in the Open MPI source code distribution for end-user convenience),
binary Open MPI packages should limit themselves solely to Open MPI
artifacts.  Specifically: ensure to configure and build Open MPI
against external installations of these required packages.

Packagers may therefore wish to configure Open MPI with something like
the following:

.. code-block:: sh

   # Install Sphinx so that Open MPI can re-build its docs with the
   # installed PRRTE's docs

   virtualalenv venv
   . ./venv/bin/activate
   pip install docs/requirements.txt

   ./configure --with-libevent=external --with-hwloc=external \
       --with-pmix=external --with-prrte=external ...

.. important:: Note the installation of the Sphinx tool so that Open
               MPI can re-build its documentation with the external
               PRRTE's documentation.

               Failure to do this will mean Open MPI's documentation
               will be correct for the version of PRRTE that is
               bundled in the Open MPI distribution, but may not be
               entirely correct for the version of PRRTE that you are
               building against.

The ``external`` keywords will force Open MPI's ``configure`` to
ignore all the bundled libraries and only look for external versions
of these support libraries.  This also has the benefit of causing
``configure`` to fail if it cannot find the required support libraries
outside of the Open MPI source tree |mdash| a good sanity check to
ensure that your package is correctly relying on the
independently-built and installed versions.

:ref:`See this section
<label-building-ompi-cli-options-required-support-libraries>` for more
information about the required support library ``--with-FOO`` command
line options.

Have Sphinx installed
---------------------

Since you should be (will be) installing Open MPI against an external
PRRTE and PMIx, you should have `Sphinx
<https://www.sphinx-doc.org/>`_ installed before running Open MPI's
``configure`` script.

This will allow Open MPI to (re-)build its documentation according to
the PMIx and PRRTE that you are building against.

To be clear: the Open MPI distribution tarball comes with pre-built
documentation |mdash| rendered in HTML and nroff |mdash| that is
suitable for the versions of PRRTE and PMIx that are bundled in that
tarball.

However, if you are building Open MPI against not-bundled versions of
PRRTE / PMIx (as all packagers should be), Open MPI needs to re-build
its documentation with specific information from those external PRRTE
/ PMIx installs.  For that, you need to have Sphinx installed before
running Open MPI's ``configure`` script.


.. _label-install-packagers-dso-or-not:

Components ("plugins"): static or DSO?
--------------------------------------

Open MPI contains a large number of components (sometimes called
"plugins") to effect different types of functionality in MPI.  For
example, some components effect Open MPI's networking functionality:
they may link against specialized libraries to provide
highly-optimized network access.

Open MPI can build its components as Dynamic Shared Objects (DSOs) or
statically included in core libraries (regardless of whether those
libraries are built as shared or static libraries).

.. note:: As of Open MPI |ompi_ver|, ``configure``'s global default is
          to build all components as static (i.e., part of the Open
          MPI core libraries, not as DSOs).  Prior to Open MPI v5.0.0,
          the global default behavior was to build most components as
          DSOs.

Why build components as DSOs?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are advantages to building components as DSOs:

* Open MPI's core libraries |mdash| and therefore MPI applications
  |mdash| will have very few dependencies.  For example, if you build
  Open MPI with support for a specific network stack, the libraries in
  that network stack will be dependencies of the DSOs, not Open MPI's
  core libraries (or MPI applications).

* Removing Open MPI functionality that you do not want is as simple as
  removing a DSO from ``$libdir/open-mpi``.

Why build components as part of Open MPI's core libraries?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The biggest advantage to building the components as part of Open MPI's
core libraries is when running at (very) large scales when Open MPI is
installed on a network filesystem (vs. being installed on a local
filesystem).

For example, consider launching a single MPI process on each of 1,000
nodes.  In this scenario, the following is accessed from the network
filesystem:

#. The MPI application
#. The core Open MPI libraries and their dependencies (e.g.,
   ``libmpi``)

   * Depending on your configuration, this is probably on the order of
     10-20 library files.

#. All DSO component files and their dependencies

   * Depending on your configuration, this can be 200+ component
     files.

If all components are physically located in the libraries, then the
third step loads zero DSO component files.  When using a networked
filesystem while launching at scale, this can translate to large
performance savings.

.. note:: If not using a networked filesystem, or if not launching at
          scale, loading a large number of DSO files may not consume a
          noticeable amount of time during MPI process launch.  Put
          simply: loading DSOs as indvidual files generally only
          matters when using a networked filesystem while launching at
          scale.

Direct controls for building components as DSOs or not
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Open MPI |ompi_ver| has two ``configure``-time defaults regarding the
treatment of components that may be of interest to packagers:

#. Open MPI's libraries default to building as shared libraries
   (vs. static libraries).  For example, on Linux, Open MPI will
   default to building ``libmpi.so`` (vs. ``libmpi.a``).

   .. note:: See the descriptions of ``--disable-shared`` and
             ``--enable-static`` :ref:`in this section
             <label-building-installation-cli-options>` for more
             details about how to change this default.

             Also be sure to :ref:`see this warning about building
             static apps <label-building-fully-static-apps>`.

#. Open MPI will default to including its components in its libraries
   (as opposed to being compiled as dynamic shared objects, or DSOs).
   For example, ``libmpi.so`` on Linux systems will contain the UCX
   PML component, instead of the UCX PML being compiled into
   ``mca_pml_ucx.so`` and dynamically opened at run time via
   ``dlopen(3)``.

   .. note:: See the descriptions of ``--enable-mca-dso`` and
             ``--enable-mca-static`` :ref:`in this section
             <label-building-installation-cli-options>` for more
             details about how to change this defaults.

A side effect of these two defaults is that all the components
included in the Open MPI libraries will bring their dependencies with
them.  For example (on Linux), if the XYZ PML component in the MPI
layer requires ``libXYZ.so``, then these defaults mean that
``libmpi.so`` will depend on ``libXYZ.so``.  This dependency will
likely be telegraphed into the Open MPI binary package that includes
``libmpi.so``.

Conversely, if the XYZ PML component was built as a DSO, then |mdash|
assuming no other parts of Open MPI require ``libXYZ.so`` |mdash|
``libmpi.so`` would *not* be dependent on ``libXYZ.so``.  Instead, the
``mca_pml_xyz.so`` DSO would have the dependency upon ``libXYZ.so``.

Packagers can use these facts to potentially create multiple binary
Open MPI packages, each with different dependencies by, for example,
using ``--enable-mca-dso`` to selectively build some components as
DSOs and leave the others included in their respective Open MPI
libraries.

:ref:`See the section on building accelerator support
<label-install-packagers-building-accelerator-support-as-dsos>` for a
practical example where this can be useful.

.. _label-install-packagers-gnu-libtool-dependency-flattening:

GNU Libtool dependency flattening
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When compiling Open MPI's components statically as part of Open MPI's
core libraries, `GNU Libtool <https://www.gnu.org/software/libtool/>`_
|mdash| which is used as part of Open MPI's build system |mdash| will
attempt to "flatten" dependencies.

For example, the :ref:`ompi_info(1) <man1-ompi_info>` command links
against the Open MPI core library ``libopen-pal``.  This library will
have dependencies on various HPC-class network stack libraries. For
simplicity, the discussion below assumes that Open MPI was built with
support for `Libfabric <https://libfabric.org/>`_ and `UCX
<https://openucx.org/>`_, and therefore ``libopen-pal`` has direct
dependencies on ``libfabric`` and ``libucx``.

In this scenario, GNU Libtool will automatically attempt to "flatten"
these dependencies by linking :ref:`ompi_info(1) <man1-ompi_info>`
directly to ``libfabric`` and ``libucx`` (vs. letting ``libopen-pal``
pull the dependencies in at run time).

* In some environments (e.g., Ubuntu 22.04), the compiler and/or
  linker will automatically utilize the linker CLI flag
  ``-Wl,--as-needed``, which will effectively cause these dependencies
  to *not* be flattened: :ref:`ompi_info(1) <man1-ompi_info>` will
  *not* have a direct dependencies on either ``libfabric`` or
  ``libucx``.

* In other environments (e.g., Fedora 38), the compiler and linker
  will *not* utilize the ``-Wl,--as-needed`` linker CLI flag.  As
  such, :ref:`ompi_info(1) <man1-ompi_info>` will show direct
  dependencies on ``libfabric`` and ``libucx``.

**Just to be clear:** these flattened dependencies *are not a
problem*.  Open MPI will function correctly with or without the
flattened dependencies.  There is no performance impact associated
with having |mdash| or not having |mdash| the flattened dependencies.
We mention this situation here in the documentation simply because it
surprised some Open MPI downstream package managers to see that
:ref:`ompi_info(1) <man1-ompi_info>` in Open MPI |ompi_ver| had more
shared library dependencies than it did in prior Open MPI releases.

If packagers want :ref:`ompi_info(1) <man1-ompi_info>` to not have
these flattened dependencies, use either of the following mechanisms:

#. Use ``--enable-mca-dso`` to force all components to be built as
   DSOs (this was actually the default behavior before Open MPI v5.0.0).

#. Add ``LDFLAGS=-Wl,--as-needed`` to the ``configure`` command line
   when building Open MPI.

   .. note:: The Open MPI community specifically chose not to
             automatically utilize this linker flag for the following
             reasons:

             #. Having the flattened dependencies does not cause any
                correctness or performance problems.
             #. There's multiple mechanisms (see above) for users or
                packagers to change this behavior, if desired.
             #. Certain environments have chosen to have |mdash| or
                not have |mdash| this flattened dependency behavior.
                It is not Open MPI's place to override these choices.
             #. In general, Open MPI's ``configure`` script only
                utilizes compiler and linker flags if they are
                *needed*.  All other flags should be the user's /
                packager's choice.

.. _label-install-packagers-building-accelerator-support-as-dsos:

Building accelerator support as DSOs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you are building a package that includes support for one or more
accelerators, it may be desirable to build accelerator-related
components as DSOs (see the :ref:`static or DSO?
<label-install-packagers-dso-or-not>` section for details).

.. admonition:: Rationale
   :class: tip

   Accelerator hardware is expensive, and may only be present on some
   compute nodes in an HPC cluster.  Specifically: there may not be
   any accelerator hardware on "head" or compile nodes in an HPC
   cluster.  As such, invoking Open MPI commands on a "head" node with
   an MPI that was built with static accelerator support but no
   accelerator hardware may fail to launch because of run-time linker
   issues (because the accelerator hardware support libraries are
   likely not present).

   Building Open MPI's accelerator-related components as DSOs allows
   Open MPI to *try* opening the accelerator components, but proceed
   if those DSOs fail to open due to the lack of support libraries.

Use the ``--enable-mca-dso`` command line parameter to Open MPI's
``configure`` command can allow packagers to build all
accelerator-related components as DSO.  For example:

.. code:: sh

   # Build all the accelerator-related components as DSOs (all other
   # components will default to being built in their respective
   # libraries)
   shell$ ./configure --enable-mca-dso=btl-smcuda,rcache-rgpusm,rcache-gpusm,accelerator

Per the example above, this allows packaging ``$libdir`` as part of
the "main" Open MPI binary package, but then packaging
``$libdir/openmpi/mca_accelerator_*.so`` and the other named
components as sub-packages.  These sub-packages may inherit
dependencies on the CUDA and/or ROCM packages, for example.  The
"main" package can be installed on all nodes, and the
accelerator-specific subpackage can be installed on only the nodes
with accelerator hardware and support libraries.