1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318
|
.. _label-install-packagers:
Advice for packagers
====================
.. _label-install-packagers-do-not-use-internal:
Do not use Open MPI's internal dependent libraries
--------------------------------------------------
The Open MPI community **strongly** suggests that binary Open MPI
packages should *not* include Hwloc, Libevent, PMIx, or PRRTE.
:ref:`Although several of these libraries are required by Open MPI
<label-install-required-support-libraries>` (and are therefore bundled
in the Open MPI source code distribution for end-user convenience),
binary Open MPI packages should limit themselves solely to Open MPI
artifacts. Specifically: ensure to configure and build Open MPI
against external installations of these required packages.
Packagers may therefore wish to configure Open MPI with something like
the following:
.. code-block:: sh
# Install Sphinx so that Open MPI can re-build its docs with the
# installed PRRTE's docs
virtualalenv venv
. ./venv/bin/activate
pip install docs/requirements.txt
./configure --with-libevent=external --with-hwloc=external \
--with-pmix=external --with-prrte=external ...
.. important:: Note the installation of the Sphinx tool so that Open
MPI can re-build its documentation with the external
PRRTE's documentation.
Failure to do this will mean Open MPI's documentation
will be correct for the version of PRRTE that is
bundled in the Open MPI distribution, but may not be
entirely correct for the version of PRRTE that you are
building against.
The ``external`` keywords will force Open MPI's ``configure`` to
ignore all the bundled libraries and only look for external versions
of these support libraries. This also has the benefit of causing
``configure`` to fail if it cannot find the required support libraries
outside of the Open MPI source tree |mdash| a good sanity check to
ensure that your package is correctly relying on the
independently-built and installed versions.
:ref:`See this section
<label-building-ompi-cli-options-required-support-libraries>` for more
information about the required support library ``--with-FOO`` command
line options.
Have Sphinx installed
---------------------
Since you should be (will be) installing Open MPI against an external
PRRTE and PMIx, you should have `Sphinx
<https://www.sphinx-doc.org/>`_ installed before running Open MPI's
``configure`` script.
This will allow Open MPI to (re-)build its documentation according to
the PMIx and PRRTE that you are building against.
To be clear: the Open MPI distribution tarball comes with pre-built
documentation |mdash| rendered in HTML and nroff |mdash| that is
suitable for the versions of PRRTE and PMIx that are bundled in that
tarball.
However, if you are building Open MPI against not-bundled versions of
PRRTE / PMIx (as all packagers should be), Open MPI needs to re-build
its documentation with specific information from those external PRRTE
/ PMIx installs. For that, you need to have Sphinx installed before
running Open MPI's ``configure`` script.
.. _label-install-packagers-dso-or-not:
Components ("plugins"): static or DSO?
--------------------------------------
Open MPI contains a large number of components (sometimes called
"plugins") to effect different types of functionality in MPI. For
example, some components effect Open MPI's networking functionality:
they may link against specialized libraries to provide
highly-optimized network access.
Open MPI can build its components as Dynamic Shared Objects (DSOs) or
statically included in core libraries (regardless of whether those
libraries are built as shared or static libraries).
.. note:: As of Open MPI |ompi_ver|, ``configure``'s global default is
to build all components as static (i.e., part of the Open
MPI core libraries, not as DSOs). Prior to Open MPI v5.0.0,
the global default behavior was to build most components as
DSOs.
Why build components as DSOs?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are advantages to building components as DSOs:
* Open MPI's core libraries |mdash| and therefore MPI applications
|mdash| will have very few dependencies. For example, if you build
Open MPI with support for a specific network stack, the libraries in
that network stack will be dependencies of the DSOs, not Open MPI's
core libraries (or MPI applications).
* Removing Open MPI functionality that you do not want is as simple as
removing a DSO from ``$libdir/open-mpi``.
Why build components as part of Open MPI's core libraries?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The biggest advantage to building the components as part of Open MPI's
core libraries is when running at (very) large scales when Open MPI is
installed on a network filesystem (vs. being installed on a local
filesystem).
For example, consider launching a single MPI process on each of 1,000
nodes. In this scenario, the following is accessed from the network
filesystem:
#. The MPI application
#. The core Open MPI libraries and their dependencies (e.g.,
``libmpi``)
* Depending on your configuration, this is probably on the order of
10-20 library files.
#. All DSO component files and their dependencies
* Depending on your configuration, this can be 200+ component
files.
If all components are physically located in the libraries, then the
third step loads zero DSO component files. When using a networked
filesystem while launching at scale, this can translate to large
performance savings.
.. note:: If not using a networked filesystem, or if not launching at
scale, loading a large number of DSO files may not consume a
noticeable amount of time during MPI process launch. Put
simply: loading DSOs as indvidual files generally only
matters when using a networked filesystem while launching at
scale.
Direct controls for building components as DSOs or not
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Open MPI |ompi_ver| has two ``configure``-time defaults regarding the
treatment of components that may be of interest to packagers:
#. Open MPI's libraries default to building as shared libraries
(vs. static libraries). For example, on Linux, Open MPI will
default to building ``libmpi.so`` (vs. ``libmpi.a``).
.. note:: See the descriptions of ``--disable-shared`` and
``--enable-static`` :ref:`in this section
<label-building-installation-cli-options>` for more
details about how to change this default.
Also be sure to :ref:`see this warning about building
static apps <label-building-fully-static-apps>`.
#. Open MPI will default to including its components in its libraries
(as opposed to being compiled as dynamic shared objects, or DSOs).
For example, ``libmpi.so`` on Linux systems will contain the UCX
PML component, instead of the UCX PML being compiled into
``mca_pml_ucx.so`` and dynamically opened at run time via
``dlopen(3)``.
.. note:: See the descriptions of ``--enable-mca-dso`` and
``--enable-mca-static`` :ref:`in this section
<label-building-installation-cli-options>` for more
details about how to change this defaults.
A side effect of these two defaults is that all the components
included in the Open MPI libraries will bring their dependencies with
them. For example (on Linux), if the XYZ PML component in the MPI
layer requires ``libXYZ.so``, then these defaults mean that
``libmpi.so`` will depend on ``libXYZ.so``. This dependency will
likely be telegraphed into the Open MPI binary package that includes
``libmpi.so``.
Conversely, if the XYZ PML component was built as a DSO, then |mdash|
assuming no other parts of Open MPI require ``libXYZ.so`` |mdash|
``libmpi.so`` would *not* be dependent on ``libXYZ.so``. Instead, the
``mca_pml_xyz.so`` DSO would have the dependency upon ``libXYZ.so``.
Packagers can use these facts to potentially create multiple binary
Open MPI packages, each with different dependencies by, for example,
using ``--enable-mca-dso`` to selectively build some components as
DSOs and leave the others included in their respective Open MPI
libraries.
:ref:`See the section on building accelerator support
<label-install-packagers-building-accelerator-support-as-dsos>` for a
practical example where this can be useful.
.. _label-install-packagers-gnu-libtool-dependency-flattening:
GNU Libtool dependency flattening
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When compiling Open MPI's components statically as part of Open MPI's
core libraries, `GNU Libtool <https://www.gnu.org/software/libtool/>`_
|mdash| which is used as part of Open MPI's build system |mdash| will
attempt to "flatten" dependencies.
For example, the :ref:`ompi_info(1) <man1-ompi_info>` command links
against the Open MPI core library ``libopen-pal``. This library will
have dependencies on various HPC-class network stack libraries. For
simplicity, the discussion below assumes that Open MPI was built with
support for `Libfabric <https://libfabric.org/>`_ and `UCX
<https://openucx.org/>`_, and therefore ``libopen-pal`` has direct
dependencies on ``libfabric`` and ``libucx``.
In this scenario, GNU Libtool will automatically attempt to "flatten"
these dependencies by linking :ref:`ompi_info(1) <man1-ompi_info>`
directly to ``libfabric`` and ``libucx`` (vs. letting ``libopen-pal``
pull the dependencies in at run time).
* In some environments (e.g., Ubuntu 22.04), the compiler and/or
linker will automatically utilize the linker CLI flag
``-Wl,--as-needed``, which will effectively cause these dependencies
to *not* be flattened: :ref:`ompi_info(1) <man1-ompi_info>` will
*not* have a direct dependencies on either ``libfabric`` or
``libucx``.
* In other environments (e.g., Fedora 38), the compiler and linker
will *not* utilize the ``-Wl,--as-needed`` linker CLI flag. As
such, :ref:`ompi_info(1) <man1-ompi_info>` will show direct
dependencies on ``libfabric`` and ``libucx``.
**Just to be clear:** these flattened dependencies *are not a
problem*. Open MPI will function correctly with or without the
flattened dependencies. There is no performance impact associated
with having |mdash| or not having |mdash| the flattened dependencies.
We mention this situation here in the documentation simply because it
surprised some Open MPI downstream package managers to see that
:ref:`ompi_info(1) <man1-ompi_info>` in Open MPI |ompi_ver| had more
shared library dependencies than it did in prior Open MPI releases.
If packagers want :ref:`ompi_info(1) <man1-ompi_info>` to not have
these flattened dependencies, use either of the following mechanisms:
#. Use ``--enable-mca-dso`` to force all components to be built as
DSOs (this was actually the default behavior before Open MPI v5.0.0).
#. Add ``LDFLAGS=-Wl,--as-needed`` to the ``configure`` command line
when building Open MPI.
.. note:: The Open MPI community specifically chose not to
automatically utilize this linker flag for the following
reasons:
#. Having the flattened dependencies does not cause any
correctness or performance problems.
#. There's multiple mechanisms (see above) for users or
packagers to change this behavior, if desired.
#. Certain environments have chosen to have |mdash| or
not have |mdash| this flattened dependency behavior.
It is not Open MPI's place to override these choices.
#. In general, Open MPI's ``configure`` script only
utilizes compiler and linker flags if they are
*needed*. All other flags should be the user's /
packager's choice.
.. _label-install-packagers-building-accelerator-support-as-dsos:
Building accelerator support as DSOs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you are building a package that includes support for one or more
accelerators, it may be desirable to build accelerator-related
components as DSOs (see the :ref:`static or DSO?
<label-install-packagers-dso-or-not>` section for details).
.. admonition:: Rationale
:class: tip
Accelerator hardware is expensive, and may only be present on some
compute nodes in an HPC cluster. Specifically: there may not be
any accelerator hardware on "head" or compile nodes in an HPC
cluster. As such, invoking Open MPI commands on a "head" node with
an MPI that was built with static accelerator support but no
accelerator hardware may fail to launch because of run-time linker
issues (because the accelerator hardware support libraries are
likely not present).
Building Open MPI's accelerator-related components as DSOs allows
Open MPI to *try* opening the accelerator components, but proceed
if those DSOs fail to open due to the lack of support libraries.
Use the ``--enable-mca-dso`` command line parameter to Open MPI's
``configure`` command can allow packagers to build all
accelerator-related components as DSO. For example:
.. code:: sh
# Build all the accelerator-related components as DSOs (all other
# components will default to being built in their respective
# libraries)
shell$ ./configure --enable-mca-dso=btl-smcuda,rcache-rgpusm,rcache-gpusm,accelerator
Per the example above, this allows packaging ``$libdir`` as part of
the "main" Open MPI binary package, but then packaging
``$libdir/openmpi/mca_accelerator_*.so`` and the other named
components as sub-packages. These sub-packages may inherit
dependencies on the CUDA and/or ROCM packages, for example. The
"main" package can be installed on all nodes, and the
accelerator-specific subpackage can be installed on only the nodes
with accelerator hardware and support libraries.
|