1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
|
.. _building-open-mpi-installation-location-label:
Installation location
=====================
A common environment to run Open MPI is in a "Beowulf"-class or
similar cluster (e.g., a bunch of 1U servers in a bunch of racks).
Simply stated, Open MPI can run on a group of servers or workstations
connected by a network. As mentioned in the
:ref:`prerequisites section <running-prerequisites-label>` there are
several caveats however (for example, you typically must have an
account on all the machines, you can ``ssh`` between the
nodes without using a password, etc.).
Regardless of whether Open MPI is installed on a shared / networked
filesystem or independently on each node, it is usually easiest if
Open MPI is available in the same filesystem path on every node.
For example, if you install Open MPI to ``/opt/openmpi-|ompi_ver|`` on
one node, ensure that it is available in ``/opt/openmpi-|ompi_ver|``
on *all* nodes.
.. important:: For simplicity, the Open MPI team *strongly* recommends
that you install Open MPI at the same path location on all nodes in
your cluster. This *greatly* simplifies the user experience of
running MPI jobs across multiple nodes in your cluster.
It is *possible* to install Open MPI in unique path locations in
the different nodes in your cluster, but it is not *advisable*.
This raises the question for Open MPI system administrators: where to
install the Open MPI binaries, header files, etc.? This discussion
mainly addresses this question for homogeneous clusters (i.e., where
all nodes and operating systems are the same), although elements of
this discussion apply to heterogeneous clusters as well.
Filesystem types
----------------
There are two common approaches.
Network filesystem
^^^^^^^^^^^^^^^^^^
Have a common filesystem, such as NFS, between all the machines to be
used. Install Open MPI such that the installation directory is the
*same value* on each node. This will *greatly* simplify user's shell
startup scripts (e.g., ``.bashrc``, ``.cshrc``, ``.profile`` etc.)
|mdash| the ``PATH`` can be set without checking which machine the
user is on. It also simplifies the system administrator's job; when
the time comes to patch or otherwise upgrade Open MPI, only one copy
needs to be modified.
For example, consider a cluster of four machines: ``inky``,
``blinky``, ``pinky``, and ``clyde``.
* Install Open MPI on ``inky``'s local hard drive in the directory
``/opt/openmpi-VERSION``. The system administrator then mounts
``inky:/opt/openmpi-VERSION`` on the remaining three machines, such
that ``/opt/openmpi-VERSION`` on all machines is effectively "the
same". That is, the following directories all contain the Open MPI
installation:
.. code-block::
inky:/opt/openmpi-VERSION
blinky:/opt/openmpi-VERSION
pinky:/opt/openmpi-VERSION
clyde:/opt/openmpi-VERSION
* Install Open MPI on ``inky``'s local hard drive in the directory
``/usr/local/openmpi-VERSION``. The system administrator then
mounts ``inky:/usr/local/openmpi-VERSION`` on *all four* machines in
some other common location, such as ``/opt/openmpi-VERSION`` (a
symbolic link can be installed on ``inky`` instead of a mount point
for efficiency). This strategy is typically used for environments
where one tree is NFS exported, but another tree is typically used
for the location of actual installation. For example, the following
directories all contain the Open MPI installation:
.. code-block::
inky:/opt/openmpi-VERSION
blinky:/opt/openmpi-VERSION
pinky:/opt/openmpi-VERSION
clyde:/opt/openmpi-VERSION
Notice that there are the same four directories as the previous
example, but on ``inky``, the directory is *actually* located in
``/usr/local/openmpi-VERSION``.
There is a bit of a disadvantage in this approach; each of the remote
nodes have to incur NFS (or whatever filesystem is used) delays to
access the Open MPI directory tree. However, both the administration
ease and low cost (relatively speaking) of using a networked file
system usually greatly outweighs the cost. Indeed, once an MPI
application is past MPI initialization, it doesn't use the Open MPI
binaries very much.
Local filesystem
^^^^^^^^^^^^^^^^
If you are concerned with networked filesystem costs of accessing the
Open MPI binaries, you can install Open MPI on the local hard drive of
each node in your system. Again, it is *highly* advisable to install
Open MPI in the *same* directory on each node so that each user's
``PATH`` can be set to the same value, regardless of the node that a
user has logged on to.
This approach will save some network latency of accessing the Open MPI
binaries, but is typically only used where users are very concerned
about squeezing every single cycle out of their machines, or are
running at extreme scale where a networked filesystem may get
overwhelmed by filesystem requests for Open MPI binaries when running
very large parallel jobs.
.. _building-open-mpi-install-overwrite-label:
Installing over a prior Open MPI installation
---------------------------------------------
.. warning:: The Open MPI team does not recommend installing a new
version of Open MPI over an existing / older installation of Open
MPI.
In its default configuration, an Open MPI installation consists of
several shared libraries, header files, executables, and plugins
(dynamic shared objects |mdash| DSOs). These installation files act
together as a single entity. The specific filenames and
contents of these files are subject to change between different
versions of Open MPI.
.. important:: Installing one version of Open MPI does *not* uninstall
another version.
If you install a new version of Open MPI over an older version, this
may not overwrite all the files from the older version. Hence, you
may end up with an incompatible muddle of files from two different
installations |mdash| which can cause problems.
See :ref:`updating Open MPI <building-open-mpi-updating-label>` for more
information about updating or upgrading an installation of Open MPI.
Relocating an Open MPI installation
-----------------------------------
It can be desirable to initially install Open MPI to one location
(e.g., ``/path/to/openmpi``) and then later move it to another
location (e.g., ``/opt/myproduct/bundled-openmpi-a.b.c``).
.. note:: Open MPI hard-codes some directory paths in its executables
based on installation paths specified by the ``configure``
script. For example, if you configure with an installation
prefix of ``/opt/openmpi/``, Open MPI encodes in its
executables that it should be able to find its help files in
``/opt/openmpi/share/openmpi``.
The "installdirs" functionality in Open MPI lets you change any of
these hard-coded directory paths at run time (*assuming* that you have
already adjusted your ``PATH`` and/or ``LD_LIBRARY_PATH`` environment
variables to the new location where Open MPI now resides).
There are three methods.
.. _install-location-opal-prefix:
Move an existing Open MPI installation to a new prefix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Set the ``OPAL_PREFIX`` environment variable before launching Open
MPI. For example, if Open MPI had initially been installed to
``/opt/openmpi`` and the entire ``openmpi`` tree was later moved to
``/home/openmpi``, setting ``OPAL_PREFIX`` to ``/home/openmpi`` will
enable Open MPI to function properly.
.. note:: The ``OPAL_PREFIX`` mechanism relies on all installation
directories being specified as relative to the ``prefix``
directory specified during ``configure``.
For example, if Open MPI is configured the following way:
.. code-block::
$ ./configure --prefix=/opt/openmpi --libdir=/usr/lib ...
Then setting ``OPAL_PREFIX`` will not affect the run-time
implications of ``libdir``, since ``/usr/lib`` is not
specified as relative to ``/opt/openmpi``.
Instead of specifying absolute directories, you can make
them relative to other ``configure``-recognized directories.
For example:
.. code-block::
$ ./configure --prefix=/opt/openmpi --libdir='${exec_prefix}/x86_64/lib' ...
Note the additional shell quoting that is likely necessary
to prevent shell variable expansion, and the additional
``${}`` around ``exec_prefix`` that is necessary for Open MPI
to recognize that it is a special name that needs to be
expanded.
The directory names recognized by Open MPI are listed in the
:ref:`Overriding individual directories
<install-location-overriding-individual-directories>`
section (below), without the ``OPAL_`` prefix, and in lower
case. For example, the ``OPAL_SYSCONFDIR`` environment
variable corresponds to ``${sysconfdir}``.
"Stage" an Open MPI installation in a temporary location
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When *creating* self-contained installation packages, systems such as
RPM install Open MPI into temporary locations. The package system
then bundles up everything under the temporary location into a package
that can be installed into its real location later. For example, when
*creating* an RPM that will be installed to ``/opt/openmpi``, the RPM
system will transparently prepend a "destination directory" (or
"destdir") to the installation directory. As such, Open MPI will
think that it is installed in ``/opt/openmpi``, but it is actually
temporarily installed in (for example)
``/var/rpm/build.1234/opt/openmpi``. If it is necessary to *use* Open
MPI while it is installed in this staging area, the ``OPAL_DESTDIR``
environment variable can be used; setting ``OPAL_DESTDIR`` to
``/var/rpm/build.1234`` will automatically prefix every directory such
that Open MPI can function properly.
.. _install-location-overriding-individual-directories:
Overriding individual directories
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Open MPI uses the GNU-specified directories (per Autoconf/Automake),
and can be overridden by setting environment variables directly
related to their common names. The list of environment variables that
can be used is:
* ``OPAL_PREFIX``
* ``OPAL_EXEC_PREFIX``
* ``OPAL_BINDIR``
* ``OPAL_SBINDIR``
* ``OPAL_LIBEXECDIR``
* ``OPAL_DATAROOTDIR``
* ``OPAL_DATADIR``
* ``OPAL_SYSCONFDIR``
* ``OPAL_SHAREDSTATEDIR``
* ``OPAL_LOCALSTATEDIR``
* ``OPAL_LIBDIR``
* ``OPAL_INCLUDEDIR``
* ``OPAL_INFODIR``
* ``OPAL_MANDIR``
* ``OPAL_PKGDATADIR``
* ``OPAL_PKGLIBDIR``
* ``OPAL_PKGINCLUDEDIR``
Note that not all of the directories listed above are used by Open
MPI; they are listed here in entirety for completeness.
Also note that several directories listed above are defined in terms
of other directories. For example, the ``$bindir`` is defined by
default as ``$prefix/bin``. Hence, overriding the ``$prefix`` (via
``OPAL_PREFIX``) will automatically change the first part of the
``$bindir`` (which is how method 1 described above works).
Alternatively, ``OPAL_BINDIR`` can be set to an absolute value that
ignores ``$prefix`` altogether.
.. _building-open-mpi-installation-location-multiple-copies-label:
Installing Multiple Copies of Open MPI
--------------------------------------
Open MPI can handle a variety of different run-time environments
(e.g., ssh, Slurm, PBS, etc.) and a variety of different
interconnection networks (e.g., ethernet, InfiniBand, etc.)
in a single installation. Specifically: because Open MPI is
fundamentally powered by a component architecture, plug-ins for all
these different run-time systems and interconnect networks can be
installed in a single installation tree. The relevant plug-ins will
only be used in the environments where they make sense.
Hence, there is no need to have one MPI installation for InfiniBand, one
MPI installation for ethernet, one MPI installation for PBS, one MPI
installation for ``ssh``, etc. Open MPI can handle all of these in a
single installation.
However, there are some issues that Open MPI cannot solve. Binary
compatibility between different compilers is such an issue and may require
installation of multiple versions of Open MPI.
Let's examine this on a per-language basis (be sure see the big caveat at
the end):
* *C:* Most C compilers are fairly compatible, such that if you compile
Open MPI with one C library and link it to an application that was
compiled with a different C compiler, everything should "just work."
As such, a single installation of Open MPI should work for most C MPI
applications.
* *C++:* The same is not necessarily true for C++. While Open MPI does not
currently contain any C++ code (the MPI C++ bindings were removed in a prior
release), and C++ compilers *should* produce ABI-equivalent code for C
symbols, obscure problem can sometimes arise when mixing compilers from
different suites. For example, if you compile Open MPI with the XYZ C/C++
compiler, you may need to have the XYC C++ run-time libraries
installed everywhere you want to run.
* *Fortran:* There are multiple issues with Fortran.
#. Fortran compilers do something called "symbol mangling," meaning that the
back-end symbols may have slightly different names than their corresponding
global variables, subroutines, and functions. There are 4 common name
mangling schemes in use by Fortran compilers. On many systems (e.g.,
Linux), Open MPI will automatically support all 4 schemes. As such, a
single Open MPI installation *should* just work with multiple different
Fortran compilers. However, on some systems, this is not possible (e.g.,
OS X), and Open MPI will only support the name mangling scheme of the
Fortran compiler that was identified during ``configure``.
#. That being said, there are two notable exceptions that do *not* work
across Fortran compilers that are "different enough":
#. The C constants ``MPI_F_STATUS_IGNORE`` and ``MPI_F_STATUSES_IGNORE``
will only compare properly to Fortran applications that were
created with Fortran compilers that that use the same
name-mangling scheme as the Fortran compiler with which Open MPI was
configured.
#. Fortran compilers may have different values for the logical
``.TRUE.`` constant. As such, any MPI function that uses the
Fortran ``LOGICAL`` type may only get ``.TRUE.`` values back that
correspond to the the ``.TRUE.`` value of the Fortran compiler with which
Open MPI was configured.
#. Similar to C++, linking object files that Fortran language features such
as modules and/or polymorphism from different
Fortran compilers is not likely to work. The ``mpi`` and ``mpi_f08`` modules that
Open MPI creates will likely only work with the Fortran compiler
that was identified during ``configure`` (and used to build Open MPI).
The big caveat to all of this is that Open MPI will only work with
different compilers *if all the datatype sizes are the same.* For
example, even though Open MPI supports all 4 name mangling schemes,
the size of the Fortran ``LOGICAL`` type may be 1 byte in some compilers
and 4 bytes in others. This will likely cause Open MPI to perform
unpredictably.
The bottom line is that Open MPI can support all manner of run-time
systems and interconnects in a single installation, but supporting
multiple compilers "sort of" works (i.e., is subject to trial and
error) in some cases, and definitely does not work in other cases.
There's unfortunately little that we can do about this |mdash| it's a
compiler compatibility issue, and one that compiler authors have
little incentive to resolve.
|