1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451
|
% -*- latex -*-
%
% Copyright (c) 2001-2004 The Trustees of Indiana University.
% All rights reserved.
% Copyright (c) 1998-2001 University of Notre Dame.
% All rights reserved.
% Copyright (c) 1994-1998 The Ohio State University.
% All rights reserved.
%
% This file is part of the LAM/MPI software package. For license
% information, see the LICENSE file in the top level directory of the
% LAM/MPI source distribution.
%
% $Id: misc.tex,v 1.20 2003/08/12 01:10:28 jsquyres Exp $
%
\chapter{Miscellaneous}
\label{sec:misc}
This chapter covers a variety of topics that don't conveniently fit
into other chapters.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Singleton MPI Processes}
It is possible to run an MPI process without the \cmd{mpirun} or
\cmd{mpiexec} commands -- simply run the program as one would normally
launch a serial program:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ my_mpi_program
\end{lstlisting}
% Stupid emacs mode: $
Doing so will create an \mpiconst{MPI\_\-COMM\_\-WORLD} with a single
process. This process can either run by itself, or spawn or connect
to other MPI processes and become part of a larger MPI jobs using the
MPI-2 dynamic function calls. A LAM RTE must be running on the local
node, as with jobs started with \cmd{mpirun}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{MPI-2 I/O Support}
\index{ROMIO}
\index{MPI-2 I/O support|see {ROMIO}}
\index{I/O support|see {ROMIO}}
MPI-2 I/O support is provided through the ROMIO
package~\cite{thak99a,thak99b}. Since support is provided through a
third party package, its integration with LAM/MPI is not ``complete.''
Specifically, everywhere the MPI-2 standard specifies an argument of
type \mpitype{MPI\_\-Request}, ROMIO's provided functions expect an
argument of type \mpitype{MPIO\_\-Request}.
Note, too, that the \mpitype{MPIO\_\-Request} types cannot be used
with LAM's standard \mpifunc{MPI\_\-TEST} and \mpifunc{MPI\_\-WAIT}
functions -- ROMIO's \mpifunc{MPIO\_\-TEST} and \mpifunc{MPIO\_\-WAIT}
functions must be used instead. There are no array versions of these
functions (e.g., \mpifunc{MPIO\_\-TESTANY}, \mpifunc{MPIO\_\-WAITANY},
etc., do not exist).
C MPI applications wanting to use MPI-2 I/O functionality can simply
include \file{mpi.h}. Fortran MPI applications, however, must include
both \file{mpif.h} and \file{mpiof.h}.
Finally, ROMIO includes its own documentation and listings of known
issues and limitations. See the \file{README} file in the ROMIO
directory in the LAM distribution.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Fortran Process Names}
\index{fortran process names}
\cmdindex{mpitask}{fortran process names}
Since Fortran does not portably provide the executable name of the
process (similar to the way that C programs get an array of {\tt
argv}), the \icmd{mpitask} command lists the name ``LAM MPI Fortran
program'' by default for MPI programs that used the Fortran binding
for \mpifunc{MPI\_\-INIT} or \mpifunc{MPI\_\-INIT\_\-THREAD}.
The environment variable \ienvvar{LAM\_\-MPI\_\-PROCESS\_\-NAME} can
be used to override this behavior.
%
Setting this environment variable before invoking \icmd{mpirun} will
cause \cmd{mpitask} to list that name instead of the default title.
%
This environment variable only works for processes that invoke the
Fortran binding for \mpifunc{MPI\_\-INIT} or
\mpifunc{MPI\_\-INIT\_\-THREAD}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{MPI Thread Support}
\label{sec:misc-threads}
\index{threads and MPI}
\index{MPI and threads|see {threads and MPI}}
\def\mtsingle{\mpiconst{MPI\_\-THREAD\_\-SINGLE}}
\def\mtfunneled{\mpiconst{MPI\_\-THREAD\_\-FUNNELED}}
\def\mtserial{\mpiconst{MPI\_\-THREAD\_\-SERIALIZED}}
\def\mtmultiple{\mpiconst{MPI\_\-THREAD\_\-MULTIPLE}}
\def\mpiinit{\mpifunc{MPI\_\-INIT}}
\def\mpiinitthread{\mpifunc{MPI\_\-INIT\_\-THREAD}}
LAM currently implements support for \mtsingle, \mtfunneled, and
\mtserial. The constant \mtmultiple\ is provided, although LAM will
never return \mtmultiple\ in the \funcarg{provided} argument to
\mpiinitthread.
LAM makes no distinction between \mtsingle\ and \mtfunneled. When
\mtserial\ is used, a global lock is used to ensure that only one
thread is inside any MPI function at any time.
\subsection{Thread Level}
Selecting the thread level for an MPI job is best described in terms
of the two parameters passed to \mpiinitthread: \funcarg{requested}
and \funcarg{provided}. \funcarg{requested} is the thread level that
the user application requests, while \funcarg{provided} is the thread
level that LAM will run the application with.
\begin{itemize}
\item If \mpiinit\ is used to initialize the job, \funcarg{requested}
will implicitly be \mtsingle. However, if the
\ienvvar{LAM\_\-MPI\_\-THREAD\_\-LEVEL} environment variable is set
to one of the values in Table~\ref{tbl:mpi-env-thread-level}, the
corresponding thread level will be used for \funcarg{requested}.
\item If \mpiinitthread\ is used to initialized the job, the
\funcarg{requested} thread level is the first thread level that the
job will attempt to use. There is currently no way to specify lower
or upper bounds to the thread level that LAM will use.
The resulting thread level is largely determined by the SSI modules
that will be used in an MPI job; each module must be able to support
the target thread level. A complex algorithm is used to attempt to
find a thread level that is acceptable to all SSI modules.
Generally, the algorithm starts at \funcarg{requested} and works
backwards towards \mpiconst{MPI\_\-THREAD\_\-SINGLE} looking for an
acceptable level. However, any module may {\em increase} the thread
level under test if it requires it. At the end of this process, if
an acceptable thread level is not found, the MPI job will abort.
\end{itemize}
\begin{table}[htbp]
\centering
\begin{tabular}{|c|l|}
\hline
Value & \multicolumn{1}{|c|}{Meaning} \\
\hline
\hline
undefined & \mtsingle \\
0 & \mtsingle \\
1 & \mtfunneled \\
2 & \mtserial \\
3 & \mtmultiple \\
\hline
\end{tabular}
\caption{Valid values for the \envvar{LAM\_\-MPI\_\-THREAD\_\-LEVEL}
environment variable.}
\label{tbl:mpi-env-thread-level}
\end{table}
Also note that certain SSI modules require higher thread support
levels than others. For example, any checkpoint/restart SSI module
will require a minimum of \mtserial, and will attempt to adjust the
thread level upwards as necessary (if that CR module will be used
during the job).
Hence, using \mpiinit\ to initialize an MPI job does not imply that
the provided thread level will be \mtsingle.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{MPI-2 Name Publishing}
\index{published names}
\index{dynamic name publishing|see {published names}}
\index{name publising|see {published names}}
LAM supports the MPI-2 functions \mpifunc{MPI\_\-PUBLISH\_\-NAME} and
\mpifunc{MPI\_\-UNPUBLISH\_\-NAME} for publishing and unpublishing
names, respectively. Published names are stored within the LAM
daemons, and are therefore persistent, even when the MPI process that
published them dies.
As such, it is important for correct MPI programs to unpublish their
names before they terminate. However, if stale names are left in the
LAM universe when an MPI process terminates, the \icmd{lamclean}
command can be used to clean {\em all} names from the LAM RTE.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Interoperable MPI (IMPI) Support}
\index{IMPI}
\index{Interoperable MPI|see {IMPI}}
The IMPI extensions are still considered experimental, and are
disabled by default in LAM. They must be enabled when LAM is
configured and built (see the Installation Guide file for details).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Purpose of IMPI}
The Interoperable Message Passing Interface (IMPI) is a standardized
protocol that enables different MPI implementations to communicate
with each other. This allows users to run jobs that utilize different
hardware, but still use the vendor-tuned MPI implementation on each
machine. This would be helpful in situations where the job is too
large to fit in one system, or when different portions of code are
better suited for different MPI implementations.
IMPI defines only the protocols necessary between MPI implementations;
vendors may still use their own high-performance protocols within
their own implementations.
Terms that are used throughout the LAM / IMPI documentation include:
IMPI clients, IMPI hosts, IMPI processes, and the IMPI server. See the
IMPI section of the the LAM FAQ for definitions of these terms on the
LAM web site.\footnote{\url{http://www.lam-mpi.org/faq/}}
For more information about IMPI and the IMPI Standard, see the main
IMPI web site.\footnote{\url{http://impi.nist.gov/}}.
Note that the IMPI standard only applies to MPI-1 functionality.
Using non-local MPI-2 functions on communicators with ranks that live
on another MPI implementation will result in undefined behavior (read:
kaboom). For example, \mpifunc{MPI\_\-COMM\_\-SPAWN} will certainly
fail, but \mpifunc{MPI\_\-COMM\_\-SET\_\-NAME} works fine (because it
is a local action).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Current IMPI functionality}
\index{IMPI!supported functionality}
LAM currently implements a subset of the IMPI functionality:
\begin{itemize}
\item Startup and shutdown
\item All MPI-1 point-to-point functionality
\item Some of the data-passing collectives:
\mpifunc{MPI\_\-ALLREDUCE}, \mpifunc{MPI\_\-BARRIER},
\mpifunc{MPI\_\-BCAST}, \mpifunc{MPI\_\-REDUCE}
\end{itemize}
LAM does not implement the following on communicators with ranks that
reside on another MPI implementation:
\begin{itemize}
\item \mpifunc{MPI\_\-PROBE} and \mpifunc{MPI\_\-IPROBE}
\item \mpifunc{MPI\_\-CANCEL}
\item All data-passing collectives that are not listed above
\item All communicator constructor/destructor collectives (e.g.,
\mpifunc{MPI\_\-COMM\_\-SPLIT}, etc.)
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Running an IMPI Job}
\index{IMPI!running jobs}
Running an IMPI job requires the use of an IMPI
server.\index{IMPI!server} An open source, freely-available server is
available.\footnote{\url{http://www.osl.iu.edu/research/impi/}}
As described in the IMPI standard, the first step is to launch the
IMPI server with the number of expected clients. The open source
server from above requires at least one authentication mechanism to be
specified (``none'' or ``key''). For simplicity, these instructions
assume that the ``none'' mechanism will be used. Only one IMPI server
needs to be launched per IMPI job, regardless of how many clients will
connect.
%
For this example, assume that there will be 2 IMPI clients; client 0
will be run in LAM/MPI, and client 1 will be run elsewhere.
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ export IMPI_AUTH_NONE=
shell$ impi_server -server 2 -auth 0
10.0.0.32:9283
\end{lstlisting}
% Stupid emacs mode: $
The IMPI server must be left running for the duration of the IMPI job.
%
The string that the IMPI server gives as output (``10.0.0.32:9283'',
in this case) must be given to \cmd{mpirun} when starting the LAM
process that will run in IMPI:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun -client 0 10.0.0.32:9283 C my_mpi_program
\end{lstlisting}
% Stupid emacs mode: $
This will run the MPI program in the local LAM universe and connect it
to the IMPI server. From there, the IMPI protocols will take over and
join this program to all other IMPI clients.
Note that LAM will launch an auxiliary ``helper'' MPI program named
\cmd{impid} that will last for the duration of the IMPI job. It acts
as a proxy to the other IMPI processes, and should not be manually
killed. It will die on its own accord when the IMPI job is complete.
If something goes wrong, it can be killed with the \cmd{lamclean}
command, just like any other MPI process.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Complex Network Setups}
In some complex network configurations -- particularly those that span
multiple private networking domains -- it may necessary to override
the hostname that IMPI uses for connectivity (i.e., use something
other that what is returned by the \cmd{hostname} command). In this
case, the \ienvvar{IMPI\_\-HOST\_\-NAME} can be used. If set, this
variable is expected to contain a resolvable name (or IP address) that
should be used.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Batch Queuing System Support}
\label{sec:misc-batch}
\index{batch queue systems}
\index{Portable Batch System|see {batch queue systems}}
\index{PBS|see {batch queue systems}}
\index{PBS Pro|see {batch queue systems}}
\index{OpenPBS|see {batch queue systems}}
\index{Load Sharing Facility|see {batch queue systems}}
\index{LSF|see {batch queue systems}}
\index{Clubmask|see {batch queue systems}}
LAM is now aware of some batch queuing systems. Support is currently
included for PBS, LSF, and Clubmask-based
systems. There is also a generic functionality that allows users of
other batch queue systems to take advantages of this functionality.
\begin{itemize}
\item When running under a supported batch queue system, LAM will take
precautions to isolate itself from other instances of LAM in
concurrent batch jobs. That is, the multiple LAM instances from the
same user can exist on the same machine when executing in batch.
This allows a user to submit as many LAM jobs as necessary, and even
if they end up running on the same nodes, a \cmd{lamclean} in one
job will not kill MPI applications in another job.
\item This behavior is {\em only} exhibited under a batch environment.
Other batch systems can easily be supported -- let the LAM Team know
if you'd like to see support for others included. Manually setting
the environment variable \ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX}
on the node where \icmd{lamboot} is run achieves the same ends.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Location of LAM's Session Directory}
\label{sec:misc-session-directory}
\index{session directory}
By default, LAM will create a temporary per-user session directory
in the following directory:
\centerline{\file{<tmpdir>/lam-<username>@<hostname>[-<session\_suffix>]}}
\noindent Each of the components is described below:
\begin{description}
\item[\file{<tmpdir>}]: LAM will set the prefix used for the session
directory based on the following search order:
\begin{enumerate}
\item The value of the \ienvvar{LAM\_\-MPI\_\-SESSION\_\-PREFIX}
environment variable
\item The value of the \ienvvar{TMPDIR} environment variable
\item \file{/tmp/}
\end{enumerate}
It is important to note that (unlike
\ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX}), the environment
variables for determining \file{<tmpdir>} must be set on each node
(although they do not necessarily have to be the same value).
\file{<tmpdir>} must exist before \icmd{lamboot} is run, or
\icmd{lamboot} will fail.
\item[\file{<username>}]: The user's name on that host.
\item[\file{<hostname>}]: The hostname.
\item[\file{<session\_suffix>}]: LAM will set the suffix (if any) used
for the session directory based on the following search order:
\begin{enumerate}
\item The value of the \ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX}
environment variable.
\item If running under a supported batch system, a unique session
ID (based on information from the batch system) will be used.
\end{enumerate}
\end{description}
\ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX} and the batch information
only need to be available on the node from which \icmd{lamboot} is
run. \icmd{lamboot} will propagate the information to the other
nodes.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Signal Catching}
\index{signals}
LAM MPI now catches the signals SEGV, BUS, FPE, and ILL. The signal
handler terminates the application. This is useful in batch jobs to
help ensure that \icmd{mpirun} returns if an application process dies.
To disable the catching of signals use the \cmdarg{-nsigs} option to
\icmd{mpirun}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{MPI Attributes}
\begin{discuss}
Need to have discussion of built-in attributes here, such as
MPI\_\-UNIVERSE\_\-SIZE, etc. Should specifically mention that
MPI\_\-UNIVERSE\_\-SIZE is fixed at \mpifunc{MPI\_\-INIT} time (at
least it is as of this writing -- who knows what it will be when we
release 7.1? :-).
This whole section is for 7.1.
\end{discuss}
|