File: misc.tex

package info (click to toggle)
lam 7.1.4-8
links: PTS
area: main
in suites: forky, sid
size: 56,404 kB
sloc: ansic: 156,541; sh: 9,991; cpp: 7,699; makefile: 5,621; perl: 488; fortran: 260; asm: 83
file content (451 lines) | stat: -rw-r--r-- 18,019 bytes
parent folder | download | duplicates (10)
% -*- latex -*-
%
% Copyright (c) 2001-2004 The Trustees of Indiana University.  
%                         All rights reserved.
% Copyright (c) 1998-2001 University of Notre Dame. 
%                         All rights reserved.
% Copyright (c) 1994-1998 The Ohio State University.  
%                         All rights reserved.
% 
% This file is part of the LAM/MPI software package.  For license
% information, see the LICENSE file in the top level directory of the
% LAM/MPI source distribution.
%
% $Id: misc.tex,v 1.20 2003/08/12 01:10:28 jsquyres Exp $
%

\chapter{Miscellaneous}
\label{sec:misc}

This chapter covers a variety of topics that don't conveniently fit
into other chapters.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Singleton MPI Processes}

It is possible to run an MPI process without the \cmd{mpirun} or
\cmd{mpiexec} commands -- simply run the program as one would normally
launch a serial program:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ my_mpi_program
\end{lstlisting}
% Stupid emacs mode: $

Doing so will create an \mpiconst{MPI\_\-COMM\_\-WORLD} with a single
process.  This process can either run by itself, or spawn or connect
to other MPI processes and become part of a larger MPI jobs using the
MPI-2 dynamic function calls.  A LAM RTE must be running on the local
node, as with jobs started with \cmd{mpirun}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{MPI-2 I/O Support}
\index{ROMIO}
\index{MPI-2 I/O support|see {ROMIO}}
\index{I/O support|see {ROMIO}}

MPI-2 I/O support is provided through the ROMIO
package~\cite{thak99a,thak99b}.  Since support is provided through a
third party package, its integration with LAM/MPI is not ``complete.''
Specifically, everywhere the MPI-2 standard specifies an argument of
type \mpitype{MPI\_\-Request}, ROMIO's provided functions expect an
argument of type \mpitype{MPIO\_\-Request}.

Note, too, that the \mpitype{MPIO\_\-Request} types cannot be used
with LAM's standard \mpifunc{MPI\_\-TEST} and \mpifunc{MPI\_\-WAIT}
functions -- ROMIO's \mpifunc{MPIO\_\-TEST} and \mpifunc{MPIO\_\-WAIT}
functions must be used instead.  There are no array versions of these
functions (e.g., \mpifunc{MPIO\_\-TESTANY}, \mpifunc{MPIO\_\-WAITANY},
etc., do not exist).

C MPI applications wanting to use MPI-2 I/O functionality can simply
include \file{mpi.h}.  Fortran MPI applications, however, must include
both \file{mpif.h} and \file{mpiof.h}.

Finally, ROMIO includes its own documentation and listings of known
issues and limitations.  See the \file{README} file in the ROMIO
directory in the LAM distribution.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Fortran Process Names}
\index{fortran process names}
\cmdindex{mpitask}{fortran process names}

Since Fortran does not portably provide the executable name of the
process (similar to the way that C programs get an array of {\tt
  argv}), the \icmd{mpitask} command lists the name ``LAM MPI Fortran
program'' by default for MPI programs that used the Fortran binding
for \mpifunc{MPI\_\-INIT} or \mpifunc{MPI\_\-INIT\_\-THREAD}.

The environment variable \ienvvar{LAM\_\-MPI\_\-PROCESS\_\-NAME} can
be used to override this behavior.
%
Setting this environment variable before invoking \icmd{mpirun} will
cause \cmd{mpitask} to list that name instead of the default title.
%
This environment variable only works for processes that invoke the
Fortran binding for \mpifunc{MPI\_\-INIT} or
\mpifunc{MPI\_\-INIT\_\-THREAD}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{MPI Thread Support}
\label{sec:misc-threads}
\index{threads and MPI}
\index{MPI and threads|see {threads and MPI}}

\def\mtsingle{\mpiconst{MPI\_\-THREAD\_\-SINGLE}}
\def\mtfunneled{\mpiconst{MPI\_\-THREAD\_\-FUNNELED}}
\def\mtserial{\mpiconst{MPI\_\-THREAD\_\-SERIALIZED}}
\def\mtmultiple{\mpiconst{MPI\_\-THREAD\_\-MULTIPLE}}
\def\mpiinit{\mpifunc{MPI\_\-INIT}}
\def\mpiinitthread{\mpifunc{MPI\_\-INIT\_\-THREAD}}

LAM currently implements support for \mtsingle, \mtfunneled, and
\mtserial.  The constant \mtmultiple\ is provided, although LAM will
never return \mtmultiple\ in the \funcarg{provided} argument to
\mpiinitthread.

LAM makes no distinction between \mtsingle\ and \mtfunneled.  When
\mtserial\ is used, a global lock is used to ensure that only one
thread is inside any MPI function at any time.

\subsection{Thread Level}

Selecting the thread level for an MPI job is best described in terms
of the two parameters passed to \mpiinitthread: \funcarg{requested}
and \funcarg{provided}.  \funcarg{requested} is the thread level that
the user application requests, while \funcarg{provided} is the thread
level that LAM will run the application with.

\begin{itemize}
\item If \mpiinit\ is used to initialize the job, \funcarg{requested}
  will implicitly be \mtsingle.  However, if the
  \ienvvar{LAM\_\-MPI\_\-THREAD\_\-LEVEL} environment variable is set
  to one of the values in Table~\ref{tbl:mpi-env-thread-level}, the
  corresponding thread level will be used for \funcarg{requested}.
  
\item If \mpiinitthread\ is used to initialized the job, the
  \funcarg{requested} thread level is the first thread level that the
  job will attempt to use.  There is currently no way to specify lower
  or upper bounds to the thread level that LAM will use.
  
  The resulting thread level is largely determined by the SSI modules
  that will be used in an MPI job; each module must be able to support
  the target thread level.  A complex algorithm is used to attempt to
  find a thread level that is acceptable to all SSI modules.
  Generally, the algorithm starts at \funcarg{requested} and works
  backwards towards \mpiconst{MPI\_\-THREAD\_\-SINGLE} looking for an
  acceptable level.  However, any module may {\em increase} the thread
  level under test if it requires it.  At the end of this process, if
  an acceptable thread level is not found, the MPI job will abort.
\end{itemize}
  
\begin{table}[htbp]
  \centering
  \begin{tabular}{|c|l|}
    \hline
    Value & \multicolumn{1}{|c|}{Meaning} \\
    \hline
    \hline
    undefined & \mtsingle \\
    0 & \mtsingle \\
    1 & \mtfunneled \\
    2 & \mtserial \\
    3 & \mtmultiple \\
    \hline
  \end{tabular}
  \caption{Valid values for the \envvar{LAM\_\-MPI\_\-THREAD\_\-LEVEL}
    environment variable.}
  \label{tbl:mpi-env-thread-level}
\end{table}

Also note that certain SSI modules require higher thread support
levels than others.  For example, any checkpoint/restart SSI module
will require a minimum of \mtserial, and will attempt to adjust the
thread level upwards as necessary (if that CR module will be used
during the job).

Hence, using \mpiinit\ to initialize an MPI job does not imply that
the provided thread level will be \mtsingle.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{MPI-2 Name Publishing}
\index{published names}
\index{dynamic name publishing|see {published names}}
\index{name publising|see {published names}}

LAM supports the MPI-2 functions \mpifunc{MPI\_\-PUBLISH\_\-NAME} and
\mpifunc{MPI\_\-UNPUBLISH\_\-NAME} for publishing and unpublishing
names, respectively.  Published names are stored within the LAM
daemons, and are therefore persistent, even when the MPI process that
published them dies.  

As such, it is important for correct MPI programs to unpublish their
names before they terminate.  However, if stale names are left in the
LAM universe when an MPI process terminates, the \icmd{lamclean}
command can be used to clean {\em all} names from the LAM RTE.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Interoperable MPI (IMPI) Support}
\index{IMPI}
\index{Interoperable MPI|see {IMPI}}

The IMPI extensions are still considered experimental, and are
disabled by default in LAM.  They must be enabled when LAM is
configured and built (see the Installation Guide file for details).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Purpose of IMPI}

The Interoperable Message Passing Interface (IMPI) is a standardized
protocol that enables different MPI implementations to communicate
with each other. This allows users to run jobs that utilize different
hardware, but still use the vendor-tuned MPI implementation on each
machine. This would be helpful in situations where the job is too
large to fit in one system, or when different portions of code are
better suited for different MPI implementations.

IMPI defines only the protocols necessary between MPI implementations;
vendors may still use their own high-performance protocols within
their own implementations.

Terms that are used throughout the LAM / IMPI documentation include:
IMPI clients, IMPI hosts, IMPI processes, and the IMPI server. See the
IMPI section of the the LAM FAQ for definitions of these terms on the
LAM web site.\footnote{\url{http://www.lam-mpi.org/faq/}}

For more information about IMPI and the IMPI Standard, see the main
IMPI web site.\footnote{\url{http://impi.nist.gov/}}.

Note that the IMPI standard only applies to MPI-1 functionality.
Using non-local MPI-2 functions on communicators with ranks that live
on another MPI implementation will result in undefined behavior (read:
kaboom).  For example, \mpifunc{MPI\_\-COMM\_\-SPAWN} will certainly
fail, but \mpifunc{MPI\_\-COMM\_\-SET\_\-NAME} works fine (because it
is a local action).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Current IMPI functionality}
\index{IMPI!supported functionality}

LAM currently implements a subset of the IMPI functionality:

\begin{itemize}
\item Startup and shutdown

\item All MPI-1 point-to-point functionality
  
\item Some of the data-passing collectives:
  \mpifunc{MPI\_\-ALLREDUCE}, \mpifunc{MPI\_\-BARRIER},
  \mpifunc{MPI\_\-BCAST}, \mpifunc{MPI\_\-REDUCE}
\end{itemize}

LAM does not implement the following on communicators with ranks that
reside on another MPI implementation:

\begin{itemize}
\item \mpifunc{MPI\_\-PROBE} and \mpifunc{MPI\_\-IPROBE}

\item \mpifunc{MPI\_\-CANCEL}

\item All data-passing collectives that are not listed above

\item All communicator constructor/destructor collectives (e.g.,
  \mpifunc{MPI\_\-COMM\_\-SPLIT}, etc.)
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Running an IMPI Job}
\index{IMPI!running jobs}

Running an IMPI job requires the use of an IMPI
server.\index{IMPI!server} An open source, freely-available server is
available.\footnote{\url{http://www.osl.iu.edu/research/impi/}}

As described in the IMPI standard, the first step is to launch the
IMPI server with the number of expected clients.  The open source
server from above requires at least one authentication mechanism to be
specified (``none'' or ``key'').  For simplicity, these instructions
assume that the ``none'' mechanism will be used.  Only one IMPI server
needs to be launched per IMPI job, regardless of how many clients will
connect.
%
For this example, assume that there will be 2 IMPI clients; client 0
will be run in LAM/MPI, and client 1 will be run elsewhere.  

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ export IMPI_AUTH_NONE=
shell$ impi_server -server 2 -auth 0
10.0.0.32:9283
\end{lstlisting}
% Stupid emacs mode: $

The IMPI server must be left running for the duration of the IMPI job.
%
The string that the IMPI server gives as output (``10.0.0.32:9283'',
in this case) must be given to \cmd{mpirun} when starting the LAM
process that will run in IMPI:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun -client 0 10.0.0.32:9283 C my_mpi_program
\end{lstlisting}
% Stupid emacs mode: $

This will run the MPI program in the local LAM universe and connect it
to the IMPI server.  From there, the IMPI protocols will take over and
join this program to all other IMPI clients.

Note that LAM will launch an auxiliary ``helper'' MPI program named
\cmd{impid} that will last for the duration of the IMPI job.  It acts
as a proxy to the other IMPI processes, and should not be manually
killed.  It will die on its own accord when the IMPI job is complete.
If something goes wrong, it can be killed with the \cmd{lamclean}
command, just like any other MPI process.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Complex Network Setups}

In some complex network configurations -- particularly those that span
multiple private networking domains -- it may necessary to override
the hostname that IMPI uses for connectivity (i.e., use something
other that what is returned by the \cmd{hostname} command).  In this
case, the \ienvvar{IMPI\_\-HOST\_\-NAME} can be used.  If set, this
variable is expected to contain a resolvable name (or IP address) that
should be used.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Batch Queuing System Support}
\label{sec:misc-batch}
\index{batch queue systems}
\index{Portable Batch System|see {batch queue systems}}
\index{PBS|see {batch queue systems}}
\index{PBS Pro|see {batch queue systems}}
\index{OpenPBS|see {batch queue systems}}
\index{Load Sharing Facility|see {batch queue systems}}
\index{LSF|see {batch queue systems}}
\index{Clubmask|see {batch queue systems}}

LAM is now aware of some batch queuing systems.  Support is currently
included for PBS, LSF, and Clubmask-based
systems.  There is also a generic functionality that allows users of
other batch queue systems to take advantages of this functionality.

\begin{itemize}
\item When running under a supported batch queue system, LAM will take
  precautions to isolate itself from other instances of LAM in
  concurrent batch jobs.  That is, the multiple LAM instances from the
  same user can exist on the same machine when executing in batch.
  This allows a user to submit as many LAM jobs as necessary, and even
  if they end up running on the same nodes, a \cmd{lamclean} in one
  job will not kill MPI applications in another job.
  
\item This behavior is {\em only} exhibited under a batch environment.
  Other batch systems can easily be supported -- let the LAM Team know
  if you'd like to see support for others included.  Manually setting
  the environment variable \ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX}
  on the node where \icmd{lamboot} is run achieves the same ends.
 \end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Location of LAM's Session Directory}
\label{sec:misc-session-directory}
\index{session directory}

By default, LAM will create a temporary per-user session directory
in the following directory:

\centerline{\file{<tmpdir>/lam-<username>@<hostname>[-<session\_suffix>]}}

\noindent Each of the components is described below:

\begin{description}
\item[\file{<tmpdir>}]: LAM will set the prefix used for the session
  directory based on the following search order:

  \begin{enumerate}
    \item The value of the \ienvvar{LAM\_\-MPI\_\-SESSION\_\-PREFIX}
      environment variable

    \item The value of the \ienvvar{TMPDIR} environment variable

    \item \file{/tmp/}
  \end{enumerate}
  
  It is important to note that (unlike
  \ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX}), the environment
  variables for determining \file{<tmpdir>} must be set on each node
  (although they do not necessarily have to be the same value).
  \file{<tmpdir>} must exist before \icmd{lamboot} is run, or
  \icmd{lamboot} will fail.

\item[\file{<username>}]: The user's name on that host.

\item[\file{<hostname>}]: The hostname.
  
\item[\file{<session\_suffix>}]: LAM will set the suffix (if any) used
  for the session directory based on the following search order:

  \begin{enumerate}

    \item The value of the \ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX}
      environment variable.
  
    \item If running under a supported batch system, a unique session
      ID (based on information from the batch system) will be used.
  \end{enumerate}
\end{description}
  
\ienvvar{LAM\_\-MPI\_\-SESSION\_\-SUFFIX} and the batch information
only need to be available on the node from which \icmd{lamboot} is
run.  \icmd{lamboot} will propagate the information to the other
nodes.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Signal Catching}
\index{signals}

LAM MPI now catches the signals SEGV, BUS, FPE, and ILL.  The signal
handler terminates the application. This is useful in batch jobs to
help ensure that \icmd{mpirun} returns if an application process dies.
To disable the catching of signals use the \cmdarg{-nsigs} option to
\icmd{mpirun}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{MPI Attributes}

\begin{discuss}
  Need to have discussion of built-in attributes here, such as
  MPI\_\-UNIVERSE\_\-SIZE, etc.  Should specifically mention that
  MPI\_\-UNIVERSE\_\-SIZE is fixed at \mpifunc{MPI\_\-INIT} time (at
  least it is as of this writing -- who knows what it will be when we
  release 7.1? :-).

  This whole section is for 7.1.
\end{discuss}