1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942
|
% -*- latex -*-
%
% Copyright (c) 2001-2003 The Trustees of Indiana University.
% All rights reserved.
% Copyright (c) 1998-2001 University of Notre Dame.
% All rights reserved.
% Copyright (c) 1994-1998 The Ohio State University.
% All rights reserved.
%
% This file is part of the LAM/MPI software package. For license
% information, see the LICENSE file in the top level directory of the
% LAM/MPI source distribution.
%
% $Id: getting-started.tex,v 1.19 2003/10/11 14:02:48 jsquyres Exp $
%
\chapter{Getting Started with LAM/MPI}
\label{sec:getting-started}
This chapter provides a summary tutorial describing some of the high
points of using LAM/MPI. It is not intended as a comprehensive guide;
the finer details of some situations will not be explained. However,
it is a good step-by-step guide for users who are new to MPI and/or
LAM/MPI.
Using LAM/MPI is conceptually simple:
\begin{itemize}
\item Launch the LAM run-time environment (RTE)
\item Repeat as necessary:
\begin{itemize}
\item Compile MPI program(s)
\item Run MPI program(s)
\end{itemize}
\item Shut down the LAM run-time environment
\end{itemize}
The tutorial below will describe each of these steps.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{One-Time Setup}
This section describes actions that usually only need to be performed
once per user in order to setup LAM to function properly.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Setting the Path}
\label{sec:getting-started-path}
One of the main requirements for LAM/MPI to function properly is for
the LAM executables to be in your path. This step may vary from site
to site; for example, the LAM executables may already be in your path
-- consult your local administrator to see if this is the case.
{\bf NOTE:} If the LAM executables are already in your path, you can
skip this step and proceed to
Section~\ref{sec:getting-started-ssi}.
In many cases, if your system does not already provide the LAM
executables in your path, you can add them by editing your ``dot''
files that are executed automatically by the shell upon login (both
interactive and non-interactive logins). Each shell has a different
file to edit and corresponding syntax, so you'll need to know which
shell you are using.
Tables~\ref{tbl:getting-started-shells-interactive}
and~\ref{tbl:getting-started-shells-noninteractive} list several
common shells and the associated files that are typically used.
Consult the documentation for your shell for more information.
\begin{table}[htbp]
\centering
\begin{tabular}{|p{1in}|p{4in}|}
\hline
\multicolumn{1}{|c|}{Shell name} &
\multicolumn{1}{|c|}{Interactive login startup file} \\
%
\hline
\cmd{sh} (or Bash named ``\cmd{sh}'') & \ifile{.profile} \\
%
\hline
\cmd{csh} & \ifile{.cshrc} followed by \ifile{.login} \\
%
\hline
\cmd{tcsh} & \ifile{.tcshrc} if it exists, \ifile{.cshrc} if it
does not, followed by \ifile{.login} \\
%
\hline
\cmd{bash} & \ifile{.bash\_\-profile} if it exists, or
\ifile{.bash\_\-login} if it exists, or \ifile{.profile} if it
exists (in that order). Note that some Linux distributions
automatically come with \ifile{.bash\_\-profile} scripts for users
that automatically execute \ifile{.bashrc} as well. Consult the
\cmd{bash} manual page for more information. \\
\hline
\end{tabular}
\caption[List of common shells and the corresponding environment
setup files for interactive shells.]{List of common shells and
the corresponding environmental setup files commonly used with
each for interactive startups (e.g., normal login). All files
listed are assumed to be in the \file{\$HOME} directory.}
\label{tbl:getting-started-shells-interactive}
\end{table}
\begin{table}[htbp]
\centering
\begin{tabular}{|p{1in}|p{4in}|}
\hline
\multicolumn{1}{|c|}{Shell name} &
\multicolumn{1}{|c|}{Non-interactive login startup file} \\
%
\hline
\cmd{sh} (or Bash named ``\cmd{sh}'') & This shell does not
execute any file automatically, so LAM will execute the
\file{.profile} script before invoking LAM executables on remote
nodes \\
%
\hline
\cmd{csh} & \ifile{.cshrc} \\
%
\hline
\cmd{tcsh} & \ifile{.tcshrc} if it exists, \ifile{.cshrc} if it
does not \\
%
\hline
\cmd{bash} & \ifile{.bashrc} if it exists \\
\hline
\end{tabular}
\caption[List of common shells and the corresponding environment
setup files for non-interactive shells.]{List of common shells and
the corresponding environmental setup files commonly used with
each for non-interactive startups (e.g., normal login). All files
listed are assumed to be in the \file{\$HOME} directory.}
\label{tbl:getting-started-shells-noninteractive}
\end{table}
You'll also need to know the directory where LAM was installed. For
the purposes of this tutorial, we'll assume that LAM is installed in
\file{/usr/local/lam}. And to re-emphasize a critical point: these
are only guidelines -- the specifics may vary depending on your local
setup. Consult your local system or network administrator for more
details.
Once you have determined all three pieces of information (what shell
you are using, what directory LAM was installed to, and what the
appropriate ``dot'' file to edit), open the ``dot'' file in a text
editor and follow the general directions listed below:
\begin{itemize}
\index{shell setup!Bash/Bourne shells}
\item For the Bash, Bourne, and Bourne-related shells, add the
following lines:
\lstset{style=lam-bourne}
\begin{lstlisting}
PATH=/usr/local/lam/bin:$PATH
export PATH
\end{lstlisting}
% Stupid emacs mode: $
\index{shell setup!C shell (and related)}
\item For the C shell and related shells (such as \cmd{tcsh}), add the
following line:
\lstset{style=lam-shell}
\begin{lstlisting}
set path = (/usr/local/lam/bin $path)
\end{lstlisting}
% Stupid emacs mode: $
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Finding the LAM Manual Pages}
\index{manual pages}
LAM includes manual pages for all supported MPI functions as well as
all of the LAM executables. While this step {\em is not necessary}
for correct MPI functionality, it can be helpful when looking for MPI
or LAM-specific information.
Using Tables~\ref{tbl:getting-started-shells-interactive}
and~\ref{tbl:getting-started-shells-noninteractive}, find the right
``dot'' file to edit. Assuming again that LAM was installed to
\file{/usr/local/lam}, open the appropriate ``dot'' file in a text
editor and follow the general directions listed below:
\begin{itemize}
\index{shell setup!Bash/Bourne shells}
\item For the Bash, Bourne, and Bourne-related shells, add the
following lines:
\lstset{style=lam-bourne}
\begin{lstlisting}
MANPATH=/usr/local/lam/man:$MANPATH
export MANPATH
\end{lstlisting}
% Stupid emacs mode: $
\index{shell setup!C shell (and related)}
\item For the C shell and related shells (such as \cmd{tcsh}), add the
following lines:
\lstset{style=lam-shell}
\begin{lstlisting}
if ($?MANPATH == 0) then
setenv MANPATH /usr/local/lam/man
else
setenv MANPATH /usr/local/lam/man:$MANPATH
endif
\end{lstlisting}
% Stupid emacs mode: $
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{System Services Interface (SSI)}
\label{sec:getting-started-ssi}
LAM/MPI is built around a core of System Services Interface (SSI)
plugin modules. SSI allows run-time selection of different underlying
services within the LAM/MPI run-time environment, including tunable
parameters that can affect the performance of MPI programs.
While this tutorial won't go into much detail about SSI, just be aware
that you'll see mention of ``SSI'' in the text below. In a few
places, the tutorial passes parameters to various SSI modules through
either environment variables and/or the \cmdarg{-ssi} command line
parameter to several LAM commands.
See other sections in this manual for a more complete description of
SSI (Chapter~\ref{sec:ssi}, page~\pageref{sec:ssi}), how it works, and
what run-time parameters are available (Chapters~\ref{sec:lam-ssi}
and~\ref{sec:mpi-ssi}, pages~\pageref{sec:lam-ssi}
and~\pageref{sec:mpi-ssi}, respectively). Also, the
\manpage{lamssi(7)}, \manpage{lamssi\_\-boot(7)},
\manpage{lamssi\_\-coll(7)}, \manpage{lamssi\_\-cr(7)}, and
\manpage{lamssi\_\-rpi(7)} manual pages each provide additional
information on LAM's SSI mechanisms.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{What Does Your LAM/MPI Installation Support?}
LAM/MPI can be installed with a large number of configuration options.
It depends on what choices your system/network administrator made when
configuring and installing LAM/MPI. The \icmd{laminfo} command is
provided to show the end-user with information about what the
installed LAM/MPI supports. Running ``\cmd{laminfo}'' (with no
arguments) prints a list of LAM's capabilities, including all of its
SSI modules.
Among other things, this shows what language bindings the installed
LAM/MPI supports, what underlying network transports it supports, and
what directory LAM was installed to. The \cmdarg{-parsable} option
prints out all the same information, but in a conveniently
machine-parsable format (suitable for using with scripts).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Booting the LAM Run-Time Environment}
\label{sec:getting-started-booting}
\index{booting the LAM run-time environment}
Before any MPI programs can be executed, the LAM run-time environment
must be launched. This is typically called ``booting LAM.'' A
successfully boot process creates an instance of the LAM run-time
environment commonly referred to as the ``LAM universe.''
LAM's run-time environment can be executed in many different
environments. For example, it can be run interactively on a cluster
of workstations (even on a single workstation, perhaps to simulate
parallel execution for debugging and/or development). Or LAM can be
run in production batch scheduled systems.
This example will focus on a traditional \cmd{rsh} / \cmd{ssh}-style
workstation cluster (i.e., not under batch systems), where \cmd{rsh}
or \cmd{ssh} is used to launch executables on remote workstations.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The Boot Schema File (a.k.a, ``Hostfile'', ``Machinefile'')}
\label{sec:getting-started-hostfile}
When using \cmd{rsh} or \cmd{ssh} to boot LAM, you will need a text
file listing the hosts on which to launch the LAM run-time
environment. This file is typically referred to as a ``boot schema'',
``hostfile'', or ``machinefile.'' For example:
\lstset{style=lam-shell}
\begin{lstlisting}
# My boot schema
node1.cluster.example.com
node2.cluster.example.com
node3.cluster.example.com cpu=2
node4.cluster.example.com cpu=2
\end{lstlisting}
Four nodes are specified in the above example by listing their IP
hostnames. Note also the ``{\tt cpu=2}'' that follows the last two
entries. This tells LAM that these machines each have two CPUs
available for running MPI programs (e.g., \host{node3} and
\host{node4} are two-way SMPs). It is important to note that the
number of CPUs specified here has {\em no} correlation to the
physicial number of CPUs in the machine. It is simply a convenience
mechanism telling LAM how many MPI processes we will typically launch
on that node. The ramifications of the {\tt cpu} key will be discussed
later.
The location of this text file is irrelevant; for the purposes of this
example, we'll assume that it is named \file{hostfile} and is located
in the current working directory.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \icmd{lamboot} Command}
\label{sec:getting-started-lamboot}
The \cmd{lamboot} command is used to launch the LAM run-time
environment. For each machine listed in the boot schema, the
following conditions must be met for LAM's run-time environment to be
booted correctly:
\cmdindex{lamboot}{conditions for success}
\begin{itemize}
\item The machine must be reachable and operational.
\item The user must be able to non-interactively execute arbitrary
commands on the machine (e.g., without being prompted for a
password).
\item The LAM executables must be locatable on that machine, using the
user's shell search path.
\item The user must be able to write to the LAM session directory
(usually somewhere under \file{/tmp}).
\item The shell's start-up scripts must not print anything on standard
error.
\item All machines must be able to resolve the fully-qualified domain
name (FQDN) of all the machines being booted (including itself).
\end{itemize}
Once all of these conditions are met, the \cmd{lamboot} command is
used to launch the LAM run-time environment. For example:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamboot -v -ssi boot rsh hostfile
LAM 7.0/MPI 2 C++/ROMIO - Indiana University
n0<1234> ssi:boot:base:linear: booting n0 (node1.cluster.example.com)
n0<1234> ssi:boot:base:linear: booting n1 (node2.cluster.example.com)
n0<1234> ssi:boot:base:linear: booting n2 (node3.cluster.example.com)
n0<1234> ssi:boot:base:linear: booting n3 (node4.cluster.example.com)
n0<1234> ssi:boot:base:linear: finished
\end{lstlisting}
% Stupid emacs mode: $
The parameters passed to \cmd{lamboot} in the example above are as
follows:
\begin{itemize}
\item \cmdarg{-v}: Make \cmd{lamboot} be slightly verbose.
\item \cmdarg{-ssi boot rsh}: Ensure that LAM uses the
\cmd{rsh}/\cmd{ssh} boot module to boot the LAM universe.
Typically, LAM chooses the right boot module automatically (and
therefore this parameter is not typically necessary), but to ensure
that this tutorial does exactly what we want it to do, we use this
parameter to absolutely ensure that LAM uses \cmd{rsh} or \cmd{ssh}
to boot the universe.
\item \file{hostfile}: Name of the boot schema file.
\end{itemize}
Common causes of failure with the \cmd{lamboot} command include (but
are not limited to):
\cmdindex{lamboot}{common problems and solutions}
\begin{itemize}
\item User does not have permission to execute on the remote node.
This typically involves setting up a \file{\$HOME/.rhosts} file (if
using \cmd{rsh}), or properly configured SSH keys (using using
\cmd{ssh}).
Setting up \file{.rhosts} and/or SSH keys for password-less remote
logins are beyond the scope of this tutorial; consult local
documentation for \cmd{rsh} and \cmd{ssh}, and/or internet tutorials
on setting up SSH keys.\footnote{As of this writing, a Google search
for ``ssh keys'' turned up several decent tutorials; including any
one of them here would significantly increase the length of this
already-tremendously-long manual.}
\item The first time a user uses \cmd{ssh} to execute on a remote
node, \cmd{ssh} typically prints a warning to the standard error.
LAM will interpret this as a failure. If this happens,
\cmd{lamboot} will complain that something unexpectedly appeared on
\file{stderr}, and abort.
%
\changebegin{7.1}
%
One solution is to manually \cmd{ssh} to each node in the boot
schema once in order to eliminate the \file{stderr} warning, and
then try \cmd{lamboot} again. Another is to use the
\ssiparam{boot\_\-rsh\_\-ignore\_\-stderr} SSI parameter. We
haven't discussed SSI parameters yet, so it is probably easiest at
this point to manually \cmd{ssh} to a small number of nodes to get
the warning out of the way.
%
\changeend{7.1}
\end{itemize}
If you have having problems with \cmd{lamboot}, try using the
\cmdarg{-d} option to \cmd{lamboot}, which will print enormous amounts
of debugging output which can be helpful for determining what the
problem is. Additionally, check the \file{lamboot(1)} man page as
well as the LAM FAQ on the main LAM web
site\footnote{\url{http://www.lam-mpi.org/faq/}} under the section
``Booting LAM'' for more information.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \icmd{lamnodes} Command}
An easy way to see how many nodes and CPUs are in the current LAM
universe is with the \cmd{lamnodes} command. For example, with the
LAM universe that was created from the boot schema in
Section~\ref{sec:getting-started-hostfile}, running the \cmd{lamnodes}
command would result in the following output:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamnodes
n0 node1.cluster.example.com:1:origin,this_node
n1 node2.cluster.example.com:1:
n2 node3.cluster.example.com:2:
n3 node4.cluster.example.com:2:
\end{lstlisting}
% Stupid emacs mode: $
The ``{\tt n}'' number on the far left is the LAM node number. For
example, ``{\tt n3}'' uniquely refers to \host{node4}. Also note the
third column, which indicates how many CPUs are available for running
processes on that node. In this example, there are a total of 6 CPUs
available for running processes. This information is from the ``{\tt
cpu}'' key that was used in the hostfile, and is helpful for running
parallel processes (see below).
Finally, the ``{\tt origin}'' notation indicates which node
\cmd{lamboot} was executed from. ``{\tt this\_\-node}'' obviously
indicates which node \cmd{lamnodes} is running on.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Compiling MPI Programs}
\label{sec:getting-started-compiling}
\index{compiling MPI programs}
Note that it is {\em not} necessary to have LAM booted to compile MPI
programs.
Compiling MPI programs can be a complicated process:
\begin{itemize}
\item The same compilers should be used to compile/link user MPI
programs as were used to compile LAM itself.
\item Depending on the specific installation configuration of LAM, a
variety of \cmdarg{-I}, \cmdarg{-L}, and \cmdarg{-l} flags (and
possibly others) may be necessary to compile and/or link a user MPI
program.
\end{itemize}
LAM/MPI provides ``wrapper'' compilers to hide all of this complexity.
These wrapper compilers simply add the correct compiler/linker flags
and then invoke the underlying compiler to actually perform the
compilation/link. As such, LAM's wrapper compilers can be used just
like ``real'' compilers.
The wrapper compilers are named \icmd{mpicc} (for C programs),
\icmd{mpiCC} and \icmd{mpic++} (for C++ programs), and \icmd{mpif77}
(for Fortran programs). For example:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpicc -g -c foo.c
shell$ mpicc -g -c bar.c
shell$ mpicc -g foo.o bar.o -o my_mpi_program
\end{lstlisting}
% Stupid emacs mode: $
Note that no additional compiler and linker flags are required for
correct MPI compilation or linking. The resulting
\cmd{my\_\-mpi\_\-program} is ready to run in the LAM run-time
environment. Similarly, the other two wrapper compilers can be used
to compile MPI programs for their respective languages:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiCC -O c++_program.cc -o my_c++_mpi_program
shell$ mpif77 -O f77_program.f -o my_f77_mpi_program
\end{lstlisting}
% Stupid emacs mode: $
Note, too, that any other compiler/linker flags can be passed through
the wrapper compilers (such as \cmdarg{-g} and \cmdarg{-O}); they will
simply be passed to the back-end compiler.
Finally, note that giving the \cmdarg{-showme} option to any of the
wrapper compilers will show both the name of the back-end compiler
that will be invoked, and also all the command line options that would
have been passed for a given compile command. For example (line
breaks added to fit in the documentation):
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiCC -O c++_program.cc -o my_c++_program -showme
g++ -I/usr/local/lam/include -pthread -O c++_program.cc -o \
my_c++_program -L/usr/local/lam/lib -llammpio -llammpi++ -lpmpi \
-llamf77mpi -lmpi -llam -lutil -pthread
\end{lstlisting}
% Stupid emacs mode: $
\changebegin{7.1}
Note that the wrapper compilers only add all the LAM/MPI-specific
flags when a command-line argument that does not begin with a dash
(``-'') is present. For example:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpicc
gcc: no input files
shell$ mpicc --version
gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
\end{lstlisting}
\changeend{7.1}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Sample MPI Program in C}
\index{sample MPI program!C}
The following is a simple ``hello world'' C program.
\lstset{style=lam-c}
\begin{lstlisting}
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf(``Hello, world! I am %d of %d\n'', rank, size);
MPI_Finalize();
return 0;
}
\end{lstlisting}
This program can be saved in a text file and compiled with the
\icmd{mpicc} wrapper compiler.
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpicc hello.c -o hello
\end{lstlisting}
% Stupid emacs mode: $
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Sample MPI Program in C++}
\index{sample MPI program!C++}
The following is a simple ``hello world'' C++ program.
\lstset{style=lam-cxx}
\begin{lstlisting}
#include <iostream>
#include <mpi.h>
using namespace std;
int main(int argc, char *argv[]) {
int rank, size;
MPI::Init(argc, argv);
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
cout << ``Hello, world! I am '' << rank << `` of '' << size << endl;
MPI::Finalize();
return 0;
}
\end{lstlisting}
This program can be saved in a text file and compiled with the
\icmd{mpiCC} wrapper compiler (or \cmd{mpic++} if on case-insensitive
filesystems, such as Mac OS X's HFS+).
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiCC hello.cc -o hello
\end{lstlisting}
% Stupid emacs mode: $
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Sample MPI Program in Fortran}
\index{sample MPI program!Fortran}
The following is a simple ``hello world'' Fortran program.
\lstset{style=lam-fortran}
\begin{lstlisting}
program hello
include 'mpif.h'
integer rank, size, ierr
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
print *, "Hello, world! I am ", rank, " of ", size
call MPI_FINALIZE(ierr)
stop
end
\end{lstlisting}
This program can be saved in a text file and compiled with the
\icmd{mpif77} wrapper compiler.
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpif77 hello.f -o hello
\end{lstlisting}
% Stupid emacs mode: $
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Running MPI Programs}
\index{running MPI programs}
Once you have successfully established a LAM universe and compiled an
MPI program, you can run MPI programs in parallel.
In this section, we will show how to run a Single Program, Multiple
Data (SPMD) program. Specifically, we will run the \cmd{hello}
program (from the previous section) in parallel. The \cmd{mpirun} and
\cmd{mpiexec} commands are used for launching parallel MPI programs,
and the \cmd{mpitask} commands can be used to provide crude debugging
support. The \cmd{lamclean} command can be used to completely clean
up a failed MPI program (e.g., if an error occurs).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \icmd{mpirun} Command}
The \cmd{mpirun} command has many different options that can be used
to control the execution of a program in parallel. We'll explain only
a few of them here.
The simplest way to launch the \cmd{hello} program across all CPUs
listed in the boot schema is:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun C hello
\end{lstlisting}
% stupid emacs mode: $
The \cmdarg{C} option means ``launch one copy of \cmd{hello} on
every CPU that was listed in the boot schema.'' The \cmdarg{C}
notation is therefore convenient shorthand notation for launching a
set of processes across a group of SMPs.
Another method for running in parallel is:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun N hello
\end{lstlisting}
% stupid emacs mode: $
The \cmdarg{N} option has a different meaning than \cmdarg{C} -- it
means ``launch one copy of \cmd{hello} on every node in the LAM
universe.'' Hence, \cmdarg{N} disregards the CPU count. This can be
useful for multi-threaded MPI programs.
Finally, to run an absolute number of processes (regardless of how
many CPUs or nodes are in the LAM universe):
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun -np 4 hello
\end{lstlisting}
% stupid emacs mode: $
This runs 4 copies of \cmd{hello}. LAM will ``schedule'' how many
copies of \cmd{hello} will be run in a round-robin fashion on each
node by how many CPUs were listed in the boot schema
file.\footnote{Note that the use of the word ``schedule'' does not
imply that LAM has ties with the operating system for scheduling
purposes (it doesn't). LAM ``scheduled'' on a per-node basis; so
selecting a process to run means that it has been assigned and
launched on that node. The operating system is solely responsible
for all process and kernel scheduling.} For example, on the LAM
universe that we have previously shown in this tutorial, the following
would be launched:
\begin{itemize}
\item 1 \cmd{hello} would be launched on {\tt n0} (named
\host{node1})
\item 1 \cmd{hello} would be launched on {\tt n1} (named
\host{node2})
\item 2 \cmd{hello}s would be launched on {\tt n2} (named
\host{node3})
\end{itemize}
Note that any number can be used -- if a number is used that is
greater than how many CPUs are in the LAM universe, LAM will ``wrap
around'' and start scheduling starting with the first node again. For
example, using \cmdarg{-np 10} would result in the following
schedule:
\begin{itemize}
\item 2 \cmd{hello}s on {\tt n0} (1 from the first pass, and then a
second from the ``wrap around'')
\item 2 \cmd{hello}s on {\tt n1} (1 from the first pass, and then a
second from the ``wrap around'')
\item 4 \cmd{hello}s on {\tt n2} (2 from the first pass, and then 2
more from the ``wrap around'')
\item 2 \cmd{hello}s on {\tt n3}
\end{itemize}
The \file{mpirun(1)} man page contains much more information and
\cmd{mpirun} and the options available. For example, \cmd{mpirun}
also supports Multiple Program, Multiple Data (MPMD) programs,
although it is not discussed here. Also see
Section~\ref{sec:commands-mpirun} (page~\pageref{sec:commands-mpirun})
in this document.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \icmd{mpiexec} Command}
The MPI-2 standard recommends the use of \cmd{mpiexec} for portable
MPI process startup. In LAM/MPI, \cmd{mpiexec} is functionally similar
to \cmd{mpirun}. Some options that are available to \cmd{mpirun} are
not available to \cmd{mpiexec}, and vice-versa. The end result is
typically the same, however -- both will launch parallel MPI programs;
which you should use is likely simply a personal choice.
That being said, \cmd{mpiexec} offers more convenient access in three
cases:
\begin{itemize}
\item Running MPMD programs
\item Running heterogeneous programs
\item Running ``one-shot'' MPI programs (i.e., boot LAM, run the
program, then halt LAM)
\end{itemize}
The general syntax for \cmd{mpiexec} is:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec <global_options> <cmd1> : <cmd2> : ...
\end{lstlisting}
% stupid emacs mode: $
%%%%%
\subsubsection{Running MPMD Programs}
For example, to run a manager/worker parallel program, where two
different executables need to be launched (i.e., \cmd{manager} and
\cmd{worker}, the following can be used:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec -n 1 manager : worker
\end{lstlisting}
% stupid emacs mode: $
This runs one copy of \cmd{manager} and one copy of \cmd{worker} for
every CPU in the LAM universe.
%%%%%
\subsubsection{Running Heterogeneous Programs}
Since LAM is a heterogeneous MPI implementation, it supports running
heterogeneous MPI programs. For example, this allows running a
parallel job that spans a Sun SPARC machine and an IA-32 Linux machine
(even though they are opposite endian machines). Although this can be
somewhat complicated to setup (remember that you will first need to
\cmd{lamboot} successfully, which essentially means that LAM must be
correctly installed on both architectures), the \cmd{mpiexec} command
can be helpful in actually running the resulting MPI job.
Note that you will need to have two MPI executables -- one compiled
for Solaris (e.g., \cmd{hello.solaris}) and one compiled for Linux
(e.g., \cmd{hello.linux}). Assuming that these executables both
reside in the same directory, and that directory is available on both
nodes (or the executables can be found in the \envvar{PATH} on their
respective machines), the following command can be used:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec -arch solaris hello.solaris : -arch linux hello.linux
\end{lstlisting}
% stupid emacs mode: $
This runs the \cmd{hello.solaris} command on all nodes in the LAM
universe that have the string ``solaris'' anywhere in their
architecture string, and \cmd{hello.linux} on all nodes that have
``linux'' in their architecture string. The architecture string of a
given LAM installation can be found by running the \cmd{laminfo}
command.
%%%%%
\subsubsection{``One-Shot'' MPI Programs}
In some cases, it seems like extra work to boot a LAM universe, run
a single MPI job, and then shut down the universe. Batch jobs are
good examples of this -- since only one job is going to be run, why
does it take three commands? \cmd{mpiexec} provides a convenient way
to run ``one-shot'' MPI jobs.
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec -machinefile hostfile hello
\end{lstlisting}
% stupid emacs mode: $
This will invoke \cmd{lamboot} with the boot schema named
``\file{hostfile}'', run the MPI program \cmd{hello} on all available
CPUs in the resulting universe, and then shut down the universe with
the \cmd{lamhalt} command (which we'll discuss in
Section~\ref{sec:getting-started-lamhalt}, below).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \icmd{mpitask} Command}
The \cmd{mpitask} command is analogous to the sequential Unix command
\cmd{ps}. It shows the current status of the MPI program(s) being
executed in the LAM universe, and displays primitive information about
what MPI function each process is currently executing (if any). Note
that in normal practice, the \cmd{mpimsg} command only gives a
snapshot of what messages are flowing between MPI processes, and
therefore is usually only accurate at that single point in time. To
really debug message passing traffic, use a tool such as message
passing analyzer (e.g., XMPI), or a parallel debugger (e.g.,
TotalView).
\cmd{mpitask} can be run from any node in the LAM universe.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \icmd{lamclean} Command}
The \cmd{lamclean} command completely removes all running programs
from the LAM universe. This can be useful if a parallel job crashes
and/or leaves state in the LAM run-time environment (e.g., MPI-2
published names). It is usually run with no parameters:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamclean
\end{lstlisting}
% stupid emacs mode: $
\cmd{lamclean} is typically only necessary when developing / debugging
MPI applications -- i.e., programs that hang, messages that are left
around, etc. Correct MPI programs should terminate properly, clean up
all their messages, unpublish MPI-2 names, etc.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Shutting Down the LAM Universe}
\label{sec:getting-started-lamhalt}
When finished with the LAM universe, it should be shut down with the
\icmd{lamhalt} command:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamhalt
\end{lstlisting}
% Stupid emacs mode: $
In most cases, this is sufficient to kill all running MPI processes
and shut down the LAM universe.
However, in some rare conditions, \cmd{lamhalt} may fail. For
example, if any of the nodes in the LAM universe crashed before
running \cmd{lamhalt}, \cmd{lamhalt} will likely timeout and
potentially not kill the entire LAM universe. In this case, you will
need to use the \icmd{lamwipe} command to guarantee that the LAM
universe has shut down properly:
\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamwipe -v hostfile
\end{lstlisting}
% Stupid emacs mode: $
\noindent where \file{hostfile} is the same boot schema that was used to
boot LAM (i.e., all the same nodes are listed). \cmd{lamwipe} will
forcibly kill all LAM/MPI processes and terminate the LAM universe.
This is a slower process than \cmd{lamhalt}, and is typically not
necessary.
|