File: getting-started.tex

package info (click to toggle)
lam 7.1.4-8
  • links: PTS
  • area: main
  • in suites: forky, sid
  • size: 56,404 kB
  • sloc: ansic: 156,541; sh: 9,991; cpp: 7,699; makefile: 5,621; perl: 488; fortran: 260; asm: 83
file content (942 lines) | stat: -rw-r--r-- 34,186 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
% -*- latex -*-
%
% Copyright (c) 2001-2003 The Trustees of Indiana University.  
%                         All rights reserved.
% Copyright (c) 1998-2001 University of Notre Dame. 
%                         All rights reserved.
% Copyright (c) 1994-1998 The Ohio State University.  
%                         All rights reserved.
% 
% This file is part of the LAM/MPI software package.  For license
% information, see the LICENSE file in the top level directory of the
% LAM/MPI source distribution.
%
% $Id: getting-started.tex,v 1.19 2003/10/11 14:02:48 jsquyres Exp $
%

\chapter{Getting Started with LAM/MPI}
\label{sec:getting-started}

This chapter provides a summary tutorial describing some of the high
points of using LAM/MPI.  It is not intended as a comprehensive guide;
the finer details of some situations will not be explained.  However,
it is a good step-by-step guide for users who are new to MPI and/or
LAM/MPI.

Using LAM/MPI is conceptually simple:

\begin{itemize}
\item Launch the LAM run-time environment (RTE)
\item Repeat as necessary:
  \begin{itemize}
  \item Compile MPI program(s)
  \item Run MPI program(s)
  \end{itemize}
\item Shut down the LAM run-time environment
\end{itemize}

The tutorial below will describe each of these steps.  

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{One-Time Setup}

This section describes actions that usually only need to be performed
once per user in order to setup LAM to function properly.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Setting the Path}
\label{sec:getting-started-path}

One of the main requirements for LAM/MPI to function properly is for
the LAM executables to be in your path.  This step may vary from site
to site; for example, the LAM executables may already be in your path
-- consult your local administrator to see if this is the case.

{\bf NOTE:} If the LAM executables are already in your path, you can
skip this step and proceed to
Section~\ref{sec:getting-started-ssi}.

In many cases, if your system does not already provide the LAM
executables in your path, you can add them by editing your ``dot''
files that are executed automatically by the shell upon login (both
interactive and non-interactive logins).  Each shell has a different
file to edit and corresponding syntax, so you'll need to know which
shell you are using.
Tables~\ref{tbl:getting-started-shells-interactive}
and~\ref{tbl:getting-started-shells-noninteractive} list several
common shells and the associated files that are typically used.
Consult the documentation for your shell for more information.

\begin{table}[htbp]
  \centering
  \begin{tabular}{|p{1in}|p{4in}|}
    \hline
    \multicolumn{1}{|c|}{Shell name} &
    \multicolumn{1}{|c|}{Interactive login startup file} \\
%
    \hline
    \cmd{sh} (or Bash named ``\cmd{sh}'') & \ifile{.profile} \\
%
    \hline
    \cmd{csh} & \ifile{.cshrc} followed by \ifile{.login} \\
%
    \hline
    \cmd{tcsh} & \ifile{.tcshrc} if it exists, \ifile{.cshrc} if it
    does not, followed by \ifile{.login} \\
%
    \hline
    \cmd{bash} & \ifile{.bash\_\-profile} if it exists, or
    \ifile{.bash\_\-login} if it exists, or \ifile{.profile} if it
    exists (in that order).  Note that some Linux distributions
    automatically come with \ifile{.bash\_\-profile} scripts for users
    that automatically execute \ifile{.bashrc} as well. Consult the
    \cmd{bash} manual page for more information. \\
    \hline
  \end{tabular}
  \caption[List of common shells and the corresponding environment
    setup files for interactive shells.]{List of common shells and
    the corresponding environmental setup files commonly used with
    each for interactive startups (e.g., normal login).  All files
    listed are assumed to be in the \file{\$HOME} directory.}
  \label{tbl:getting-started-shells-interactive}
\end{table}

\begin{table}[htbp]
  \centering
  \begin{tabular}{|p{1in}|p{4in}|}
    \hline
    \multicolumn{1}{|c|}{Shell name} &
    \multicolumn{1}{|c|}{Non-interactive login startup file} \\
%
    \hline
    \cmd{sh} (or Bash named ``\cmd{sh}'') & This shell does not
    execute any file automatically, so LAM will execute the
    \file{.profile} script before invoking LAM executables on remote
    nodes \\
%
    \hline
    \cmd{csh} & \ifile{.cshrc} \\
%
    \hline
    \cmd{tcsh} & \ifile{.tcshrc} if it exists, \ifile{.cshrc} if it
    does not \\
%
    \hline
    \cmd{bash} & \ifile{.bashrc} if it exists \\
    \hline
  \end{tabular}
  \caption[List of common shells and the corresponding environment
    setup files for non-interactive shells.]{List of common shells and
    the corresponding environmental setup files commonly used with
    each for non-interactive startups (e.g., normal login).  All files
    listed are assumed to be in the \file{\$HOME} directory.}
  \label{tbl:getting-started-shells-noninteractive}
\end{table}

You'll also need to know the directory where LAM was installed.  For
the purposes of this tutorial, we'll assume that LAM is installed in
\file{/usr/local/lam}.  And to re-emphasize a critical point: these
are only guidelines -- the specifics may vary depending on your local
setup.  Consult your local system or network administrator for more
details.

Once you have determined all three pieces of information (what shell
you are using, what directory LAM was installed to, and what the
appropriate ``dot'' file to edit), open the ``dot'' file in a text
editor and follow the general directions listed below:

\begin{itemize}
\index{shell setup!Bash/Bourne shells}
\item For the Bash, Bourne, and Bourne-related shells, add the
  following lines:

  \lstset{style=lam-bourne}
  \begin{lstlisting}
PATH=/usr/local/lam/bin:$PATH
export PATH
  \end{lstlisting}
% Stupid emacs mode: $
  
\index{shell setup!C shell (and related)}
\item For the C shell and related shells (such as \cmd{tcsh}), add the
  following line:

  \lstset{style=lam-shell}
  \begin{lstlisting}
set path = (/usr/local/lam/bin $path)
  \end{lstlisting}
% Stupid emacs mode: $
  
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Finding the LAM Manual Pages}
\index{manual pages}

LAM includes manual pages for all supported MPI functions as well as
all of the LAM executables.  While this step {\em is not necessary}
for correct MPI functionality, it can be helpful when looking for MPI
or LAM-specific information.

Using Tables~\ref{tbl:getting-started-shells-interactive}
and~\ref{tbl:getting-started-shells-noninteractive}, find the right
``dot'' file to edit.  Assuming again that LAM was installed to
\file{/usr/local/lam}, open the appropriate ``dot'' file in a text
editor and follow the general directions listed below:

\begin{itemize}
\index{shell setup!Bash/Bourne shells}
\item For the Bash, Bourne, and Bourne-related shells, add the
  following lines:

  \lstset{style=lam-bourne}
  \begin{lstlisting}
MANPATH=/usr/local/lam/man:$MANPATH
export MANPATH
  \end{lstlisting}
% Stupid emacs mode: $
  
\index{shell setup!C shell (and related)}
\item For the C shell and related shells (such as \cmd{tcsh}), add the
  following lines:

  \lstset{style=lam-shell}
  \begin{lstlisting}
if ($?MANPATH == 0) then
  setenv MANPATH /usr/local/lam/man
else
  setenv MANPATH /usr/local/lam/man:$MANPATH
endif
  \end{lstlisting}
% Stupid emacs mode: $
  
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{System Services Interface (SSI)}
\label{sec:getting-started-ssi}

LAM/MPI is built around a core of System Services Interface (SSI)
plugin modules.  SSI allows run-time selection of different underlying
services within the LAM/MPI run-time environment, including tunable
parameters that can affect the performance of MPI programs.

While this tutorial won't go into much detail about SSI, just be aware
that you'll see mention of ``SSI'' in the text below.  In a few
places, the tutorial passes parameters to various SSI modules through
either environment variables and/or the \cmdarg{-ssi} command line
parameter to several LAM commands.

See other sections in this manual for a more complete description of
SSI (Chapter~\ref{sec:ssi}, page~\pageref{sec:ssi}), how it works, and
what run-time parameters are available (Chapters~\ref{sec:lam-ssi}
and~\ref{sec:mpi-ssi}, pages~\pageref{sec:lam-ssi}
and~\pageref{sec:mpi-ssi}, respectively).  Also, the
\manpage{lamssi(7)}, \manpage{lamssi\_\-boot(7)},
\manpage{lamssi\_\-coll(7)}, \manpage{lamssi\_\-cr(7)}, and
\manpage{lamssi\_\-rpi(7)} manual pages each provide additional
information on LAM's SSI mechanisms.
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{What Does Your LAM/MPI Installation Support?}

LAM/MPI can be installed with a large number of configuration options.
It depends on what choices your system/network administrator made when
configuring and installing LAM/MPI.  The \icmd{laminfo} command is
provided to show the end-user with information about what the
installed LAM/MPI supports.  Running ``\cmd{laminfo}'' (with no
arguments) prints a list of LAM's capabilities, including all of its
SSI modules.

Among other things, this shows what language bindings the installed
LAM/MPI supports, what underlying network transports it supports, and
what directory LAM was installed to.  The \cmdarg{-parsable} option
prints out all the same information, but in a conveniently
machine-parsable format (suitable for using with scripts).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Booting the LAM Run-Time Environment}
\label{sec:getting-started-booting}
\index{booting the LAM run-time environment}

Before any MPI programs can be executed, the LAM run-time environment
must be launched.  This is typically called ``booting LAM.''  A
successfully boot process creates an instance of the LAM run-time
environment commonly referred to as the ``LAM universe.''

LAM's run-time environment can be executed in many different
environments.  For example, it can be run interactively on a cluster
of workstations (even on a single workstation, perhaps to simulate
parallel execution for debugging and/or development).  Or LAM can be
run in production batch scheduled systems.

This example will focus on a traditional \cmd{rsh} / \cmd{ssh}-style
workstation cluster (i.e., not under batch systems), where \cmd{rsh}
or \cmd{ssh} is used to launch executables on remote workstations.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The Boot Schema File (a.k.a, ``Hostfile'', ``Machinefile'')}
\label{sec:getting-started-hostfile}

When using \cmd{rsh} or \cmd{ssh} to boot LAM, you will need a text
file listing the hosts on which to launch the LAM run-time
environment.  This file is typically referred to as a ``boot schema'',
``hostfile'', or ``machinefile.''  For example:

\lstset{style=lam-shell}
\begin{lstlisting}
# My boot schema
node1.cluster.example.com
node2.cluster.example.com
node3.cluster.example.com cpu=2
node4.cluster.example.com cpu=2
\end{lstlisting}

Four nodes are specified in the above example by listing their IP
hostnames.  Note also the ``{\tt cpu=2}'' that follows the last two
entries.  This tells LAM that these machines each have two CPUs
available for running MPI programs (e.g., \host{node3} and
\host{node4} are two-way SMPs).  It is important to note that the
number of CPUs specified here has {\em no} correlation to the
physicial number of CPUs in the machine.  It is simply a convenience
mechanism telling LAM how many MPI processes we will typically launch
on that node.  The ramifications of the {\tt cpu} key will be discussed
later.

The location of this text file is irrelevant; for the purposes of this
example, we'll assume that it is named \file{hostfile} and is located
in the current working directory.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The \icmd{lamboot} Command}
\label{sec:getting-started-lamboot}

The \cmd{lamboot} command is used to launch the LAM run-time
environment.  For each machine listed in the boot schema, the
following conditions must be met for LAM's run-time environment to be
booted correctly:

\cmdindex{lamboot}{conditions for success}
\begin{itemize}
\item The machine must be reachable and operational.
  
\item The user must be able to non-interactively execute arbitrary
  commands on the machine (e.g., without being prompted for a
  password).
  
\item The LAM executables must be locatable on that machine, using the
  user's shell search path.
  
\item The user must be able to write to the LAM session directory
  (usually somewhere under \file{/tmp}).
  
\item The shell's start-up scripts must not print anything on standard
  error.
  
\item All machines must be able to resolve the fully-qualified domain
  name (FQDN) of all the machines being booted (including itself).
\end{itemize}

Once all of these conditions are met, the \cmd{lamboot} command is
used to launch the LAM run-time environment.  For example:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamboot -v -ssi boot rsh hostfile

LAM 7.0/MPI 2 C++/ROMIO - Indiana University

n0<1234> ssi:boot:base:linear: booting n0 (node1.cluster.example.com)
n0<1234> ssi:boot:base:linear: booting n1 (node2.cluster.example.com)
n0<1234> ssi:boot:base:linear: booting n2 (node3.cluster.example.com)
n0<1234> ssi:boot:base:linear: booting n3 (node4.cluster.example.com)
n0<1234> ssi:boot:base:linear: finished
\end{lstlisting}
% Stupid emacs mode: $

The parameters passed to \cmd{lamboot} in the example above are as
follows:

\begin{itemize}
\item \cmdarg{-v}: Make \cmd{lamboot} be slightly verbose.
  
\item \cmdarg{-ssi boot rsh}: Ensure that LAM uses the
  \cmd{rsh}/\cmd{ssh} boot module to boot the LAM universe.
  Typically, LAM chooses the right boot module automatically (and
  therefore this parameter is not typically necessary), but to ensure
  that this tutorial does exactly what we want it to do, we use this
  parameter to absolutely ensure that LAM uses \cmd{rsh} or \cmd{ssh}
  to boot the universe.

\item \file{hostfile}: Name of the boot schema file.
\end{itemize}

Common causes of failure with the \cmd{lamboot} command include (but
are not limited to):

\cmdindex{lamboot}{common problems and solutions}
\begin{itemize}
\item User does not have permission to execute on the remote node.
  This typically involves setting up a \file{\$HOME/.rhosts} file (if
  using \cmd{rsh}), or properly configured SSH keys (using using
  \cmd{ssh}).
  
  Setting up \file{.rhosts} and/or SSH keys for password-less remote
  logins are beyond the scope of this tutorial; consult local
  documentation for \cmd{rsh} and \cmd{ssh}, and/or internet tutorials
  on setting up SSH keys.\footnote{As of this writing, a Google search
    for ``ssh keys'' turned up several decent tutorials; including any
    one of them here would significantly increase the length of this
    already-tremendously-long manual.}
  
\item The first time a user uses \cmd{ssh} to execute on a remote
  node, \cmd{ssh} typically prints a warning to the standard error.
  LAM will interpret this as a failure.  If this happens,
  \cmd{lamboot} will complain that something unexpectedly appeared on
  \file{stderr}, and abort.  
%
  \changebegin{7.1}
%
  One solution is to manually \cmd{ssh} to each node in the boot
  schema once in order to eliminate the \file{stderr} warning, and
  then try \cmd{lamboot} again.  Another is to use the
  \ssiparam{boot\_\-rsh\_\-ignore\_\-stderr} SSI parameter.  We
  haven't discussed SSI parameters yet, so it is probably easiest at
  this point to manually \cmd{ssh} to a small number of nodes to get
  the warning out of the way.
%
  \changeend{7.1}
\end{itemize}

If you have having problems with \cmd{lamboot}, try using the
\cmdarg{-d} option to \cmd{lamboot}, which will print enormous amounts
of debugging output which can be helpful for determining what the
problem is.  Additionally, check the \file{lamboot(1)} man page as
well as the LAM FAQ on the main LAM web
site\footnote{\url{http://www.lam-mpi.org/faq/}} under the section
``Booting LAM'' for more information.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The \icmd{lamnodes} Command}

An easy way to see how many nodes and CPUs are in the current LAM
universe is with the \cmd{lamnodes} command.  For example, with the
LAM universe that was created from the boot schema in
Section~\ref{sec:getting-started-hostfile}, running the \cmd{lamnodes}
command would result in the following output:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamnodes
n0      node1.cluster.example.com:1:origin,this_node
n1      node2.cluster.example.com:1:
n2      node3.cluster.example.com:2:
n3      node4.cluster.example.com:2:
\end{lstlisting}
% Stupid emacs mode: $

The ``{\tt n}'' number on the far left is the LAM node number.  For
example, ``{\tt n3}'' uniquely refers to \host{node4}.  Also note the
third column, which indicates how many CPUs are available for running
processes on that node.  In this example, there are a total of 6 CPUs
available for running processes.  This information is from the ``{\tt
  cpu}'' key that was used in the hostfile, and is helpful for running
parallel processes (see below).

Finally, the ``{\tt origin}'' notation indicates which node
\cmd{lamboot} was executed from.  ``{\tt this\_\-node}'' obviously
indicates which node \cmd{lamnodes} is running on.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Compiling MPI Programs}
\label{sec:getting-started-compiling}
\index{compiling MPI programs}

Note that it is {\em not} necessary to have LAM booted to compile MPI
programs.

Compiling MPI programs can be a complicated process:

\begin{itemize}
\item The same compilers should be used to compile/link user MPI
  programs as were used to compile LAM itself.
  
\item Depending on the specific installation configuration of LAM, a
  variety of \cmdarg{-I}, \cmdarg{-L}, and \cmdarg{-l} flags (and
  possibly others) may be necessary to compile and/or link a user MPI
  program.
\end{itemize}

LAM/MPI provides ``wrapper'' compilers to hide all of this complexity.
These wrapper compilers simply add the correct compiler/linker flags
and then invoke the underlying compiler to actually perform the
compilation/link.  As such, LAM's wrapper compilers can be used just
like ``real'' compilers.

The wrapper compilers are named \icmd{mpicc} (for C programs),
\icmd{mpiCC} and \icmd{mpic++} (for C++ programs), and \icmd{mpif77}
(for Fortran programs).  For example:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpicc -g -c foo.c
shell$ mpicc -g -c bar.c
shell$ mpicc -g foo.o bar.o -o my_mpi_program
\end{lstlisting}
% Stupid emacs mode: $

Note that no additional compiler and linker flags are required for
correct MPI compilation or linking.  The resulting
\cmd{my\_\-mpi\_\-program} is ready to run in the LAM run-time
environment.  Similarly, the other two wrapper compilers can be used
to compile MPI programs for their respective languages:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiCC -O c++_program.cc -o my_c++_mpi_program
shell$ mpif77 -O f77_program.f -o my_f77_mpi_program
\end{lstlisting}
% Stupid emacs mode: $

Note, too, that any other compiler/linker flags can be passed through
the wrapper compilers (such as \cmdarg{-g} and \cmdarg{-O}); they will
simply be passed to the back-end compiler.

Finally, note that giving the \cmdarg{-showme} option to any of the
wrapper compilers will show both the name of the back-end compiler
that will be invoked, and also all the command line options that would
have been passed for a given compile command.  For example (line
breaks added to fit in the documentation):

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiCC -O c++_program.cc -o my_c++_program -showme
g++ -I/usr/local/lam/include -pthread -O c++_program.cc -o \
my_c++_program -L/usr/local/lam/lib -llammpio -llammpi++ -lpmpi \
-llamf77mpi -lmpi -llam -lutil -pthread
\end{lstlisting}
% Stupid emacs mode: $

\changebegin{7.1}

Note that the wrapper compilers only add all the LAM/MPI-specific
flags when a command-line argument that does not begin with a dash
(``-'') is present.  For example:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpicc
gcc: no input files
shell$ mpicc --version
gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
\end{lstlisting}

\changeend{7.1}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Sample MPI Program in C}
\index{sample MPI program!C}

The following is a simple ``hello world'' C program.

\lstset{style=lam-c}

\begin{lstlisting}
#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[]) {
  int rank, size;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);   

  printf(``Hello, world!  I am %d of %d\n'', rank, size);

  MPI_Finalize();
  return 0;
}
\end{lstlisting}

This program can be saved in a text file and compiled with the
\icmd{mpicc} wrapper compiler.

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpicc hello.c -o hello
\end{lstlisting}
% Stupid emacs mode: $

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Sample MPI Program in C++}
\index{sample MPI program!C++}

The following is a simple ``hello world'' C++ program.

\lstset{style=lam-cxx}

\begin{lstlisting}
#include <iostream>
#include <mpi.h>

using namespace std;

int main(int argc, char *argv[]) {
  int rank, size;

  MPI::Init(argc, argv);
  rank = MPI::COMM_WORLD.Get_rank();
  size = MPI::COMM_WORLD.Get_size();

  cout << ``Hello, world!  I am '' << rank << `` of '' << size << endl;

  MPI::Finalize();
  return 0;
}
\end{lstlisting}

This program can be saved in a text file and compiled with the
\icmd{mpiCC} wrapper compiler (or \cmd{mpic++} if on case-insensitive
filesystems, such as Mac OS X's HFS+).

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiCC hello.cc -o hello
\end{lstlisting}
% Stupid emacs mode: $

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{Sample MPI Program in Fortran}
\index{sample MPI program!Fortran}

The following is a simple ``hello world'' Fortran program.

\lstset{style=lam-fortran}

\begin{lstlisting}
program hello
include 'mpif.h'
integer rank, size, ierr

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)

print *, "Hello, world!  I am  ", rank, " of ", size

call MPI_FINALIZE(ierr)
stop
end
\end{lstlisting}

This program can be saved in a text file and compiled with the
\icmd{mpif77} wrapper compiler.

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpif77 hello.f -o hello
\end{lstlisting}
% Stupid emacs mode: $

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Running MPI Programs}
\index{running MPI programs}

Once you have successfully established a LAM universe and compiled an
MPI program, you can run MPI programs in parallel.

In this section, we will show how to run a Single Program, Multiple
Data (SPMD) program.  Specifically, we will run the \cmd{hello}
program (from the previous section) in parallel.  The \cmd{mpirun} and
\cmd{mpiexec} commands are used for launching parallel MPI programs,
and the \cmd{mpitask} commands can be used to provide crude debugging
support.  The \cmd{lamclean} command can be used to completely clean
up a failed MPI program (e.g., if an error occurs).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The \icmd{mpirun} Command}

The \cmd{mpirun} command has many different options that can be used
to control the execution of a program in parallel.  We'll explain only
a few of them here.

The simplest way to launch the \cmd{hello} program across all CPUs
listed in the boot schema is:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun C hello
\end{lstlisting}
% stupid emacs mode: $

The \cmdarg{C} option means ``launch one copy of \cmd{hello} on
every CPU that was listed in the boot schema.''  The \cmdarg{C}
notation is therefore convenient shorthand notation for launching a
set of processes across a group of SMPs.

Another method for running in parallel is:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun N hello
\end{lstlisting}
% stupid emacs mode: $

The \cmdarg{N} option has a different meaning than \cmdarg{C} -- it
means ``launch one copy of \cmd{hello} on every node in the LAM
universe.''  Hence, \cmdarg{N} disregards the CPU count.  This can be
useful for multi-threaded MPI programs.

Finally, to run an absolute number of processes (regardless of how
many CPUs or nodes are in the LAM universe):

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpirun -np 4 hello
\end{lstlisting}
% stupid emacs mode: $

This runs 4 copies of \cmd{hello}.  LAM will ``schedule'' how many
copies of \cmd{hello} will be run in a round-robin fashion on each
node by how many CPUs were listed in the boot schema
file.\footnote{Note that the use of the word ``schedule'' does not
  imply that LAM has ties with the operating system for scheduling
  purposes (it doesn't).  LAM ``scheduled'' on a per-node basis; so
  selecting a process to run means that it has been assigned and
  launched on that node.  The operating system is solely responsible
  for all process and kernel scheduling.}  For example, on the LAM
universe that we have previously shown in this tutorial, the following
would be launched:

\begin{itemize}
\item 1 \cmd{hello} would be launched on {\tt n0} (named
  \host{node1})
\item 1 \cmd{hello} would be launched on {\tt n1} (named
  \host{node2})
\item 2 \cmd{hello}s would be launched on {\tt n2} (named
  \host{node3})
\end{itemize}

Note that any number can be used -- if a number is used that is
greater than how many CPUs are in the LAM universe, LAM will ``wrap
around'' and start scheduling starting with the first node again.  For
example, using \cmdarg{-np 10} would result in the following
schedule:

\begin{itemize}
\item 2 \cmd{hello}s on {\tt n0} (1 from the first pass, and then a
  second from the ``wrap around'')
\item 2 \cmd{hello}s on {\tt n1} (1 from the first pass, and then a
  second from the ``wrap around'')
\item 4 \cmd{hello}s on {\tt n2} (2 from the first pass, and then 2
  more from the ``wrap around'')
\item 2 \cmd{hello}s on {\tt n3}
\end{itemize}

The \file{mpirun(1)} man page contains much more information and
\cmd{mpirun} and the options available.  For example, \cmd{mpirun}
also supports Multiple Program, Multiple Data (MPMD) programs,
although it is not discussed here.  Also see
Section~\ref{sec:commands-mpirun} (page~\pageref{sec:commands-mpirun})
in this document.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The \icmd{mpiexec} Command}

The MPI-2 standard recommends the use of \cmd{mpiexec} for portable
MPI process startup.  In LAM/MPI, \cmd{mpiexec} is functionally similar
to \cmd{mpirun}.  Some options that are available to \cmd{mpirun} are
not available to \cmd{mpiexec}, and vice-versa.  The end result is
typically the same, however -- both will launch parallel MPI programs;
which you should use is likely simply a personal choice.

That being said, \cmd{mpiexec} offers more convenient access in three
cases:

\begin{itemize}
\item Running MPMD programs
\item Running heterogeneous programs
\item Running ``one-shot'' MPI programs (i.e., boot LAM, run the
  program, then halt LAM)
\end{itemize}

The general syntax for \cmd{mpiexec} is:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec <global_options> <cmd1> : <cmd2> : ...
\end{lstlisting}
% stupid emacs mode: $

%%%%%

\subsubsection{Running MPMD Programs}

For example, to run a manager/worker parallel program, where two
different executables need to be launched (i.e., \cmd{manager} and
\cmd{worker}, the following can be used:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec -n 1 manager : worker
\end{lstlisting}
% stupid emacs mode: $

This runs one copy of \cmd{manager} and one copy of \cmd{worker} for
every CPU in the LAM universe.

%%%%%

\subsubsection{Running Heterogeneous Programs}

Since LAM is a heterogeneous MPI implementation, it supports running
heterogeneous MPI programs.  For example, this allows running a
parallel job that spans a Sun SPARC machine and an IA-32 Linux machine
(even though they are opposite endian machines).  Although this can be
somewhat complicated to setup (remember that you will first need to
\cmd{lamboot} successfully, which essentially means that LAM must be
correctly installed on both architectures), the \cmd{mpiexec} command
can be helpful in actually running the resulting MPI job.

Note that you will need to have two MPI executables -- one compiled
for Solaris (e.g., \cmd{hello.solaris}) and one compiled for Linux
(e.g., \cmd{hello.linux}).  Assuming that these executables both
reside in the same directory, and that directory is available on both
nodes (or the executables can be found in the \envvar{PATH} on their
respective machines), the following command can be used:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec -arch solaris hello.solaris : -arch linux hello.linux
\end{lstlisting}
% stupid emacs mode: $

This runs the \cmd{hello.solaris} command on all nodes in the LAM
universe that have the string ``solaris'' anywhere in their
architecture string, and \cmd{hello.linux} on all nodes that have
``linux'' in their architecture string.  The architecture string of a
given LAM installation can be found by running the \cmd{laminfo}
command.

%%%%%

\subsubsection{``One-Shot'' MPI Programs}

In some cases, it seems like extra work to boot a LAM universe, run
a single MPI job, and then shut down the universe.  Batch jobs are
good examples of this -- since only one job is going to be run, why
does it take three commands?  \cmd{mpiexec} provides a convenient way
to run ``one-shot'' MPI jobs.

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ mpiexec -machinefile hostfile hello
\end{lstlisting}
% stupid emacs mode: $

This will invoke \cmd{lamboot} with the boot schema named
``\file{hostfile}'', run the MPI program \cmd{hello} on all available
CPUs in the resulting universe, and then shut down the universe with
the \cmd{lamhalt} command (which we'll discuss in
Section~\ref{sec:getting-started-lamhalt}, below).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The \icmd{mpitask} Command}

The \cmd{mpitask} command is analogous to the sequential Unix command
\cmd{ps}.  It shows the current status of the MPI program(s) being
executed in the LAM universe, and displays primitive information about
what MPI function each process is currently executing (if any).  Note
that in normal practice, the \cmd{mpimsg} command only gives a
snapshot of what messages are flowing between MPI processes, and
therefore is usually only accurate at that single point in time.  To
really debug message passing traffic, use a tool such as message
passing analyzer (e.g., XMPI), or a parallel debugger (e.g.,
TotalView).

\cmd{mpitask} can be run from any node in the LAM universe.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\subsection{The \icmd{lamclean} Command}

The \cmd{lamclean} command completely removes all running programs
from the LAM universe.  This can be useful if a parallel job crashes
and/or leaves state in the LAM run-time environment (e.g., MPI-2
published names).  It is usually run with no parameters:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamclean
\end{lstlisting}
% stupid emacs mode: $

\cmd{lamclean} is typically only necessary when developing / debugging
MPI applications -- i.e., programs that hang, messages that are left
around, etc.  Correct MPI programs should terminate properly, clean up
all their messages, unpublish MPI-2 names, etc.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Shutting Down the LAM Universe}
\label{sec:getting-started-lamhalt}

When finished with the LAM universe, it should be shut down with the
\icmd{lamhalt} command:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamhalt
\end{lstlisting}
% Stupid emacs mode: $

In most cases, this is sufficient to kill all running MPI processes
and shut down the LAM universe.

However, in some rare conditions, \cmd{lamhalt} may fail.  For
example, if any of the nodes in the LAM universe crashed before
running \cmd{lamhalt}, \cmd{lamhalt} will likely timeout and
potentially not kill the entire LAM universe.  In this case, you will
need to use the \icmd{lamwipe} command to guarantee that the LAM
universe has shut down properly:

\lstset{style=lam-cmdline}
\begin{lstlisting}
shell$ lamwipe -v hostfile
\end{lstlisting}
% Stupid emacs mode: $

\noindent where \file{hostfile} is the same boot schema that was used to
boot LAM (i.e., all the same nodes are listed).  \cmd{lamwipe} will
forcibly kill all LAM/MPI processes and terminate the LAM universe.
This is a slower process than \cmd{lamhalt}, and is typically not
necessary.