File: README

package info (click to toggle)
mpich 1.1.2-11
  • links: PTS
  • area: main
  • in suites: potato
  • size: 23,292 kB
  • ctags: 21,259
  • sloc: ansic: 144,312; cpp: 14,541; sh: 12,607; makefile: 5,252; java: 5,116; fortran: 4,157; tcl: 3,548; csh: 946; asm: 825; perl: 184; f90: 87
file content (730 lines) | stat: -rw-r--r-- 29,188 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730

		      Portable MPI Model Implementation

        		  Version 1.1.2, February 1999

				  Bill Gropp
				  Rusty Lusk
				Debbie Swider
				Rajeev Thakur
			 Argonne National Laboratory


Major Changes

This release is primarily a bug fix release, particularly for the various
scripts (mpicc, configure, etc.).  There are now separate configures for mpe
and examples/test (in reality, there have been for some time since these
parts can be used with any MPI implementation, but these configures are now
used by MPICH).

This version includes support for viewing the message queues with debuggers
such as Totalview that know how to access the generic queue routines.  These
are dynamically linked routines, and require that the MPICH configure figure
out how to build a shared library for your system.  If MPICH does NOT figure
this out on your system, please send mail to mpi-bugs@mcs.anl.gov telling us
what commands are needed (see mpich/util/makesharedlib for a example).  In
addition, if you intend to install MPICH for others to use (with make
install), you need to configure with the PREFIX value that you will be using:
    configure ... -prefix=PREFIX
in order for MPICH to be able to tell the debugger where to find the message
queue DLL (dynamically linked library).

A list of known problems and issues is in the file KnownBugs.

A change log for all changes from the previous release is available from
http://www.mcs.anl.gov/mpi/mpich/r1_1_2changes.html .

Miscellaneous issues:

Using NFS with MPI-IO:

NFS requires special care and use in order to get correct behavior when
multiple hosts access or try to lock the same file.  Without this special
care, NFS silently fails (for example, a file lock system call will succeed,
but the actual file lock will not be correctly handled.  This is considered a
feature, not a bug, of NFS.  Go figure).  If you need to use NFS, then you
should do the following:  
    Make sure that you are using version 3 of NFS
    Make sure that attribute caching is turned off
This will have various negative consequences (automounts won't work, some
other operations will be slower).  The up side is that file operations will be
correctly implemented.  This is an instance of "do you want it fast or
correct; you can't have both".

Viewing the manuals with Pageview (Solaris)

pageview has trouble with the Postcript files for the installation and user
guides.  To work around this, choose the option, "Ignore PostScript
Structuring Comments", under "Properties", in pageview.

Also, some programs apparently don't do a good job of displaying these files.
If you are having this problem, you should try ghostscript instead.

Connection failures on LINUX

We've had a number of problem reports about programs running on LINUX suddenly
failing with lost TCP connections.  We believe that this is a problem in how
the LINUX TCP implementation decides that a connection has failed (even when
it hasn't), and reflects a general weakness in TCP (more precisely, the
definition of "reliable" for TCP and for us and most users isn't the same, and
that difference causes trouble).  This is on our list of things to fix in a
future release.

What's next:

We continue to enhance MPICH with a goal of full MPI-2 capability.  In
addition, we continue to explore ways to improve the performance of MPICH.  In
the queue are improved datatype handling and a new sockets implementation.  In
the longer term, a new ADI with better support for multithreaded applications
and new network approaches will be introduced.  One area where we'd like
contributions is better implementations of the collective routines.  Please
contact us if you'd like to contribute (send mail to gropp@mcs.anl.gov).

        		  Version 1.1.1, July 1998

Major Changes

The biggest change is the inclusion of MPI-1.2 conformant C++ bindings (these
are different from the experimental bindings provided in previous versions
of MPICH) and a major part of the MPI-2 I/O functions.  The implementation of
the C++ bindings was provided by the University of Notre Dame; the MPI-2 I/O
functions (ROMIO) were provided by Rajeev Thakur of ANL.  Limitations of this
version of ROMIO can be found in the file romio/README and in 
romio/doc/users-guide.ps.gz.  Each of these packages may be used with MPI 
implementations other than MPICH, and are included in the MPICH distribution
to simplify building a state-of-the-art MPI environment.

Note that both ROMIO and the C++ interface are quite demanding of the system.
ROMIO in particular requires a correct file system; we have had many problems
with incorrect behavior of NFS version 2 (often seen as a hang inside of
fcntl); NFS version 3 seems to work.  The C++ interface requires a fairly
complete C++ implementation, including system header files.  LINUX and IRIX 5,
for example, do not seem to have all the header files that are required.
You can turn off building ROMIO and C++ with -noromio and -noc++ respectively.

MPI library name:  We have changed the default name of the MPICH MPI library
from libmpi.a to libmpich.a .  As more and more vendors provide their own,
platform specific, MPI's, we have found that many users encounter problems
as linkers find libmpi.a in system directories and preferentially choose them
over the -Ldirectory_path -lmpi version of MPICH.  By using the mpich name,
this confusion can be avoided.  Users that use either the mpicc/mpif77 scripts
or the mpireconfig scripts will not need to change their source Makefiles
or scripts.  

Minor Changes

There is now partial support for Fortran 90.  Your Fortran 77 and Fortran 90
compilers must produce compatible code (for example, the external names as
seen by the linker must be the same), and you will need to specify several
features of your Fortran 90 compiler on the configure line. Only the most
basic MPI module is defined (MPICH does not yet support the "extended Fortran
support" as defined in the MPI standard).

Datatypes are aligned to the largest "natural" unit.  This is a common choice
for many compilers, but isn't universal.  We continue to recommend the use
of MPI_UB to ensure that the structure is as large as you think.

Important note about Upshot/Nupshot

A new version, Jumpshot, has been written in Java.  We have found that tcl/tk
is far too unstable to base software on, and we can no longer afford to make 
the constant changes needed to keep upshot and nupshot compatible with every 
change in tcl/tk.  The Jumpshot version has new features and should be
significantlyly easier to maintain.

                        Version 1.1 April, 1997

Major Changes

The biggest change is an update of the internals of MPICH.  A major part of
this is a change to a new "abstract device (ADI-2)" that MPICH uses to achieve
portability.  This change has allowed us to fix some long-standing bugs and 
to offer a higher-performing interface.  However, some devices have not been
updated to the new interface, and are not available.  In many cases, this
is because we have no access to the machines (e.g., Ncube, CM5) or the
previous devices where produced by vendors who have not updated them (Intel
NX, Meiko).  Other devices, such as the t3d or ch_nexus devices, are not yet
ready.  Rather than wait for all of the devices to be updated, we are
releasing MPICH 1.1 know, and providing "unbundled" devices later.  See the
MPICH web page (http://www.mcs.anl.gov/mpich/download.html) for details and
for what is currently available.

All of the MPI constants (except MPI_BOTTOM) are now constants in the
language, not just unchanging between MPI_Init and MPI_Finalize as in MPI 1.0.
This required a great many changes to the internals of MPICH; unfortunately
there was no way to avoid this and comply with the most recent (1.1) version
of the MPI standard.  All MPI programs must be recompiled.

Minor Changes

On IBM SPx systems, the program spxcp is provided to copy the exectuable
using the high performance switch.  This can significantly shorten startup
time when using large numbers of processes.  It may also eliminate "Cannot 
configure system calls" errors that users quite properly are perplexed about
(these come from the IBM runtime system, not MPICH).  Currently, only
mpirun.anlspx makes use of this feature.  

There are many bug fixes and enhancements to the code.  

    Handling of MPI_UB and MPI_LB types has been corrected
 
    Code is much cleaner - few warnings are generated even at paranoid
    warning settings (any many of these come from system header files, 
    such as X11R5 headers).

    Executables built for the ch_p4 and ch_shmem devices can be run on 
    a single processor without using mpirun - just run them like any
    other program.

Important Note

If you have any problems installing or using MPICH, first check the
installation and users manuals; both contain troubleshooting sections.
If you don't find the answer there, send mail to mpi-bugs@mcs.anl.gov.
Do NOT use the newsgroup comp.parallel.mpi ; this newsgroup is for discussions
about MPI, not any particular implementation of MPI.

			Version 1.0.13 (July 26, 1996)

This is a joint-effort project between Argonne (Bill Gropp and Rusty Lusk) and
Mississippi State (Tony Skjellum and Nathan Doss).  Of course the hardest work
was getting the specification right, which was done by the MPI Forum as a
whole.  Information on MPI can be found on the web through
http://www.mcs.anl.gov/mpi . 

This version represents a major restructuring of the internals of MPICH and
the introduction of a new abstract device interface (ADI) which is simpler in
its implementation while offering enhanced functionality.

In addition, a number of parts of the implementation, such as datatypes,
attributes and topologies, have been modularized so that they can be used and
replaced separately from the rest of the MPICH implementation (note that there
is still a lot of cross definitions, so don't expect to be able to use the
attribute mechanism directly in your search tree code).

Error handling has been improved, particularly for errors caught by an MPI
routine called by another MPI routine.  The man pages now indicate (many of)
the errors detected by each MPI routine.  

MPICH is now supported by the Totalview debugger (see "Debugging" below).
This provides a truly parallel debugging environment for the ch_p4 device.

Major Changes

By default, configure now selects -mpe -mpedbg -nodevdebug .  You can get the
previous behavior with -nompe -nompedbg -devdebug .

SGI users should check the installation manual.  There is new support for
the various code modes (-64, -n32, and -32) and new architecture types.
To get 64 bit libraries, one should pass the -64 flag to the compilers and
linker with

   configure -cc='cc -64' -fc='f77 -64' ....

MPI_Cancel is now supported.

                           Getting Started

The "Quick Start" section of the installation guide (found in
mpich/doc/install.ps.Z) gives the following steps for installation and minimal
testing:

  Get mpich.tar.Z or mpich.tar.gz by anonymous ftp from info.mcs.anl.gov
       in the directory pub/mpi.

  (If this file is too big, you can get it in pieces from the subdirectory
   mpisplit.  If you are in Europe, look at 

    ftp://ftp.rus.uni-stuttgart.de/pub/parallelrechner/MPI/ARGONNE
    ftp://ftp.jussieu.fr/pub9/parallel/standards/mpi/anl
    ftp://ftp.ibp.fr/pub9/parallel/standards/mpi/anl
    ftp://ftp.unix.hensa.ac.uk/pub/parallel/standards/mpi/anl
    ftp://ftp.irisa.fr/pub/mirrors/mpi/
  )

  zcat mpich.tar.Z | tar xovf -

(Some systems don't accept the o argument to tar - you can use just xvf, but
you should upgrade to a better system)

OR

  gunzip -c mpich.tar | tar xovf -

THEN

  cd mpich

  configure -arch=sun4 -device=ch_p4 (for example)

( 
  if you get errors from sed, try using configure.2 instead of configure; e.g.
  configure.2 -arch=sun4 -device=ch_p4
)
  make

  On workstation networks, or to run on a single workstation, edit
    mpich/util/machines/machines.sun4 file to reflect your local host
    names.  (sun4 is used here as an example; most other workstations are
    supported as well.)  On parallel machines, this step is not needed.  See
    the README file in the mpich/util/machines directory.

  cd examples/basic

  make cpi

  mpirun -np 4 cpi

  The installation and user's guides (in doc/install.ps.Z and guide.ps.Z) 
  contain helpful information on diagnosing problems; please check there 
  before sending mail to mpi-bugs@mcs.anl.gov (see below on how to
  submit bug reports).

  More information on special configurations, installing MPICH, and 
  testing it are in the installation and users guides.

			Changes since Version 1.0.12

1.  A new device, ch_nexus, has been added.  For more information, see
    http://www.mcs.anl.gov/people/geisler/projects/mpi.html

2.  Many fixes to the handling of heterogeneous systems.  

3.  The ch_p4 device is now more tolerant of buggy implementations of rsh.
    This is particularly true of Kerberized versions of rsh, which contain
    a serious bug (if you had this problem, tell your vendor to read
    the man page for select and make their rsh use select correctly).

4.  Many fixes to the handling of heterogeneous systems.  

5.  Fixed some bugs in the ch_shmem device, particularly in error handling.

6.  The test suite is more extensive.

7.  Various and sundry bugs have been fixed.

                            Known Problems

    On Cray PVP (XMP,YMP,C90) and Cray T3D, some operations with character
    data are not supported from Fortran.  This causes
    examples/test/pt2pt/structf to fail.  The 1.0.12 release contains code
    from Laurie Costella of CRAY that should fix some of the problems.

    Some systems may occasionally fail (die or deadlock) in the ch_p4 device 
    when creating a new connection.  This is caused by interactions between
    user code and code used in the signal handler that the ch_p4 device uses
    to dynamically establish new connections.  To date, we have only seen
    this problem on Solaris.  We are working on a fix for this, but it
    will involve a fairly significant re-arangement of the code.
    You should be able to work around this by sending 0 length messages to
    all of the processes that you will be communicating with before
    doing anything else.

    Fortran LOGICAL data is not handled correctly on some HETEROGENEOUS
    systems (basically, there is no XDR type for Fortran LOGICAL, and the
    current code doesn't convert to/from a standard representation).

    Don't forget to check the on-line buglist at MPICH home page
    http://www.mcs.anl.gov/mpi/mpich/index.html 
    
What is here
============

This directory contains files releated to the Test Implemntation of the MPI
(Message-Passing Interface) Draft Standard.  They are:

mpich.tar.Z          - the implementation itself, sufficient to run on the
                       following machines: Intel NX (i860, Delta, Paragon), 
	               IBM SP1 (using EUI or EUIH), Meiko CS-2, 
	               IBM SP2 (using MPL), 
                       CM5 (using CMMD) and networks of workstations 
                       (Sun, SGI, RS6000, HP, DEC Alpha, ...), and symmetric
                       multiprocessors (SGI, Convex, Sun).
                       This is a link to the most recent release; you will
                       also see files of the form mpich-1.x.x.tar.Z; these
                       are the actual releases.
                       New releases are indicated by a change in the last 
                       number; the value when this readme was written
                       was 1 (as in 1.1.2).

You do not need anything else if you are using either the various MPP versions
(Cray T3D, Intel Paragon, TMC CM5, IBM SP2) or the integrated version of p4
(-device=ch_p4; see below).

Also of interest for workstation clusters is

sut.tar.Z             - Scalable Unix Commands.  This provides versions of
                        common Unix commands, such as ls or ps, that run 
                        in parallel on a selection of machines.  This is
                        a prototype under development and is a Ptools
                        consortium project; your feedback is welcome.
                        Currently only tested on SunOS 4.1.3 and AIX 3.2.5.
                        This is NOT part of MPI.

Installing MPI
==============

Create your mpi root directory (such as ~/mpitest ).  

Ftp the files that you need to that directory.  In particular, on most 
machines you only need the one file mpich.tar.Z .

Uncompress the files (uncompress mpich.tar.Z etc) and untar them
(tar xf mpich.tar.Z etc) .  ("zcat mpich.tar.Z | tar xf - "
also works)

To create the mpi implementation, uncompress and tar the distribution, which
will give you a directory called mpich.  In this directory, you will run
configure and then make.  To see the options for configure, say

    configure -u

On many systems, you can now just say:

    configure
    make

This will try to pick an appropriate set of options.  On platforms that
do cross-compilation (e.g., CM5, Paragon, or SP2 front ends), it is necessary
to give the specific device and architecture.  These are described below.
While MPICH is building, register your copy so that you can receive
notification about future releases.  Information on registering is provided
when you run configure.

****************************************************************************
If you are building MPICH for a network of workstations, and you may be
running some form of crippled remote shell (i.e., in Kerberos or AFS), you may
need to specify an alternate remote shell command with -rsh=commandname
to configure.  See the documentation for more information.
****************************************************************************

To build the network version for sun4's running SunOS 4.x, give the command

    configure -device=ch_p4 -arch=sun4
    make

To build the network version for sun4's running Solaris, give the command

    configure -device=ch_p4 -arch=solaris
    make

To build the network version for Intel x86s running Solaris, give the command

    configure -device=ch_p4 -arch=solaris86
    make

For SGI workstationss, use

    configure -arch=sgi -device=ch_p4
    make 

For an SGI Challenge or Power Challenge, using shared-memory for messages, use

    configure -arch=sgi -device=ch_shmem
    make

For an SGI Challenge or Power Challenge, using shared-memory for messages and
sockets to other machines, use

    configure -arch=sgi -device=ch_p4 -comm=shared

If you need to generate a particular version that corresponds to the -32,
-n32, or -64 compiler/linker options on SGI, use the architectures
IRIX32, IRIXN32, or IRIX64 respectively instead of sgi.
Specifically, use the following for an R10000 SGI:

    configure -arch=IRIX64 \
	-cc="cc -64 -mips4 -r10000" \
	-fc="f77 -64 -mips4 -r10000" \
	-opt="-O2" \
	-device=ch_shmem

    configure -arch=IRIXN32 \
	-cc="cc -n32 -mips4 -r10000" \
	-fc="f77 -n32 -mips4 -r10000" \
	-opt="-O2" \
	-device=ch_shmem

    configure -arch=IRIX32 \
	-cc="cc -32" \
	-fc="f77 -32" \
	-opt="-O2" \
	-device=ch_shmem

(The optimization level is optional; -O2 has worked for some users.  Be
careful of aggressive optimization, particularly in the mpid/ch_shmem code.)

For a Convex Exemplar, please get the official version from Convex/HP.
This is based on MPICH, but has been tuned for better performance on the
Exemplar.  If for some reason you want to use the shared memory version
of MPICH on the Convex, use

    configure -arch=hpux -device=ch_shmem
    make

For a Sun multiprocessor (Solaris), use

    configure -arch=solaris -device=ch_shmem
    make

To build for nX on an ipsc/2 or ipsc/860, use

    configure -arch=intelnx -device=ch_nx
    make

To build for nX on an Intel Paragon, use

    configure -arch=paragon -device=ch_nx
    make
    Fortran programs will need to use a ABSOLUTE path for the mpif.h
    include file, due to a bug in the if77 compiler (it searches include
    directories in the wrong order).

To build for nX on a paragon, use
    (Not currently supported)
    configure -arch=paragon -device=nx
    make

To build for the Meiko CS-2, use
    (Not currently supported)
    configure -arch=meiko -device=meiko
    make

To build for UNICOS on a CRAY C90, CRAY Y-MP, or CRAY J90
(This will use TCP sockets for communication.)

    configure -device=ch_p4 -arch=cray
    make

To build for the Ncube, use
    (Not currently supported)
    configure -arch=ncube -device=ch_nc
    make

    NOTE: The ncube is currently unsupported as we have no access to one
    at this time.  In particular, there may be problems with the mpirun 
    command.  

To build for the CM-5, use

    configure -arch=sun4 -device=ch_cmmd
    make
    (Not extensively tested; bug reports, particularly with fixes, encouraged)

To build for SP1, SP2, or rs6000's using MPL, give the command

    configure -device=ch_mpl -arch=rs6000
    make

If you have version 2 of MPL, IBM has included their own MPI with it.  We 
recommend that you use that instead.  If you need to use ours, just use

    configure -device=ch_mpl -arch=rs6000 
    make

The Makefile contains some sample targets that run configure with the
correct options for our installation; you may find these instructive.

Some useful additional flags you can give to configure include:

-mpe	Build the MPE extensions to mpi. 

-mpedbg Enable support for starting a debugger when an MPI error or
        signal occurs.  Currently supported only for systems that can
        start xterms running dbx.

-nof77	Don't build the fortran interface. This option has only been
	lightly tested, but should work.

-prefix=<location>
	Use <location> as the top-level directory to install MPI in
	when installing. This option can also be set at install time
	by specifiying PREFIX=<location> to the make install. 

To use mpirun on a homogeneous network, make the file
util/machines/machines.<arch> to be a list of machines that mpirun can use to
run jobs. Example machine files are installed in util/machines.anl.

After building a specific version as above, you can install it with

    make install		  

if you used -prefix at configure time, or,

    make install PREFIX=/usr/local/mpi    (or whatever directory you like)

if you did not, or want to override the configure-time selection.
If you intend to leave MPI where you built it, you should NOT install it 
(install is used only to move the necessary parts of a built MPICH to another
location).

The installed copy will have the include files, libraries, man pages, and a
few other odds and ends, but not the whole source tree and examples.  There is
a small examples directory for testing the installation and a
location-independent Makefile built during installation, which users can copy
and modify to compile and link against the installed copy.

To rebuild a different version, say 

    make clean

to clean up everything, including the Makefiles constructed by configure,
and then run configure again with the new options.

Testing MPI
===========
The directory "examples/test" contains makefiles and scripts for testing 
the implementation.  If you have set up mpirun (see "Running Programs" below)
correctly, then

cd examples/test
make testing

will run a variety of test programs.  

If you have not setup mpirun, you can still use the test programs in these
directories; you'll just have to write your own script to run them.  
(And send mail to mpi-bugs@mcs.anl.gov describing what problems you had with
mpirun.)

Reporting Bugs
==============
If you have trouble, first check the installation and user manuals (in 
mpich/doc/{guide,user}.ps.Z) .  Next, check the on-line bug list at
http://www.mcs.anl.gov/mpi/mpich .  Finally, if you are still having problems,
send

   The type of system (often, uname -a)
   The output of configure
   The output of make
   Any same programs or tests

to mpi-bugs@mcs.anl.gov .

Running Programs
================

How you start an MPI program depends on the type of device (the -device 
option to configure).  You can also use mpirun to start jobs on many
machines. Here are the choices:

*** mpirun *** 

"mpirun" is a shell script that attempts to hide the differences in starting
jobs for various devices from the user. Mpirun attempts to determine what kind
of machine it is running on and start the required number of jobs on that
machine. On workstation clusters you must supply a file that lists the
different machines that mpirun can use to run remote jobs or specify this file
every time you run mpirun with the -mr_machine file option. The default file
is in util/machines/machines.<arch>.


mpirun typically works like this:
mpirun -np <number of processes> <program name and arguments>

If mpirun can't determine what kind of machine you are on, and it is supported
by the mpi implementation, you can the -mr_machine and -mr_arch options to
tell it what kind of machine you are running on. The current valid values for
mr_machine include:

              meiko     (the ch_meiko device on the meiko)
              paragon   (the nx or ch_nx device on a paragon not running NQS)
              p4        (the ch_p4 device on a workstation cluster)
              sp1       (ch_mpl on ANL's sp2)
	      execer    (a custom script for starting ch_p4 programs
			 without using a procgroup file. This script
                         currently does not work well with interactive
			 jobs)

You should only have to specify mr_arch if mpirun does not recognize
your machine, the default value is wrong, and you are using the p4 or
execer devices. Other options include:

    -h   This help
    -machine <machine name>
         use startup procedure for <machine name>
    -machinefile <machine-file name>
	 Take the list of possible machines to run on from the"
         file <machine-file name>"
    -np <np>
         specify the number of processors to run on
    -nolocal
         don't run on the local machine (only works for 
    -e   Use execer to start p4 programs on workstation
         clusters
    -pg  Use a procgroup file to start p4 programs, not execer
	 (default)
    -leave_pg
         Don't delete the P4 procgroup file after running
    -t   Testing - do not actually run, just print what would be
         executed
    -v   Verbose - thrown in some comments

If you don't want to use mpirun, use the instructions below depending
on which device you have configured your program:
 
*** device=ch_p4 ***
The processors are specified by a p4 procgroup file

Procgroup files contain information about which machines a network of
processes should run on.  

A typical procgroup file looks like:

local 0
sun1.foo.edu 1 /home/me/myprog
sun2.foo.edu 1 /home/me/myprog
sun3.foo.edu 1 /home/me/myprog

to run with 4 processes, one on the machine where the program is executed,
(aloways called "local", and the others on the machines sun1, sun2, and sun3.
The "local 0" means no second process on the local machine.

To run, say on sun0, call the above file something like myprog.pg and do

  myprog -p4pg myprog.pg

More elaborate and flexible procgroup files are possible.  See the p4 manual
for details.  Running on parallel machines often requires a procgroup file
containing "local 15" (for 16 = 15 - 1 processes).  The job also will need
to be started with the appropriate method for a parallel job on that system.

*** device=nx or device=ch_nx ***
Just start the job like any Intel NX program

*** device=ch_mpl ***
Just start the job like any MPL program.  For example, 

poe <programname> <commandline args>

where your environment contains the appropriate definitions.


Debugging
=========

MPICH offers a number of ways to debug programs.  You can start at least the
process with rank 0 in MPI_COMM_WORLD in a debugger by using the -dbx, -gdb,
or -xxgdb flags for the dbx, gdb, and xxgdb debuggers respectively.  You
can also use the Totalview debugger by using the -tv or -totalview flags.

*** Using Totalview *** 

If you start the program with -tv or -totalview, then the totalview debugger
is invoked on the parallel program. Totalview understands the ch_p4 startup
mechanism and various MPICH internal data structures, allowing it to acquire
all of the processes in the job automatically, and display the state of the
message queues.

mpirun uses the value of the environment variable TOTALVIEW to determine the
command to execute on the front of the ultimate command line when the -tv flag
is present. If this environment variable is not set then "totalview" is
used. By setting this envirable you can invoke totalview with additional
flags, or specify exactly where totalview is to be found. However to perform
remote debugging you must ensure that the totalview server (tvdsvr) can be
found on your path as it is immediately after you log in, since it will be
executed via rsh. (Alternatively you should use the "Server Launch Window" to
modify the command used by totalview to start its servers).

Details of Totalview can be found at http://www.dolphinics.com/ .