1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236
|
MPI/Myrinet
-----------
Grid Engine Parallel Support for MPI/Myrinet
--------------------------------------------
This file describes how to setup a parallel environment integration
which supports running distributed parallel MPI jobs under Grid Engine
using the MPICH/GM software on clusters using Myrinet cards for
communication.
Content
-------
1) Content of this directory hierarchy
2) mpi.template
3) mpich.template
4) mpich_multi.template
5) startmpi.sh
6) stopmpi.sh
7) sge_mpirun
8) Queue Configuration
9) gmps
10) Notes
11) Copyright
1) Content of this directory hierarchy
--------------------------------------
This directory contains the following files and directories:
README this file
README.x the README for MPI/Myrinet (pre MPICH-GM 1.2.4..8a release)
startmpi.sh startup script for MPI/Myrinet
startmpi.sh.x startup script for MPI/Myrinet (pre MPICH-GM 1.2.4..8a release)
stopmpi.sh shutdown script for MPI/Myrinet
mpi.template a MPICH/Myrinet PE template configuration for Grid Engine
(loose integration)
mpich.template a MPICH/Myrinet PE template configuration for Grid Engine
(tight integration)
gmps utility program for reporting Myrinet port usage
hostname a wrapper for the hostname command
sge_mpirun MPIRUN command replacement
sge_mpirun.x MPIRUN command replacement (pre MPICH-GM 1.2.4..8a release)
Please refer to the "Installation and Administration Guide" Chapter "Support
of Parallel Environments" for a general introduction to the Parallel
Environment Interface of Grid Engine.
2) mpi.template
---------------
Use this template as a starting point when establishing a parallel
environment for MPI/Myrinet. You need to replace
<a_list_of_parallel_queues>, <the_number_of_slots>, <your_sge_root>, and
<path_to_mpirun_command> with the appropriate information. See the qconf(1)
and qmon(1) man pages for additional information on how to create parallel
environments.
Grid Engine offers an additional interface which allows a tighter
integration with public domain MPI which is ported for use with
Myrinet cards (MPICH/GM). Tighter integration means that all tasks
of your MPICH application are under full control of Grid Engine.
This is necessary for these additional benefits:
- full accounting for all tasks of MPI jobs
- resource limits are effective for all tasks
- all tasks are started with the appropriate nice value which was
configured as 'priority' in the queues configuration
3) mpich.template
-----------------
Use this template as a starting point when establishing a parallel
environment for MPICH with tight integration. You need to replace
<a_list_of_parallel_queues>, <the_number_of_slots>, <your_sge_root>,
and <path_to_mpirun_command> with the appropriate information.
Here is a list of problems for which tight integration provides solutions
- resource limits are enforced also for tasks at slave hosts
- resource consumption at slave hosts can be accounted
- no need to write a customized terminate method to ensure
that whole job is finished on qdel
Here is a list of problems which are not solved by the tight integration
- can't trigger job finish if application finishes partially. However
the MPICH/GM mpirun.ch_gm command has an option called --gm-kill which
handles this case nicely. A default value for the --gm-kill option can
be set in the sge_mpirun command.
4) mpich_multi.template
-----------------------
Use this PE template for running mixed mode MPI and OpenMP programs. A
single MPI task will be allocated on each host, allowing multiple
OpenMP threads per MPI task. The PE allocation_rule of 2 indicates that
2 slots should be allocated per host. The number in the allocation rule
should be equal to the number of CPUs per host.
5) startmpi.sh
--------------
The starter script 'startmpi.sh' needs some command line arguments, to
be configured by use of either qmon or qconf. The first one is the path
to the "$pe_hostfile" that gets transformed by startmpi.sh into a
MPI machine file. On successful completion startmpi.sh creates a
machine file in $TMPDIR/machines to be passed to "mpirun.ch_gm" at job
start. $TMPDIR is a temporary directory created and removed by the
Grid Engine execution daemon.
The second argument command line argument of the starter script should
be the path of the MPICH/GM mpirun.ch_gm command.
6) stopmpi.sh
-------------
The stop script 'stopmpi.sh' removes files in $TMPDIR created by
startmpi.sh.
7) sge_mpirun
-------------
The sge_mpirun command should be used in the user's job script to
start the MPI program. The sge_mpirun command ensures that the right
number of tasks get started on the hosts scheduled by Grid Engine. The
sge_mpirun command is installed in $SGE_ROOT/mpi/myrinet. You may
want to create a link to it in $SGE_ROOT/bin/<arch>/sge_mpirun.
The sge_mpirun command will then be in the user's PATH, if they
have sourced the Grid Engine settings file.
An alternative to using the sge_mpirun command is for the user
to execute the MPICH/GM mpirun command and provide the machines
file created by Grid Engine.
mpirun.ch_gm --gm-f $TMPDIR/machines --gm-kill 15 -np $NSLOTS a.out
8) Queue Configuration
----------------------
Earlier versions of this integration required a queue per processor
with the "processors" queue attribute containing a unique Myrinet port
number for each queue. This is no longer necessary.
The tmpdir queue attribute on all the parallel queues should be set
to a shared file system. The integration stores some files in the
user's TMPDIR, which must be readable on all the hosts. For instance,
if /usr/var/tmp is setup on your cluster as a shared file system, then
you can set the queue attribute tmpdir to /usr/var/tmp.
For loose integrations, the terminate_method attribute on all the
parallel queues should be set to SIGTERM. This tells Grid Engine to
terminate the MPI jobs using a SIGTERM signal instead of a SIGKILL.
This allows the mpirun.ch_gm command to "clean up" all the MPI tasks
by sending them a SIGTERM signal. The default SIGKILL will result in
the job being deleted, but the MPI tasks will continue running.
Before attempting to run a job with the mpich tight integration, it
is a good idea to verify that the qrsh command works to all of your
parallel hosts. Try the command 'qrsh -l h=<hostname> ls'. One
common reason for a qrsh failure is if the rsh command is not
installed in $SGE_ROOT/utilbin/<arch>/rsh as a root setuid program.
9) gmps
-------
The 'gmps' command is a utility program to report and/or cleanup
processes which are using Myrinet ports. For usage information,
type 'gmps -h'. This command uses a command called gm_board_info
which is part of the GM distribution from Myrinet.
10) Notes
---------
The startmpi.sh, sge_mpirun, and gmps scripts use /bin/ksh (the
Korn shell). Make sure it is installed.
In a Kerberos environment, if you want the MPI tasks to have Kerberos
credentials, then it is critical that the user have valid forwardable
Kerberos tickets when the job is submitted. If the jobs may be queued
for a long period of time, then the user should also have renewable
tickets.
The Myrinet MPICH/GM software can be configured to use rsh or ssh
or a specific path to rsh or ssh. We recommend that you configure
MPICH/GM using the default rsh. This makes it easy for the integration
to override rsh and use Grid Engine's qrsh command to run the MPI tasks
under Grid Engine's control (tight integration). If your MPICH/GM
software is configured to use a hard-coded path to rsh (search for the
string "rexec" in the file mpirun.ch_gm.pl), then you will either need
to change the script or copy the mpirun.ch_gm and mpirun.ch_gm.pl
scripts to a new location (e.g. $SGE_ROOT/mpi/myrinet), modify the
mpirun.ch_gm.pl command to use $SGE_ROOT/mpi/rsh, and update the
start_proc_args attribute in your PE configuration to point to the
path of the updated mpirun.ch_gm command.
11) Copyright
-------------
___INFO__MARK_BEGIN__
The Contents of this file are made available subject to the terms of the Sun
Industry Standards Source License Version 1.2
Sun Microsystems Inc., March, 2001
Sun Industry Standards Source License Version 1.2
=================================================
The contents of this file are subject to the Sun Industry Standards Source
License Version 1.2 (the "License"); You may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://gridengine.sunsource.net/Gridengine_SISSL_license.html
Software provided under this License is provided on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
WITHOUT LIMITATION, WARRANTIES THAT THE SOFTWARE IS FREE OF DEFECTS,
MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE, OR NON-INFRINGING.
See the License for the specific provisions governing your rights and
obligations concerning the Software.
The Initial Developer of the Original Code is: Sun Microsystems, Inc.
Copyright: 2001 by Sun Microsystems, Inc.
All Rights Reserved.
___INFO__MARK_END__
|