File: README

package info (click to toggle)
gridengine 6.2-4
  • links: PTS, VCS
  • area: main
  • in suites: lenny
  • size: 51,532 kB
  • ctags: 51,172
  • sloc: ansic: 418,155; java: 37,080; sh: 22,593; jsp: 7,699; makefile: 5,292; csh: 4,244; xml: 2,901; cpp: 2,086; perl: 1,895; tcl: 1,188; lisp: 669; ruby: 642; yacc: 393; lex: 266
file content (236 lines) | stat: -rw-r--r-- 9,594 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236

                                 MPI/Myrinet
                                 -----------
                 Grid Engine Parallel Support for MPI/Myrinet
                 --------------------------------------------

This file describes how to setup a parallel environment integration
which supports running distributed parallel MPI jobs under Grid Engine
using the MPICH/GM software on clusters using Myrinet cards for
communication.

Content
-------

1) Content of this directory hierarchy
2) mpi.template 
3) mpich.template
4) mpich_multi.template
5) startmpi.sh
6) stopmpi.sh
7) sge_mpirun
8) Queue Configuration
9) gmps
10) Notes
11) Copyright


1) Content of this directory hierarchy
--------------------------------------

This directory contains the following files and directories:

   README           this file 
   README.x         the README for MPI/Myrinet (pre MPICH-GM 1.2.4..8a release)
   startmpi.sh      startup script for MPI/Myrinet
   startmpi.sh.x    startup script for MPI/Myrinet (pre MPICH-GM 1.2.4..8a release)
   stopmpi.sh       shutdown script for MPI/Myrinet
   mpi.template     a MPICH/Myrinet PE template configuration for Grid Engine
                    (loose integration)
   mpich.template   a MPICH/Myrinet PE template configuration for Grid Engine
                    (tight integration)
   gmps             utility program for reporting Myrinet port usage
   hostname         a wrapper for the hostname command
   sge_mpirun       MPIRUN command replacement
   sge_mpirun.x     MPIRUN command replacement (pre MPICH-GM 1.2.4..8a release)

Please refer to the "Installation and Administration Guide" Chapter "Support
of Parallel Environments" for a general introduction to the Parallel
Environment Interface of Grid Engine.


2) mpi.template
---------------

   Use this template as a starting point when establishing a parallel
   environment for MPI/Myrinet. You need to replace
   <a_list_of_parallel_queues>, <the_number_of_slots>, <your_sge_root>, and
   <path_to_mpirun_command> with the appropriate information. See the qconf(1)
   and qmon(1) man pages for additional information on how to create parallel
   environments.

   Grid Engine offers an additional interface which allows a tighter
   integration with public domain MPI which is ported for use with
   Myrinet cards (MPICH/GM). Tighter integration means that all tasks
   of your MPICH application are under full control of Grid Engine.
   This is necessary for these additional benefits:

   - full accounting for all tasks of MPI jobs
   - resource limits are effective for all tasks
   - all tasks are started with the appropriate nice value which was
     configured as 'priority' in the queues configuration


3) mpich.template
-----------------

   Use this template as a starting point when establishing a parallel
   environment for MPICH with tight integration. You need to replace
   <a_list_of_parallel_queues>, <the_number_of_slots>, <your_sge_root>,
   and <path_to_mpirun_command> with the appropriate information.

   Here is a list of problems for which tight integration provides solutions

   - resource limits are enforced also for tasks at slave hosts
   - resource consumption at slave hosts can be accounted
   - no need to write a customized terminate method to ensure
     that whole job is finished on qdel
 
   Here is a list of problems which are not solved by the tight integration

   - can't trigger job finish if application finishes partially. However
   the MPICH/GM mpirun.ch_gm command has an option called --gm-kill which
   handles this case nicely. A default value for the --gm-kill option can
   be set in the sge_mpirun command.


4) mpich_multi.template
-----------------------

    Use this PE template for running mixed mode MPI and OpenMP programs. A
    single MPI task will be allocated on each host, allowing multiple
    OpenMP threads per MPI task. The PE allocation_rule of 2 indicates that
    2 slots should be allocated per host. The number in the allocation rule
    should be equal to the number of CPUs per host.

5) startmpi.sh
--------------

   The starter script 'startmpi.sh' needs some command line arguments, to 
   be configured by use of either qmon or qconf. The first one is the path
   to the "$pe_hostfile" that gets transformed by startmpi.sh into a
   MPI machine file. On successful completion startmpi.sh creates a
   machine file in $TMPDIR/machines to be passed to "mpirun.ch_gm" at job
   start. $TMPDIR is a temporary directory created and removed by the
   Grid Engine execution daemon.
   
   The second argument command line argument of the starter script should
   be the path of the MPICH/GM mpirun.ch_gm command.


6) stopmpi.sh
-------------

   The stop script 'stopmpi.sh' removes files in $TMPDIR created by
   startmpi.sh.


7) sge_mpirun
-------------

   The sge_mpirun command should be used in the user's job script to
   start the MPI program. The sge_mpirun command ensures that the right
   number of tasks get started on the hosts scheduled by Grid Engine. The
   sge_mpirun command is installed in $SGE_ROOT/mpi/myrinet.  You may
   want to create a link to it in $SGE_ROOT/bin/<arch>/sge_mpirun.
   The sge_mpirun command will then be in the user's PATH, if they
   have sourced the Grid Engine settings file.

   An alternative to using the sge_mpirun command is for the user
   to execute the MPICH/GM mpirun command and provide the machines
   file created by Grid Engine.

      mpirun.ch_gm --gm-f $TMPDIR/machines --gm-kill 15 -np $NSLOTS a.out


8) Queue Configuration
----------------------

   Earlier versions of this integration required a queue per processor
   with the "processors" queue attribute containing a unique Myrinet port
   number for each queue.  This is no longer necessary.

   The tmpdir queue attribute on all the parallel queues should be set
   to a shared file system. The integration stores some files in the
   user's TMPDIR, which must be readable on all the hosts. For instance,
   if /usr/var/tmp is setup on your cluster as a shared file system, then
   you can set the queue attribute tmpdir to /usr/var/tmp.

   For loose integrations, the terminate_method attribute on all the
   parallel queues should be set to SIGTERM. This tells Grid Engine to
   terminate the MPI jobs using a SIGTERM signal instead of a SIGKILL.
   This allows the mpirun.ch_gm command to "clean up" all the MPI tasks
   by sending them a SIGTERM signal. The default SIGKILL will result in
   the job being deleted, but the MPI tasks will continue running.

   Before attempting to run a job with the mpich tight integration, it
   is a good idea to verify that the qrsh command works to all of your
   parallel hosts.  Try the command 'qrsh -l h=<hostname> ls'.  One
   common reason for a qrsh failure is if the rsh command is not
   installed in $SGE_ROOT/utilbin/<arch>/rsh as a root setuid program.


9) gmps
-------

   The 'gmps' command is a utility program to report and/or cleanup 
   processes which are using Myrinet ports. For usage information,
   type 'gmps -h'. This command uses a command called gm_board_info
   which is part of the GM distribution from Myrinet.

10) Notes
---------

   The startmpi.sh, sge_mpirun, and gmps scripts use /bin/ksh (the
   Korn shell).  Make sure it is installed.

   In a Kerberos environment, if you want the MPI tasks to have Kerberos
   credentials, then it is critical that the user have valid forwardable
   Kerberos tickets when the job is submitted. If the jobs may be queued
   for a long period of time, then the user should also have renewable
   tickets.

   The Myrinet MPICH/GM software can be configured to use rsh or ssh
   or a specific path to rsh or ssh.  We recommend that you configure
   MPICH/GM using the default rsh.  This makes it easy for the integration
   to override rsh and use Grid Engine's qrsh command to run the MPI tasks
   under Grid Engine's control (tight integration).  If your MPICH/GM
   software is configured to use a hard-coded path to rsh (search for the
   string "rexec" in the file mpirun.ch_gm.pl), then you will either need
   to change the script or copy the mpirun.ch_gm and mpirun.ch_gm.pl
   scripts to a new location (e.g. $SGE_ROOT/mpi/myrinet), modify the
   mpirun.ch_gm.pl command to use $SGE_ROOT/mpi/rsh, and update the
   start_proc_args attribute in your PE configuration to point to the
   path of the updated mpirun.ch_gm command.

11) Copyright
-------------

___INFO__MARK_BEGIN__
The Contents of this file are made available subject to the terms of the Sun
Industry Standards Source License Version 1.2

Sun Microsystems Inc., March, 2001

Sun Industry Standards Source License Version 1.2
=================================================

The contents of this file are subject to the Sun Industry Standards Source
License Version 1.2 (the "License"); You may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://gridengine.sunsource.net/Gridengine_SISSL_license.html

Software provided under this License is provided on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
WITHOUT LIMITATION, WARRANTIES THAT THE SOFTWARE IS FREE OF DEFECTS,
MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE, OR NON-INFRINGING.

See the License for the specific provisions governing your rights and
obligations concerning the Software.

The Initial Developer of the Original Code is: Sun Microsystems, Inc.

Copyright: 2001 by Sun Microsystems, Inc.

All Rights Reserved.
___INFO__MARK_END__