File: KnownBugs

package info (click to toggle)
mpich 1.2.5.3-5
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 53,880 kB
  • ctags: 44,904
  • sloc: ansic: 260,029; cpp: 91,556; sh: 42,421; java: 33,448; makefile: 8,959; fortran: 4,601; tcl: 3,548; f90: 3,517; perl: 2,251; asm: 999; csh: 856
file content (262 lines) | stat: -rw-r--r-- 11,741 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
This is a partial list of the known bugs and problems in MPICH. 


Current as of Release 1.2.4

General:
========

1. When using strict ANSI/ISO C compilation, the use of the "long long" type
generates an error.  This is tricky; "long long" is needed by ROMIO on some
systems to get 8-byte integers for file offsets.  Basically, if you want large
files, you may need to accept "long long". 

2. The Fortran 90 compiler and Fortran compilers may be passed the same
shared-library flags (e.g., -PIC).  Some Fortran 90 compilers may not
recognize the same flags, and generate an "unrecognized argument" warning.

3. Some C++ compilers, particularly newer compilers under Solaris, may not
be able to compile the C++ interface code.  Configure with --disable-cxx to
turn off the C++ interface.

4. Debugger switches to mpirun (e.g., -dbx, -gdb) may require site-specific
customization.  This is caused by differences in the command line syntax for
the debuggers; even for a single vendor, the command line syntax for the
debugger may change from month to month.

To customize mpirun to support the debuggers at your site, look at the files
mpirun_dbg.<debuggername>, for example, mpirun_dbg.dbx or mpirun_dbx.gdb .
You may need to edit these files to fit the behavior of your debugger
(particularly mpirun_dbg.dbx).

5. MPICH is not thread-safe.  A thread-safe version is in our plans.  More
precisely, MPICH supports the MPI_THREAD_FUNNELED level of thread support.
This is usually adequate for OpenMP loop-parallelism.

6. The choice of default struct alignment is fixed and may not match your
compiler.  The best fix is to use an MPI_UB when using MPI_Type_struct to
define the end of the structure.  A subsequent version of MPICH will provide
more flexibility on structure alignment (this requires changes to the
implementation of MPI_Type_struct).

7. Shared libraries for MPI.  We're still working on this; the lack of a
common method for compiling/creating/using them is a burden.  Still, there are
some steps in this direction; see mpich/util/makesharedlib.  If you can help
us add to systems for which we can generate shared libraries, please send mail
to mpi-bugs@mcs.anl.gov .  The configure switch --enable-sharedlib will build
shared libraries for a few systems.  We know about libtool, but have decided
not to use it directly (it is a great source of information).

8. If your Fortran 77 and Fortran90 compilers use different name mappings
(e.g., one uses a single trailing underscore and the other uses two), you
won't be able to use one of them.  Instead, you'll need to build two separate
MPICH versions, one for each name mapping.  We hoped to use weak symbols
support for this, but it turns out that only one weak symbol is supported, not
many, and we already use weak symbols to efficiently implement the profiling
interface.  In general, if your Fortran 77 and Fortran 90 compilers are
different (e.g., different command-line forms, different libraries), you'll
need to build separate Fortran versions for each.  

9. Performance of collective routines is inadequate.  Many of these are
simple functional implementations.  If you would like to contribute better
versions, please contact us at mpi-bugs@mcs.anl.gov .  One project to
provide better collective operations is known to us: MagPIe at
http://www.cs.vu.nl/albatross/ .

10.  When configuring with the -f95nag switch or the -fc=f95 switch, the 
variable F77 is assigned the value of f95 and is substituted in the mpif77 
script.  Thus, the mpif77 script is used to run f95 programs.  This can be
very misleading.

11. Systems with a Unix System V heritage may show performance problems when
using the ch_p4 device.  This is due to the P4 code working around limitations
in the SysV/R2 interface.  Later versions don't have the limitation that P4 is
catering to.  We have added some code to bypass this old code (the P4
configure tests to make sure that the problem has been fixed).  The option
to use with the MPICH configure is -p4_opts=--enable-nonblockread .  We have 
*not* tested this option much yet, and if something goes wrong, you may see
error messages from MPICH/P4 about "Unexpected EOF on socket".  

12. Error reporting does not always go to the correct error handler.  In some
cases, MPICH looses track of which communicator's error handler should be
called.  In that case, it uses the error handler on MPI_COMM_WORLD.

13. The Fortran 90 module build can fail on systems (like SunOS 4.x) whose
ar commands support only short (e.g., 15 character) file names.  We do not
plan to fix this, since all new operating systems support longer names.

14. On Cray PVP (XMP,YMP,C90) and Cray T3D, some operations with character
data are not supported from Fortran.  This causes examples/test/pt2pt/structf
to fail.  The 1.0.12 and all subsequent releases contains code from
Laurie Costella of CRAY that should fix some of the problems.

15. Fortran LOGICAL data is not handled correctly on some HETEROGENEOUS
systems (basically, there is no XDR type for Fortran LOGICAL, and the current
code doesn't convert to/from a standard representation).


MPE:
====

1. MPE Logging doesn't currently record communicator information.  This can
cause upshot/jumpshot/nupshot to become confused about message arrows.

2. MPE Logging uses the MPICH definitions for MPI_PROC_NULL, MPI_ANY_TAG, and
MPI_ANY_SOURCE.  This is fine for MPICH, but can cause problems when using MPE
logging with other MPI implementations.

3. The mechanism that upshot/nupshot/jumpshot uses to match sends and receives
can be fooled by multiple messages from the same source and with the same tag,
particularly when nonblocking sends and receives are used.

JUMPSHOT:
========

1.  There are no man pages for the java-based programs.

ROMIO:
======

1. There are problems with NFS (they are not in ROMIO).  
   NFS requires special care and use in order to get correct behavior when
   multiple hosts access or try to lock the same file.  Without this special
   care, NFS silently fails (for example, a file lock system call will succeed,
   but the actual file lock will not be correctly handled.  This is considered
   a feature, not a bug, of NFS.  Go figure).  If you need to use NFS, then you
   should do the following:  
       Make sure that you are using version 3 of NFS
       Make sure that attribute caching is turned off
   This will have various negative consequences (automounts won't work, some
   other operations will be slower).  The up side is that file operations will
   be correctly implemented.  This is an instance of "do you want it fast or
   correct; you can't have both".  More details on this may be found in the
   ROMIO README.

2. Error messages in ROMIO do not use the MPI Error_string service yet.

3. ROMIO Fortran tests may use getarg; some systems require pxfgetarg.

C++ Bindings:
=============

1. The configure that is part of the Notre Dame C++ bindings package (which
   is what MPICH uses) assigns compiler options for the C++ compiler based
   on the system type.  This will cause errors if you don't use the compiler
   that the C++ package expects.  For example, you cannot use g++ on HPUX 
   systems.  

Fortran 90 Modules
==================
1. Because the Fortran module defines an interface for the MPI I/O routines 
that are provided by ROMIO, you cannot build the MPI module if ROMIO is not
built.  

Device Specific:
===============

All devices:

1.  Configuring with the --enable-devdebug flag does not propagate to the
    configure in the specific device.

ch_p4 device:

1. In certain cases, established connections can be lost by the TCP layer
(this is under the MPICH layer).  MPICH depends on TCP to be reliable; when
the TCP implementation decides that a connection has failed, so does MPICH.
It turns out that the algorithm that some TCP implementations uses to decide
that a connection has failed can confuse network congestion and local node CPU
load with a failed connection.  To fix this, we will either need to use UDP
instead of TCP or add our own reliability layer on top of TCP.  Only under
LINUX has this been a serious problem, due to various bugs in the LINUX TCP
implementation.  See http://www.icase.edu/coral/LinuxTCP2.html for more
information about LINUX TCP problems and some fixes.

This problem affects more than MPICH; we have had xterms and Emacs sessions
fail because of this problem with TCP.

On the plus side, in MPICH version 1.2.0, we introduced a better flow control
algorithm that can reduce but not eliminate the likelyhood of the TCP
implementation dropping the connection.

ch_meiko:

1. This device is no longer supported.  If someone would like to send us a
patch that would make it work, we'll make it available as part of the regular
release. 

ch_lfshmem:

1. The "lock free shared memory" device requires very specific memory ordering
or operations; this usually requires either adding assembly language
instructions (e.g., a write sync) or special compiler options.  You can't just
configure/make this device, alas.

globus2:

1.  See the web page http://www.niu.edu/mpi for information about known
problems with the globus2 device. 

System Specific:
================

Cray J90:

1. The Fortran interfaces may not compile because of MPIR_FROM_FLOG.  This
is a macro that converts Fortran logical values to C booleans.  The fix is to
put the argument to MPIR_FROM_FLOG into a temporary int variable and pass that
to MPIR_FROM_FLOG.

LINUX:

1. System include files in 2.1 before 2.1.128 have errors than can keep MPICH
from compiling.  Update your version of Linux to fix these problems.

FreeBSD:

1. ch_shmem doesn't work.  One one system, the semget call generates a SIGSYS
interrupt.  This appears to be an error in FreeBSD, as semget should not
generate this signal (EACCESS, EINVAL, or ENOSPC should be used if the semget
calls fails in this circumstance).

Solaris86:

1. Some users have had trouble compiling with pgcc (in sys/signal.h); using
gcc fixed the problem.

IRIX:

1. In some cases, when an MPI program aborts, all of the processes in the
process group that created that program (including the *parents* of the MPI
process) will exit.  

Miscellaneous
=============

1. This is not an MPICH bug, but it may look like one.  If you are using file
servers running AIX, providing NFS file systems for LINUX machines, you may
experience file corruption problems.  You can either (1) work entirely in a
local (UFS, not NFS) file system or (2) change to an NFS file system on
something other than AIX.  

2. This is also not an MPICH bug.  Some older versions of the Absoft Fortran
compiler had a serious bug: it only allowed one -I option, and if there are
several -I options on the command line, it ignores all but one.  In this case,
you must either copy the include files to the current directory, or setup
links to the necessary files.  More recent versions do not have this problem;
you should update your compiler to fix this. For more information, see the
MPICH installation manual.

3. Viewing the manuals with Pageview (Solaris).  pageview has trouble with the
Postcript files for the installation and user guides.  To work around this,
choose the option, "Ignore PostScript Structuring Comments", under
"Properties", in pageview. 

Also, some programs apparently don't do a good job of displaying these files.
If you are having this problem, you should try ghostscript instead (there is
nothing wrong with the Postscript).  For example, Adobe tools that convert
Postscript files into PDF do a remarkably poor job; we recommend that you view
the Postscript files rather than converting them into PDF.

An alternative solution is to view the PDF versions of the documents.