This is a partial list of the known bugs and problems in MPICH.
Current as of Release 1.2.4
1. When using strict ANSI/ISO C compilation, the use of the "long long" type
generates an error. This is tricky; "long long" is needed by ROMIO on some
systems to get 8-byte integers for file offsets. Basically, if you want large
files, you may need to accept "long long".
2. The Fortran 90 compiler and Fortran compilers may be passed the same
shared-library flags (e.g., -PIC). Some Fortran 90 compilers may not
recognize the same flags, and generate an "unrecognized argument" warning.
3. Some C++ compilers, particularly newer compilers under Solaris, may not
be able to compile the C++ interface code. Configure with --disable-cxx to
turn off the C++ interface.
4. Debugger switches to mpirun (e.g., -dbx, -gdb) may require site-specific
customization. This is caused by differences in the command line syntax for
the debuggers; even for a single vendor, the command line syntax for the
debugger may change from month to month.
To customize mpirun to support the debuggers at your site, look at the files
mpirun_dbg.<debuggername>, for example, mpirun_dbg.dbx or mpirun_dbx.gdb .
You may need to edit these files to fit the behavior of your debugger
5. MPICH is not thread-safe. A thread-safe version is in our plans. More
precisely, MPICH supports the MPI_THREAD_FUNNELED level of thread support.
This is usually adequate for OpenMP loop-parallelism.
6. The choice of default struct alignment is fixed and may not match your
compiler. The best fix is to use an MPI_UB when using MPI_Type_struct to
define the end of the structure. A subsequent version of MPICH will provide
more flexibility on structure alignment (this requires changes to the
implementation of MPI_Type_struct).
7. Shared libraries for MPI. We're still working on this; the lack of a
common method for compiling/creating/using them is a burden. Still, there are
some steps in this direction; see mpich/util/makesharedlib. If you can help
us add to systems for which we can generate shared libraries, please send mail
to firstname.lastname@example.org . The configure switch --enable-sharedlib will build
shared libraries for a few systems. We know about libtool, but have decided
not to use it directly (it is a great source of information).
8. If your Fortran 77 and Fortran90 compilers use different name mappings
(e.g., one uses a single trailing underscore and the other uses two), you
won't be able to use one of them. Instead, you'll need to build two separate
MPICH versions, one for each name mapping. We hoped to use weak symbols
support for this, but it turns out that only one weak symbol is supported, not
many, and we already use weak symbols to efficiently implement the profiling
interface. In general, if your Fortran 77 and Fortran 90 compilers are
different (e.g., different command-line forms, different libraries), you'll
need to build separate Fortran versions for each.
9. Performance of collective routines is inadequate. Many of these are
simple functional implementations. If you would like to contribute better
versions, please contact us at email@example.com . One project to
provide better collective operations is known to us: MagPIe at
10. When configuring with the -f95nag switch or the -fc=f95 switch, the
variable F77 is assigned the value of f95 and is substituted in the mpif77
script. Thus, the mpif77 script is used to run f95 programs. This can be
11. Systems with a Unix System V heritage may show performance problems when
using the ch_p4 device. This is due to the P4 code working around limitations
in the SysV/R2 interface. Later versions don't have the limitation that P4 is
catering to. We have added some code to bypass this old code (the P4
configure tests to make sure that the problem has been fixed). The option
to use with the MPICH configure is -p4_opts=--enable-nonblockread . We have
*not* tested this option much yet, and if something goes wrong, you may see
error messages from MPICH/P4 about "Unexpected EOF on socket".
12. Error reporting does not always go to the correct error handler. In some
cases, MPICH looses track of which communicator's error handler should be
called. In that case, it uses the error handler on MPI_COMM_WORLD.
13. The Fortran 90 module build can fail on systems (like SunOS 4.x) whose
ar commands support only short (e.g., 15 character) file names. We do not
plan to fix this, since all new operating systems support longer names.
14. On Cray PVP (XMP,YMP,C90) and Cray T3D, some operations with character
data are not supported from Fortran. This causes examples/test/pt2pt/structf
to fail. The 1.0.12 and all subsequent releases contains code from
Laurie Costella of CRAY that should fix some of the problems.
15. Fortran LOGICAL data is not handled correctly on some HETEROGENEOUS
systems (basically, there is no XDR type for Fortran LOGICAL, and the current
code doesn't convert to/from a standard representation).
1. MPE Logging doesn't currently record communicator information. This can
cause upshot/jumpshot/nupshot to become confused about message arrows.
2. MPE Logging uses the MPICH definitions for MPI_PROC_NULL, MPI_ANY_TAG, and
MPI_ANY_SOURCE. This is fine for MPICH, but can cause problems when using MPE
logging with other MPI implementations.
3. The mechanism that upshot/nupshot/jumpshot uses to match sends and receives
can be fooled by multiple messages from the same source and with the same tag,
particularly when nonblocking sends and receives are used.
1. There are no man pages for the java-based programs.
1. There are problems with NFS (they are not in ROMIO).
NFS requires special care and use in order to get correct behavior when
multiple hosts access or try to lock the same file. Without this special
care, NFS silently fails (for example, a file lock system call will succeed,
but the actual file lock will not be correctly handled. This is considered
a feature, not a bug, of NFS. Go figure). If you need to use NFS, then you
should do the following:
Make sure that you are using version 3 of NFS
Make sure that attribute caching is turned off
This will have various negative consequences (automounts won't work, some
other operations will be slower). The up side is that file operations will
be correctly implemented. This is an instance of "do you want it fast or
correct; you can't have both". More details on this may be found in the
2. Error messages in ROMIO do not use the MPI Error_string service yet.
3. ROMIO Fortran tests may use getarg; some systems require pxfgetarg.
1. The configure that is part of the Notre Dame C++ bindings package (which
is what MPICH uses) assigns compiler options for the C++ compiler based
on the system type. This will cause errors if you don't use the compiler
that the C++ package expects. For example, you cannot use g++ on HPUX
Fortran 90 Modules
1. Because the Fortran module defines an interface for the MPI I/O routines
that are provided by ROMIO, you cannot build the MPI module if ROMIO is not
1. Configuring with the --enable-devdebug flag does not propagate to the
configure in the specific device.
1. In certain cases, established connections can be lost by the TCP layer
(this is under the MPICH layer). MPICH depends on TCP to be reliable; when
the TCP implementation decides that a connection has failed, so does MPICH.
It turns out that the algorithm that some TCP implementations uses to decide
that a connection has failed can confuse network congestion and local node CPU
load with a failed connection. To fix this, we will either need to use UDP
instead of TCP or add our own reliability layer on top of TCP. Only under
LINUX has this been a serious problem, due to various bugs in the LINUX TCP
implementation. See http://www.icase.edu/coral/LinuxTCP2.html for more
information about LINUX TCP problems and some fixes.
This problem affects more than MPICH; we have had xterms and Emacs sessions
fail because of this problem with TCP.
On the plus side, in MPICH version 1.2.0, we introduced a better flow control
algorithm that can reduce but not eliminate the likelyhood of the TCP
implementation dropping the connection.
1. This device is no longer supported. If someone would like to send us a
patch that would make it work, we'll make it available as part of the regular
1. The "lock free shared memory" device requires very specific memory ordering
or operations; this usually requires either adding assembly language
instructions (e.g., a write sync) or special compiler options. You can't just
configure/make this device, alas.
1. See the web page http://www.niu.edu/mpi for information about known
problems with the globus2 device.
1. The Fortran interfaces may not compile because of MPIR_FROM_FLOG. This
is a macro that converts Fortran logical values to C booleans. The fix is to
put the argument to MPIR_FROM_FLOG into a temporary int variable and pass that
1. System include files in 2.1 before 2.1.128 have errors than can keep MPICH
from compiling. Update your version of Linux to fix these problems.
1. ch_shmem doesn't work. One one system, the semget call generates a SIGSYS
interrupt. This appears to be an error in FreeBSD, as semget should not
generate this signal (EACCESS, EINVAL, or ENOSPC should be used if the semget
calls fails in this circumstance).
1. Some users have had trouble compiling with pgcc (in sys/signal.h); using
gcc fixed the problem.
1. In some cases, when an MPI program aborts, all of the processes in the
process group that created that program (including the *parents* of the MPI
process) will exit.
1. This is not an MPICH bug, but it may look like one. If you are using file
servers running AIX, providing NFS file systems for LINUX machines, you may
experience file corruption problems. You can either (1) work entirely in a
local (UFS, not NFS) file system or (2) change to an NFS file system on
something other than AIX.
2. This is also not an MPICH bug. Some older versions of the Absoft Fortran
compiler had a serious bug: it only allowed one -I option, and if there are
several -I options on the command line, it ignores all but one. In this case,
you must either copy the include files to the current directory, or setup
links to the necessary files. More recent versions do not have this problem;
you should update your compiler to fix this. For more information, see the
MPICH installation manual.
3. Viewing the manuals with Pageview (Solaris). pageview has trouble with the
Postcript files for the installation and user guides. To work around this,
choose the option, "Ignore PostScript Structuring Comments", under
"Properties", in pageview.
Also, some programs apparently don't do a good job of displaying these files.
If you are having this problem, you should try ghostscript instead (there is
nothing wrong with the Postscript). For example, Adobe tools that convert
Postscript files into PDF do a remarkably poor job; we recommend that you view
the Postscript files rather than converting them into PDF.
An alternative solution is to view the PDF versions of the documents.