File: MATSOLVERMUMPS.html

package info (click to toggle)
petsc 3.10.3%2Bdfsg1-5
links: PTS, VCS
area: main
in suites: buster
size: 209,064 kB
sloc: ansic: 587,333; python: 29,696; makefile: 12,445; fortran: 11,626; f90: 9,677; cpp: 8,768; sh: 1,027; xml: 621; objc: 445; csh: 194; java: 13
file content (153 lines) | stat: -rw-r--r-- 13,162 bytes
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML3.2 EN">
<HTML>
<HEAD> <link rel="canonical" href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERMUMPS.html" />
<META NAME="GENERATOR" CONTENT="DOCTEXT">
<TITLE>MATSOLVERMUMPS</TITLE>
</HEAD>
<BODY BGCOLOR="FFFFFF">
   <div id="version" align=right><b>petsc-3.10.3 2018-12-18</b></div>
   <div id="bugreport" align=right><a href="mailto:petsc-maint@mcs.anl.gov?subject=Typo or Error in Documentation &body=Please describe the typo or error in the documentation: petsc-3.10.3 v3.10.3 docs/manualpages/Mat/MATSOLVERMUMPS.html "><small>Report Typos and Errors</small></a></div>
<A NAME="MATSOLVERMUMPS"><H1>MATSOLVERMUMPS</H1></A>
A matrix type providing direct solvers (LU and Cholesky) for distributed and sequential matrices via the external package MUMPS. Works with <A HREF="../Mat/MATAIJ.html#MATAIJ">MATAIJ</A> and <A HREF="../Mat/MATSBAIJ.html#MATSBAIJ">MATSBAIJ</A> matrices
<P>
Use ./configure --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch  to have PETSc installed with MUMPS
<P>
Use ./configure --with-openmp --download-hwloc (or --with-hwloc) to enable running MUMPS in MPI+OpenMP hybrid mode and non-MUMPS in flat-MPI mode. See details below.
<P>
Use -pc_type cholesky or lu -pc_factor_mat_solver_type mumps to use this direct solver
<P>
<H3><FONT COLOR="#CC3333">Options Database Keys</FONT></H3>
<TABLE border="0" cellpadding="0" cellspacing="0">
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_1 </B></TD><TD>- ICNTL(1): output stream for error messages
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_2 </B></TD><TD>- ICNTL(2): output stream for diagnostic printing, statistics, and warning
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_3 </B></TD><TD>- ICNTL(3): output stream for global information, collected on the host
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_4 </B></TD><TD>- ICNTL(4): level of printing (0 to 4)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_6 </B></TD><TD>- ICNTL(6): permutes to a zero-free diagonal and/or scale the matrix (0 to 7)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_7 </B></TD><TD>- ICNTL(7): computes a symmetric permutation in sequential analysis (0 to 7). 3=Scotch, 4=PORD, 5=Metis
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_8  </B></TD><TD>- ICNTL(8): scaling strategy (-2 to 8 or 77)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_10  </B></TD><TD>- ICNTL(10): max num of refinements
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_11  </B></TD><TD>- ICNTL(11): statistics related to an error analysis (via -ksp_view)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_12  </B></TD><TD>- ICNTL(12): an ordering strategy for symmetric matrices (0 to 3)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_13  </B></TD><TD>- ICNTL(13): parallelism of the root node (enable ScaLAPACK) and its splitting
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_14  </B></TD><TD>- ICNTL(14): percentage increase in the estimated working space
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_19  </B></TD><TD>- ICNTL(19): computes the Schur complement
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_22  </B></TD><TD>- ICNTL(22): in-core/out-of-core factorization and solve (0 or 1)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_23  </B></TD><TD>- ICNTL(23): max size of the working memory (MB) that can allocate per processor
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_24  </B></TD><TD>- ICNTL(24): detection of null pivot rows (0 or 1)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_25  </B></TD><TD>- ICNTL(25): compute a solution of a deficient matrix and a null space basis
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_26  </B></TD><TD>- ICNTL(26): drives the solution phase if a Schur complement matrix
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_28  </B></TD><TD>- ICNTL(28): use 1 for sequential analysis and ictnl(7) ordering, or 2 for parallel analysis and ictnl(29) ordering
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_29 </B></TD><TD>- ICNTL(29): parallel ordering 1 = ptscotch, 2 = parmetis
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_30 </B></TD><TD>- ICNTL(30): compute user-specified set of entries in inv(A)
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_31 </B></TD><TD>- ICNTL(31): indicates which factors may be discarded during factorization
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_icntl_33 </B></TD><TD>- ICNTL(33): compute determinant
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_cntl_1  </B></TD><TD>- CNTL(1): relative pivoting threshold
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_cntl_2  </B></TD><TD>- CNTL(2): stopping criterion of refinement
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_cntl_3 </B></TD><TD>- CNTL(3): absolute pivoting threshold
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_cntl_4 </B></TD><TD>- CNTL(4): value for static pivoting
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_cntl_5 </B></TD><TD>- CNTL(5): fixation for null pivots
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>-mat_mumps_use_omp_threads [m] </B></TD><TD>- run MUMPS in MPI+OpenMP hybrid mode as if omp_set_num_threads(m) is called before calling MUMPS.
Default might be the number of cores per CPU package (socket) as reported by hwloc and suggested by the MUMPS manual.
</TD></TR></TABLE>
<P>

<P>
<H3><FONT COLOR="#CC3333">Notes</FONT></H3>
When a MUMPS factorization fails inside a <A HREF="../KSP/KSP.html#KSP">KSP</A> solve, for example with a <A HREF="../KSP/KSP_DIVERGED_PCSETUP_FAILED.html#KSP_DIVERGED_PCSETUP_FAILED">KSP_DIVERGED_PCSETUP_FAILED</A>, one can find the MUMPS information about the failure by calling
<pre>
         <A HREF="../KSP/KSPGetPC.html#KSPGetPC">KSPGetPC</A>(ksp,&amp;pc);
</pre>
<pre>
         <A HREF="../PC/PCFactorGetMatrix.html#PCFactorGetMatrix">PCFactorGetMatrix</A>(pc,&amp;mat);
</pre>
<pre>
         <A HREF="../Mat/MatMumpsGetInfo.html#MatMumpsGetInfo">MatMumpsGetInfo</A>(mat,....);
</pre>
<pre>
         <A HREF="../Mat/MatMumpsGetInfog.html#MatMumpsGetInfog">MatMumpsGetInfog</A>(mat,....); etc.
</pre>
Or you can run with -ksp_error_if_not_converged and the program will be stopped and the information printed in the error message.
<P>
If you want to run MUMPS in MPI+OpenMP hybrid mode (i.e., enable multithreading in MUMPS), but still want to run the non-MUMPS part
(i.e., PETSc part) of your code in the so-called flat-MPI (aka pure-MPI) mode, you need to configure PETSc with --with-openmp --download-hwloc
(or --with-hwloc), and have an MPI that supports MPI-3.0's process shared memory (which is usually available). Since MUMPS calls BLAS
libraries, to really get performance, you should have multithreaded BLAS libraries such as Intel MKL, AMD ACML, Cray libSci or open sourced
OpenBLAS (PETSc has configure options to install/specify them). With these conditions met, you can run your program as before but with
an extra option -mat_mumps_use_omp_threads [m]. It works as if we set OMP_NUM_THREADS=m to MUMPS, with m defaults to the number of cores
per CPU socket (or package, in hwloc term), or number of PETSc MPI processes on a node, whichever is smaller.
<P>
By flat-MPI or pure-MPI mode, it means you run your code with as many MPI ranks as the number of cores. For example,
if a compute node has 32 cores and you run on two nodes, you may use "mpirun -n 64 ./test". To run MPI+OpenMP hybrid MUMPS,
the tranditional way is to set OMP_NUM_THREADS and run with fewer MPI ranks than cores. For example, if you want to have 16 OpenMP
threads per rank, then you may use "export OMP_NUM_THREADS=16 &amp;&amp; mpirun -n 4 ./test". The problem of this approach is that the non-MUMPS
part of your code is run with fewer cores and CPUs are wasted. "-mat_mumps_use_omp_threads [m]" provides an alternative such that
you can stil run your code with as many MPI ranks as the number of cores, but have MUMPS run in MPI+OpenMP hybrid mode. In our example,
you can use "mpirun -n 64 ./test -mat_mumps_use_omp_threads 16".
<P>
If you run your code through a job submission system, there are caveats in MPI rank mapping. We use MPI_Comm_split_type to get MPI
processes on each compute node. Listing the processes in rank ascending order, we split processes on a node into consecutive groups of
size m and create a communicator called omp_comm for each group. Rank 0 in an omp_comm is called the master rank, and others in the omp_comm
are called slave ranks (or slaves). Only master ranks are seen to MUMPS and slaves are not. We will free CPUs assigned to slaves (might be set
by CPU binding policies in job scripts) and make the CPUs available to the master so that OMP threads spawned by MUMPS can run on the CPUs.
In a multi-socket compute node, MPI rank mapping is an issue. Still use the above example and suppose your compute node has two sockets,
if you interleave MPI ranks on the two sockets, in other words, even ranks are placed on socket 0, and odd ranks are on socket 1, and bind
MPI ranks to cores, then with -mat_mumps_use_omp_threads 16, a master rank (and threads it spawns) will use half cores in socket 0, and half
cores in socket 1, that definitely hurts locality. On the other hand, if you map MPI ranks consecutively on the two sockets, then the
problem will not happen. Therefore, when you use -mat_mumps_use_omp_threads, you need to keep an eye on your MPI rank mapping and CPU binding.
For example, with the Slurm job scheduler, one can use srun --cpu-bind=verbsoe -m block:block to map consecutive MPI ranks to sockets and
examine the mapping result.
<P>
PETSc does not control thread binding in MUMPS. So to get best performance, one still has to set OMP_PROC_BIND and OMP_PLACES in job scripts,
for example, export OMP_PLACES=threads and export OMP_PROC_BIND=spread. One does not need to export OMP_NUM_THREADS=m in job scripts as PETSc
calls omp_set_num_threads(m) internally before calling MUMPS.
<P>
<H3><FONT COLOR="#CC3333">References</FONT></H3>
<TABLE border="0" cellpadding="0" cellspacing="0">
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>1. </B></TD><TD>- Heroux, Michael A., R. Brightwell, and Michael M. Wolf. "Bi-modal MPI and MPI+ threads computing on scalable multicore systems." IJHPCA (Submitted) (2011).
</TD></TR>
<TR><TD WIDTH=40></TD><TD ALIGN=LEFT VALIGN=TOP><B>2. </B></TD><TD>- Gutierrez, Samuel K., et al. "Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications." Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International. IEEE, 2017.
</TD></TR></TABLE>
<P>
<H3><FONT COLOR="#CC3333">See Also</FONT></H3>
 <A HREF="../PC/PCFactorSetMatSolverType.html#PCFactorSetMatSolverType">PCFactorSetMatSolverType</A>(), <A HREF="../Mat/MatSolverType.html#MatSolverType">MatSolverType</A>, MatMumpsSetICntl(), <A HREF="../Mat/MatMumpsGetIcntl.html#MatMumpsGetIcntl">MatMumpsGetIcntl</A>(), <A HREF="../Mat/MatMumpsSetCntl.html#MatMumpsSetCntl">MatMumpsSetCntl</A>(), <A HREF="../Mat/MatMumpsGetCntl.html#MatMumpsGetCntl">MatMumpsGetCntl</A>(), <A HREF="../Mat/MatMumpsGetInfo.html#MatMumpsGetInfo">MatMumpsGetInfo</A>(), <A HREF="../Mat/MatMumpsGetInfog.html#MatMumpsGetInfog">MatMumpsGetInfog</A>(), <A HREF="../Mat/MatMumpsGetRinfo.html#MatMumpsGetRinfo">MatMumpsGetRinfo</A>(), <A HREF="../Mat/MatMumpsGetRinfog.html#MatMumpsGetRinfog">MatMumpsGetRinfog</A>(), <A HREF="../KSP/KSPGetPC.html#KSPGetPC">KSPGetPC</A>(), PCGetFactor(), <A HREF="../PC/PCFactorGetMatrix.html#PCFactorGetMatrix">PCFactorGetMatrix</A>()
<BR>
<P>
<P><B></B><H3><FONT COLOR="#CC3333">Level</FONT></H3>beginner<BR>
<H3><FONT COLOR="#CC3333">Location</FONT></H3>
</B><A HREF="../../../src/mat/impls/aij/mpi/mumps/mumps.c.html#MATSOLVERMUMPS">src/mat/impls/aij/mpi/mumps/mumps.c</A>
<P><H3><FONT COLOR="#CC3333">Examples</FONT></H3>
<A HREF="../../../src/ksp/ksp/examples/tutorials/ex52.c.html">src/ksp/ksp/examples/tutorials/ex52.c.html</A><BR>
<A HREF="../../../src/ksp/ksp/examples/tutorials/ex53.c.html">src/ksp/ksp/examples/tutorials/ex53.c.html</A><BR>
<A HREF="../../../src/ksp/ksp/examples/tutorials/ex52f.F90.html">src/ksp/ksp/examples/tutorials/ex52f.F90.html</A><BR>
<BR><A HREF="./index.html">Index of all Mat routines</A>
<BR><A HREF="../../index.html">Table of Contents for all manual pages</A>
<BR><A HREF="../singleindex.html">Index of all manual pages</A>
</BODY></HTML>