File: LU_serial.tex

package info (click to toggle)
spooles 2.2-16
links: PTS, VCS
area: main
in suites: forky, sid
size: 19,760 kB
sloc: ansic: 146,836; sh: 7,571; csh: 3,615; makefile: 1,970; perl: 74
file content (557 lines) | stat: -rw-r--r-- 21,681 bytes
parent folder | download | duplicates (7)
\vfill \eject
\par
\section{Serial Solution of $A X = Y$ using an $LU$ factorization}
\label{section:LU-serial}
\par
The user has some representation of the data which represents the 
linear system, $AX = Y$.  The user wants the solution $X$.  
The {\bf SPOOLES} library will use $A$ and $Y$ 
and provide $X$ back to the user.  
\par
The {\bf SPOOLES} library is based on an object
oriented design philosophy.  The first object that the user
must interact with is {\tt InpMtx}\footnote{
{\tt InpMtx} stands for {\tt Inp}ut {\tt M}a{\tt t}ri{\tt x},
for it is the object into which the user inputs the matrix
entries.}.  
The {\tt InpMtx} object is where 
the {\bf SPOOLES} representation of $A$ is assembled.  
The user can input the representation of $A$ 
into the {\tt InpMtx} object with methods for
single matrix entry (consisting of the row index, the column index,
and the value), for an array of entries, for a set of
entries in a specified row or column, and for a dense sub-matrices
(useful 
for finite element applications).  All of these methods can be used
interchangeably with each other.
\par
A complete listing of a sample program 
is found in Section~\ref{section:LU-serial-driver}.
We will now begin to
work our way through the program to illustrate the use of {\bf
SPOOLES}
to solve a system of linear equations.  
\par
\subsection{Reading the input parameters}
\label{subsection:serial:input-data}
\par
The program starts by declaring a variety of variables
and pointers for the program.  It then reads 
the following parameters from standard input.
\begin{itemize}
\item
The variable {\tt msglvl} controls the
level of output generated by the program and by {\bf SPOOLES}.
\item
The printed output is sent to {\tt messageFile}.
\item
Whether the matrix is real or complex is controled by {\tt type} 
(1 for real, 2 for complex).
\item
Similarly, {\tt symmetryflag} controls whether the matrix is
symmetric (0), Hermitian (1), or nonsymmetric (2).
\item
The matrix data will be read from the file {\tt matrixFileName}.
The matrix data has a simple format with the first line
containing the number of rows ({\tt nrow}), the number of
columns ({\tt ncol}), and the number of entries ({\tt nent}).
The remaining {\tt nent} lines on the file contain the row
number, the column number, and value for each nonzero in the sparse
matrix.
In our sample case, the matrix is symmetric so only the entries
in the upper triangle are given on the file.
If the matrix is complex, there would be 2 values,
one for the real part and one for the imaginary part.  
{\bf SPOOLES} follows the C language convention for indexing 
all arrays starting with 0.  
So the row and column labels for a matrix of order {\tt neqns} range 
from 0 to {\tt neqns-1}.
\item
The right hand side matrix $Y$ 
will be read from the file {\tt rhsFileName}.
The first line of this file has two numbers: 
{\tt nrow}, the number of rows of $Y$ 
that are present in the file, 
followed by {\tt nrhs}, the number of columns of $Y$.
(The number of rows of $Y$ in the file may be different from the
number of rows in $Y$, since often right hand side matrices are
sparse.
This allows us the option of only reading in nonzero rows of $Y$.)
The remaining lines of the file have the following format:
the row id, followed by either {\tt nrhs} floating point numbers if
the system is real, or {\tt 2*nrhs} numbers if the system is complex.
\item
The {\tt seed} parameter is a random number seed used in the
ordering process.
\end{itemize}
\par
\subsection{Communicating the data for the problem}
\label{subsection:serial:communicating-data}
\par
The following code segment from the full sample program opens the
file {\tt matrixFileName}, reads the first line of the file,
and then initializes the {\tt InpMtx} object.  
The program continues by reading each line of
the input matrix data and uses either the method
{\tt InpMtx\_inputRealEntry()} or {\tt InpMtx\_inputComplexEntry()}
to place that entry into the {\tt InpMtx} object.  
Finally this code segment closes the file.
finalizes the input to {\tt InpMtx} by converting the internal
storage of the matrix entries to a vector form.
(This is necessary for later steps.)
\begin{verbatim}
inputFile = fopen(matrixFileName, "r") ;
fscanf(inputFile, "%d %d %d", &nrow, &ncol, &nent) ;
neqns = nrow ;
mtxA = InpMtx_new() ;
InpMtx_init(mtxA, INPMTX_BY_ROWS, type, nent, neqns) ;
if ( type == SPOOLES_REAL ) {
   double   value ;
   for ( ient = 0 ; ient < nent ; ient++ ) {
      fscanf(inputFile, "%d %d %le", &irow, &jcol, &value) ;
      InpMtx_inputRealEntry(mtxA, irow, jcol, value) ;
   }
} else {
   double   imag, real ;
   for ( ient = 0 ; ient < nent ; ient++ ) {
      fscanf(inputFile, "%d %d %le %le", &irow, &jcol, &real, &imag) ;
      InpMtx_inputComplexEntry(mtxA, irow, jcol, real, imag) ;
   }
}
fclose(inputFile) ;
InpMtx_changeStorageMode(mtxA, INPMTX_BY_VECTORS) ;
if ( msglvl > 2 ) {
   fprintf(msgFile, "\n\n input matrix") ;
   InpMtx_writeForHumanEye(mtxA, msgFile) ;
   fflush(msgFile) ;
}
\end{verbatim}
\par
The {\tt InpMtx} object is created via a call to {\tt InpMtx\_new()}
and initialized via a call to {\tt InpMtx\_init()}.
The arguments to {\tt InpMtx\_init()} 
are the pointer to the {\tt InpMtx}
object created by {\tt InpMtx\_new()} followed by four integers,
{\tt coordType},
{\tt inputMode}, {\tt maxnent}, and {\tt maxnvector}.
\begin{itemize}
\item
The second argument {\tt coordType} = {\tt INPMTX\_BY\_ROWS}
represent a general purpose mode that is well suited
for most users.\footnote{Note that {\bf SPOOLES} has some
pre-defined parameters such as {\tt INPMTX\_BY\_ROWS} for some objects. 
These parameters are always uppercase and either begin with the name of 
the object which they apply to, or the library name, e.g.,
{\tt SPOOLES\_REAL}.
They are described in the reference manual in the section for the
particular object.}
Some users may want to use other settings for {\tt coordType}
whose complete descriptions are found in the reference
manual.
\item
The third argument {\tt inputMode} controls whether 
the matrix is real or complex.  
One use of {\bf SPOOLES} not illustrated here is that
the {\tt InpMtx} object can have no values.  
This allows {\bf SPOOLES}
to be used to generate an ordering for use by another package.
\item
The fourth argument {\tt maxnent} is an estimate of the number of
nonzero entries in the matrix.
\item
The fifth argument {\tt maxnvector} is an estimate of the number of
number of vectors that will be used, e.g., number of rows or
numbers of columns.
\end{itemize}
The {\tt maxnent} and {\tt maxnvector} arguments only
have to be estimates as they are used in the initial sizing of the
object.  
Either can be 0.  
The {\tt InpMtx} object resizes itself
as required to handle the linear system.
\par
Every object in {\bf SPOOLES} has print methods to output the
contents of that object.  This is illustrated in this code segment
by printing the input matrix as contained in the {\tt InpMtx} object,
{\tt mtxA}.
To shorten this chapter we will from now on omit the part of
the code that prints debug output to {\tt msgFile} for the various
code segments.  
The complete sample program in Section~\ref{section:LU-serial-driver}
contains all of the debug print statements.
\par
After the matrix $A$ has been read in from the file and placed in an
{\tt InpMtx} object, the right hand matrix $Y$ is read in from a
file and placed in a {\tt DenseMtx} object.
The following code fragment does this operation.
\begin{verbatim}
inputFile = fopen(rhsFileName, "r") ;
fscanf(inputFile, "%d %d", &nrow, &nrhs) ;
mtxB = DenseMtx_new() ;
DenseMtx_init(mtxB, type, 0, 0, neqns, nrhs, 1, neqns) ;
DenseMtx_zero(mtxB) ;
if ( type == SPOOLES_REAL ) {
   double   value ;
   for ( irow = 0 ; irow < nrow ; irow++ ) {
      fscanf(inputFile, "%d", &jrow) ;
      for ( jrhs = 0 ; jrhs < nrhs ; jrhs++ ) {
         fscanf(inputFile, "%le", &value) ;
         DenseMtx_setRealEntry(mtxB, jrow, jrhs, value) ;
      }
   }
} else {
   double   imag, real ;
   for ( irow = 0 ; irow < nrow ; irow++ ) {
      fscanf(inputFile, "%d", &jrow) ;
      for ( jrhs = 0 ; jrhs < nrhs ; jrhs++ ) {
         fscanf(inputFile, "%le %le", &real, &imag) ;
         DenseMtx_setComplexEntry(mtxB, jrow, jrhs, real, imag) ;
      }
   }
}
fclose(inputFile) ;
\end{verbatim}
The dense matrix object is created by a call to {\tt DenseMtx\_new()}
and initialized via a call to {\tt DenseMtx\_init()}.
There are seven arguments to {\tt DenseMtx\_init()}, not counting
the initial pointer argument.
\begin{itemize}
\item
The second argument specifies the type of the matrix, real or
complex.
\item
The third and fourth arguments specify row and column ids of the
matrix.
This is useful when the dense matrix is a submatrix of a larger
block matrix, but this feature is not used in the present context.
\item
The fifth and sixth arguments are the number of rows and columns in
the matrix, here equal to {\tt neqns} and {\tt nrhs}.
\item
The seventh and eighth arguments are the row stride and column
stride for the matrix entries.
For our application we require a column major matrix, and so
the row stride is {\tt 1} and the column stride is the number of
rows, or {\tt neqns}.
\end{itemize}
The initialization step allocates storage for the matrix entries,
but it does not fill them with any values.
This is done explicitly via the {\tt DenseMtx\_zero()} method,
which places zeroes in all the entries.
This is necessary since the right hand side matrix $Y$ may be
sparse, and so the number of rows in the file may not equal the
number of equations.
\par
The right hand side entries are then in, row by row, and placed
into their locations via one of the two ``set entries'' methods.
Note, the nonzero rows can be read from the file in any order.
\par
\subsection{Reordering the linear system}
\label{subsection:serial:reordering}
\par
The first step is to find the permutation matrix $P$, and then
permute $AX = Y$ into $(PAP^T)(PX) = PY$.
The result of the {\bf SPOOLES} ordering step is not just $P$ 
or its permutation vector, it is a {\it front tree} that defines
not just the permutation, but the blocking of the factor matrices,
which in turn specifies the data structures and the computations
that are performed during the factor and solves.
To determine this {\tt ETree} {\it front tree} object takes three
step, as seen in the code fragment below.
\begin{verbatim}
adjIVL = InpMtx_fullAdjacency(mtxA) ;
nedges = IVL_tsize(adjIVL) ;
graph = Graph_new() ;
Graph_init2(graph, 0, neqns, 0, nedges, neqns, nedges, adjIVL,
            NULL, NULL) ;
frontETree = orderViaMMD(graph, seed, msglvl, msgFile) ;
\end{verbatim}
The ordering modules requires a graph of $A + A^T$.
(The {\bf SPOOLES} $LU$ factorization works with matrices of
symmetric structure.)
The {\tt Graph} object represents the graph of the matrix.
Its internal representation uses adjacency lists, one for each
vertex, which in turn are stored in an {\tt IVL} object.
The {\tt Graph} and {\tt InpMtx} objects are at a high level 
in the object hierarchy.
To promote independence of the objects, the two do not know about
each other, so we cannot create one from the other.
Instead, the {\tt InpMtx} object creates the lower level {\tt IVL}
object\footnote{{\tt IVL} stands for {\tt I}nteger {\tt V}ector
{\tt L}ist, i.e., a list of integer vectors.}, which is then used
in the initialization step for the {\tt Graph} object.
The {\tt Graph} object is quite general, and can be used to
describe a graph with unit or non-unit vertices and edges.
We refrain from describing all the input parameters to initialize
the {\tt Graph} object and instead refer the reader to the
reference manual.
\par
Once a {\tt Graph} object has been created, it is ordered via the
multiple minimum degree method, whose return value is a front tree
object.
The minimum degree method is the simplest of the ordering methods
provided in the {\bf SPOOLES} library.
For more information on ordering, please see the user document
{\it ``Ordering Sparse Matrices and Transforming Front Trees''}.
\par
\subsection{Non-numeric work}
\label{subsection:serial:non-numeric}
\par
The next phase is to obtain the permutation matrix $P$, (stored
implicitly in a permutation vector), and apply it to the front
tree, the matrix $A$ and the right hand side $Y$.
This is done by the following code fragment.
\begin{verbatim}
oldToNewIV = ETree_oldToNewVtxPerm(frontETree) ;
oldToNew = IV_entries(oldToNewIV) ;
newToOldIV = ETree_newToOldVtxPerm(frontETree) ;
newToOld   = IV_entries(newToOldIV) ;
ETree_permuteVertices(frontETree, oldToNewIV) ;
InpMtx_permute(mtxA, oldToNew, oldToNew) ;
if (  symmetryflag == SPOOLES_SYMMETRIC
   || symmetryflag == SPOOLES_HERMITIAN ) {
   InpMtx_mapToUpperTriangle(mtxA) ;
}
InpMtx_changeCoordType(mtxA, INPMTX_BY_CHEVRONS) ;
InpMtx_changeStorageMode(mtxA, INPMTX_BY_VECTORS) ;
DenseMtx_permuteRows(mtxB, oldToNewIV) ;
\end{verbatim}
The {\tt oldToNewIV} and {\tt newToOldIV} variables are {\tt IV}
objects that represent an integer vector.
The {\tt oldToNew} and {\tt newToOld} variables are pointers to
{\tt int}, which point to the base address of the {\tt int} vector
in an {\tt IV} object.
\par
Once we have the permutation vector, we apply it to the front tree,
by the {\tt ETree\_permuteVertices()} method, and then to the
matrix with the {\tt InpMtx\_permute()} method.
If the matrix $A$ is symmetric or Hermitian, we expect all nonzero
entries to be in the upper triangle.
Permuting the matrix yields $PAP^T$, which may not have all of its
entries in the upper triangle.
If $A$ is symmetric or Hermitian, the call to
{\tt InpMtx\_mapToUpperTriangle()} ensures that all entries of
$PAP^T$ are in its upper triangle.
Permuting the matrix destroys the internal vector structure, which
has to be restored.
But first we need to change the coordinate type of the {\tt InpMtx}
object, from rows into {\it chevrons}.\footnote{The $i$-th
chevron of $A$ consists of the diagonal entry $A_{i,i}$,
the $i$-th row of the upper triangle of $A$,
and the $i$-th column of the lower triangle of $A$.}
This is necessary in order to assemble entries of $PAP^T$ during
the numerical factorization.
At this point the {\tt InpMtx} object holds $PAP^T$ in the form
required by the factorization.
What remains is to transform $Y$ into $PY$, which is done via a
call to {\tt DenseMtx\_permuteRows()}.
\par
The final step is to compute the symbolic factorization,
which is stored in an {\tt IVL} object.
\begin{verbatim}
symbfacIVL = SymbFac_initFromInpMtx(frontETree, mtxA) ;
\end{verbatim}
\par
\subsection{The Matrix Factorization}
\label{subsection:serial:factor}
\par
The numeric factorization step begins by initializing the {\tt
FrontMtx} object with the {\tt frontETree} and {\tt symbacIVL} 
objects created in early steps.
The {\tt FrontMtx} object holds the actual factorization.
The code segment for the initialization is found below.
\begin{verbatim}
frontmtx = FrontMtx_new() ;
mtxmanager = SubMtxManager_new() ;
SubMtxManager_init(mtxmanager, NO_LOCK, 0) ;
FrontMtx_init(frontmtx, frontETree, symbfacIVL, type, symmetryflag,
              FRONTMTX_DENSE_FRONTS, pivotingflag, NO_LOCK, 0, NULL,
              mtxmanager, msglvl, msgFile) ;
\end{verbatim}
Here is a brief description of the initialization method and its
input parameters.
\begin{itemize}
\item
The fourth parameter is the matrix type, real or complex.
\item
The fifth parameter specifies whether the matrix is symmetric,
Hermitian or nonsymmetric.
\item
The sixth parameter defines whether the fronts in the factor matrix
are stored as dense or sparse matrices.
The latter is necessary for an approximate factorization.
\item
The seventh parameter says whether pivoting is enabled for
numerical stability.
\item
The eighth, nine and ten parameters are used during a multithreaded
or MPI factorization.
Their present values are for a serial factorization.
\item
The eleventh parameter, {\tt mtxmanager} is a {\tt SubMtxManager}
object, an object used to manage instances of submatrices that form
the factor matrices.
The {\tt FrontMtx} object does not concern itself with finding
storage for the factor matrices, instead it asks the {\tt
SubMtxManager} object for submatrices.
While this seems awkward, it allows the {\tt FrontMtx} to operate 
in serial, multithreaded and MPI environments with little internal
code differences, and it is the hook we have left in the library to
extend its capabilities to out-of-core factors and solves.
\item
The twelveth and thirteenth parameters define the message level and
message file for the factorization.
\end{itemize}
\par
The numeric factorization is performed by the 
{\tt FrontMtx\_factorInpMtx()} method.  
The code segment from the sample program for the numerical
factorization step is found below.
\begin{verbatim}
chvmanager = ChvManager_new() ;
ChvManager_init(chvmanager, NO_LOCK, 1) ;
DVfill(10, cpus, 0.0) ;
IVfill(20, stats, 0) ;
rootchv = FrontMtx_factorInpMtx(frontmtx, mtxA, tau, droptol,
             chvmanager, &error, cpus, stats, msglvl, msgFile) ;
ChvManager_free(chvmanager) ;
\end{verbatim}
Working storage used during the factorization is found in the form
of block {\it chevrons}, in a {\tt Chv} object, which hold the partial 
frontal matrix for a front.
Much as with the {\tt SubMtx} object, the {\tt FrontMtx} object does
not concern itself with managing working storage, instead it relies
on a {\tt ChvManager} object to manage the {\tt Chv} objects.
We now discuss the arguments to the factor method.
\begin{itemize}
\item
The third argument is used when pivoting for numerical stability is
enabled.
Each entry in $L$ and $U$ is bounded above in magnitude by {\tt tau}.
We recommend a value of 100 for this parameter.
\item
The fourth argument is a drop tolerance 
that is not relevant for this case.
When used with approximate factorizations, this argument is a lower
bound on the magnitude of the entries that are stored in the front
matrices.
\item
The sixth argument is an error flag.
After a successful factorization, {\tt error < 0} implies that the
factorization finished.
If {\tt error > 0}, then the factorization failed at front {\tt error}.
\item
The seventh and eighth arguments are vectors to be filled with 
statistics and breakdown of cpu times.
\end{itemize}
The return value of the factorization is a {\tt Chv} object, which
will be {\tt NULL} if the factorization succeeded.
We have left this as a hook for future extensions where only 
portions of the factor matrices are created.
\par
The factorization is performed using a one-dimensional
decomposition of the factor matrices.
Keeping the factor matrices in this form severely limits the amount
of parallelism for the forward and backsolves.
We perform a post-processing step to convert the one-dimensional
data structures to submatrices of a two-dimensional block
decomposition of the factor matrices.
The following code fragment performs this operation.
\begin{verbatim}
FrontMtx_postProcess(frontmtx, msglvl, msgFile) ;
\end{verbatim}
\par
\subsection{The Forward and Backsolves}
\label{subsection:serial:solve}
\par
The following code fragment solves the linear system 
$(PAP^T) (PX) = PY$,
and permutes the solution $PX$ back into the original ordering,
yielding $X$.
\begin{verbatim}
mtxX = DenseMtx_new() ;
DenseMtx_init(mtxX, type, 0, 0, neqns, nrhs, 1, neqns) ;
DenseMtx_zero(mtxX) ;
FrontMtx_solve(frontmtx, mtxX, mtxB, mtxmanager,
               cpus, msglvl, msgFile) ;
DenseMtx_permuteRows(mtxX, newToOldIV) ;
\end{verbatim}
First we initialize a new {\tt DenseMtx} object to hold $X$
(and also $PX$).
(Note, in all cases {\it but} a nonsymmetric matrix with pivoting 
enabled in an MPI environment, $X$ may overwrite $Y$, and so we 
can use the same {\tt DenseMtx} object for $X$ and $Y$.)
We then solve the linear system with a call to
{\tt FrontMtx\_solve()}.
Note that one of the arguments is the {\tt mtxmanager} object,
first created for the numerical factorization.
The solve requires working submatrices, and so we continue the
convention of having the {\tt FrontMtx} ask the manager object for
working storage.
The last step is to permute the rows of the {\tt DenseMtx}
from the new ordering into the old ordering.
\par
\subsection{Sample Matrix and Right Hand Side Files}
\label{subsection:serial:input-files}
\par
Immediately below are two sample files:
{\tt matrix.input} holds the matrix input
and {\tt rhs.input} holds the right hand side.
This example is for a symmetric Laplacian operator 
on a $3 \times 3$ grid.
Only entries in the upper triangle are stored.
The right hand side is the $9 \times 9$ identity matrix.
Note how the indices are zero-based as for C, instead of one-based
as for Fortran.
\begin{center}
\begin{tabular}{|l|}
\multicolumn{1}{c}{\tt matrix.input} \\ \hline
\begin{minipage}[t]{0.75 in}
\begin{verbatim}
9 9 21
0 0  4.0
1 1  4.0
2 2  4.0
3 3  4.0
4 4  4.0
5 5  4.0
6 6  4.0
7 7  4.0
8 8  4.0
0 1 -1.0
1 2 -1.0
3 4 -1.0
4 5 -1.0
6 7 -1.0
7 8 -1.0
0 3 -1.0
1 4 -1.0
2 5 -1.0
3 6 -1.0
4 7 -1.0
5 8 -1.0
\end{verbatim}
\end{minipage}
\\ \hline
\end{tabular}
\qquad
\begin{tabular}{|l|}
\multicolumn{1}{c}{\tt rhs.input} \\ \hline
\begin{minipage}[t]{2.75 in}
\begin{verbatim}
9 9
0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
\end{verbatim}
\end{minipage}
\\ \hline
\end{tabular}
\end{center}
\par