File: QR_serial.tex

package info (click to toggle)
spooles 2.2-9
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 19,012 kB
  • sloc: ansic: 146,834; csh: 3,615; makefile: 2,040; perl: 74
file content (249 lines) | stat: -rw-r--r-- 8,492 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
\vfill \eject
\par
\section{Serial Solution of $A X = Y$ using an $QR$ factorization}
\label{section:QR-serial}
\par
Let us review the steps is solving $A X = Y$ using a $QR$
factorization.
\begin{itemize}
\item 
{\bf communicate} the data for the problem as $A$, $X$ and $Y$.
\item 
{\bf reorder} as
${\widetilde A} {\widetilde X} = Y$, where
${\widetilde A} = A P^T$ and
${\widetilde X} = P X$.
and $P$ is a permutation matrix.
\item 
{\bf factor} $ {\widetilde A} = Q R$,
where $Q$ is orthogonal and $R$ is upper triangular.
\item 
{\bf solve}  $R^T R (P X) = A^T Y$ (if real)
or {\bf solve}  $R^H R (P X) = A^H Y$ (if complex).
\end{itemize}
\par
A complete listing of a sample program 
is found in Section~\ref{section:QR-serial-driver}.
We will now begin to
work our way through the program to illustrate the use 
of {\bf SPOOLES} to solve a system of linear equations.  
\par
\subsection{Reading the input parameters}
\label{subsection:QR:input-data}
\par
The input parameters are identical to those of the serial $LU$
driver program described in
Section~\ref{subsection:serial:input-data}
with the exception that the {\tt symmetryflag} is not present.
\par
\subsection{Communicating the data for the problem}
\label{subsection:QR:communicating-data}
\par
This step is identical to the serial code, as described in
Section~\ref{subsection:serial:communicating-data}
\par
\subsection{Reordering the linear system}
\label{subsection:QR:reordering}
For the $LU$ factorization of $A$, we used the graph of $A + A^T$.
For the $QR$ factorization of $A$, we need the graph of $A^TA$.
The only difference between the two orderings is how we create
the {\tt IVL} object for the graph.
For the $QR$ factorization, we use 
{\tt InpMtx\_adjForATA()}, as we see below.
\begin{verbatim}
adjIVL = InpMtx_adjForATA(mtxA) ;
nedges = IVL_tsize(adjIVL) ;
graph = Graph_new() ;
Graph_init2(graph, 0, neqns, 0, nedges, neqns, nedges, adjIVL,
            NULL, NULL) ;
frontETree = orderViaMMD(graph, seed, msglvl, msgFile) ;
\end{verbatim}
The minimum degree method is the simplest of the ordering methods
provided in the {\bf SPOOLES} library.
For more information on ordering, please see the user document
{\it ``Ordering Sparse Matrices and Transforming Front Trees''}.
\par
\subsection{Non-numeric work}
\label{subsection:QR:non-numeric}
\par
The next phase is to obtain the permutation matrix $P$, (stored
implicitly in a permutation vector), and apply it to the matrix $A$.
This is done by the following code fragment.
\begin{verbatim}
oldToNewIV = ETree_oldToNewVtxPerm(frontETree) ;
oldToNew   = IV_entries(oldToNewIV) ;
newToOldIV = ETree_newToOldVtxPerm(frontETree) ;
newToOld   = IV_entries(newToOldIV) ;
InpMtx_permute(mtxA, NULL, oldToNew)) ;
InpMtx_changeStorageMode(mtxA, INPMTX_BY_VECTORS) ;
\end{verbatim}
The {\tt oldToNewIV} and {\tt newToOldIV} variables are {\tt IV}
objects that represent an integer vector.
The {\tt oldToNew} and {\tt newToOld} variables are pointers to
{\tt int}, which point to the base address of the {\tt int} vector
in an {\tt IV} object.
Once we have the permutation vector, we apply it to the front tree,
by the {\tt ETree\_permuteVertices()} method. 
We need $A P^T$, so we permute the {\tt InpMtx}
object using a {\tt NULL} pointer for the row permutation (which
means do not permute the rows) and the {\tt oldToNew} vector for
the column permutation.
At this point the {\tt InpMtx} object holds $AP^T$ in the form
required by the factorization.
\par
The final steps are to compute the symbolic factorization,
which is stored in an {\tt IVL} object, and to permute the vertices
in the front tree.
The symbolic factorization differs slightly from the $LU$ case.
\begin{verbatim}
symbfacIVL = SymbFac_initFromGraph(frontETree, graph) ;
IVL_overwrite(symbfacIVL, oldToNewIV) ;
IVL_sortUp(symbfacIVL) ;
ETree_permuteVertices(frontETree, oldToNewIV) ;
\end{verbatim}
We do not have the $A^TA$ matrix object, so we constuct the
symbolic factorization using the front tree and the {\tt Graph} object.
Note, at this point in time, both the graph and front tree are in
terms of the original ordering, so after the {\tt IVL} object is
created, its vertices must be mapped into the new permutation and
sorted into ascending order.
Then the vertices in the front tree are mapped into the new ordering.
\par
\subsection{The Matrix Factorization}
\label{subsection:QR:factor}
\par
The numeric factorization step begins by initializing the {\tt
FrontMtx} object with the {\tt frontETree} and {\tt symbacIVL} 
objects created in early steps.
The {\tt FrontMtx} object holds the actual factorization.
The code segment for the initialization is found below.
\begin{verbatim}
frontmtx = FrontMtx_new() ;
mtxmanager = SubMtxManager_new() ;
SubMtxManager_init(mtxmanager, NO_LOCK, 0) ;
if ( type == SPOOLES_REAL ) {
   FrontMtx_init(frontmtx, frontETree, symbfacIVL, type, 
                 SPOOLES_SYMMETRIC, FRONTMTX_DENSE_FRONTS, 
                 SPOOLES_NO_PIVOTING, NO_LOCK, 0, NULL,
                 mtxmanager, msglvl, msgFile) ;
} else {
   FrontMtx_init(frontmtx, frontETree, symbfacIVL, type, 
                 SPOOLES_HERMITIAN, FRONTMTX_DENSE_FRONTS, 
                 SPOOLES_NO_PIVOTING, NO_LOCK, 0, NULL,
                 mtxmanager, msglvl, msgFile) ;
}
\end{verbatim}
This differs little from the initialization in
Section~\ref{subsection:serial:factor}, except that the matrix type
is symmetric or Hermitian, and no pivoting is used for stability.
\par
The numeric factorization is performed by the 
{\tt FrontMtx\_QR\_factor()} method.  
The code segment from the sample program for the numerical
factorization step is found below.
\begin{verbatim}
chvmanager = ChvManager_new() ;
ChvManager_init(chvmanager, NO_LOCK, 1) ;
DVzero(10, cpus) ;
facops = 0.0 ;
FrontMtx_QR_factor(frontmtx, mtxA, chvmanager, cpus, &facops, msglvl, msgFile) ;
ChvManager_free(chvmanager) ;
\end{verbatim}
Working storage used during the factorization is found in the form
of block {\it chevrons}, in a {\tt Chv} object, which hold the partial 
frontal matrix for a front.
Much as with the {\tt SubMtx} object, the {\tt FrontMtx} object does
not concern itself with managing working storage, instead it relies
on a {\tt ChvManager} object to manage the {\tt Chv} objects.
On return {\tt facops} contains the number of floating point
operations performed during the factorization.
\par
The factorization is performed using a one-dimensional
decomposition of the factor matrices.
Keeping the factor matrices in this form severely limits the amount
of parallelism for the forward and backsolves.
We perform a post-processing step to convert the one-dimensional
data structures to submatrices of a two-dimensional block
decomposition of the factor matrices.
The following code fragment performs this operation.
\begin{verbatim}
FrontMtx_postProcess(frontmtx, msglvl, msgFile) ;
\end{verbatim}
\par
\subsection{Solving the linear system}
\label{subsection:QR:solve}
\par
The following code fragment solves the linear system 
$R^T R {\widehat X} = {\widehat A}^T Y$ if real
or
$R^H R {\widehat X} = {\widehat A}^H Y$ if complex.
\begin{verbatim}
mtxX = DenseMtx_new() ;
DenseMtx_init(mtxX, type, 0, 0, neqns, nrhs, 1, neqns) ;
FrontMtx_QR_solve(frontmtx, mtxA, mtxX, mtxB, mtxmanager,
                  cpus, msglvl, msgFile) ;
\end{verbatim}
Last, we permute the rows of ${\tt widehat X}$ back into $X$.
\begin{verbatim}
DenseMtx_permuteRows(mtxX, newToOldIV) ;
\end{verbatim}
\par
\subsection{Sample Matrix and Right Hand Side Files}
\label{subsection:QR:input-files}
\par
Immediately below are two sample files:
{\tt qr.matrix.input} holds the matrix input
and {\tt qr.rhs.input} holds the right hand side.
This simple example is an $8 \times 6$ matrix $A$
and a single right hand side.
The solution is the vector of all ones.
Note how the indices are zero-based as for C, instead of one-based
as for Fortran.
\begin{center}
\begin{tabular}{|l|}
\multicolumn{1}{c}{\tt matrix.input} \\ \hline
\begin{minipage}[t]{0.5 in}
\begin{verbatim}
8 6 18
0 1 1.0
0 3 2.0
1 2 3.0
1 3 1.0
1 5 1.0
2 0 1.0
2 2 2.0
3 0 3.0
3 2 4.0
3 4 2.0
4 3 1.0
5 1 2.0
5 4 3.0
5 5 1.0
6 0 2.0
6 3 3.0
7 1 1.0
7 4 3.0
\end{verbatim}
\end{minipage}
\\ \hline
\end{tabular}
\qquad
\begin{tabular}{|l|}
\multicolumn{1}{c}{\tt rhs.input} \\ \hline
\begin{minipage}[t]{0.5 in}
\begin{verbatim}
8 1
0 3.0
1 5.0
2 3.0
3 9.0
4 1.0
5 6.0
6 5.0
7 4.0
\end{verbatim}
\end{minipage}
\\ \hline
\end{tabular}
\end{center}
\par