File: proto.tex

package info (click to toggle)
spooles 2.2-11
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 19,656 kB
  • ctags: 3,690
  • sloc: ansic: 146,836; sh: 7,571; csh: 3,615; makefile: 1,968; perl: 74
file content (377 lines) | stat: -rw-r--r-- 15,655 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
\par
\section{Prototypes and descriptions of {\tt MT} methods}
\label{section:MT:proto}
\par
This section contains brief descriptions including prototypes
of all methods found in the {\tt MT} source directory.
\par
\subsection{Matrix-matrix multiply methods}
\label{subsection:MT:proto:mvm}
\par
There are five methods to multiply a vector times a dense matrix.
The first three methods, called {\tt InpMtx\_MT\_nonsym\_mmm*()}, 
are straightforward,
$y := y + \alpha A x$, where $A$ is nonsymmetric, and $\alpha$ is
real (if $A$ is real) and complex (if $A$ is complex).
The fourth method, {\tt InpMtx\_MT\_sym\_mmm()}, 
is used when the matrix is real symmetric or complex symmetric, 
though it is not necessary that only the lower or upper
triangular entries are stored.
(If one fills the {\tt InpMtx} object with only the entries in
the lower triangle of $A$, and then permute the matrix $PAP^T$,
the entries will not generally be found in only the lower or upper
triangle. However, the code is still correct.)
The last method, 
{\tt InpMtx\_MT\_herm\_mmm()}, is used when the matrix is
complex hermitian.
\par
%=======================================================================
\begin{enumerate}
%-----------------------------------------------------------------------
\item
\begin{verbatim}
void InpMtx_MT_nonsym_mmm ( InpMtx *A, DenseMtx *Y, double alpha[], DenseMtx *X,
                            int nthread, int msglvl, int msgFile ) ;
void InpMtx_MT_sym_mmm ( InpMtx *A, DenseMtx *Y, double alpha[], DenseMtx *X,
                            int nthread, int msglvl, int msgFile ) ;
void InpMtx_MT_herm_mmm ( InpMtx *A, DenseMtx *Y, double alpha[], DenseMtx *X,
                            int nthread, int msglvl, int msgFile ) ;
\end{verbatim}
\index{InpMtx_MT_nonsym_mmm@{\tt InpMtx\_MT\_nonsym\_mmm()}}
\index{InpMtx_MT_sym_mmm@{\tt InpMtx\_MT\_sym\_mmm()}}
\index{InpMtx_MT_herm_mmm@{\tt InpMtx\_MT\_herm\_mmm()}}
These methods compute the matrix-vector product $y := y + \alpha A x$,
where $y$ is found in the {\tt Y DenseMtx} object,
$\alpha$ is real or complex in {\tt alpha[]},
$A$ is found in the {\tt A Inpmtx} object, and
$x$ is found in the {\tt X DenseMtx} object.
If any of the input objects are {\tt NULL}, an error message is
printed and the program exits.
{\tt A}, {\tt X} and {\tt Y} must all be real or all be complex.
When {\tt A} is real, then $\alpha$ = {\tt alpha[0]}.
When {\tt A} is complex, then $\alpha$ = 
{\tt alpha[0]} + i* {\tt alpha[1]}.
This means that one cannot call the methods with a constant as the
third parameter, e.g.,
{\tt InpMtx\_MT\_nonsym\_mmm(A, Y, 3.22, X, nthread, msglvl, msgFile)},
for this may result in a segmentation violation.
The values of $\alpha$ must be loaded into an array of length 1 or 2.
The number of threads is specified by the {\tt nthread} parameter;
if, {\tt nthread} is {\tt 1}, the serial method is called.
The {\tt msglvl} and {\tt msgFile} parameters are used for
diagnostics during the creation of the threads' individual data
structures.
\par \noindent {\it Error checking:}
If {\tt A}, {\tt Y} or {\tt X} are {\tt NULL},
or if {\tt coordType} is not {\tt INPMTX\_BY\_ROWS},
{\tt INPMTX\_BY\_COLUMNS} or {\tt INPMTX\_BY\_CHEVRONS},
or if {\tt storageMode} is not one of {\tt INPMTX\_RAW\_DATA},
{\tt INPMTX\_SORTED} or {\tt INPMTX\_BY\_VECTORS},
or if {\tt inputMode} is not {\tt SPOOLES\_REAL} or
{\tt SPOOLES\_COMPLEX},
an error message is printed and the program exits.
%-----------------------------------------------------------------------
\item
\begin{verbatim}
void InpMtx_MT_nonsym_mmm_T ( InpMtx *A, DenseMtx *Y, double alpha[], DenseMtx *X,
                            int nthread, int msglvl, int msgFile ) ;
\end{verbatim}
\index{InpMtx_MT_nonsym_mmm_T@{\tt InpMtx\_MT\_nonsym\_mm\_Tm()}}
This method computes the matrix-vector product $y := y + \alpha A^T x$,
where $y$ is found in the {\tt Y DenseMtx} object,
$\alpha$ is real or complex in {\tt alpha[]},
$A$ is found in the {\tt A Inpmtx} object, and
$x$ is found in the {\tt X DenseMtx} object.
If any of the input objects are {\tt NULL}, an error message is
printed and the program exits.
{\tt A}, {\tt X} and {\tt Y} must all be real or all be complex.
When {\tt A} is real, then $\alpha$ = {\tt alpha[0]}.
When {\tt A} is complex, then $\alpha$ = 
{\tt alpha[0]} + i* {\tt alpha[1]}.
This means that one cannot call the methods with a constant as the
third parameter, e.g.,
{\tt InpMtx\_MT\_nonsym\_mmm(A, Y, 3.22, X, nthread, msglvl, msgFile)},
for this may result in a segmentation violation.
The values of $\alpha$ must be loaded into an array of length 1 or 2.
The number of threads is specified by the {\tt nthread} parameter;
if, {\tt nthread} is {\tt 1}, the serial method is called.
The {\tt msglvl} and {\tt msgFile} parameters are used for
diagnostics during the creation of the threads' individual data
structures.
\par \noindent {\it Error checking:}
If {\tt A}, {\tt Y} or {\tt X} are {\tt NULL},
or if {\tt coordType} is not {\tt INPMTX\_BY\_ROWS},
{\tt INPMTX\_BY\_COLUMNS} or {\tt INPMTX\_BY\_CHEVRONS},
or if {\tt storageMode} is not one of {\tt INPMTX\_RAW\_DATA},
{\tt INPMTX\_SORTED} or {\tt INPMTX\_BY\_VECTORS},
or if {\tt inputMode} is not {\tt SPOOLES\_REAL} or
{\tt SPOOLES\_COMPLEX},
an error message is printed and the program exits.
%-----------------------------------------------------------------------
\item
\begin{verbatim}
void InpMtx_MT_nonsym_mmm_H ( InpMtx *A, DenseMtx *Y, double alpha[], DenseMtx *X,
                            int nthread, int msglvl, int msgFile ) ;
\end{verbatim}
\index{InpMtx_MT_nonsym_mmm_H@{\tt InpMtx\_MT\_nonsym\_mmm\_H()}}
This method computes the matrix-vector product $y := y + \alpha A^H x$,
where $y$ is found in the {\tt Y DenseMtx} object,
$\alpha$ is complex in {\tt alpha[]},
$A$ is found in the {\tt A Inpmtx} object, and
$x$ is found in the {\tt X DenseMtx} object.
If any of the input objects are {\tt NULL}, an error message is
printed and the program exits.
{\tt A}, {\tt X} and {\tt Y} must all be complex.
The number of threads is specified by the {\tt nthread} parameter;
if, {\tt nthread} is {\tt 1}, the serial method is called.
The {\tt msglvl} and {\tt msgFile} parameters are used for
diagnostics during the creation of the threads' individual data
structures.
\par \noindent {\it Error checking:}
If {\tt A}, {\tt Y} or {\tt X} are {\tt NULL},
or if {\tt coordType} is not {\tt INPMTX\_BY\_ROWS},
{\tt INPMTX\_BY\_COLUMNS} or {\tt INPMTX\_BY\_CHEVRONS},
or if {\tt storageMode} is not one of {\tt INPMTX\_RAW\_DATA},
{\tt INPMTX\_SORTED} or {\tt INPMTX\_BY\_VECTORS},
or if {\tt inputMode} is not {\tt SPOOLES\_COMPLEX},
an error message is printed and the program exits.
%-----------------------------------------------------------------------
\end{enumerate}
\par
\subsection{Multithreaded Factorization methods}
\label{subsection:FrontMtx:proto:factorMT}
\par
%=======================================================================
\begin{enumerate}
%-----------------------------------------------------------------------
\item
\begin{verbatim}
Chv * FrontMtx_MT_factorInpMtx ( FrontMtx *frontmtx, InpMtx *inpmtx, 
             double tau, double droptol, ChvManager *chvmanager,
             IV *ownersIV, int lookahead, double cpus[], int stats[],  
             int msglvl, FILE *msgFile ) ;
Chv * FrontMtx_MT_factorPencil ( FrontMtx *frontmtx, Pencil *pencil, 
             double tau, double droptol, ChvManager *chvmanager,
             IV *ownersIV, int lookahead, double cpus[], int stats[],  
             int msglvl, FILE *msgFile ) ;
\end{verbatim}
\index{FrontMtx_MT_factorInpMtx@{\tt FrontMtx\_MT\_factorInpMtx()}}
\index{FrontMtx_MT_factorPencil@{\tt FrontMtx\_MT\_factorPencil()}}
These two methods compute a multithreaded factorization for a matrix
$A$ (stored in {\tt inpmtx}) or a matrix pencil
$A + \sigma B$ (stored in {\tt pencil}).
The {\tt tau} parameter is used when pivoting is enabled, each
entry in $U$ and $L$ (when nonsymmetric) will have magnitude less
than or equal to {\tt tau}.
The {\tt droptol} parameter is used when the fronts are stored in
a sparse format, each entry in $U$ and $L$ (when nonsymmetric) 
will have magnitude greater than or equal to {\tt droptol}.
The map from fronts to owning processes is found in {\tt ownersIV}.
The {\tt lookahead} parameter governs the
``upward--looking'' nature of the computations.
Choosing {\tt lookahead = 0} is usually the most conservative with
respect to working storage, while positive values increase the
working storage and sometimes decrease the factorization time.
On return, the {\tt cpus[]} vector is filled with the following
information.
\begin{itemize}
\item
{\tt cpus[0]} --- time spent managing working storage.
\item
{\tt cpus[1]} --- time spent initializing the fronts
                  and loading the original entries.
\item
{\tt cpus[2]} --- time spent accumulating updates from descendents.
\item
{\tt cpus[3]} --- time spent inserting aggregate fronts.
\item
{\tt cpus[4]} --- time spent removing and assembling aggregate fronts.
\item
{\tt cpus[5]} --- time spent assembling postponed data.
\item
{\tt cpus[6]} --- time spent to factor the fronts.
\item
{\tt cpus[7]} --- time spent to extract postponed data.
\item
{\tt cpus[8]} --- time spent to store the factor entries.
\item
{\tt cpus[9]} --- miscellaneous time.
\end{itemize}
On return, the {\tt stats[]} vector is filled with the following
information.
\begin{itemize}
\item
{\tt stats[0]} --- number of pivots.
\item
{\tt stats[1]} --- number of pivot tests.
\item
{\tt stats[2]} --- number of delayed rows and columns.
\item
{\tt stats[3]} --- number of entries in $D$.
\item
{\tt stats[4]} --- number of entries in $L$.
\item
{\tt stats[5]} --- number of entries in $U$.
\item
{\tt stats[6]} --- number of locks of the {\tt FrontMtx} object.
\item
{\tt stats[7]} --- number of locks of aggregate list.
\item
{\tt stats[8]} --- number of locks of postponed list.
\end{itemize}
\par \noindent {\it Error checking:}
If {\tt frontmtx}, {\tt inpmtxA}, {\tt cpus} or {\tt stats}
is {\tt NULL},
or if {\tt msglvl > 0} and {\tt msgFile} is {\tt NULL},
an error message is printed and the program exits.
%-----------------------------------------------------------------------
\end{enumerate}
\par
\subsection{Multithreaded $QR$ Factorization method}
\label{subsection:FrontMtx:proto:factorQR_MT}
\par
%=======================================================================
\begin{enumerate}
%-----------------------------------------------------------------------
\item
\begin{verbatim}
void FrontMtx_MT_QR_factor ( FrontMtx *frontmtx, InpMtx *mtxA,
                             ChvManager *chvmanager, IV *ownersIV, double cpus[],
                             double *pfacops,  int msglvl, FILE *msgFile ) ;
\end{verbatim}
\index{FrontMtx_MT_QR_factor@{\tt FrontMtx\_MT\_QR\_factor()}}
This method computes the
$(U^T+I)D(I+U)$ factorization of $A^TA$ if $A$ is real
or
$(U^H+I)D(I+U)$ factorization of $A^HA$ if $A$ is complex.
The {\tt chvmanager} object manages the working storage.
The map from fronts to threads is found in {\tt ownersIV}.
On return, the {\tt cpus[]} vector is filled as follows.
\begin{itemize}
\item
{\tt cpus[0]} -- time to set up the factorization.
\item
{\tt cpus[1]} -- time to set up the fronts.
\item
{\tt cpus[2]} -- time to factor the matrices.
\item
{\tt cpus[3]} -- time to scale and store the factor entries.
\item
{\tt cpus[4]} -- time to store the update entries
\item
{\tt cpus[5]} -- miscellaneous time
\item
{\tt cpus[6]} -- total time
\end{itemize}
On return, {\tt *pfacops} contains the number of floating point
operations done by the factorization.
\par \noindent {\it Error checking:}
If {\tt frontmtx}, {\tt frontJ} or {\tt chvmanager} is {\tt NULL},
or if {\tt msglvl > 0} and {\tt msgFile} is {\tt NULL},
an error message is printed and the program exits.
%-----------------------------------------------------------------------
\end{enumerate}
\par
\subsection{Multithreaded Solve method}
\label{subsection:FrontMtx:proto:solve-multithreaded}
\par
\begin{enumerate}
%=======================================================================
\item
\begin{verbatim}
void FrontMtx_MT_solve ( FrontMtx *frontmtx, DenseMtx *mtxX, DenseMtx *mtxB,
                         SubMtxManager *mtxmanager, SolveMap *solvemap,
                         double cpus[], int msglvl, FILE *msgFile ) ;
\end{verbatim}
\index{FrontMtx_MT_solve@{\tt FrontMtx\_MT\_solve()}}
This method is used to solve one of three linear systems of equations
using a multithreaded solve
---
$(U^T + I)D(I + U) X = B$,
$(U^H + I)D(I + U) X = B$ or
$(L + I)D(I + U) X = B$.
Entries of $B$ are {\it read} from {\tt mtxB} and
entries of $X$ are written to {\tt mtxX}.
Therefore, {\tt mtxX} and {\tt mtxB} can be the same object.
(Note, this does not hold true for an MPI factorization with pivoting.)
The submatrix manager object manages the working storage.
The {\tt solvemap} object contains the map from submatrices to
threads.
The map from fronts to processes that own them is given in the {\tt
ownersIV} object.
On return the {\tt cpus[]} vector is filled with the following.
The {\tt stats[]} vector is not currently used.
\begin{itemize}
\item
{\tt cpus[0]} --- set up the solves
\item
{\tt cpus[1]} --- fetch right hand side and store solution
\item
{\tt cpus[2]} --- forward solve
\item
{\tt cpus[3]} --- diagonal solve
\item
{\tt cpus[4]} --- backward solve
\item
{\tt cpus[5]} --- total time in the method.
\end{itemize}
\par \noindent {\it Error checking:}
If {\tt frontmtx}, {\tt rhsmtx}, {\tt mtxmanager},
{\tt solvemap}, {\tt cpus} or {\tt stats} is {\tt NULL},
or if {\tt msglvl} > 0 and {\tt msgFile} is {\tt NULL},
an error message is printed and the program exits.
%-----------------------------------------------------------------------
\end{enumerate}
\par
\subsection{Multithreaded $QR$ Solve method}
\label{subsection:FrontMtx:proto:QRsolve-MT}
\par
\begin{enumerate}
%=======================================================================
\item
\begin{verbatim}
void FrontMtx_MT_QR_solve ( FrontMtx *frontmtx, InpMtx *mtxA, DenseMtx *mtxX,
              DenseMtx *mtxB, SubMtxManager *mtxmanager, SolveMap *solvemap,
              double cpus[], int msglvl, FILE *msgFile ) ;
\end{verbatim}
\index{FrontMtx_MT_QR_solve@{\tt FrontMtx\_MT\_QR\_solve()}}
This method is used to minimize $\|B - AX\|_F$, where
$A$ is stored in {\tt mtxA},
$B$ is stored in {\tt mtxB},
and $X$ will be stored in {\tt mtxX}.
The {\tt frontmtx} object contains a
$(U^T+I)D(I+U)$ factorization of $A^TA$ if $A$ is real
or
$(U^H+I)D(I+U)$ factorization of $A^HA$ if $A$ is complex.
We solve the seminormal equations
$(U^T+I)D(I+U)X = A^TB$ or $(U^H+I)D(I+U)X = A^HB$
for $X$.
On return the {\tt cpus[]} vector is filled with the following.
\begin{itemize}
\item
{\tt cpus[0]} --- set up the solves
\item
{\tt cpus[1]} --- fetch right hand side and store solution
\item
{\tt cpus[2]} --- forward solve
\item
{\tt cpus[3]} --- diagonal solve
\item
{\tt cpus[4]} --- backward solve
\item
{\tt cpus[5]} --- total time in the solve method.
\item
{\tt cpus[6]} --- time to compute $A^TB$ or $A^HB$.
\item
{\tt cpus[7]} --- total time.
\end{itemize}
Only the solve is presently done in parallel.
\par \noindent {\it Error checking:}
If {\tt frontmtx}, {\tt mtxA}, {\tt mtxX}, {\tt mtxB}, {\tt mtxmanager},
{\tt solvemap} or {\tt cpus} is {\tt NULL},
or if {\tt msglvl} > 0 and {\tt msgFile} is {\tt NULL},
an error message is printed and the program exits.
%=======================================================================
\end{enumerate}