File: partIntro.tex

package info (click to toggle)
spooles 2.2-16
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 19,760 kB
  • sloc: ansic: 146,836; sh: 7,571; csh: 3,615; makefile: 1,970; perl: 74
file content (518 lines) | stat: -rw-r--r-- 22,081 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
\part{Introduction}
\chapter{Introduction}
\par
The {\bf SPOOLES} package is used to solve two types of 
real or complex linear systems:
\begin{itemize}
\item
$AX = Y$ or $(A + \sigma B)X = Y$ where $A$ and B are square.
$A$ and $B$ can be real or complex,
symmetric, Hermitian or nonsymmetric.
The factorization can proceed with or without pivoting for
numerical stability.
The factor matrices can be stored with or without dropping small
entries.
\item
Minimize $\|AX_{*,j} - Y_{*,j}\|_2$ for each column of the solution
matrix $X$ and right hand side matrix $Y$.
This is done by computing a $QR$ factorization of $A$ and then
solving $R^T R X = A^T Y$ or $R^H R X = A^H Y$.
\end{itemize}
In both cases, the linear systems can be permuted to reduce the
fill in the factor matrices.
\par
The {\bf SPOOLES} software is written in an object oriented fashion
in the C language.
Parts of the software run in serial mode, 
multithreading using Solaris or POSIX threads, and with MPI.
\par
The software objects are naturally partitioned into three families of
objects.
\par
\begin{center}
\begin{tabular}{|l|l|} 
\multicolumn{2}{c}{\bf Utility objects} \\ \hline
{\tt A2} & dense two dimensional array \\
{\tt Coords} & object to hold coordinates in any number of dimensions \\
{\tt DV} & double precision vector \\
{\tt Drand} & random number generator \\
{\tt I2Ohash} & hash table for the factor submatrices \\
{\tt IIheap} & simple heap object \\
{\tt IV} & int vector \\
{\tt IVL} & int list object \\
{\tt Ideq} & simple dequeue object \\
{\tt Lock} & abstract mutual exclusion lock \\
{\tt Perm} & permutation vector object \\
{\tt Utilities} & various vector and linked list utility methods \\
{\tt ZV} & double precision complex vector \\
\hline
\end{tabular}
\end{center}
\par
\begin{center}
\begin{tabular}{|l|l|} 
\multicolumn{2}{c}{\bf Ordering objects} \\ \hline
{\tt BKL} & Block Kernihan-Lin algorithm object \\
{\tt BPG} & bipartite graph object \\
{\tt DSTree} & domain/separator tree object \\
{\tt EGraph} & element graph object \\
{\tt ETree} & front tree object \\
{\tt GPart} & graph partitioning algorithm object \\
{\tt Graph} & graph object \\
{\tt MSMD} & multi-stage minimum degree algorithm object \\
{\tt Network} & network object for solving max flow problems \\
{\tt SolveMap} & map of submatrices to processes for solves \\
{\tt Tree} & tree object \\
\hline
\end{tabular}
\end{center}
\par
\begin{center}
\begin{tabular}{|l|l|} 
\multicolumn{2}{c}{\bf Numeric objects} \\ \hline
{\tt Chv} & block chevron object for fronts \\
{\tt ChvList} & object to hold lists of {\tt Chv} objects \\
{\tt ChvManager} & object to manager instances of {\tt Chv} objects \\
{\tt DenseMtx} & dense matrix object \\
{\tt FrontMtx} & front matrix object \\
{\tt ILUMtx} & simple preconditioner matrix object \\
{\tt InpMtx} & sparse matrix object \\
{\tt Iter} & Krylov methods for iterative solves \\
{\tt PatchAndGoInfo} & 
    modified factors in the presence of zero or small pivots \\
{\tt Pencil} & object to contain $A + \sigma B$ \\
{\tt SemiImplMtx} & semi-implicit factorization matrix object \\
{\tt SubMtx} & object for dense or sparse submatrices \\
{\tt SubMtxList} & object to hold lists of {\tt SubMtx} objects \\
{\tt SubMtxManager} & 
       object to manager instances of {\tt SubMtx} objects \\
{\tt SymbFac} & algorithm object to compute a symbolic factorization \\
\hline
\end{tabular}
\end{center}
The {\tt MT} directory contains all the multithreaded methods 
and drivers programs.
The {\tt MPI} directory contains all the {\tt MPI} methods and drivers.
The {\tt misc} directory contains miscellaneous methods and drivers.
\par
Each of the following objects that hold numeric entries ---
{\tt A2}, {\tt Chv}, {\tt DenseMtx}, {\tt FrontMtx}, 
{\tt ILUMtx},
{\tt InpMtx},
{\tt Pencil},
{\tt SemiImplMtx} and 
{\tt SubMtx} --- can hold real or complex entries.
An object knows its {\tt type},
{\tt 1} for real (define'd constant {\tt SPOOLES\_REAL})
or
{\tt 2} for complex (define'd constant {\tt SPOOLES\_COMPLEX}).
Since C does not yet have a standard structure for complex numbers,
we have followed the FORTRAN convention of storing the real and
imaginary parts of a complex number in consecutive memory locations.
Internally, we unroll the complex arithmetic into real arithmetic.
The user need not be burdened by this process if (s)he uses the
input/output methods for the different object.
For example, 
{\tt DenseMtx\_setRealEntry()} sets an entry of a real dense matrix, 
while {\tt DenseMtx\_setComplexEntry()} sets an entry of a complex
dense matrix. 
\par
All the heavily used computational tasks have been expanded where
possible into BLAS2 or BLAS3 kernels, for both the real and complex
cases.
There are a multitude of driver programs that test the
functionality of the objects.
A common output of a driver program is a file that can be input
into Matlab to check the errors of the computations.
This convention inspires confidence in the correctness of the
kernel computations.
\par
\section{Software Design}
\label{chapter:softwareDesign}
\par
The {\bf SPOOLES} library is written in the C language and uses
object oriented design.
There are some routines that manipulate native C data types such as
vectors, but the vast bulk of the code is centered around objects,
data objects and algorithm objects.
By necessity, the implementation of an object is through the C {\tt
struct} data type.
We use the following naming convention --- a method (i.e., function)
associated with an object of type {\tt Object} has the form

\centerline{{\it (return value type)}
{\tt Object\_}{\it methodName}{\tt (Object * obj}, $\ldots${\tt )};}

The method's name begins with the name of the object it is
associated with and the first parameter in the calling sequence is
a pointer to the instance of the object. 
Virtually the only exception to this rule is the {\it constructor} 
method.

\centerline{\tt Object * Object\_new(void) ;}

\noindent 
Two objects, the {\tt Chv} and {\tt DenseMtx} objects, have
methods that return the number of bytes needed to hold their data,
e.g.,

\centerline{\tt 
int Chv\_nbytesNeeded(int nD, int nL, int nU, int type, int symflag) ;}
\par
Scan the directory structure of the source code and you will notice
a number of subdirectories --- each deals with an object.
For example, 
the {\tt Graph} directory holds code and documentation for an
object that represents a graph:
its {\tt doc} subdirectory holds \LaTeX files with documentation;
its {\tt src} subdirectory holds C files that contain methods
associated with the object ;
% (each method is a C function of the form {\tt Graph\_*}); 
and its {\tt driver} subdirectory holds driver programs to test or
validate some behavior of the object.
\par
The directory structure is fairly flat --- no object directory
contains another --- because the C language does not support
inheritance.
This can be inelegant at times.
For example, a bipartite graph (a {\tt BPG} object) 
{\it is--a} graph (a {\tt Graph} object), but instead of {\tt BPG}
inheriting from {\tt Graph} data fields and methods from {\tt Graph},
we must use the {\it has--a} relation.
A {\tt BPG} object contains a pointer to a {\tt Graph} object
that represents the adjacency structure.
The situation is even more cumbersome for the objects that deal
with trees of one form or another: an elimination tree {\tt ETree}
and a domain/separator tree {\tt DSTree} each contain a pointer to
a generic tree object {\tt Tree} in their structure.
\par
Predecessors to this library were written in C++ and 
Objective-C.\footnote{The knowledgeable reader is encouraged to
peruse the source to discover the prejudices both pro and con
towards these two languages.}
The port to the present C library was painless, almost mechanical.
We expect the port back to C++ and/or Objective-C to be simple.
\par
Objects are one of two types: 
{\it data objects} whose primary function is to store data
and
{\it algorithm objects} whose function is to manipulate some data
object(s) and return new information.
Naturally this distinction can be fuzzy --- algorithm objects have
their own data that may be persistent and data objects can execute
some simple functionality --- but it holds in general.
To be more explicit, data objects have the following properties:
\begin{itemize}
\item
There is a delicate balance between encapsulation and openness.
The C language does not support any private or protected data
fields, so the C {\tt struct} that holds the data for an object
is completely open.
As an example, the {\tt Graph} object has a function to return the
size of and pointers to a vector that contains an adjacency list,
namely
\begin{verbatim}
     void Graph_adjAndSize(Graph *g, int v, int *psize, int **padj)
\end{verbatim}
where the pointers {\tt psize} and {\tt padj} are filled with the
size of the adjacency structure and a pointer to its vector.
One can get this same information by chasing pointers as follows.
\begin{verbatim}
     vsize = g->adjIVL->sizes[v] ;
     vadj  = g->adjIVL->p_ind[v] ;
\end{verbatim}
One can do the latter but we encourage the former.
As an experiment we replaced every instance of 
{\tt Graph\_adjAndSize()} with the appropriate pointer chasing
(and a similar operation for the {\tt IVL} object) and achieved
around a ten per cent reduction in the ordering time.
For a production code, this savings might drive the change in code,
but for our research code we kept the function call.
\item
Persistent storage needs to be supported.
% Currently this means file storage, but in the future we expect to
% need to ``bundle'' a data object into a block of storage to be
% passed or communicated to a different processor.
Each data object has eight different methods to deal with file I/O.
Two methods deal with reading from and writing to a file whose
suffix is associated with the object name, e.g., {\tt *graph\{f,b\}}
for a formatted or binary file to hold a {\tt Graph} object.
Four methods deal with reading and writing objects from and to a
file that is already opened and positioned, necessary for composite
objects (e.g., a {\tt Graph} object contains an {\tt IVL} object).
Two methods deal with writing the objects to a formatted file to be
examined by the user.
We strongly encourage any new data object added to the library to 
supply this functionality.
\item
Some data objects need to have compact storage requirements.
Two examples are our {\tt Chv} and {\tt SubMtx} objects.
Both objects need to be communicated between processes
in the MPI implementation, the former during the factorization,
the latter during the solve.
Each has a workspace buffer that contains all the information
needed to {\it regenerate} the object upon reception by another
process.
\item
By and large, data objects have simple methods.
A {\tt Graph} object does {\bf not} have methods to find a good
bisector; this is a sufficiently sophisticated function that it
should be implemented by an algorithm object.
The major exception to this rule is that our {\tt FrontMtx} object
{\it contains} the factorization data but also {\it performs} the
factorization, forward and backsolves.
In the future we intend to separate these two functionalities.
For example, one can implement an alternative forward and backsolve
by using methods to {\it access} the factor data stored in the {\tt
FrontMtx} object.
As a second example, massive changes to the storage format,
e.g., in an out-of-core implementation, can be encapsulated in the
access methods for the data, and any changes to the factorization
or solve functions could be minimal.
\end{itemize}
Algorithm objects have these properties.
\begin{itemize}
\item
Algorithm objects use data objects.
Some data objects are created within an algorithm objects method; 
these are owned by the algorithm object and free'd by that object.
Data objects that are passed to algorithm objects can be queried
or {\it temporarily} changed.
\item
They do not destroy or free data objects that are passed to them.
Any side effects on the data objects should be innocent, e.g.,
when a {\tt Graph} object is passed to the graph partitioning
object ({\tt GPart}) or the multistage minimum degree object
({\tt MSMD}), on return the adjacency lists may not be in the input
order, but they contain the values they had on input.
\item
Algorithm objects should support diagnostic, logfile and debug output.
This convention is not entirely thought out or supported at present. 
The rationale is that an algorithm object should be able to respond
to its environment to a greater degree than a data object.
\end{itemize}
\par
Data and algorithm objects share two common properties.
\begin{itemize}
\item
Each object has four basic methods: to allocate storage for an
object, set the default fields of an object, 
clear the data fields of an object,
and free the storage occupied and owned by an object.
\item
Ownership of data is very rigidly defined.
In most cases,
an object owns all data that is allocated inside one of its
methods, 
and when this does not hold it is very plainly documented.
For example, the bipartite graph object {\tt BPG} has a data field
that points to a {\tt Graph} object.
One of its initialization methods has a {\tt Graph} pointer in its
calling sequence.
The {\tt BPG} object then owns the {\tt Graph} object and when it
is free'd or has its data cleared, the {\tt Graph} object is free'd
by a call to its appropriate method.
\end{itemize}
\par
By and large these conventions hold throughout the library.
There are fuzzy areas and objects still ``under construction''.
Here are two examples.
\begin{itemize}
\item
We have an {\tt IIheap} object that maintains integer 
$\langle$ key, value $\rangle$ pairs in a priority heap.
Normally we think of a heap as a data structure, but another 
perspective is that of a continuously running algorithm that
supports insert, delete and identification of a minimum pair.
\item
Our {\tt BPG} bipartite graph object is a data object, 
but it has a method to find the Dulmage-Mendelsohn decomposition, 
a fairly involved algorithm used to refine a separator of a graph.
At present, we are not willing to create a new algorithm object
just to find the Dulmage-Mendelsohn decomposition, so we leave
this method to the domain of the data object.
The desired functionality, identifying minimal
weight separators for a region of a graph, can be modeled using
max flow techniques from network optimization.
We also provide a {\tt BPG} method that finds this
Dulmage-Mendelsohn decomposition by solving a max flow problem on
a bipartite network.
Both these methods have been superceded by the {\tt Network} object
that contains a method to find a max flow and one or more min-cuts 
of a network (not necessarily bipartite).
\end{itemize}
\par
The {\bf SPOOLES} software library is continuously evolving in an
almost organic fashion.
Growth and change are to be expected, and welcomed, but some
discipline is required to keep the complexity, both code and human
understanding, under control.
The guidelines we have just presented have two purposes:
to let the user and researcher get a better understanding of
the design of the library, and to point out some conventions 
to be used in extending the library.
\par
\section{Changes from Release 1.0}
\label{section:intro:changes-1.0}
\par
There are two major changes from the first release of the {\bf
SPOOLES} package: we now support complex linear systems, and the
storage format of the sparse factor matrices has changed from a
one-dimensional data decomposition to a two-dimensional
decomposition. 
The factors are now submatrix based, and thus allow a parallel
solve to be much faster than in Release 1.0.
\par
In the first release, all numeric objects had a {\tt `D`} as the
leading letter in their name, e.g., {\tt DA2}, {\tt DChv}, etc.
A natural way to implement complex data types would be to write 
``parallel'' objects, e.g., {\tt ZA2}, {\tt ZChv}, etc, as is done
in LINPACK and LAPACK for subroutine names.
However, a {\tt DA2} and {\tt ZA2} object share so much common code
that it is a better decision to combine the real and complex
functionality into one object.
This is even more pronounced for the {\tt FrontMtx} object where
there is virtually no code that is dependent on whether or not the
matrix is real or complex.
\par
Virtually no new work has been done on the ordering objects and methods.
Their algorithms were state of the art two years ago, but a recent
comparison with the {\bf EXTREME} \cite{hr98-msndtalk}
and {\bf METIS} \cite{karypis98metis} packages 
on a large collection
of finite element problems shows that the {\bf SPOOLES} orderings
are still competitive.
\par
The serial, multithreaded and MPI code has been modified to force
greater sharing of code between the environments.
``What'' is done is identical in the three cases.
The multithreaded and MPI codes share the same ``choreography'',
in other words, who does what and how.
The main differences between multithreaded and MPI
are that the data structures are global versus local,
and that explicit message passing is done in the latter.
This common structure of the codes has a nonzero impact on the speed 
and efficiency of the individual codes, but the gains from a common
code base are well worth the cost.
\par
The MPI methods have been extensively reworked since the first
release. 
A number of bugs and logic errors have been detected and fixed.
The code appears to be more robust than the first release.
\par
\section{Changes from Release 2.0}
\label{section:intro:changes-2.0}
\par
Release 2.2 is partly a maintenance release.
Some bugs were found and fixed in the MPI factors and solves.
Some minor new methods were added to the 
{\tt DenseMtx}, {\tt FrontMtx}, {\tt InpMtx}
and {\tt Utilities} directories.
The multithreaded methods and drivers have been removed from the
{\tt FrontMtx} directory and placed in a new {\tt MT} directory,
much like the {\tt MPI} methods have their own directory.
\par
Some new functionality has been added.
\begin{itemize}
\item
There are now multithreaded
and distributed matrix-matrix multiply methods.
See the {\tt MT} and {\tt MPI} directories.
\item
The {\tt FrontMtx} object now supports more robust reporting of
errors encountered during the factorization.
There is one additional parameter in the factorization calling
sequences, an error return that signals that the factorization has
failed.
\item
In response to customer requirements, we have added some
``patch-and-go'' functionality to the sparse $LU$ and $U^TDU$
factorizations without pivoting.
There are applications in optimization and structural analysis
where pivoting is not necessary for stabilty, but where the
location of small or zero pivots on the diagonal is meaningful.
Normally the factorization would be ustable or stop, but special
action is taken, the factors are ``patched'' and the factorization
continues.
\par
There is a new {\tt PatchAndGoInfo} object that encapsulates the
``patch-and-go'' strategy and gathers optional statistics about the
action that was taken during the factorization.
This object is attached to the {\tt FrontMtx} object which passes
it unchanged to the {\tt Chv} object that performs the
factorization of each front.
If the user does not need this functionality, no changes are
necessary to their code, i.e., no calling sequences are affected.
\item
New MPI broadcast methods for the {\tt Graph}, {\tt IVL} and
{\tt ETree} objects have been added to the library.
\item
The {\tt Iter} directory contains the following Krylov accelerators 
for the iterative solution of linear systems:
Block GMRES, BiCGStab, conjugate gradient and transpose-free QMR.
Each is available in both left- and right-preconditioned forms.
The preconditioner that these methods use is a {\tt FrontMtx}
object that contains a drop tolerance approximate factorization.
The {\tt ILUMtx} object contains a simple vector-based drop
tolerance factorization object.
(The {\tt FrontMtx} approximate factorization is submatrix-based in
both its data structures and computational kernels, and supports
pivoting for numerical stability, which the {\tt ILUMtx} object
does not.)
We have not written Krylov methods that use the {\tt ILUMtx}
object, but it would be simple to replace the {\tt FrontMtx}
preconditioner with the {\tt ILUMtx} preconditioner.
\item
The {\tt SemiImplMtx} object contains
a {\it semi-implicit} factorization, 
a technique that can require less storage and solve operations than
the present explicit factorization. It is based on the equation
$$
\left \lbrack \begin{array}{cc}
A_{0,0} & A_{0,1} \cr
A_{1,0} & A_{1,1} 
\end{array} \right \rbrack
=
\left \lbrack \begin{array}{cc}
L_{0,0} & 0 \cr
L_{1,0} & L_{1,1} 
\end{array} \right \rbrack
\left \lbrack \begin{array}{cc}
U_{0,0} & U_{0,1} \cr
 0 & U_{1,1} 
\end{array} \right \rbrack,
=
\left \lbrack \begin{array}{cc}
L_{0,0} & 0 \cr
A_{1,0}U_{0,0}^{-1} & L_{1,1} 
\end{array} \right \rbrack
\left \lbrack \begin{array}{cc}
U_{0,0} & L_{0,0}^{-1}U_{0,1} \cr
 0 & U_{1,1}
\end{array} \right \rbrack.
$$
A solve of $AX = B$ 
with the explicit factorization does the following steps
\begin{itemize}
\item solve $L_{0,0} Y_0 = B_0$
\item solve $L_{1,1} U_{1,1} X_1 = B_1 - L_{1,0} Y_0$
\item solve $U_{0,0} X_0 = Y_0 - U_{0,1} X_1$
\end{itemize}
while an implicit factorization has the following form.
\begin{itemize}
\item solve $L_{0,0} U_{0,0} Z_0 = B_0$
\item solve $L_{1,1} U_{1,1} X_1 = B_1 - A_{1,0} Z_0$
\item solve $L_{0,0} U_{0,0} X_0 = B_0 - A_{0,1} X_1$
\end{itemize}
The difference is that the semi-implicit factorization stores
and computes with 
$A_{1,0}$ and $A_{0,1}$ instead of $L_{1,0}$ and $U_{0,1}$,
(this can be a modest savings in storage and operation count),
and performs two solves with $L_{0,0}$ and $U_{0,0}$
instead of one.
This technique works with either a direct or approximate
factorization of $A$.
The semi-implicit factorization is constructed via a
post-processing of any factorization computed by the {\tt
FrontMtx} object.
\end{itemize}