File: poisson3Db.rst

package info (click to toggle)
amgcl 1.4.4-1
  • links: PTS, VCS
  • area: contrib
  • in suites: sid
  • size: 5,676 kB
  • sloc: cpp: 34,043; python: 747; pascal: 258; f90: 196; makefile: 20
file content (351 lines) | stat: -rw-r--r-- 13,485 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
Poisson problem
---------------

This system may be downloaded from the poisson3Db_ page (use the `Matrix
Market`_ download option). The system matrix has 85,623 rows and 2,374,949
nonzeros (which is on average is about 27 non-zero elements per row). The
matrix has an interesting structure, presented on the figure below:

.. _poisson3Db: https://sparse.tamu.edu/FEMLAB/poisson3Db
.. _Matrix Market: https://math.nist.gov/MatrixMarket

.. figure:: ../../tutorial/1.poisson3Db/Poisson3D.png
   :width: 90%

   Poisson3Db matrix portrait

A Poisson problem should be an ideal candidate for a solution with an AMG
preconditioner, but before we start writing any code, let us check this with
the `examples/solver`_ utility provided by AMGCL. It can take the input matrix
and the RHS in the `Matrix Market`_ format, and allows to play with various
solver and preconditioner options.

.. _examples/solver: https://github.com/ddemidov/amgcl/blob/master/examples/solver.cpp

.. note::

    The `examples/solver`_ is convenient not only for testing the systems
    obtained elsewhere. You can also save your own matrix and the RHS vector
    into the `Matrix Market`_ format with :cpp:func:`amgcl::io::mm_write()`
    function. This way you can find the AMGCL options that work for your
    problem without the need to rewrite the code and recompile the program.

The default options of BiCGStab iterative solver preconditioned with a smoothed
aggregation AMG (a simple diagonal SPAI(0) relaxation is used on each level of
the AMG hierarchy) should work very well for a Poisson problem::

    $ solver -A poisson3Db.mtx -f poisson3Db_b.mtx
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         85623
    Memory footprint: 4.57 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    58.93 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     50.07 M (83.20%)
        1         6361         446833      7.78 M (15.65%)
        2          384          32566      1.08 M ( 1.14%)

    Iterations: 24
    Error:      8.33789e-09

    [Profile:       2.351 s] (100.00%)
    [  reading:     1.623 s] ( 69.01%)
    [  setup:       0.136 s] (  5.78%)
    [  solve:       0.592 s] ( 25.17%)

As we can see from the output, the solution converged in 24 iterations to the
default relative error of 1e-8. The solver setup took 0.136 seconds and the
solution time is 0.592 seconds. The iterative solver used 4.57M of memory, and
the preconditioner required 58.93M. This looks like a well-performing solver
already, but we can try a couple of things just in case. We can not use the
simpler CG solver, because the matrix is reported as a non-symmetric on the
poisson3Db_ page. Using the GMRES solver seems to work equally well (the
solution time is just slightly lower, but the solver requires more memory to
store the orthogonal vectors). The number of iterations seems to have grown,
but keep in mind that each iteration of BiCGStab requires two matrix-vector
products and two preconditioner applications, while GMRES only makes one of
each::

    $ solver -A poisson3Db.mtx -f poisson3Db_b.mtx solver.type=gmres
    Solver
    ======
    Type:             GMRES(30)
    Unknowns:         85623
    Memory footprint: 20.91 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    58.93 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     50.07 M (83.20%)
        1         6361         446833      7.78 M (15.65%)
        2          384          32566      1.08 M ( 1.14%)

    Iterations: 39
    Error:      9.50121e-09

    [Profile:       2.282 s] (100.00%)
    [  reading:     1.612 s] ( 70.66%)
    [  setup:       0.135 s] (  5.93%)
    [  solve:       0.533 s] ( 23.38%)

We can also try differrent relaxation options for the AMG preconditioner. But
as we can see below, the simplest SPAI(0) works well enough for a Poisson
problem.  The incomplete LU decomposition with zero fill-in makes less
iterations, but is more expensive to setup::

    $ solver -A poisson3Db.mtx -f poisson3Db_b.mtx precond.relax.type=ilu0
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         85623
    Memory footprint: 4.57 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    103.44 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     87.63 M (83.20%)
        1         6361         446833     14.73 M (15.65%)
        2          384          32566      1.08 M ( 1.14%)

    Iterations: 12
    Error:      7.99207e-09

    [Profile:       2.510 s] (100.00%)
    [ self:         0.005 s] (  0.19%)
    [  reading:     1.614 s] ( 64.30%)
    [  setup:       0.464 s] ( 18.51%)
    [  solve:       0.427 s] ( 17.01%)


On the other hand, the Chebyshev relaxation has cheap setup but its
application is expensive as it involves multiple matrix-vector products. So,
even though it requires less iterations, the overall solution time does not
improve that much::

    $ solver -A poisson3Db.mtx -f poisson3Db_b.mtx precond.relax.type=chebyshev
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         85623
    Memory footprint: 4.57 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    59.63 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     50.72 M (83.20%)
        1         6361         446833      7.83 M (15.65%)
        2          384          32566      1.08 M ( 1.14%)

    Iterations: 8
    Error:      5.21588e-09

    [Profile:       2.316 s] (100.00%)
    [  reading:     1.607 s] ( 69.39%)
    [  setup:       0.134 s] (  5.78%)
    [  solve:       0.574 s] ( 24.80%)

Now that we have the feel of the problem, we can actually write some code. The
complete source may be found in `tutorial/1.poisson3Db/poisson3Db.cpp`_ and is
presented below:

.. literalinclude:: ../../tutorial/1.poisson3Db/poisson3Db.cpp
   :caption: The source code for the solution of the poisson3Db problem.
   :language: cpp
   :linenos:

.. _tutorial/1.poisson3Db/poisson3Db.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/1.poisson3Db/poisson3Db.cpp

In lines 4--10 we include the necessary AMGCL headers:
the builtin backend uses the OpenMP threading model; the ``crs_tuple`` matrix
adaper allows to use a ``std::tuple`` of CRS arrays as an input matrix; the
:cpp:class:`amgcl::make_solver` class binds together a preconditioner and an iterative
solver; :cpp:class:`amgcl::amg` class is the AMG preconditioner;
:cpp:class:`amgcl::coarsening::smoothed_aggregation` defines the smoothed
aggreation coarsening strategy; :cpp:class:`amgcl::relaxation::spai0` is the
sparse approximate inverse relaxation used on each level of the AMG hierarchy;
and :cpp:class:`amgcl::solver::bicgstab` is the BiCGStab iterative solver.
In lines 12--13 we include the `Matrix Market`_ reader and the AMGCL profiler.

After checking the validity of the command line arguments (lines 16--20), and
initializing the profiler (line 23), we read the system matrix and the RHS
vector from the `Matrix Market`_ files specified on the command line (lines
30--36).

Now we are ready to actually solve the system. First, we define the backends
that we use with the iterative solver and the preconditioner (lines 44--51).
The backend have to belong to the same class (in this case,
:cpp:class:`amgcl::backend::builtin`), but may have different value type
precision. Here we use a double precision backend for the iterative solver, but
choose either a double or a single precision for the preconditioner backend,
depending on whether the preprocessor macro ``MIXED_PRECISION`` was defined
during compilation. Using a single precision preconditioner may be both more
memory efficient and faster, since the iterative solvers performance is usually
memory-bound.

The defined backends are used in the solver definition (lines 53--60). Here we
are using the :cpp:class:`amgcl::make_solver` class to couple the AMG
preconditioner with the BiCGStab iterative solver. We istantiate the solver in
line 64.

.. note:

    The same solver may be used multiple times with different RHS vectors.

In line 76 we solve the system for the given RHS vector, starting with a zero
initial approximation (the ``x`` vector acts as an initial approximation on
input, and contains the solution on output).

Below is the output of the program when compiled with a double precision
preconditioner. The results are close to what we have seen with the
`examples/solver`_ utility above, which is a good sign::

    $ ./poisson3Db poisson3Db.mtx poisson3Db_b.mtx
    Matrix poisson3Db.mtx: 85623x85623
    RHS poisson3Db_b.mtx: 85623x1
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         85623
    Memory footprint: 4.57 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    58.93 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     50.07 M (83.20%)
        1         6361         446833      7.78 M (15.65%)
        2          384          32566      1.08 M ( 1.14%)

    Iters: 24
    Error: 8.33789e-09

    [poisson3Db:     2.412 s] (100.00%)
    [  read:         1.618 s] ( 67.08%)
    [  setup:        0.143 s] (  5.94%)
    [  solve:        0.651 s] ( 26.98%)

Looking at the output of the mixed precision version, it is apparent that it
uses less memory for the preconditioner (43.59M as opposed to 58.93M in the
double-precision case), and is slightly faster during both the setup and the
solution phases::

    $ ./poisson3Db_mixed poisson3Db.mtx poisson3Db_b.mtx
    Matrix poisson3Db.mtx: 85623x85623
    RHS poisson3Db_b.mtx: 85623x1
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         85623
    Memory footprint: 4.57 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    43.59 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     37.23 M (83.20%)
        1         6361         446833      5.81 M (15.65%)
        2          384          32566    554.90 K ( 1.14%)

    Iters: 24
    Error: 7.33493e-09

    [poisson3Db:     2.234 s] (100.00%)
    [  read:         1.559 s] ( 69.78%)
    [  setup:        0.125 s] (  5.59%)
    [  solve:        0.550 s] ( 24.62%)

We may also try to switch to the CUDA backend in order to accelerate the
solution using an NVIDIA GPU. We only need to use the
:cpp:class:`amgcl::backend::cuda` instead of the ``builtin`` backend, and we
also need to initialize the CUSPARSE library and pass the handle to AMGCL as
the backend parameters. Unfortunately, we can not use the mixed precision
approach, as CUSPARSE does not support that (we could use the VexCL_ backend
though, see :doc:`poisson3DbMPI`). The source code is very close to what we
have seen above and is available at
`tutorial/1.poisson3Db/poisson3Db_cuda.cu`_. The listing below has the
differences highligted:

.. literalinclude:: ../../tutorial/1.poisson3Db/poisson3Db_cuda.cu
   :caption: The source code for the solution of the poisson3Db problem using the CUDA backend.
   :language: cuda
   :linenos:
   :emphasize-lines: 4,22-27,51,63-64,69,73,83-84

.. _VexCL: https://github.com/ddemidov/vexcl
.. _tutorial/1.poisson3Db/poisson3Db_cuda.cu: https://github.com/ddemidov/amgcl/blob/master/tutorial/1.poisson3Db/poisson3Db_cuda.cu

Using the consumer level GeForce GTX 1050 Ti GPU, the solution phase is almost
4 times faster than with the OpenMP backend. On the contrary, the setup is
slower, because we now need to additionally initialize the GPU-side structures.
Overall, the complete solution is about twice faster (comparing with the double
precision OpenMP version)::

    $ ./poisson3Db_cuda poisson3Db.mtx poisson3Db_b.mtx
    GeForce GTX 1050 Ti
    Matrix poisson3Db.mtx: 85623x85623
    RHS poisson3Db_b.mtx: 85623x1
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         85623
    Memory footprint: 4.57 M

    Preconditioner
    ==============
    Number of levels:    3
    Operator complexity: 1.20
    Grid complexity:     1.08
    Memory footprint:    44.81 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0        85623        2374949     37.86 M (83.20%)
        1         6361         446833      5.86 M (15.65%)
        2          384          32566      1.09 M ( 1.14%)

    Iters: 24
    Error: 8.33789e-09

    [poisson3Db:     2.253 s] (100.00%)
    [ self:          0.223 s] (  9.90%)
    [  read:         1.676 s] ( 74.39%)
    [  setup:        0.183 s] (  8.12%)
    [  solve:        0.171 s] (  7.59%)