File: Serena.rst

package info (click to toggle)
amgcl 1.4.4-1
  • links: PTS, VCS
  • area: contrib
  • in suites: sid
  • size: 5,676 kB
  • sloc: cpp: 34,043; python: 747; pascal: 258; f90: 196; makefile: 20
file content (426 lines) | stat: -rw-r--r-- 16,942 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
Structural problem
------------------

This system may be downloaded from the Serena_ page (use the `Matrix Market`_
download option). According to the description, the system represents a 3D gas
resevoir simulation for CO2 sequestration, and was obtained from a structural
problem discretizing a gas reservoir with tetrahedral Finite Elements. The
medium is strongly heterogeneous and characterized by a complex geometry
consisting of alternating sequences of thin clay and sand layers. More details
available in [FGJT10]_.  Note that the RHS vector for the Serena problem is not
provided, and we use the RHS vector filled with ones.

The system matrix is symmetric, and has 1,391,349 rows and 64,131,971 nonzero
values, which corresponds to an average of 46 nonzeros per row. The matrix
portrait is shown on the figure below.

.. _Serena: https://sparse.tamu.edu/Janna/Serena
.. _Matrix Market: https://math.nist.gov/MatrixMarket
.. _examples/solver: https://github.com/ddemidov/amgcl/blob/master/examples/solver.cpp
.. _examples/mm2bin: https://github.com/ddemidov/amgcl/blob/master/examples/mm2bin.cpp

.. figure:: ../../tutorial/2.Serena/Serena.png
   :width: 90%
   :name: serena_spy

   Serena matrix portrait

.. [FGJT10] M. Ferronato, G. Gambolati, C. Janna, P. Teatini. "Geomechanical issues of anthropogenic CO2 sequestration in exploited gas fields", Energy Conversion and Management, 51, pp. 1918-1928, 2010.

As in the case of :doc:`poisson3Db` tutorial, we start experimenting with the
`examples/solver`_ utility provided by AMGCL. The default options do not seem
to work this time. The relative error did not reach the required threshold of
1e-8 and the solver exited after the default limit of 100 iterations::

    $ solver -A Serena.mtx
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         1391349
    Memory footprint: 74.31 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.22
    Grid complexity:     1.08
    Memory footprint:    1.45 G

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0      1391349       64531701      1.22 G (82.01%)
        1        98824       13083884    218.40 M (16.63%)
        2         5721        1038749     16.82 M ( 1.32%)
        3          279          29151    490.75 K ( 0.04%)

    Iterations: 100
    Error:      0.000874761

    [Profile:      74.102 s] (100.00%)
    [  reading:    18.505 s] ( 24.97%)
    [  setup:       2.101 s] (  2.84%)
    [  solve:      53.489 s] ( 72.18%)

The system is quite large and just reading from the text-based `Matrix Market`_
format takes 18.5 seconds. No one has that amount of free time on their hands,
so lets convert the matrix into the binary format with the `examples/mm2bin`_
utility. This should make the experiments slightly less painful::

    mm2bin -i Serena.mtx -o Serena.bin
    Wrote 1391349 by 1391349 sparse matrix, 64531701 nonzeros

The ``-B`` option tells the solver that the input is in
binary format now. Lets also increase the maximum iteration limit this time to
see if the solver manages to converge at all::

    $ solver -B -A Serena.bin solver.maxiter=1000
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         1391349
    Memory footprint: 74.31 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.22
    Grid complexity:     1.08
    Memory footprint:    1.45 G

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0      1391349       64531701      1.22 G (82.01%)
        1        98824       13083884    218.40 M (16.63%)
        2         5721        1038749     16.82 M ( 1.32%)
        3          279          29151    490.75 K ( 0.04%)

    Iterations: 211
    Error:      8.54558e-09

    [Profile:     114.703 s] (100.00%)
    [  reading:     0.550 s] (  0.48%)
    [  setup:       2.114 s] (  1.84%)
    [  solve:     112.034 s] ( 97.67%)

The input matrix is read much faster now, and the solver does converge, but the
convergence rate is not great. Looking closer at the :ref:`serena_spy` figure,
the matrix seems to have block structure with :math:`3\times3` blocks.  This is
usually the case when the system has been obtained via discretization of a
system of coupled PDEs, or has vector unknowns. We have to guess here, but
since the problem is described as "structural", then each block probably
corresponds to the 3D displacement vector of a single grid node. We can
communicate this piece of information to AMGCL using the ``block_size``
parameter of the aggregation method::

    $ solver -B -A Serena.bin solver.maxiter=1000 \
          precond.coarsening.aggr.block_size=3
    Solver
    ======
    Type:             BiCGStab
    Unknowns:         1391349
    Memory footprint: 74.31 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.31
    Grid complexity:     1.08
    Memory footprint:    1.84 G

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0      1391349       64531701      1.50 G (76.48%)
        1       109764       17969220    316.66 M (21.30%)
        2         6291        1788507     29.51 M ( 2.12%)
        3          429          82719      1.23 M ( 0.10%)

    Iterations: 120
    Error:      9.73074e-09

    [Profile:      73.296 s] (100.00%)
    [  reading:     0.587 s] (  0.80%)
    [  setup:       2.709 s] (  3.70%)
    [  solve:      69.994 s] ( 95.49%)

This has definitely improved the convergence! We also know that the matrix is
symmetric, so lets switch the solver from the default BiCGStab to the slightly
less expensive CG::

    $ solver -B -A Serena.bin \
          solver.type=cg \
          solver.maxiter=1000 \
          precond.coarsening.aggr.block_size=3
    Solver
    ======
    Type:             CG
    Unknowns:         1391349
    Memory footprint: 42.46 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.31
    Grid complexity:     1.08
    Memory footprint:    1.84 G

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0      1391349       64531701      1.50 G (76.48%)
        1       109764       17969220    316.66 M (21.30%)
        2         6291        1788507     29.51 M ( 2.12%)
        3          429          82719      1.23 M ( 0.10%)

    Iterations: 177
    Error:      8.6598e-09

    [Profile:      55.250 s] (100.00%)
    [  reading:     0.550 s] (  1.00%)
    [  setup:       2.801 s] (  5.07%)
    [  solve:      51.894 s] ( 93.92%)

This reduces the solution time, even though the number of iterations
has grown. Each iteration of BiCGStab costs about twice as much as a CG
iteration, because BiCGStab does two matrix-vector products and preconditioner
applications per iteration, while CG only does one.

The problem description states that *the medium is strongly heterogeneous and
characterized by a complex geometry consisting of alternating sequences of thin
clay and sand layers*. This may result in high contrast between matrix
coefficients in the neighboring rows, which is confirmed by the plot of the
matrix diagonal in :ref:`serena_spy`: the diagonal coefficients span more than
10 orders of magnitude! Scaling the matrix (so that it has the unit diagonal)
should help with the convergence. The ``-s`` option tells the solver to do
that::

    $ solver -B -A Serena.bin -s \
          solver.type=cg solver.maxiter=200 \
          precond.coarsening.aggr.block_size=3
    Solver
    ======
    Type:             CG
    Unknowns:         1391349
    Memory footprint: 42.46 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.29
    Grid complexity:     1.08
    Memory footprint:    1.82 G

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0      1391349       64531701      1.51 G (77.81%)
        1       100635       16771185    294.81 M (20.22%)
        2         5643        1571157     25.92 M ( 1.89%)
        3          342          60264    802.69 K ( 0.07%)

    Iterations: 112
    Error:      9.84457e-09

    [Profile:      36.021 s] (100.00%)
    [ self:         0.204 s] (  0.57%)
    [  reading:     0.564 s] (  1.57%)
    [  setup:       2.684 s] (  7.45%)
    [  solve:      32.568 s] ( 90.42%)

And the convergence has indeed been improved! Finally, when the matrix has
block structure, as in this case, it often pays to use the block-valued
backend, so that the system matrix has three times fewer rows and columns, but
each nonzero entry is a statically sized :math:`3\times3` matrix. This should
be done instead of specifying the ``block_size`` aggregation parameter, as the
aggregation now naturally operates with the :math:`3\times3` blocks::

    $ solver -B -A Serena.bin solver.type=cg solver.maxiter=200 -s -b3
    Solver
    ======
    Type:             CG
    Unknowns:         463783
    Memory footprint: 42.46 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.27
    Grid complexity:     1.08
    Memory footprint:    1.04 G

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0       463783        7170189    891.53 M (78.80%)
        1        33052        1772434    159.22 M (19.48%)
        2         1722         151034     12.72 M ( 1.66%)
        3           98           5756    612.72 K ( 0.06%)

    Iterations: 162
    Error:      9.7497e-09

    [Profile:      31.122 s] (100.00%)
    [ self:         0.204 s] (  0.66%)
    [  reading:     0.550 s] (  1.77%)
    [  setup:       1.013 s] (  3.26%)
    [  solve:      29.354 s] ( 94.32%)

Note that the preconditioner now requires 1.04G of memory as opposed to 1.82G
in the scalar case.  The setup is about 2.5 times faster, and the solution
phase performance has been slightly improved, even though the number of
iteration has grown.  This is explained by the fact that the matrix is now
symbolically smaller, and is easier to analyze during setup. The matrix also
occupies less memory for the CRS arrays, and is more cache-friendly, which
helps to speed up the solution phase. This seems to be the best we can get with
this system, so let us implement this version. We will also use the mixed
precision approach in order to get as much performance as possible from the
solution. The listing below shows the complete solution and is also available
in `tutorial/2.Serena/serena.cpp`_.

.. literalinclude:: ../../tutorial/2.Serena/serena.cpp
   :caption: The source code for the solution of the Serena problem.
   :language: cpp
   :linenos:

.. _tutorial/2.Serena/serena.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/2.Serena/serena.cpp

In addition the the includes described in :doc:`poisson3Db`, we also include
the headers for the :cpp:class:`amgcl::static_matrix` value type, and the
:cpp:func:`amgcl::adapter::block_matrix` adapter that transparently converts
a scalar matrix to the block format. In lines 42--58 we apply the scaling
according to the following formula:

.. math:: A_s = D^{-1/2} A D^{-1/2}, \quad f_s = D^{-1/2} f

where :math:`A_s` and :math:`f_s` are the scaled matrix and the RHS vector, and
:math:`D` is the diagonal of the matrix :math:`A`.  After solving the scaled
system :math:`A_s y = f_s`, the solution to the original system may be found as
:math:`x = D^{-1/2} y`.

In lines 66--68 we define the block value types for the matrix and the RHS and
solution vectors. ``dmat_type`` and ``smat_type`` are :math:`3\times3` static
matrices used as value types with the double precision solver backend and the
single precision preconditioner backend. ``dvec_type`` is a double precision
:math:`3\times1` matrix (or a vector) used as a value type for the RHS and the
solution.

The solver class definition in lines 73--80 is almost the same as in the
:doc:`poisson3Db` case, with the exception that we are using the CG iterative
solver this time.  In lines 83--84 we define the solver parameters. Namely, we
increase the maximum iterations limit to 500 iterations.

In lines 90--91 we instantiate the solver, using the ``block_matrix`` adapter
in order to convert the scalar matrix into the block format. The adapter
operates on a row-by-row basis and does not create a temporary copy of the
matrix.

In lines 103--106 we convert the scalar RHS and solution vectors to the
block-valued ones. We use the fact that 3 consecutive elements of a scalar
array may be reinterpreted as a single :math:`3\times1` static matrix. Using
the ``reinterpret_cast`` trick we can get the block-valued view into the RHS
and the solution vectors data without extra memory copies.

Here is the output of the program::

    $ ./serena Serena.mtx 
    Matrix Serena.mtx: 1391349x1391349
    Solver
    ======
    Type:             CG
    Unknowns:         463783
    Memory footprint: 42.46 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.27
    Grid complexity:     1.08
    Memory footprint:    585.33 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0       463783        7170189    490.45 M (78.80%)
        1        33052        1772434     87.58 M (19.48%)
        2         1722         151034      7.00 M ( 1.66%)
        3           98           5756    306.75 K ( 0.06%)

    Iters: 162
    Error: 9.74929e-09

    [Serena:     48.427 s] (100.00%)
    [ self:       0.166 s] (  0.34%)
    [  read:     21.115 s] ( 43.60%)
    [  setup:     0.749 s] (  1.55%)
    [  solve:    26.397 s] ( 54.51%)

Note that due to the use of mixed precision the preconditioner consumes 585.33M
of memory as opposed to 1.08G from the example above. The setup and the
solution are faster that the full precision version by about 30% and 10%
correspondingly.

Let us see if using a GPU backend may further improve the performance. The CUDA
backend does not support block value types, so we will use the VexCL backend
(which, in turn, may use either OpenCL, CUDA, or OpenMP). The listing below
contains the complete source for the solution (available at
`tutorial/2.Serena/serena_vexcl.cpp`_). The differences with the builtin
backend version are highlighted.

.. literalinclude:: ../../tutorial/2.Serena/serena_vexcl.cpp
   :caption: The solution of the Serena problem with the VexCL backend.
   :language: cpp
   :linenos:
   :emphasize-lines: 4-5,28-29,32-33,81-82,98-99,106,119-120,127-128,131

.. _tutorial/2.Serena/serena_vexcl.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/2.Serena/serena_vexcl.cpp

In the include section, we replace the header for the builtin backend with the
one for the VexCL backend, and also include the header with support for block
values in VexCL (lines 4--5). In lines 28--29 we initialize the VexCL context,
and in lines 32--33 we enable the VexCL support for :math:`3\times3` static
matrices in both double and single precision.

In lines 81--82 we define the solver and preconditioner backends as VexCL
backends with the corresponding matrix value types. In lines 98--99 we
reference the VexCL context in the backend parameters.

Since we are using the GPU backend, we have to explicitly form the block valued
matrix and transfer it to the GPU. This is done in lines 119--120. In lines
127--128 we copy the RHS and the solution vectors to the GPU, and we solve the
system in line 131.

The output of the program is shown below::

    $ ./serena_vexcl_cuda Serena.mtx 
    1. GeForce GTX 1050 Ti

    Matrix Serena.mtx: 1391349x1391349
    Solver
    ======
    Type:             CG
    Unknowns:         463783
    Memory footprint: 42.46 M

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.27
    Grid complexity:     1.08
    Memory footprint:    585.33 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0       463783        7170189    490.45 M (78.80%)
        1        33052        1772434     87.58 M (19.48%)
        2         1722         151034      7.00 M ( 1.66%)
        3           98           5756    309.04 K ( 0.06%)

    Iters: 162
    Error: 9.74928e-09

    [Serena (VexCL):  27.208 s] (100.00%)
    [ self:            0.180 s] (  0.66%)
    [  GPU matrix:     0.604 s] (  2.22%)
    [  read:          18.699 s] ( 68.73%)
    [  setup:          1.308 s] (  4.81%)
    [  solve:          6.417 s] ( 23.59%)

The setup time has increased from 0.7 seconds for the builtin backend to 1.3
seconds, and we also see the additional 0.6 seconds for transferring the matrix to the
GPU. But the solution time has decreased from 26.4 to 6.4 seconds, which is
about 4 times faster.