File: Stokes.rst

package info (click to toggle)
amgcl 1.4.4-1
  • links: PTS, VCS
  • area: contrib
  • in suites: sid
  • size: 5,676 kB
  • sloc: cpp: 34,043; python: 747; pascal: 258; f90: 196; makefile: 20
file content (374 lines) | stat: -rw-r--r-- 13,095 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
Stokes-like problem
-------------------

In this section we consider a saddle point system which was obtained by
discretization of the steady incompressible Stokes flow equations in a unit
cube with a locally divergence-free weak Galerkin finite element method. The
UCube(4) system studied here may be downloaded from the the `dataset
<https://doi.org/10.5281/zenodo.4134357>`_ accompanying the paper [DeMW20]_. We
will use the UCube(4) system from the dataset. The system matrix is symmetric
and has 554,496 rows and 14,292,884 nonzero values, which corresponds to an
average of 26 nonzero entries per row. The matrix sparsity portrait is shown on
the figure below.

.. _SuiteSparse Matrix Collection: https://sparse.tamu.edu/

.. figure:: ../../tutorial/4.Stokes/ucube_4.png
   :width: 90%
   :name: ucube_spy

   UCube(4) matrix portrait

As with any Stokes-like problem, the system has a general block-wise structure:

.. math::

    \begin{bmatrix} A & B_1^T \\ B_2 & C \end{bmatrix}
    \begin{bmatrix} \mathbf x_u \\ \mathbf x_p \end{bmatrix} =
    \begin{bmatrix} \mathbf b_u \\ \mathbf b_p \end{bmatrix}

In this case, the upper left subblock :math:`A` corresponds the the flow
unknowns, and itself has block-wise structure with small :math:`3\times3`
blocks. The lower right subblock :math:`C` corresponds to the pressure
unknowns. There is a lot of research dedicated to the efficient solution of
such systems, see [BeGL05]_ for an extensive overview.  The direct approach of
using a monolithic preconditioner usually does not work very well, but we may
try it to have a reference point. The AMG preconditioning does not yield a
converging solution, but a single level ILU(0) relaxation seems to work with a
CG iterative solver::

    $ solver -B -A ucube_4_A.bin -f ucube_4_b.bin \
          solver.type=cg solver.maxiter=500 \
          precond.class=relaxation precond.type=ilu0
    Solver
    ======
    Type:             CG
    Unknowns:         554496
    Memory footprint: 16.92 M

    Preconditioner
    ==============
    Relaxation as preconditioner
      Unknowns: 554496
      Nonzeros: 14292884
      Memory:   453.23 M

    Iterations: 270
    Error:      6.84763e-09

    [Profile:       9.300 s] (100.00%)
    [  reading:     0.133 s] (  1.43%)
    [  setup:       0.561 s] (  6.03%)
    [  solve:       8.599 s] ( 92.46%)

A preconditioner that takes the structure of the system into account should be
a better choice performance-wise. AMGCL provides an implementation of the Schur
complement pressure correction preconditioner. The preconditioning step
consists of solving two linear systems:

.. math::
   :label: schurpc_eq

    \begin{aligned}
        S \mathbf x_p &= \mathbf b_p - B_2 A^{-1} \mathbf b_u, \\
        A \mathbf x_u &= \mathbf b_u - B_1^T \mathbf b_p.
    \end{aligned}

Here :math:`S` is the Schur complement :math:`S = C - B_2 A^{-1} B_1^T`.  Note
that forming the Schur complement matrix explicitly is prohibitively expensive,
and the following approximation is used to create the preconditioner for the
first equation in :eq:`schurpc_eq`:

.. math::

    \hat S = C - \mathop{\mathrm{diag}}\left( B_2 \mathop{\mathrm{diag}}(A)^{-1} B_1^T \right).

There is no need to solve the equations :eq:`schurpc_eq` exactly. It is enough
to perform a single application of the corresponding preconditioner as an
approximation to :math:`S^{-1}` and :math:`A^{-1}`. This means that the
overall preconditioner is linear, and we may use a non-flexible iterative
solver with it.  The approximation matrix :math:`\hat S` has a simple band
diagonal structure, and a diagonal SPAI(0) preconditioner should have
reasonable performance.

.. _examples/solver: https://github.com/ddemidov/amgcl/blob/master/examples/solver.cpp
.. _examples/schur_pressure_correction: https://github.com/ddemidov/amgcl/blob/master/examples/schur_pressure_correction.cpp

Similar to the `examples/solver`_, the `examples/schur_pressure_correction`_
utility allows to play with the Schur pressure correction preconditioner
options before trying to write any code. We found that using the non-smoothed
aggregation with ILU(0) smoothing on each level for the flow subsystem
(``usolver``) and single-level SPAI(0) relaxation for the Schur complement
subsystem (``psolver``) works best. We also disable lumping of the diagonal of
the :math:`A` matrix in the Schur complement approximation with the
``precond.simplec_dia=false`` option, and enable block-valued backend for the
flow susbsystem with the ``--ub 3`` option. The ``-m '>456192'`` option sets
the pressure mask pattern. It tells the solver that all unknowns starting with
the 456192-th belong to the pressure subsystem::

    $ schur_pressure_correction -B -A ucube_4_A.bin -f ucube_4_b.bin -m '>456192' \
        -p solver.type=cg solver.maxiter=200 \
           precond.simplec_dia=false \
           precond.usolver.solver.type=preonly \
           precond.usolver.precond.coarsening.type=aggregation \
           precond.usolver.precond.relax.type=ilu0 \
           precond.psolver.solver.type=preonly \
           precond.psolver.precond.class=relaxation \
           --ub 3
    Solver
    ======
    Type:             CG
    Unknowns:         554496
    Memory footprint: 16.92 M

    Preconditioner
    ==============
    Schur complement (two-stage preconditioner)
      Unknowns: 554496(98304)
      Nonzeros: 14292884
      Memory:  587.45 M

    [ U ]
    Solver
    ======
    Type:             PreOnly
    Unknowns:         152064
    Memory footprint: 0.00 B

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.25
    Grid complexity:     1.14
    Memory footprint:    233.07 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0       152064         982416    188.13 M (80.25%)
        1        18654         197826     35.07 M (16.16%)
        2         2619          35991      6.18 M ( 2.94%)
        3          591           7953      3.69 M ( 0.65%)


    [ P ]
    Solver
    ======
    Type:             PreOnly
    Unknowns:         98304
    Memory footprint: 0.00 B

    Preconditioner
    ==============
    Relaxation as preconditioner
      Unknowns: 98304
      Nonzeros: 274472
      Memory:   5.69 M


    Iterations: 35
    Error:      8.57921e-09

    [Profile:                3.872 s] (100.00%)
    [  reading:              0.131 s] (  3.38%)
    [  schur_complement:     3.741 s] ( 96.62%)
    [   self:                0.031 s] (  0.79%)
    [    setup:              0.301 s] (  7.78%)
    [    solve:              3.409 s] ( 88.05%)

Lets see how this translates to the code. Below is the complete listing of the
solver (`tutorial/4.Stokes/stokes_ucube.cpp`_) which uses the mixed precision approach.

.. _tutorial/4.Stokes/stokes_ucube.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/4.Stokes/stokes_ucube.cpp

.. literalinclude:: ../../tutorial/4.Stokes/stokes_ucube.cpp
   :caption: The source code for the solution of the UCube(4) problem.
   :language: cpp
   :linenos:

Schur pressure correction is composite preconditioner. Its definition includes
definition of two nested iterative solvers, one for the "flow" (U) subsystem,
and the other for the "pressure" (P) subsystem. In lines 55--58 we define the
backends used in the outer iterative solver, and in the two nested solvers.
Note that both backends for nested solvers use single precision values, and the
flow subsystem backend has block value type:

.. literalinclude:: ../../tutorial/4.Stokes/stokes_ucube.cpp
   :language: cpp
   :linenos:
   :lines: 55-58
   :lineno-start: 55

In lines 60-79 we define the solver type. The flow solver is defined in lines
62-69, and the pressure solver -- in lines 70--77. Both are using
:cpp:class:`amgcl::solver::preonly` as "iterative" solver, which in fact only
applies the specified preconditioner once. The flow solver is defined with
:cpp:class:`amgcl::make_block_solver`, which automatically converts its
input matrix :math:`A` to the block format during the setup and reinterprets
the scalar RHS and solution vectors as having block values during solution:

.. literalinclude:: ../../tutorial/4.Stokes/stokes_ucube.cpp
   :language: cpp
   :linenos:
   :lines: 60-79
   :lineno-start: 60

In the solver parameters we disable lumping of the matrix :math:`A` diagonal
for the Schur complement approimation (line 83) and fill the pressure mask to
indicate which unknowns correspond to the pressure subsystem (lines 84--85):

.. literalinclude:: ../../tutorial/4.Stokes/stokes_ucube.cpp
   :language: cpp
   :linenos:
   :lines: 81-85
   :lineno-start: 81

Here is the output from the compiled program. The preconditioner uses 398M or
memory, as opposed to 587M in the case of the full precision preconditioner
used in the `examples/schur_pressure_correction`_, and both the setup and the
solution are about 50% faster due to the use of the mixed precision approach::

    $ ./stokes_ucube ucube_4_A.bin ucube_4_b.bin 456192
    Matrix ucube_4_A.bin: 554496x554496
    RHS ucube_4_b.bin: 554496x1
    Solver
    ======
    Type:             CG
    Unknowns:         554496
    Memory footprint: 16.92 M

    Preconditioner
    ==============
    Schur complement (two-stage preconditioner)
      Unknowns: 554496(98304)
      Nonzeros: 14292884
      Memory:  398.39 M

    [ U ]
    Solver
    ======
    Type:             PreOnly
    Unknowns:         152064
    Memory footprint: 0.00 B

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.25
    Grid complexity:     1.14
    Memory footprint:    130.49 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0       152064         982416    105.64 M (80.25%)
        1        18654         197826     19.56 M (16.16%)
        2         2619          35991      3.44 M ( 2.94%)
        3          591           7953      1.85 M ( 0.65%)


    [ P ]
    Solver
    ======
    Type:             PreOnly
    Unknowns:         98304
    Memory footprint: 0.00 B

    Preconditioner
    ==============
    Relaxation as preconditioner
      Unknowns: 98304
      Nonzeros: 274472
      Memory:   4.27 M


    Iters: 35
    Error: 8.57996e-09

    [UCube4:      2.502 s] (100.00%)
    [  read:      0.129 s] (  5.16%)
    [  setup:     0.240 s] (  9.57%)
    [  solve:     2.132 s] ( 85.19%)

Converting the solver to the VexCL backend in order to accelerate the solution
with GPGPU is straightforward. Below is the complete source code of the solver
(`tutorial/4.Stokes/stokes_ucube_vexcl.cpp`_), with the differences between the
OpenMP and the VexCL versions highlighted. Note that the GPU version of the
ILU(0) smoother approximates the lower and upper triangular solves in the
incomplete LU decomposition with a couple of Jacobi iterations [ChPa15]_. Here
we set the number of iterations to 4 (line 94).

.. _tutorial/4.Stokes/stokes_ucube_vexcl.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/4.Stokes/stokes_ucube_vexcl.cpp

.. literalinclude:: ../../tutorial/4.Stokes/stokes_ucube_vexcl.cpp
   :caption: The source code for the solution of the UCube(4) problem with the VexCL backend.
   :language: cpp
   :linenos:
   :emphasize-lines: 4-5,32-39,65-67,94,99-100,104,111-113,118-120,123

The output of the VexCL version is shown below. The solution phase is about
twice as fast as the OpenMP version::

    $ ./stokes_ucube_vexcl_cuda ucube_4_A.bin ucube_4_b.bin 456192
    1. GeForce GTX 1050 Ti

    Matrix ucube_4_A.bin: 554496x554496
    RHS ucube_4_b.bin: 554496x1
    Solver
    ======
    Type:             CG
    Unknowns:         554496
    Memory footprint: 16.92 M

    Preconditioner
    ==============
    Schur complement (two-stage preconditioner)
      Unknowns: 554496(98304)
      Nonzeros: 14292884
      Memory:  399.66 M

    [ U ]
    Solver
    ======
    Type:             PreOnly
    Unknowns:         152064
    Memory footprint: 0.00 B

    Preconditioner
    ==============
    Number of levels:    4
    Operator complexity: 1.25
    Grid complexity:     1.14
    Memory footprint:    131.76 M

    level     unknowns       nonzeros      memory
    ---------------------------------------------
        0       152064         982416    106.76 M (80.25%)
        1        18654         197826     19.68 M (16.16%)
        2         2619          35991      3.45 M ( 2.94%)
        3          591           7953      1.86 M ( 0.65%)


    [ P ]
    Solver
    ======
    Type:             PreOnly
    Unknowns:         98304
    Memory footprint: 0.00 B

    Preconditioner
    ==============
    Relaxation as preconditioner
      Unknowns: 98304
      Nonzeros: 274472
      Memory:   4.27 M


    Iters: 36
    Error: 7.26253e-09

    [UCube4 (VexCL):   1.858 s] (100.00%)
    [ self:            0.004 s] (  0.20%)
    [  GPU matrix:     0.213 s] ( 11.46%)
    [  read:           0.128 s] (  6.87%)
    [  setup:          0.519 s] ( 27.96%)
    [  solve:          0.994 s] ( 53.52%)