File: architecture-multi-standby.rst

package info (click to toggle)
pg-auto-failover 2.0-2
  • links: PTS
  • area: main
  • in suites: bookworm
  • size: 5,156 kB
  • sloc: ansic: 58,245; python: 5,501; sql: 3,171; makefile: 593; sh: 35
file content (334 lines) | stat: -rw-r--r-- 15,993 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
.. _multi_node_architecture:

Multi-node Architectures
========================

Pg_auto_failover allows you to have more than one standby node, and offers
advanced control over your production architecture characteristics.

Architectures with two standby nodes
------------------------------------

When adding your second standby node with default settings, you get the
following architecture:

.. figure:: ./tikz/arch-multi-standby.svg
   :alt: pg_auto_failover architecture with two standby nodes

   pg_auto_failover architecture with two standby nodes

In this case, three nodes get set up with the same characteristics, achieving
HA for both the Postgres service and the production dataset. An important
setting for this architecture is ``number_sync_standbys``.

The replication setting ``number_sync_standbys`` sets how many standby nodes
the primary should wait for when committing a transaction. In order to have
a good availability in your system, pg_auto_failover requires
``number_sync_standbys + 1`` standby nodes participating in the replication
quorum: this allows any standby node to fail without impact on the system's
ability to respect the replication quorum.

When only two nodes are registered in a group on the monitor we have a
primary and a single secondary node. Then ``number_sync_standbys`` can only
be set to zero. When adding a second standby node to a pg_auto_failover
group, then the monitor automatically increments ``number_sync_standbys`` to
one, as we see in the diagram above.

When ``number_sync_standbys`` is set to zero then pg_auto_failover
implements the *Business Continuity* setup as seen in
:ref:`architecture_basics`: synchronous replication is then used as a way to
guarantee that failover can be implemented without data loss.

In more details:

 1. With ``number_sync_standbys`` set to one, this architecture always
    maintains two copies of the dataset: one on the current primary node
    (node A in the previous diagram), and one on the standby that
    acknowledges the transaction first (either node B or node C in the
    diagram).

    When one of the standby nodes is unavailable, the second copy of the
    dataset can still be maintained thanks to the remaining standby.

    When both the standby nodes are unavailable, then it's no longer
    possible to guarantee the replication quorum, and thus writes on the
    primary are blocked. The Postgres primary node waits until at least one
    standby node acknowledges the transactions locally committed, thus
    degrading your Postgres service to read-only.

 0. It is possible to manually set ``number_sync_standbys`` to zero when
    having registered two standby nodes to the monitor, overriding the
    default behavior.

    In that case, when the second standby node becomes unhealthy at the same
    time as the first standby node, the primary node is assigned the state
    :ref:`wait_primary`. In that state, synchronous replication is disabled
    on the primary by setting ``synchronous_standby_names`` to an empty
    string. Writes are allowed on the primary, even though there's no extra
    copy of the production dataset available at this time.

    Setting ``number_sync_standbys`` to zero allows data to be written even
    when both standby nodes are down. In this case, a single copy of the
    production data set is kept and, if the primary was then to fail, some
    data will be lost. How much depends on your backup and recovery
    mechanisms.

.. _architecture_setup:

Replication Settings and Postgres Architectures
-----------------------------------------------

The entire flexibility of pg_auto_failover can be leveraged with the
following three replication settings:

  - Number of sync stanbys
  - Replication quorum
  - Candidate priority

.. _number_sync_standbys:

Number Sync Standbys
^^^^^^^^^^^^^^^^^^^^

This parameter is used by Postgres in the `synchronous_standby_names`__
parameter: ``number_sync_standby`` is the number of synchronous standbys for
whose replies transactions must wait.

__ https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES

This parameter can be set at the *formation* level in pg_auto_failover, meaning
that it applies to the current primary, and "follows" a failover to apply to
any new primary that might replace the current one.

To set this parameter to the value ``<n>``, use the following command::

  pg_autoctl set formation number-sync-standbys <n>

The default value in pg_auto_failover is zero. When set to zero, the
Postgres parameter ``synchronous_standby_names`` can be set to either
``'*'`` or to ``''``:

- ``synchronous_standby_names = '*'`` means that any standby may
  participate in the replication quorum for transactions with
  ``synchronous_commit`` set to ``on`` or higher values.

  pg_autofailover uses ``synchronous_standby_names = '*'`` when there's at
  least one standby that is known to be healthy.

- ``synchronous_standby_names = ''`` (empty string) disables synchrous
  commit and makes all your commits asynchronous, meaning that transaction
  commits will not wait for replication. In other words, a single copy of
  your production data is maintained when ``synchronous_standby_names`` is
  set that way.

  pg_autofailover uses ``synchronous_standby_names = ''`` only when
  number_sync_standbys is set to zero and there's no standby node known
  healthy by the monitor.

In order to set ``number_sync_standbys`` to a non-zero value,
pg_auto_failover requires that at least ``number_sync_standbys + 1`` standby
nodes be registered in the system.

When the first standby node is added to the pg_auto_failover monitor, the
only acceptable value for ``number_sync_standbys`` is zero. When a second
standby is added that participates in the replication quorum, then
``number_sync_standbys`` is automatically set to one.

The command ``pg_autoctl set formation number-sync-standbys`` can be used to
change the value of this parameter in a formation, even when all the nodes
are already running in production. The pg_auto_failover monitor then sets a
transition for the primary to update its local value of
``synchronous_standby_names``.

Replication Quorum
^^^^^^^^^^^^^^^^^^

The replication quorum setting is a boolean and defaults to ``true``, and can
be set per-node. Pg_auto_failover includes a given node in
``synchronous_standby_names`` only when the replication quorum parameter has
been set to true. This means that asynchronous replication will be used for
nodes where ``replication-quorum`` is set to ``false``.

It is possible to force asynchronous replication globally by setting
replication quorum to false on all the nodes in a formation. Remember that
failovers will happen, and thus to set your replication settings on the
current primary node too when needed: it is going to be a standby later.

To set this parameter to either true or false, use one of the following
commands::

  pg_autoctl set node replication-quorum true
  pg_autoctl set node replication-quorum false

.. _candidate_priority:

Candidate Priority
^^^^^^^^^^^^^^^^^^

The candidate priority setting is an integer that can be set to any value
between 0 (zero) and 100 (one hundred). The default value is 50. When the
pg_auto_failover monitor decides to orchestrate a failover, it uses each
node's candidate priority to pick the new primary node.

When setting the candidate priority of a node down to zero, this node will
never be selected to be promoted as the new primary when a failover is
orchestrated by the monitor. The monitor will instead wait until another
node registered is healthy and in a position to be promoted.

To set this parameter to the value ``<n>``, use the following command::

  pg_autoctl set node candidate-priority <n>

When nodes have the same candidate priority, the monitor then picks the
standby with the most advanced LSN position published to the monitor. When
more than one node has published the same LSN position, a random one is
chosen.

When the candidate for failover has not published the most advanced LSN
position in the WAL, pg_auto_failover orchestrates an intermediate step in the
failover mechanism. The candidate fetches the missing WAL bytes from one of the
standby with the most advanced LSN position prior to being promoted. Postgres
allows this operation thanks to cascading replication: any standby can be the
upstream node for another standby.

It is required at all times that at least two nodes have a non-zero candidate
priority in any pg_auto_failover formation. Otherwise no failover is possible.

Auditing replication settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The command ``pg_autoctl get formation settings`` (also known as
``pg_autoctl show settings``) can be used to obtain a summary of all the
replication settings currently in effect in a formation. Still using the
first diagram on this page, we get the following summary::

  $ pg_autoctl get formation settings
    Context |    Name |                   Setting | Value
  ----------+---------+---------------------------+-------------------------------------------------------------
  formation | default |      number_sync_standbys | 1
    primary |  node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
       node |  node_A |        replication quorum | true
       node |  node_B |        replication quorum | true
       node |  node_C |        replication quorum | true
       node |  node_A |        candidate priority | 50
       node |  node_B |        candidate priority | 50
       node |  node_C |        candidate priority | 50

We can see that the ``number_sync_standbys`` has been used to compute the
current value of the `synchronous_standby_names`__ setting on the primary.

__ https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES

Because all the nodes in that example have the same default candidate
priority (50), then pg_auto_failover is using the form ``ANY 1`` with the
list of standby nodes that are currently participating in the replication
quorum.

The entries in the `synchronous_standby_names` list are meant to match the
`application_name` connection setting used in the `primary_conninfo`, and
the format used by pg_auto_failover there is the format string
`"pgautofailover_standby_%d"` where `%d` is replaced by the node id. This
allows keeping the same connection string to the primary when the node name
is changed (using the command ``pg_autoctl set metadata --name``).

Here we can see the node id of each registered Postgres node with the
following command::

  $ pg_autoctl show state
    Name |  Node |      Host:Port |       LSN | Reachable |       Current State |      Assigned State
  -------+-------+----------------+-----------+-----------+---------------------+--------------------
  node_A |     1 | localhost:5001 | 0/7002310 |       yes |             primary |             primary
  node_B |     2 | localhost:5002 | 0/7002310 |       yes |           secondary |           secondary
  node_C |     3 | localhost:5003 | 0/7002310 |       yes |           secondary |           secondary

When setting pg_auto_failover with per formation `number_sync_standby` and
then per node replication quorum and candidate priority replication
settings, those properties are then used to compute the
``synchronous_standby_names`` value on the primary node. This value is
automatically maintained on the primary by pg_auto_failover, and is updated
either when replication settings are changed or when a failover happens.

The other situation when the pg_auto_failover replication settings are used
is a candidate election when a failover happens and there is more than two
nodes registered in a group. Then the node with the highest candidate
priority is selected, as detailed above in the :ref:`candidate_priority`
section.

Sample architectures with three standby nodes
---------------------------------------------

When setting the three parameters above, it's possible to design very
different Postgres architectures for your production needs.

.. figure:: ./tikz/arch-three-standby.svg
   :alt: pg_auto_failover architecture with three standby nodes

   pg_auto_failover architecture with three standby nodes

In this case, the system is set up with three standby nodes all set the same
way, with default parameters. The default parameters support setting
``number_sync_standbys = 2``. This means that Postgres will maintain three
copies of the production data set at all times.

On the other hand, if two standby nodes were to fail at the same time,
despite the fact that two copies of the data are still maintained, the
Postgres service would be degraded to read-only.

With this architecture diagram, here's the summary that we obtain::

  $ pg_autoctl show settings
    Context |    Name |                   Setting | Value
  ----------+---------+---------------------------+---------------------------------------------------------------------------------------
  formation | default |      number_sync_standbys | 2
    primary |  node_A | synchronous_standby_names | 'ANY 2 (pgautofailover_standby_2, pgautofailover_standby_4, pgautofailover_standby_3)'
       node |  node_A |        replication quorum | true
       node |  node_B |        replication quorum | true
       node |  node_C |        replication quorum | true
       node |  node_D |        replication quorum | true
       node |  node_A |        candidate priority | 50
       node |  node_B |        candidate priority | 50
       node |  node_C |        candidate priority | 50
       node |  node_D |        candidate priority | 50

Sample architecture with three standby nodes, one async
-------------------------------------------------------

.. figure:: ./tikz/arch-three-standby-one-async.svg
   :alt: pg_auto_failover architecture with three standby nodes, one async

   pg_auto_failover architecture with three standby nodes, one async

In this case, the system is set up with two standby nodes participating in the
replication quorum, allowing for ``number_sync_standbys = 1``. The system
always maintains at least two copies of the data set, one on the primary,
another on either node B or node C. Whenever we lose one of those nodes, we can
hold to the guarantee of having two copies of the data set.

Additionally, we have the standby server D which has been set up to not
participate in the replication quorum. Node D will not be found in the
``synchronous_standby_names`` list of nodes. Also, node D is set up to
never be a candidate for failover, with ``candidate-priority = 0``.

This architecture would fit a situation with nodes A, B, and C are deployed
in the same data center or availability zone and node D in another one.
Those three nodes are set up to support the main production traffic and
implement high availability of both the Postgres service and the data set.

Node D might be set up for Business Continuity in case the first data center
is lost, or maybe for reporting needs on another application domain.

With this architecture diagram, here's the summary that we obtain::

  pg_autoctl show settings
    Context |    Name |                   Setting | Value
  ----------+---------+---------------------------+-------------------------------------------------------------
  formation | default |      number_sync_standbys | 1
    primary |  node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
       node |  node_A |        replication quorum | true
       node |  node_B |        replication quorum | true
       node |  node_C |        replication quorum | true
       node |  node_D |        replication quorum | false
       node |  node_A |        candidate priority | 50
       node |  node_B |        candidate priority | 50
       node |  node_C |        candidate priority | 50
       node |  node_D |        candidate priority | 0