File: faq.rst

package info (click to toggle)
pg-auto-failover 2.0-2
  • links: PTS
  • area: main
  • in suites: bookworm
  • size: 5,156 kB
  • sloc: ansic: 58,245; python: 5,501; sql: 3,171; makefile: 593; sh: 35
file content (215 lines) | stat: -rw-r--r-- 9,769 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
Frequently Asked Questions
==========================

Those questions have been asked in `GitHub issues`__ for the project by
several people. If you have more questions, feel free to open a new issue,
and your question and its answer might make it to this FAQ.

__ https://github.com/citusdata/pg_auto_failover/issues_

I stopped the primary and no failover is happening for 20s to 30s, why?
-----------------------------------------------------------------------

In order to avoid spurious failovers when the network connectivity is not
stable, pg_auto_failover implements a timeout of 20s before acting on a node
that is known unavailable. This needs to be added to the delay between
health checks and the retry policy.

See the :ref:`configuration` part for more information about how to setup
the different delays and timeouts that are involved in the decision making.

See also :ref:`pg_autoctl_watch` to have a dashboard that helps
understanding the system and what's going on in the moment.

The secondary is blocked in the CATCHING_UP state, what should I do?
--------------------------------------------------------------------

In the pg_auto_failover design, the following two things are needed for the
monitor to be able to orchestrate nodes integration completely:

 1. Health Checks must be successful

    The monitor runs periodic health checks with all the nodes registered
    in the system. Those *health checks* are Postgres connections from the
    monitor to the registered Postgres nodes, and use the ``hostname`` and
    ``port`` as registered.

    The ``pg_autoctl show state`` commands column *Reachable* contains
    "yes" when the monitor could connect to a specific node, "no" when this
    connection failed, and "unknown" when no connection has been attempted
    yet, since the last startup time of the monitor.

    The *Reachable* column from ``pg_autoctl show state`` command output
    must show a "yes" entry before a new standby node can be orchestrated
    up to the "secondary" goal state.

 2. pg_autoctl service must be running

    The pg_auto_failover monitor works by assigning goal states to
    individual Postgres nodes. The monitor will not assign a new goal state
    until the current one has been reached.

    To implement a transition from the current state to the goal state
    assigned by the monitor, the pg_autoctl service must be running on
    every node.

When your new standby node stays in the "catchingup" state for a long time,
please check that the node is reachable from the monitor given its
``hostname`` and ``port`` known on the monitor, and check that the
``pg_autoctl run`` command is running for this node.

When things are not obvious, the next step is to go read the logs. Both the
output of the ``pg_autoctl`` command and the Postgres logs are relevant. See
the :ref:`logs` question for details.

.. _logs:

Should I read the logs? Where are the logs?
-------------------------------------------

Yes. If anything seems strange to you, please do read the logs.

As maintainers of the ``pg_autoctl`` tool, we can't foresee everything that
may happen to your production environment. Still, a lot of efforts is spent
on having a meaningful output. So when you're in a situation that's hard to
understand, please make sure to read the ``pg_autoctl`` logs and the
Postgres logs.

When using systemd integration, the ``pg_autoctl`` logs are then handled
entirely by the journal facility of systemd. Please then refer to
``journalctl`` for viewing the logs.

The Postgres logs are to be found in the ``$PGDATA/log`` directory with the
default configuration deployed by ``pg_autoctl create ...``. When a custom
Postgres setup is used, please refer to your actual setup to find Postgres
logs.

The state of the system is blocked, what should I do?
-----------------------------------------------------

This question is a general case situation that is similar in nature to the
previous situation, reached when adding a new standby to a group of Postgres
nodes. Please check the same two elements: the monitor health checks are
successful, and the ``pg_autoctl run`` command is running.

When things are not obvious, the next step is to go read the logs. Both the
output of the ``pg_autoctl`` command and the Postgres logs are relevant. See
the :ref:`logs` question for details.

Impossible / unresolveable state after crash - How to recover?
--------------------------------------------------------------

The pg_auto_failover :ref:`failover_state_machine` is great to simplify node
management and orchestrate multi-nodes operations such as a switchover or a
failover. That said, it might happen that the FSM is unable to proceed in
some cases, usually after a hard crash of some components of the system, and
mostly due to bugs.

Even if we have an extensive test suite to prevent such bugs from happening,
you might have to deal with a situation that the monitor doesn't know how to
solve.

The FSM has been designed with a last resort operation mode. It is always
possible to unregister a node from the monitor with the
:ref:`pg_autoctl_drop_node` command. This helps the FSM getting back to a
simpler situation, the simplest possible one being when only one node is
left registered in a given formation and group (state is then SINGLE).

When the monitor is back on its feet again, then you may add your nodes
again with the :ref:`pg_autoctl_create_postgres` command. The command
understands that a Postgres service is running and will recover from where
you left.

In some cases you might have to also delete the local pg_autoctl state file,
error messages will instruct you about the situation.

The monitor is a SPOF in pg_auto_failover design, how should we handle that?
----------------------------------------------------------------------------

When using pg_auto_failover, the monitor is needed to make decisions and
orchestrate changes in all the registered Postgres groups. Decisions are
transmitted to the Postgres nodes by the monitor assigning nodes a goal
state which is different from their current state.

Consequences of the monitor being unavailable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Nodes contact the monitor each second and call the ``node_active`` stored
procedure, which returns a goal state that is possibly different from the
current state.

The monitor only assigns Postgres nodes with a new goal state when a cluster
wide operation is needed. In practice, only the following operations require
the monitor to assign a new goal state to a Postgres node:

 - a new node is registered
 - a failover needs to happen, either triggered automatically or manually
 - a node is being put to maintenance
 - a node replication setting is being changed.

When the monitor node is not available, the ``pg_autoctl`` processes on the
Postgres nodes will fail to contact the monitor every second, and log about
this failure. Adding to that, no orchestration is possible.

The Postgres streaming replication does not need the monitor to be available
in order to deliver its service guarantees to your application, so your
Postgres service is still available when the monitor is not available.

To repair your installation after having lost a monitor, the following
scenarios are to be considered.

The monitor node can be brought up again without data having been lost
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is typically the case in Cloud Native environments such as Kubernetes,
where you could have a service migrated to another pod and re-attached to
its disk volume. This scenario is well supported by pg_auto_failover, and no
intervention is needed.

It is also possible to use synchronous archiving with the monitor so that
it's possible to recover from the current archives and continue operating
without intervention on the Postgres nodes, except for updating their monitor URI. This requires an archiving setup
that uses synchronous replication so that any transaction committed on the
monitor is known to have been replicated in your WAL archive.

At the moment, you have to take care of that setup yourself. Here's a quick
summary of what needs to be done:

  1. Schedule base backups

     Use ``pg_basebackup`` every once in a while to have a full copy of the
     monitor Postgres database available.

  2. Archive WAL files in a synchronous fashion

     Use ``pg_receivewal --sync ...`` as a service to keep a WAL archive in
     sync with the monitor Postgres instance at all time.

  3. Prepare a recovery tool on top of your archiving strategy

     Write a utility that knows how to create a new monitor node from your
     most recent pg_basebackup copy and the WAL files copy.

     Bonus points if that tool/script is tested at least once a day, so that
     you avoid surprises on the unfortunate day that you actually need to
     use it in production.

A future version of pg_auto_failover will include this facility, but the
current versions don't.

The monitor node can only be built from scratch again
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you don't have synchronous archiving for the monitor set-up, then you
might not be able to restore a monitor database with the expected up-to-date
node metadata. Specifically we need the nodes state to be in sync with what
each ``pg_autoctl`` process has received the last time they could contact
the monitor, before it has been unavailable.

It is possible to register nodes that are currently running to a new monitor
without restarting Postgres on the primary. For that, the procedure
mentioned in :ref:`replacing_monitor_online` must be followed, using the
following commands::

  $ pg_autoctl disable monitor
  $ pg_autoctl enable monitor