File: how-to.rst

package info (click to toggle)
pg-auto-failover 2.0-2
  • links: PTS
  • area: main
  • in suites: bookworm
  • size: 5,156 kB
  • sloc: ansic: 58,245; python: 5,501; sql: 3,171; makefile: 593; sh: 35
file content (238 lines) | stat: -rw-r--r-- 9,717 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
.. _how-to:

Main pg_autoctl commands
========================

pg_auto_failover includes the command line tool ``pg_autoctl`` that
implements many commands to manage your Postgres nodes. To implement the
Postgres architectures described in this documentation, and more, it is
generally possible to use only some of the many ``pg_autoctl`` commands.

This section of the documentation is a short introduction to the main
commands that are useful when getting started with pg_auto_failover. More
commands are available and help deal with a variety of situations, see the
:ref:`manual` for the whole list.

To understand which replication settings to use in your case, see
:ref:`architecture_basics` section and then the
:ref:`multi_node_architecture` section.

To follow a step by step guide that you can reproduce on your own Azure
subscription and create a production Postgres setup from VMs, see the
:ref:`tutorial` section.

To understand how to setup pg_auto_failover in a way that is compliant with
your internal security guide lines, read the :ref:`security` section.

Command line environment, configuration files, etc
--------------------------------------------------

As a command line tool ``pg_autoctl`` depends on some environment variables.
Mostly, the tool re-uses the Postgres environment variables that you might
already know.

To manage a Postgres node pg_auto_failover needs to know its data directory
location on-disk. For that, some users will find it easier to export the
``PGDATA`` variable in their environment. The alternative consists of always
using the ``--pgdata`` option that is available to all the ``pg_autoctl``
commands.

Creating Postgres Nodes
-----------------------

To get started with the simplest Postgres failover setup, 3 nodes are
needed: the pg_auto_failover monitor, and 2 Postgres nodes that will get
assigned roles by the monitor. One Postgres node will be assigned the
primary role, the other one will get assigned the secondary role.

To create the monitor use the command::

  $ pg_autoctl create monitor

The create the Postgres nodes use the following command on each node you
want to create::

  $ pg_autoctl create postgres

While those *create* commands initialize your nodes, now you have to
actually run the Postgres service that are expected to be running. For that
you can manually run the following command on every node::

  $ pg_autoctl run

It is also possible (and recommended) to integrate the pg_auto_failover
service in your usual service management facility. When using **systemd**
the following commands can be used to produce the unit file configuration
required::

  $ pg_autoctl show systemd
  INFO  HINT: to complete a systemd integration, run the following commands:
  INFO  pg_autoctl -q show systemd --pgdata "/tmp/pgaf/m" | sudo tee /etc/systemd/system/pgautofailover.service
  INFO  sudo systemctl daemon-reload
  INFO  sudo systemctl enable pgautofailover
  INFO  sudo systemctl start pgautofailover
  [Unit]
  ...

While it is expected that for a production deployment each node actually is
a separate machine (virtual or physical, or even a container), it is also
possible to run several Postgres nodes all on the same machine for testing
or development purposes.

.. tip::

   When running several ``pg_autoctl`` nodes on the same machine for testing
   or contributing to pg_auto_failover, each Postgres instance needs to run
   on its own port, and with its own data directory. It can make things
   easier to then set the environment variables ``PGDATA`` and ``PGPORT``
   in each terminal, shell, or tab where each instance is started.

Inspecting nodes
----------------

Once your Postgres nodes have been created, and once each ``pg_autoctl``
service is running, it is possible to inspect the current state of the
formation with the following command::

  $ pg_autoctl show state

The ``pg_autoctl show state`` commands outputs the current state of the
system only once. Sometimes it would be nice to have an auto-updated display
such as provided by common tools such as `watch(1)` or `top(1)` and the
like. For that, the following commands are available (see also
:ref:`pg_autoctl_watch`)::

  $ pg_autoctl watch
  $ pg_autoctl show state --watch

To analyze what's been happening to get to the current state, it is possible
to review the past events generated by the pg_auto_failover monitor with the
following command::

  $ pg_autoctl show events

.. hint::

   The ``pg_autoctl show`` commands can be run from any node in your system.
   Those command need to connect to the monitor and print the current state
   or the current known list of events as per the monitor view of the system.

   Use ``pg_autoctl show state --local`` to have a view of the local state
   of a given node without connecting to the monitor Postgres instance.

   The option ``--json`` is available in most ``pg_autoctl`` commands and
   switches the output format from a human readable table form to a program
   friendly JSON pretty-printed output.

Inspecting and Editing Replication Settings
-------------------------------------------

When creating a node it is possible to use the ``--candidate-priority`` and
the ``--replication-quorum`` options to set the replication properties as
required by your choice of Postgres architecture.

To review the current replication settings of a formation, use one of the
two following commands, which are convenient aliases (the same command with
two ways to invoke it)::

  $ pg_autoctl show settings
  $ pg_autoctl get formation settings

It is also possible to edit those replication settings at any time while
your nodes are in production: you can change your mind or adjust to new
elements without having to re-deploy everything. Just use the following
commands to adjust the replication settings on the fly::

  $ pg_autoctl set formation number-sync-standbys
  $ pg_autoctl set node replication-quorum
  $ pg_autoctl set node candidate-priority

.. important::

   The ``pg_autoctl get`` and ``pg_autoctl set`` commands always connect to
   the monitor Postgres instance.

   The ``pg_autoctl set`` command then changes the replication settings on
   the node registration on the monitor. Then the monitor assigns the
   APPLY_SETTINGS state to the current primary node in the system for it to
   apply the new replication settings to its Postgres streaming replication
   setup.

   As a result, the ``pg_autoctl set`` commands requires a stable state in
   the system to be allowed to proceed. Namely, the current primary node in
   the system must have both its Current State and its Assigned State set to
   primary, as per the ``pg_autoctl show state`` output.

Implementing Maintenance Operations
-----------------------------------

When a Postgres node must be taken offline for a maintenance operation, such
as e.g. a kernel security upgrade or a minor Postgres update, it is best to
make it so that the pg_auto_failover monitor knows about it.

 - For one thing, a node that is known to be in maintenance does not
   participate in failovers. If you are running with two Postgres nodes,
   then failover operations are entirely prevented while the standby node is
   in maintenance.

 - Moreover, depending on your replication settings, enabling maintenance on
   your standby ensures that the primary node switches to async replication
   before Postgres is shut down on the secondary, avoiding write queries to
   be blocked.

To implement maintenance operations, use the following commands::

  $ pg_autoctl enable maintenance
  $ pg_autoctl disable maintenance

The main ``pg_autoctl run`` service that is expected to be running in the
background should continue to run during the whole maintenance operation.
When a node is in the maintenance state, the ``pg_autoctl`` service is not
controlling the Postgres service anymore.

Note that it is possible to enable maintenance on a primary Postgres node,
and that operation then requires a failover to happen first. It is possible
to have pg_auto_failover orchestrate that for you when using the command::

  $ pg_autoctl enable maintenance --allow-failover

.. important::

   The ``pg_autoctl enable`` and ``pg_autoctl disable`` commands requires a
   stable state in the system to be allowed to proceed. Namely, the current
   primary node in the system must have both its Current State and its
   Assigned State set to primary, as per the ``pg_autoctl show state``
   output.

Manual failover, switchover, and promotions
-------------------------------------------

In the cases when a failover is needed without having an actual node
failure, the pg_auto_failover monitor can be used to orchestrate the
operation. Use one of the following commands, which are synonyms in the
pg_auto_failover design::

  $ pg_autoctl perform failover
  $ pg_autoctl perform switchover

Finally, it is also possible to “elect” a new primary node in your formation
with the command::

  $ pg_autoctl perform promotion

.. important::

   The ``pg_autoctl perform`` commands requires a stable state in the system
   to be allowed to proceed. Namely, the current primary node in the system
   must have both its Current State and its Assigned State set to primary,
   as per the ``pg_autoctl show state`` output.

What's next?
------------

This section of the documentation is meant to help users get started by
focusing on the main commands of the ``pg_autoctl`` tool. Each command has
many options that can have very small impact, or pretty big impact in terms
of security or architecture. Read the rest of the manual to understand how
to best use the many ``pg_autoctl`` options to implement your specific
Postgres production architecture.