File: azure-tutorial.rst

package info (click to toggle)
pg-auto-failover 2.0-2
  • links: PTS
  • area: main
  • in suites: bookworm
  • size: 5,156 kB
  • sloc: ansic: 58,245; python: 5,501; sql: 3,171; makefile: 593; sh: 35
file content (489 lines) | stat: -rw-r--r-- 18,008 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
.. _azure_tutorial:

Azure VMs Tutorial
==================

In this guide we’ll create a primary and secondary Postgres node and set
up pg_auto_failover to replicate data between them. We’ll simulate failure in
the primary node and see how the system smoothly switches (fails over)
to the secondary.

For illustration, we'll run our databases on virtual machines in the Azure
platform, but the techniques here are relevant to any cloud provider or
on-premise network. We'll use four virtual machines: a primary database, a
secondary database, a monitor, and an "application." The monitor watches the
other nodes’ health, manages global state, and assigns nodes their roles.

Create virtual network
----------------------

Our database machines need to talk to each other and to the monitor node, so
let's create a virtual network.

.. code-block:: bash

   az group create \
       --name ha-demo \
       --location eastus

   az network vnet create \
       --resource-group ha-demo \
       --name ha-demo-net \
       --address-prefix 10.0.0.0/16

We need to open ports 5432 (Postgres) and 22 (SSH) between the machines, and
also give ourselves access from our remote IP. We'll do this with a network
security group and a subnet.

.. code-block:: bash

   az network nsg create \
       --resource-group ha-demo \
       --name ha-demo-nsg

   az network nsg rule create \
       --resource-group ha-demo \
       --nsg-name ha-demo-nsg \
       --name ha-demo-ssh-and-pg \
       --access allow \
       --protocol Tcp \
       --direction Inbound \
       --priority 100 \
       --source-address-prefixes `curl ifconfig.me` 10.0.1.0/24 \
       --source-port-range "*" \
       --destination-address-prefix "*" \
       --destination-port-ranges 22 5432

   az network vnet subnet create \
       --resource-group ha-demo \
       --vnet-name ha-demo-net \
       --name ha-demo-subnet \
       --address-prefixes 10.0.1.0/24 \
       --network-security-group ha-demo-nsg

Finally add four virtual machines (ha-demo-a, ha-demo-b, ha-demo-monitor, and
ha-demo-app). For speed we background the ``az vm create`` processes and run
them in parallel:

.. code-block:: bash

   # create VMs in parallel
   for node in monitor a b app
   do
   az vm create \
       --resource-group ha-demo \
       --name ha-demo-${node} \
       --vnet-name ha-demo-net \
       --subnet ha-demo-subnet \
       --nsg ha-demo-nsg \
       --public-ip-address ha-demo-${node}-ip \
       --image debian \
       --admin-username ha-admin \
       --generate-ssh-keys &
   done
   wait

To make it easier to SSH into these VMs in future steps, let's make a shell
function to retrieve their IP addresses:

.. code-block:: bash

  # run this in your local shell as well

  vm_ip () {
    az vm list-ip-addresses -g ha-demo -n ha-demo-$1 -o tsv \
      --query '[] [] .virtualMachine.network.publicIpAddresses[0].ipAddress'
  }

  # for convenience with ssh

  for node in monitor a b app
  do
  ssh-keyscan -H `vm_ip $node` >> ~/.ssh/known_hosts
  done

Let's review what we created so far.

.. code-block:: bash

  az resource list --output table --query \
    "[?resourceGroup=='ha-demo'].{ name: name, flavor: kind, resourceType: type, region: location }"

This shows the following resources:

::

    Name                             ResourceType                                           Region
    -------------------------------  -----------------------------------------------------  --------
    ha-demo-a                        Microsoft.Compute/virtualMachines                      eastus
    ha-demo-app                      Microsoft.Compute/virtualMachines                      eastus
    ha-demo-b                        Microsoft.Compute/virtualMachines                      eastus
    ha-demo-monitor                  Microsoft.Compute/virtualMachines                      eastus
    ha-demo-appVMNic                 Microsoft.Network/networkInterfaces                    eastus
    ha-demo-aVMNic                   Microsoft.Network/networkInterfaces                    eastus
    ha-demo-bVMNic                   Microsoft.Network/networkInterfaces                    eastus
    ha-demo-monitorVMNic             Microsoft.Network/networkInterfaces                    eastus
    ha-demo-nsg                      Microsoft.Network/networkSecurityGroups                eastus
    ha-demo-a-ip                     Microsoft.Network/publicIPAddresses                    eastus
    ha-demo-app-ip                   Microsoft.Network/publicIPAddresses                    eastus
    ha-demo-b-ip                     Microsoft.Network/publicIPAddresses                    eastus
    ha-demo-monitor-ip               Microsoft.Network/publicIPAddresses                    eastus
    ha-demo-net                      Microsoft.Network/virtualNetworks                      eastus

Install the "pg_autoctl" executable
-----------------------------------

This guide uses Debian Linux, but similar steps will work on other
distributions. All that differs are the packages and paths. See :ref:`install`.

The pg_auto_failover system is distributed as a single ``pg_autoctl`` binary
with subcommands to initialize and manage a replicated PostgreSQL service.
We’ll install the binary with the operating system package manager on all
nodes. It will help us run and observe PostgreSQL.

.. code-block:: bash

  for node in monitor a b app
  do
  az vm run-command invoke \
     --resource-group ha-demo \
     --name ha-demo-${node} \
     --command-id RunShellScript \
     --scripts \
        "sudo touch /home/ha-admin/.hushlogin" \
        "curl https://install.citusdata.com/community/deb.sh | sudo bash" \
        "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-common" \
        "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" \
        "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-11-auto-failover-1.4" \
        "sudo usermod -a -G postgres ha-admin" &
  done
  wait

.. _tutorial_run_monitor:

Run a monitor
-------------

The pg_auto_failover monitor is the first component to run. It periodically
attempts to contact the other nodes and watches their health. It also
maintains global state that “keepers” on each node consult to determine their
own roles in the system.

.. code-block:: bash

   # on the monitor virtual machine

   ssh -l ha-admin `vm_ip monitor` -- \
     pg_autoctl create monitor \
       --auth trust \
       --ssl-self-signed \
       --pgdata monitor \
       --pgctl /usr/lib/postgresql/11/bin/pg_ctl

This command initializes a PostgreSQL cluster at the location pointed
by the ``--pgdata`` option. When ``--pgdata`` is omitted, ``pg_autoctl``
attempts to use the ``PGDATA`` environment variable. If a PostgreSQL
instance had already existing in the destination directory, this command
would have configured it to serve as a monitor.

``pg_auto_failover``, installs the ``pgautofailover`` Postgres extension, and
grants access to a new ``autoctl_node`` user.

In the Quick Start we use ``--auth trust`` to avoid complex security settings.
The Postgres `trust authentication method`__ is not considered a reasonable
choice for production environments. Consider either using the ``--skip-pg-hba``
option or ``--auth scram-sha-256`` and then setting up passwords yourself.

__ https://www.postgresql.org/docs/current/auth-trust.html_

At this point the monitor is created. Now we'll install it as a service with
systemd so that it will resume if the VM restarts.

.. code-block:: bash

   ssh -T -l ha-admin `vm_ip monitor` << CMD
     pg_autoctl -q show systemd --pgdata ~ha-admin/monitor > pgautofailover.service
     sudo mv pgautofailover.service /etc/systemd/system
     sudo systemctl daemon-reload
     sudo systemctl enable pgautofailover
     sudo systemctl start pgautofailover
   CMD


Bring up the nodes
------------------

We’ll create the primary database using the ``pg_autoctl create`` subcommand.

.. code-block:: bash

   ssh -l ha-admin `vm_ip a` -- \
     pg_autoctl create postgres \
       --pgdata ha \
       --auth trust \
       --ssl-self-signed \
       --username ha-admin \
       --dbname appdb \
       --hostname ha-demo-a.internal.cloudapp.net \
       --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
       --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'

Notice the user and database name in the monitor connection string -- these
are what monitor init created. We also give it the path to pg_ctl so that the
keeper will use the correct version of pg_ctl in future even if other versions
of postgres are installed on the system.

In the example above, the keeper creates a primary database. It chooses to set
up node A as primary because the monitor reports there are no other nodes in
the system yet. This is one example of how the keeper is state-based: it makes
observations and then adjusts its state, in this case from "init" to "single."

Also add a setting to trust connections from our "application" VM:

.. code-block:: bash

   ssh -T -l ha-admin `vm_ip a` << CMD
     echo 'hostssl "appdb" "ha-admin" ha-demo-app.internal.cloudapp.net trust' \
       >> ~ha-admin/ha/pg_hba.conf
   CMD

At this point the monitor and primary node are created and running. Next we
need to run the keeper. It’s an independent process so that it can continue
operating even if the PostgreSQL process goes terminates on the node. We'll
install it as a service with systemd so that it will resume if the VM restarts.

.. code-block:: bash

   ssh -T -l ha-admin `vm_ip a` << CMD
     pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
     sudo mv pgautofailover.service /etc/systemd/system
     sudo systemctl daemon-reload
     sudo systemctl enable pgautofailover
     sudo systemctl start pgautofailover
   CMD

Next connect to node B and do the same process. We'll do both steps at once:

.. code-block:: bash

   ssh -l ha-admin `vm_ip b` -- \
     pg_autoctl create postgres \
       --pgdata ha \
       --auth trust \
       --ssl-self-signed \
       --username ha-admin \
       --dbname appdb \
       --hostname ha-demo-b.internal.cloudapp.net \
       --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
       --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'

   ssh -T -l ha-admin `vm_ip b` << CMD
     pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
     sudo mv pgautofailover.service /etc/systemd/system
     sudo systemctl daemon-reload
     sudo systemctl enable pgautofailover
     sudo systemctl start pgautofailover
   CMD

It discovers from the monitor that a primary exists, and then switches its own
state to be a hot standby and begins streaming WAL contents from the primary.

Node communication
------------------

For convenience, pg_autoctl modifies each node's ``pg_hba.conf`` file to allow
the nodes to connect to one another. For instance, pg_autoctl added the
following lines to node A:

.. code-block:: ini

   # automatically added to node A

   hostssl "appdb" "ha-admin" ha-demo-a.internal.cloudapp.net trust
   hostssl replication "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
   hostssl "appdb" "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust

For ``pg_hba.conf`` on the monitor node pg_autoctl inspects the local network
and makes its best guess about the subnet to allow. In our case it guessed
correctly:

.. code-block:: ini

   # automatically added to the monitor

   hostssl "pg_auto_failover" "autoctl_node" 10.0.1.0/24 trust

If worker nodes have more ad-hoc addresses and are not in the same subnet, it's
better to disable pg_autoctl's automatic modification of pg_hba using the
``--skip-pg-hba`` command line option during creation. You will then need to
edit the hba file by hand. Another reason for manual edits would be to use
special authentication methods.

Watch the replication
---------------------

First let’s verify that the monitor knows about our nodes, and see what
states it has assigned them:

.. code-block:: bash

   ssh -l ha-admin `vm_ip monitor` pg_autoctl show state --pgdata monitor

     Name |  Node |                            Host:Port |       LSN | Reachable |       Current State |      Assigned State
   -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
   node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/3000060 |       yes |             primary |             primary
   node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/3000060 |       yes |           secondary |           secondary


This looks good. We can add data to the primary, and later see it appear in the
secondary. We'll connect to the database from inside our "app" virtual machine,
using a connection string obtained from the monitor.

.. code-block:: bash

   ssh -l ha-admin `vm_ip monitor` pg_autoctl show uri --pgdata monitor

         Type |    Name | Connection String
   -----------+---------+-------------------------------
      monitor | monitor | postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net:5432/pg_auto_failover?sslmode=require
    formation | default | postgres://ha-demo-b.internal.cloudapp.net:5432,ha-demo-a.internal.cloudapp.net:5432/appdb?target_session_attrs=read-write&sslmode=require

Now we'll get the connection string and store it in a local environment
variable:

.. code-block:: bash

   APP_DB_URI=$( \
     ssh -l ha-admin `vm_ip monitor` \
       pg_autoctl show uri --formation default --pgdata monitor \
   )

The connection string contains both our nodes, comma separated, and includes
the url parameter ``?target_session_attrs=read-write`` telling psql that we
want to connect to whichever of these servers supports reads *and* writes.
That will be the primary server.

.. code-block:: bash

   # connect to database via psql on the app vm and
   # create a table with a million rows
   ssh -l ha-admin -t `vm_ip app` -- \
     psql "'$APP_DB_URI'" \
       -c "'CREATE TABLE foo AS SELECT generate_series(1,1000000) bar;'"

Cause a failover
----------------

Now that we've added data to node A, let's switch which is considered
the primary and which the secondary. After the switch we'll connect again
and query the data, this time from node B.

.. code-block:: bash

   # initiate failover to node B
   ssh -l ha-admin -t `vm_ip monitor` \
     pg_autoctl perform switchover --pgdata monitor

Once node B is marked "primary" (or "wait_primary") we can connect and verify
that the data is still present:

.. code-block:: bash

   # connect to database via psql on the app vm
   ssh -l ha-admin -t `vm_ip app` -- \
     psql "'$APP_DB_URI'" \
       -c "'SELECT count(*) FROM foo;'"

It shows

.. code-block:: bash

    count
  ---------
   1000000

Cause a node failure
--------------------

This plot is too boring, time to introduce a problem. We’ll turn off VM for
node B (currently the primary after our previous failover) and watch node A
get promoted.

In one terminal let’s keep an eye on events:

.. code-block:: bash

   ssh -t -l ha-admin `vm_ip monitor` -- \
     watch -n 1 -d pg_autoctl show state --pgdata monitor

In another terminal we’ll turn off the virtual server.

.. code-block:: bash

   az vm stop \
     --resource-group ha-demo \
     --name ha-demo-b

After a number of failed attempts to talk to node B, the monitor determines
the node is unhealthy and puts it into the "demoted" state.  The monitor
promotes node A to be the new primary.

.. code-block:: bash

     Name |  Node |                            Host:Port |       LSN | Reachable |       Current State |      Assigned State
   -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
   node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/6D4E068 |       yes |        wait_primary |        wait_primary
   node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/6D4E000 |       yes |             demoted |          catchingup

Node A cannot be considered in full "primary" state since there is no secondary
present, but it can still serve client requests. It is marked as "wait_primary"
until a secondary appears, to indicate that it's running without a backup.

Let's add some data while B is offline.

.. code-block:: bash

   # notice how $APP_DB_URI continues to work no matter which node
   # is serving as primary
   ssh -l ha-admin -t `vm_ip app` -- \
     psql "'$APP_DB_URI'" \
       -c "'INSERT INTO foo SELECT generate_series(1000001, 2000000);'"

Resurrect node B
----------------

Run this command to bring node B back online:

.. code-block:: bash

   az vm start \
     --resource-group ha-demo \
     --name ha-demo-b

Now the next time the keeper retries its health check, it brings the node back.
Node B goes through the state "catchingup" while it updates its data to match
A. Once that's done, B becomes a secondary, and A is now a full primary again.

.. code-block:: bash

     Name |  Node |                            Host:Port |        LSN | Reachable |       Current State |      Assigned State
   -------+-------+--------------------------------------+------------+-----------+---------------------+--------------------
   node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/12000738 |       yes |             primary |             primary
   node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/12000738 |       yes |           secondary |           secondary

What's more, if we connect directly to the database again, all two million rows
are still present.

.. code-block:: bash

   ssh -l ha-admin -t `vm_ip app` -- \
     psql "'$APP_DB_URI'" \
       -c "'SELECT count(*) FROM foo;'"

It shows

.. code-block:: bash

    count
  ---------
   2000000