1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
|
# Coordinate disaster recovery with Secret Manager
## Background
This document assumes you are already using the following strategy for
detecting and triggering failovers:
1. Using an independent service to detect when the primary is down
2. Trigger a promotion of an existing read replica to become a primary
3. Update a Secret Manager secret with the name of the current primary
## Restart Auth proxy when secret changes
This option uses a wrapper script around the Cloud SQL Auth proxy to detect
when the secret has changed, and restart the proxy with the new value. This
could be done in many languages, but here’s an example using bash:
> [failover.sh](examples/disaster-recovery/failover.sh)
```sh
#!/bin/bash
SECRET_ID="my-secret-id" # TODO(developer): replace this value
REFRESH_INTERVAL=5
PORT=5432 # TODO(developer): change this port as needed
# Get the latest version of the secret and start the proxy
INSTANCE=$(gcloud secrets versions access "latest" --secret="$SECRET_ID")
cloud_sql_proxy -instances="$INSTANCE"=tcp:"$PORT" &
PID=$!
# Every 5s, get the latest version of the secret. If it's changed, restart the
# proxy with the new value.
while true; do
sleep $REFRESH_INTERVAL
NEW=$(gcloud secrets versions access "latest" --secret="$SECRET_ID")
if [ "$INSTANCE" != "$NEW" ]; then
INSTANCE=$NEW
kill $PID
wait $PID
cloud_sql_proxy -instances="$INSTANCE"=tcp:"$PORT" &
PID=$!
fi
done
```
## Benefits of this approach
Using this approach will help assist with failovers without needing to
reconfigure your application. Instead, by changing the proxy the application
will always connect to 127.0.0.1 and won’t need to restart to apply
configuration changes. Additionally, it will prevent split brain syndrome by
ensuring that your application can only connect to the current “primary”.
|