1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
|
OVSDB replication implementation
--------------------------------
Overview
========
Given two Open vSwitch databases with the same schema, OVSDB
replication keeps these databases in the same state, i.e. each of the
databases have the same contents at any given time even if they are
not running in the same host. This document elaborates on the
implementation details to provide this functionality.
Terminology
===========
- Source of truth database: database whose content will be replicated
to another database.
- Active server: ovsdb-server providing RPC interface to the source of
truth database.
- Standby server: ovsdb-server providing RPC interface to the database
that is not the source of truth.
Design
======
The overall design of replication consists of one ovsdb-server (active server)
communicating the state of its databases to another ovsdb-server
(standby server) so that the latter keep its own databases in that same state.
To achieve this, the standby server acts as a client of the active
server, in the sense that it sends a monitor request to keep up to date with
the changes in the active server databases. When a notification from the
active server arrives, the standby server executes the necessary set of
operations so its databases reach the same state as the the active server
databases. Below is the design represented as a diagram.
+--------------+ replication +--------------+
| Active |<-------------------| Standby |
| OVSDB-server | | OVSDB-server |
+--------------+ +--------------+
| |
| |
+-------+ +-------+
| SoT | | |
| OVSDB | | OVSDB |
+-------+ +-------+
Setting up the replication
==========================
To initiate the replication process, the standby server must be executed
indicating the location of the active server via the command line option
"--sync-from=server", where server can take any form described in the
ovsdb-client manpage and it must specify an active connection type (tcp, unix,
ssl). This option will cause the standby server to attempt to send a monitor
request to the active server in every main loop iteration, until the active
server responds.
When sending a monitor request the standby server is doing the following:
1. Erase the content of the databases for which it is providing a RPC
interface.
2. Open the jsonrpc channel to communicate with the active server.
3. Fetch all the databases located in the active server.
4. For each database with the same schema in both the active and
standby servers: construct and send a monitor request message
specifying the tables that will be monitored (i.e all the tables on
the database except the ones blacklisted*).
5. Set the standby database to the current state of the active
database.
Once the monitor request message is sent, the standby server will continuously
receive notifications of changes occurring to the tables specified in the
request. The process of handling this notifications is detailed in the next
section.
*A set of tables that will be excluded from replication can be
configure as a blacklist of tables via the command line option
"--sync-exclude-tables=db:table[,db:table]...", where db corresponds
to the database where the table resides.
Replication process
===================
The replication process consists on handling the update notifications received
in the standby server caused by the monitor request that was previously sent to
the active server. In every loop iteration, the standby server attempts to
receive a message from the active server which can be an error, an echo
message (used to keep the connection alive) or an update notification. In case
the message is a fatal error, the standby server will disconnect from the
active without dropping the replicated data. If it is an echo message, the
standby server will reply with an echo message as well. If the message is an
update notification, the following process occurs:
1. Create a new transaction.
2. Get the \<table-updates\> object from the "params" member of the
notification.
3. For each \<table-update\> in the \<table-updates\> object do:
1. For each \<row-update\> in \<table-update\> check what kind of
operation should be executed according to the following criteria about
the presence of the object members:
- If "old" member is not present, execute an insert operation
using \<row\> from the "new" member.
- If "old" member is present and "new" member is not present,
execute a delete operation using \<row\> from the "old"
member
- If both "old" and "new" members are present, execute an
update operation using \<row\> from the "new" member.
4. Commit the transaction.
If an error occurs during the replication process, all replication is
restarted by resending a new monitor request as described in the section
"Setting up the replication".
Runtime management commands
===========================
Runtime management commands can be sent to a running standby server via
ovs-appctl in order to configure the replication functionality. The available
commands are the following.
- ovsdb-server/set-remote-ovsdb-server {server}: sets the name of the active
server.
- ovsdb-server/get-remote-ovsdb-server: gets the name of the active server
- ovsdb-server/connect-remote-ovsdb-server: causes the server to attempt to
send a monitor request every main loop iteration.
- ovsdb-server/disconnect-remote-ovsdb-server: closes the jsonrpc channel
between the active server and frees the memory used for the replication
configuration.
- ovsdb-server/set-sync-exclude-tables {db:table,...}: sets the tables list
that will be excluded from being replicated.
- ovsdb-server/get-sync-excluded-tables: gets the tables list that is
currently excluded from replication.
|