1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
|
# Storage for Praefect's replication queue
## Rationale
Praefect is the traffic router and replication manager for Gitaly Cluster.
Praefect is currently (November 2019) under development and far from
being a minimum viable HA solution. We are at a point where we think we
need to add database to Praefect's architecture.
The router part of Praefect detects Gitaly calls that modify
repositories, and submits jobs to a job queue indicating that the
repository that got modified needs to have its replicas updated. The
replication manager part consumes the job queue. Currently, this queue
is implemented in-memory in the Praefect process.
While useful for prototyping, this is unsuitable for real HA Gitaly for
two reasons:
1. The job queue must be **persistent**. Currently, the queue is
emptied if a Praefect process restarts. This can lead to data loss
in case we fail over away from a repository that is ahead of its
replicas.
1. The job queue must be **shared**. We expect multiple Praefect
processes to be serving up the same Gitaly storage cluster. This is
so that Praefect itself is not a single point of failure. These
Praefect processes must all see and use the same job queue.
## Does it have to be a queue?
We don't strictly need a queue. We need a shared, persistent database
that allows the router to mark a repository as being in need of
replication, and that allows the replication manager to query for
repositories that need to be replicated -- and to clear them as "needing
replication" afterwards. A queue is just a way of modeling this
communication pattern.
## Does the queue database need to have special properties?
Different types of databases make different trade-offs in their semantics
and reliability. For our purposes, the most important thing is that
**messages get delivered at least once**. Delivering more than once is
wasteful but otherwise harmless: this is because we are doing idempotent
Git fetches.
If a message gets lost, that can lead to data loss.
## What sort of throughput do we expect?
Currently (November 2019), `gitlab.com` has about 5000 Gitaly calls per
second. About 300 of those [are labeled as
"mutators"](https://prometheus.gprd.gitlab.net/graph?g0.range_input=7d&g0.expr=sum(rate(gitaly_cacheinvalidator_optype_total%5B5m%5D))%20by%20(type)&g0.tab=0),
which suggests that today we'd see about 300 replication jobs per
second. Each job may need multiple writes as it progresses through
different states; say 5 state changes. That makes 1500 writes per
second.
Note that we have room to maneuver with sharding. Contrary to the SQL
database of GitLab itself, which is more or less monolithic across all
projects, there is no functional requirement to co-locate any two
repositories on the same Gitaly server, nor on the same Praefect
cluster. So if you have 1 million repos, you could make 1 million
Praefect clusters, with 1 million queue database instances (one behind
each Praefect cluster). Each queue database would then see a very, very
low job insertion rate.
This scenario is unpractical from an operational standpoint, but
functionally, it would be OK. In other words, we have horizontal leeway
to avoid vertically scaling the queue database. There will of course be
practical limits on how many instances of the queue database we can run.
Especially because the queue database must be highly available.
## The queue database must be highly available
If the queue database is unavailable, Praefect should be forced into a
read-only mode. This is undesirable, so I think we can say we want the
queue database to be highly available itself.
## Running the queue database should be operationally feasible
As always at GitLab, we want to choose solutions that are suitable for
self-managed GitLab installations.
- Should be open source
- Don't pick an open core solution, and rely on features that are not
in the core
- Don't assume that "the cloud" makes problems go away; assume there
is no cloud
- Running the queue database should require as little expertise as
possible, or it should be a commodity component
## Do we have other database needs in Praefect?
This takes us into YAGNI territory but it's worth considering.
Praefect serves as a front end for a cluster of Gitaly servers (the
"internal Gitaly nodes") that store the actual repository data. We will
need some form of consensus over which internal Gitaly nodes are good
(available) or bad (offline). This is not a YAGNI, we will need this.
Like the queue this would be shared state. The most natural fit for
this, within GitLab's current architecture, would be Consul. But Consul
is not a good fit for storing the queue.
We might want Praefect to have a catalogue of all repositories it is
storing. With Gitaly, there is no such catalogue; the filesystem is the
single source of truth. This strikes me as a YAGNI though. Even with
Praefect, there will be filesystems "in the back" on the internal Gitaly
nodes, and those could serve as the source of truth.
## What are our options
### Redis
Pro:
- Already used in GitLab
- Has queue primitives
Con:
- Deployed with snapshot persistence (RDB dump) in GitLab, which is
not the durability I think we want
### Postgres
Pro:
- Already used in GitLab
- Gold standard for persistence
- General purpose database: likely to be able to grow with us as we
develop other needs
Con:
- Can be used for queues, but not meant for it
- Need to find queueing library, or develop SQL-backed queue ourselves
(hard, subtle)
- Because not meant to be a queue, may have a lower ceiling where we
are forced to scale horizontally. When we hit the ceiling we would
have to run multiple Praefect clusters each with their own HA
Postgres cluster behind it)
### Kafka
Pro:
- Closely matches description of "durable queue"
Con:
- Would be new to GitLab: no development experience nor operational
experience
### SQLite or BoltDB
Embedded databases such as SQLite or BoltDB don't meet our requirements
because we need shared access. Being embedded implies you don't have to
go over a network, while going over a network is an essential feature
for us: this enables us to have multiple machines running Praefect.
### Consul
Consul is something that GitLab already relies on. You could consider it
a database although it is not presented as that by it authors. The
advertised use cases are service discovery and having service mesh.
Consul does contain a key-value store you can use to store values
smaller than 512KB in. But the [documentation
states](https://www.consul.io/docs/install/performance.html#memory-requirements):
> NOTE: Consul is not designed to serve as a general purpose database,
> and you should keep this in mind when choosing what data are populated
> to the key/value store.
## Conclusion
I am strongly leaning towards Postgres because it seems like a safe,
boring choice. It has strong persistence and it is generic, which is
useful because we don't know what our needs are yet.
Running your own HA Postgres is challenging but it's a challenge you
need to take on anyway when you deploy HA GitLab.
|