File: README.md

package info (click to toggle)
rabbitmq-server 4.0.5-6
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 37,948 kB
  • sloc: erlang: 257,835; javascript: 22,466; sh: 2,796; makefile: 2,517; python: 1,966; xml: 646; cs: 335; java: 244; ruby: 212; php: 100; perl: 63; awk: 13
file content (207 lines) | stat: -rw-r--r-- 8,407 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# RabbitMQ Sharding Plugin

This plugin introduces the concept of sharded queues for
RabbitMQ. Sharding is performed by exchanges, that is, messages
will be partitioned across "shard" queues by one exchange that we should
define as sharded. The machinery used behind the scenes implies
defining an exchange that will partition, or shard messages across
queues. The partitioning will be done automatically for you, i.e: once
you define an exchange as _sharded_, then the supporting queues will
be automatically created on every cluster node and messages will be sharded across them.

## Project Maturity

This plugin is reasonably mature and known to have production users.

## Overview

The following graphic depicts how the plugin works from the standpoint
of a publisher and a consumer:

![Sharding Overview](https://raw.githubusercontent.com/rabbitmq/rabbitmq-sharding/master/docs/sharded_queues.png)

On the picture above the producer publishes a series of
messages, those messages get partitioned to different queues, and then
our consumer get messages from one of those queues. Therefore if there is
a partition with 3 queues, it is assumed that there are at least 3
consumers to get all the messages from those queues.

Queues in RabbitMQ are [units of concurrency](https://www.rabbitmq.com/queues.html#runtime-characteristics)
(and, if there are enough cores available, parallelism). This plugin makes
it possible to have a single logical queue that is partitioned into
multiple regular queues ("shards"). This trades off total ordering
on the logical queue for gains in parallelism.

Message distribution between shards (partitioning) is achieved
with a custom exchange type that distributes messages by applying
a hashing function to the routing key.

## Sharding and Queue Replication

Sharding performed by this plugin makes sense for **non-replicated classic queues** only.

Combining sharding with a replicated queue type, e.g. [quorum queues]() or
(**deprecated**) mirrored classic queues will lose most or all of the benefits offered
by this plugin.

Do not use this plugin with quorum queues. Avoid classic mirrored queues in general.

## Messages Distribution Between Shards (Partitioning)

The exchanges that ship by default with RabbitMQ work in an "all or
nothing" fashion, i.e: if a routing key matches a set of queues bound
to the exchange, then RabbitMQ will route the message to all the
queues in that set. For this plugin to work it is necessary to
route messages to an exchange that would partition messages, so they
are routed to _at most_ one queue (a subset).

The plugin provides a new exchange type, `"x-modulus-hash"`, that will use
a hashing function to partition messages routed to a logical queue
across a number of regular queues (shards).

The `"x-modulus-hash"` exchange will hash the routing key used to
publish the message and then it will apply a `Hash mod N` to pick the
queue where to route the message, where N is the number of queues
bound to the exchange. **This exchange will completely ignore the
binding key used to bind the queue to the exchange**.

There are other exchanges with similar behaviour:
the _Consistent Hash Exchange_ or the _Random Exchange_.
Those were designed with regular queues in mind, not this plugin, so `"x-modulus-hash"`
is highly recommended.

If message partitioning is the only feature necessary and the automatic scaling
of the number of shards (covered below) is not needed or desired, consider using
[Consistent Hash Exchange](https://github.com/rabbitmq/rabbitmq-consistent-hash-exchange)
instead of this plugin.


## Auto-scaling When Nodes are Added

One of the main properties of this plugin is that when a new node
is added to the RabbitMQ cluster, then the plugin will automatically create
more shards on the new node. Say there is a shard with 4 queues on
`node a` and `node b` just joined the cluster. The plugin will
automatically create 4 queues on `node b` and "join" them to the shard
partition. Already delivered messages _will not_ be rebalanced but
newly arriving messages will be partitioned to the new queues.


## Consuming From a Sharded [Pseudo-]Queue ##

While the plugin creates a bunch of "shard" queues behind the scenes, the idea
is that those queues act like a big logical queue where you consume
messages from it. Total ordering of messages between shards is not defined.

An example should illustrate this better: let's say you declared the
exchange _images_ to be a sharded exchange. Then RabbitMQ creates
several "shard" queues behind the scenes:

 * _shard: - nodename images 1_
 * _shard: - nodename images 2_
 * _shard: - nodename images 3_
 * _shard: - nodename images 4_.

To consume from a sharded queue, register a consumer on the `"images"` pseudo-queue
using the `basic.consume` method. RabbitMQ will attach the consumer to a shard
behind the scenes. Note that **consumers must not declare a queue with the same
name as the sharded pseudo-queue prior to consuming**.

TL;DR: if you have a shard called _images_, then you can directly
consume from a queue called _images_.

How does it work? The plugin will chose the queue from the shard with
the _least amount of consumers_, provided the queue contents are local
to the broker you are connected to.

**NOTE: there's a small race condition between RabbitMQ updating the
queue's internal stats about consumers and when clients issue
`basic.consume` commands.** The problem with this is that if your
client issue many `basic.consume` commands without too much time in
between, it might happen that the plugin assigns the consumers to
queues in an uneven way.


## Load Distribution and Consumer Balancing

Shard queues declaration by this plugin will ignore queue master locator policy, if any.


## How Evenly Will Messages Be Distributed?

As with many data distribution approaches based on a hashing function,
even distribution between shards depends on the distribution (variability) of inputs,
that is, routing keys. In other words the larger the set of routing keys is,
the more even will message distribution between shareds be. If all messages had
the same routing key, they would all end up on the same shard.


## Installation

This plugin ships with modern versions of RabbitMQ.
Like all plugins, it [must be enabled](https://www.rabbitmq.com/plugins.html) before it can be used:

``` bash
# this might require sudo
rabbitmq-plugins enable rabbitmq_sharding
```


## Usage

Once the plugin is installed you can define an exchange as sharded by
setting up a policy that matches the exchange name. For example if we
have the exchange called `shard.images`, we could define the following
policy to shard it:

```bash
$CTL set_policy images-shard "^shard.images$" '{"shards-per-node": 2, "routing-key": "1234"}'
```

Note, on Windows, above command will be like this:
```
rabbitmqctl set_policy images-shard "^shard.images$" '{\"shards-per-node\": 2, \"routing-key\": \"1234\"}'
```

This will create `2` sharded queues per node in the cluster, and will
bind those queues using the `"1234"` routing key.

### About the routing-key policy definition ###

In the example above we use the routing key `1234` when defining the
policy. This means that the underlying exchanges used for sharding
will bind the sharded queues to the exchange using the `1234` routing
key specified above. This means that for a direct exchange, _only
messages that are published with the routing key `1234` will be routed
to the sharded queues. If you decide to use a fanout exchange for
sharding, then the `1234` routing key, while used during binding, will
be ignored by the exchange. If you use the `"x-modulus-hash"`
exchange, then the routing key will be ignored as well. So depending
on the exchange you use, will be the effect the `routing-key` policy
definition has while routing messages.

The `routing-key` policy definition is optional.


## Building from Source

Get the RabbitMQ Public Umbrella ready as explained in the
[RabbitMQ Plugin Development Guide](https://www.rabbitmq.com/plugin-development.html).

Move to the umbrella folder an then run the following commands, to
fetch dependencies:

```bash
make up
cd deps/rabbitmq-sharding
make dist
```

## LICENSE ##

See the LICENSE file.

## Extra information ##

Some information about how the plugin affects message ordering and
some other details can be found in the file README.extra.md