File: NOTES.selection-roundrobin

package info (click to toggle)
proftpd-mod-proxy 0.9.2-1%2Bdeb12u1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 4,972 kB
  • sloc: perl: 43,469; ansic: 43,171; sh: 3,479; makefile: 247
file content (229 lines) | stat: -rw-r--r-- 7,075 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

Issues for ProxyBackendSelection 'roundRobin':

  1.  Selection strategy is per-vhost.

  2.  Selection lists are per-vhost.

     This means that a 'core.fork' event, before the vhost is known,
     would not work easily.  We don't just want to increment some
     counter/index for all vhosts, as that would not actually be
     the expected "round robin" behavior.

To implement round robin, we need:

  1.  An ordered list of backend servers, whose order is preferably stable.

     How would healthchecks, with servers moving into and out of the "live"
     list, affect this?

     What happens if the "previous selection" or "next selection" (whichever
     we persist) is not available in the "live" list for the next connection?

  2.  Knowledge of what the "next" server should be (or, conversely,
      what the previous server was) 

  3.  Persistence of #2 in some storage accessible across connections to
      that vhost (i.e. vhost-specific storage).

     Possibilities: SysV shm, file, SQL, memcache, external process.
     What about an mmap'd file?  Still needs locking (with retries?)

     Any of these would require a postparse (startup?) event listener, to see
     if any vhost has RoundRobin selection configured, to create/prep the
     shared storage area.

What if there are THREE backend server lists:

  configured:	conf/
  live		live/
  dead		dead/

The "configured" list would be static, wouldn't change, would have stable
ordering.  That could then be the reference server/index for round robin.

  proxy_select_backends:
    vhost_sid INTEGER,
    name TEXT, // e.g. "server1.example.com"
    backend_id INTEGER, // used for ordering
    healthy BOOLEAN // used for healthchecks
    nconns INTEGER  // used for least conns

  SELECT name FROM proxy_backends WHERE vhost_sid = {...} ORDER BY position ASC;

  proxy_select_roundrobin:
    vhost_sid INTEGER,
    current_backend_id INTEGER (FK into proxy_select_backends.backend_id?)

  proxy_select_shuffle:
    insert rows for used backend IDs; delete them all on _reset()
    (OR insert rows for UNUSED backend IDs; delete as used)

  a selection policy object, with callbacks into the database

Basic data structure:

  vhost1
    index (into 'configured' list) of current/selected backend server
    max index value (i.e. length of 'configured' list minus 1)

   ...
  vhostN

Could use SID to identify vhost.

  unsigned int idx;

  int proxy_roundrobin_get_index(main_server->sid, &idx);
  /* Get backend for index */
  idx++;
  if (idx == max_idx) {
    idx = 0;
  }
  int proxy_roundrobin_set_index(main_server->sid, idx);

OR:

  unsigned int idx;

  int proxy_roundrobin_incr_index(main_server->sid, &idx);
    This would "atomically" return the current index, and
    increment (with wraparound) the index for the next call.
    Callers, then, don't need to know about the max_idx.

  With this arrangement, an on-disk mmap'd file would have range
  locking, and a "row" would be:

    uint32_t sid
    uint32_t backend_server_count
    uint32_t backend_server_idx

  Alternatively, the entire selection database could be a single JSON file;
  the "locking" would be done on the basis of the entire file.

  OR, alternatively, the selection database could be a SQL database (e.g.
  SQLite), a la mod_auth_otp.

  Store these databases in a policy-specific, vhost-specific file (to reduce
  contention)?  This would make it easy for each policy to have its own
  format, as needed.

    CONF_ROOT|CONF_VIRTUAL, NO <Global>.
    Leaning toward SQLite.
      SELECT ...
      INSERT sid, ...
      UPDATE ...

       Per Policy!  Hrm.  Maybe have mod_proxy create its own tables, etc.
       (just configure a path).  That'd work nicely.  On startup, delete
       table if it exists (warn about this!), create needed schema.  Much
       less fuss for the admin.

      Make lib/db/sqlite.c file, use sqlite3 directly (NOT via
      mod_sql+mod_sql_sqlite). 

  This would be the "select.dat" file/table, the basis for backend
  selection.

  For health checks:

    uint32_t sid;
    uint32_t backend_server_count
    uint32_t live_servers[backend_server_count]
    uint32_t dead_servers[backend_server_count]

    ...

  Note: Use big-endian values for all numeric values on disk, as if writing
    to the network, for file "reuse" on other servers?

  Note: Would be nice, given the above format, to NOT have to scan the
  entire file to find the vhost in question, given the variable length
  of the server lists per vhost.

  Could deal with that by having a header, which would be the offset of
  that SID into the file.  I.e.:

    off_t sid_offs[vhost_count];

      sid 1: off = 0
      sid 2: off = 42

    and remember that vhost SIDs start at 1!

      off_t sid_off = sid_offs[main_server->sid-1];
      lseek(fd, sid_off, SEEK_SET);
      readv(...)
      readv(...)

  Note: have a version uint32_t as first value in value, for indicating file
    format version.

  Note: have ftpdctl proxy action for dumping out (as JSON) the contents of
   the select.dat file?

  Note: files/structure:

    select.c
    select-random.c
    select-roundrobin.c
    select-shuffle.c
    ...

    policy/
      random/
        {sid}.json
      roundrobin/
        {sid}.json
      shuffle/
        {sid}.json 
       
  When writing out the file initially, scan each vhost, figure out its
  position, then write out header, then vhost entry.

And, to support the leastConns, equalConns, and leastResponseTime policies,
we'll need a separate table (also mmap'd):

    unsigned int idx
    unsigned long curr_conns
    unsigned long curr_response_ms

OR, even better, to handle the leastConns/leastResponseTime etc strategies,
use a fourth list:

  configured
  ranked
  live
  dead

Where "ranked" is a list of indices (into the configured list) in
preference/ranked order.  In the case of least conns, this ranking is done
by number of current connections.  Hmm, no, we'd still need to know that
number of current connections somewhere, not just the relative rank of
that server versus others -- consider that the number of connections may
change, changing the relative ranking, and we'd need to know the number of
current connections in order to do the re-ranking.

To make this more general (e.g. for other selection policies), maybe:

  int proxy_reverse_select_index_next(main_server->sid, &idx, policy);

Where policy could be:

  POLICY_RANDOM
  POLICY_ROUND_ROBIN
  POLICY_LEAST_CONNS
  POLICY_EQUAL_CONNS
  POLICY_LOWEST_RESPONSE_TIME

If that backend is chosen/used successfully, then:

  int policy_reverse_select_index_used(main_server->sid, idx, response_ms);

Note when we would need to maintain an fd on the database until the
session ended (e.g. for leastConns), so that we could decrement the number
of connections to a given SID when that connection ended.  This is not
necessary for roundRobin, which doesn't care about number of current
connections per sid, only next SID to index.  LeastConns, on the other hand,
DOES care about number of current connections.