1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
|
UNICAST2 design
===============
(see UNICAST.txt for the old design)
Author: Bela Ban
Motivation
----------
UNICAST has issues when one end of the connnection unilaterally closes the connection and discards the state in
the connection table.
Example: we have a conn between A and B. There's a partition such that A sees {A,B} but B sees only {B}.
B will clear its connection table for A on reception of the view, whereas A will keep it.
Now the partition heals and A and B can communicate again.
Assuming A's next seqno to B is #25 (and #7 for receiving messages from B),
B will store the message because it expects #1 from A (new connection). As a matter of fact, B will store *and not
deliver* all subsequent messages from A !
The reverse direction is also bad: B will send #1 to A, but A expects #7, so A will discard the message. The first 6
messages from B are discarded at A !
Goals
-----
#1 Handle the above scenarios
#2 Handle the scenario where a member communicates with a non-member (get rid of enabled_mbrs and prev_mbrs)
#3 Handle the scenario where a member talks to a non existing (or previous) member. Get rid of
ENABLE_UNICASTS_TO and age out connections to non existing members after some time (JGRP-942)
#4 Should be usable without group communication ('Unicast JGroups')
Design
------
As example we have a unicast connection between A and B. A is the sender and B the receiver:
A <-------------------------------------------------> B
B:entry.seqno=#25 A:entry.seqno=#7
recv_win=#7 recv_win=#25
send-conn-id=322649 send-conn-id=101200
recv-conn-id=101200 recv-conn-id=322649
A has an entry in the connection table for B, and B has an entry for A. Each connection has a connection ID (conn-id).
Each entry also has a seqno which is the highest seqno sent to the peer so far, and a recv_win which has the highest
seqno received from the peer so far. For example, A's next message to B will be #25, and the next seqno expected
from B is #7.
A sends a message to B:
-----------------------
- If the entry for B is null, or the seqno=0:
- Create an entry, set the seqno to 1 and set send-conn-id to the current time (needs to be unique, could also use UUIDs)
- Send the message with the next seqno and the current conn-id and first=true
- Else
- Send the message with the next seqno and the current conn-id
B receives a message from A:
----------------------------
- If first == true
- If entry or entry.recv_win for B == null
- Create a new entry.recv_win with msg.seqno
- Set entry.recv-conn-id to conn-id
- Else:
- If conn-id != entry.recv-conn-id:
- Create a new entry.recv_win with msg.seqno
- Set entry.recv-conn-id to conn-id
- Else
- NOP (prevents duplicate connection establishments)
- Else
- If entry.recv_win == null || conn-id != recv-conn-id: no-op
- Drop message
- Send SEND_FIRST_SEQNO to A
A receives GET_FIRST_SEQNO from B:
----------------------------------
- If conn-id != send-conn-id: drop message
- A grabs the first message in its sent_win
- A adds the entry.send-conn-id to the UnicastHeader (if not yet present), sets first=true and sends the message to B
Scenarios
---------
The scenarios are tested in UNICAST_ConnectionTests
#1 A creates new connection to B:
- The entry for B is null, a new entry is created and added to the connection table
- Entry.send-conn-id is set and sent with the message
- Entry.seqno now is 1
#2 B receives new connection:
- B creates a new entry and entry.recv_win (with msg.seqno) for A
- B sets entry.recv-conn-id to msg.conn-id
- B adds the message to entry.recv_win
#3 A and B close connection (e.g. based on a view change (partition)):
- Both A and B reset (cancelling pending retransmissions) and remove the entry for their peer from the connection table
#4 A closes the connection unilaterally (B keeps it open), then reopens it and sends a message:
- A removes the entry for B from its connection table, cancelling all pending retransmissions
- (Assuming that B's entry.recv_win for A is at #25)
- A creates a new entry for B in its connection table
- Entry.send-conn-id is set and sent with the message
- Entry.seqno now is 1
- B receives the message with a new conn-id
- B does have an entry for A, but entry.recv-conn-id doesn't match msg.conn-id
- B creates a new entry.recv_win, sets it to msg.seqno
- B sets entry.recv-conn-id to msg.conn-id
#5 B closes its connection unilaterally, then A sends a message to B:
- B doesn't find an entry for A in its connection table
- B discards the message and sends a SEND-FIRST-SEQNO to A
- A receives the SEND-FIRST-SEQNO message. It grabs the message with the lowest seqno
in its entry.send_win, adds a UnicastHeader with entry.send-conn-id and sends the
message to B
- B receive the message and creates a new entry and entry.recv_win (with msg.seqno)
- B sets entry.recv-conn-id to msg.conn-id
#6 Same as #4, but after re-establishing the connection to B, A loses the first message
(first part of #4)
- A creates a new sender window for B
- A sends #1(conn-id=322649) #2(conn-id=0) #3(conn-id=0), but loses #1
- B receives #2 first. It thinks this is part of a regular connection, so it doesn't trash its receiver window
- B expects a seqno higher than #2 (from the prev conversation with A), and discards #2, but *acks* it nevertheless
- A removes #2 from its sender window
- B now finally receives #1, and creates a new receiver window for A at #1
- A retransmits #3
- B stores #3 but doesn't deliver it because it hasn't received #2 yet
- However, B will *never* receive #2 from A because that seqno has been removed from A's sender window !
#7 Merge where A and B are in different partitions:
- Both A and B removes the entries for each other in their respective connection tables
- When the partition heals, both A and B will create new entries (see scenario #2)
#8 Merge where A and B are in overlapping partitions A: {A}, B: {A,B}:
- (This case is currently handled by shunning, not merging)
- A sends a message to B
- A removed its entry for B, but B kept its entry for A
- A new creates a new connection to B (scenario #1) and sends the message
- B receives the message, but entry.recv-conn-id doesn't match msg.conn-id, so B
removes entry.recv_win, sets entry.recv-conn-id to msg.conn-id and creates a new
entry.recv_win with msg.seqno (same as second half of scenario #4)
#9 Merge where A and B are in overlapping partitions A: {A,B}, B: {B}:
- A sends a message to B (msg.seqno=25)
- B doesn't have an entry for A
- B discards the message and sends a SEND-FIRST-SEQNO to A
- A receives the SEND-FIRST-SEQNO message. It grabs the message with the lowest seqno
in its entry.send_win, adds a UnicastHeader with entry.send-conn-id and sends the
message to B
- B receive the message and creates a new entry and entry.recv_win (with msg.seqno)
- B sets entry.recv-conn-id to msg.conn-id
Issues
------
- How do we handle retransmissions of the first message (first=true) ? We *cannot* create a new entry.recv_win, or
else we trash already received msgs ! Use a UUID (as connection-ID) instead of first=true ? Maybe the system time
is sufficient ? After all, the ID only has to be unique between A and B !
==> Solved by using connection IDs (see above)
|