UNICAST2 design =============== (see UNICAST.txt for the old design) Author: Bela Ban Motivation ---------- UNICAST has issues when one end of the connnection unilaterally closes the connection and discards the state in the connection table. Example: we have a conn between A and B. There's a partition such that A sees {A,B} but B sees only {B}. B will clear its connection table for A on reception of the view, whereas A will keep it. Now the partition heals and A and B can communicate again. Assuming A's next seqno to B is #25 (and #7 for receiving messages from B), B will store the message because it expects #1 from A (new connection). As a matter of fact, B will store *and not deliver* all subsequent messages from A ! The reverse direction is also bad: B will send #1 to A, but A expects #7, so A will discard the message. The first 6 messages from B are discarded at A ! Goals ----- #1 Handle the above scenarios #2 Handle the scenario where a member communicates with a non-member (get rid of enabled_mbrs and prev_mbrs) #3 Handle the scenario where a member talks to a non existing (or previous) member. Get rid of ENABLE_UNICASTS_TO and age out connections to non existing members after some time (JGRP-942) #4 Should be usable without group communication ('Unicast JGroups') Design ------ As example we have a unicast connection between A and B. A is the sender and B the receiver: A <-------------------------------------------------> B B:entry.seqno=#25 A:entry.seqno=#7 recv_win=#7 recv_win=#25 send-conn-id=322649 send-conn-id=101200 recv-conn-id=101200 recv-conn-id=322649 A has an entry in the connection table for B, and B has an entry for A. Each connection has a connection ID (conn-id). Each entry also has a seqno which is the highest seqno sent to the peer so far, and a recv_win which has the highest seqno received from the peer so far. For example, A's next message to B will be #25, and the next seqno expected from B is #7. A sends a message to B: ----------------------- - If the entry for B is null, or the seqno=0: - Create an entry, set the seqno to 1 and set send-conn-id to the current time (needs to be unique, could also use UUIDs) - Send the message with the next seqno and the current conn-id and first=true - Else - Send the message with the next seqno and the current conn-id B receives a message from A: ---------------------------- - If first == true - If entry or entry.recv_win for B == null - Create a new entry.recv_win with msg.seqno - Set entry.recv-conn-id to conn-id - Else: - If conn-id != entry.recv-conn-id: - Create a new entry.recv_win with msg.seqno - Set entry.recv-conn-id to conn-id - Else - NOP (prevents duplicate connection establishments) - Else - If entry.recv_win == null || conn-id != recv-conn-id: no-op - Drop message - Send SEND_FIRST_SEQNO to A A receives GET_FIRST_SEQNO from B: ---------------------------------- - If conn-id != send-conn-id: drop message - A grabs the first message in its sent_win - A adds the entry.send-conn-id to the UnicastHeader (if not yet present), sets first=true and sends the message to B Scenarios --------- The scenarios are tested in UNICAST_ConnectionTests #1 A creates new connection to B: - The entry for B is null, a new entry is created and added to the connection table - Entry.send-conn-id is set and sent with the message - Entry.seqno now is 1 #2 B receives new connection: - B creates a new entry and entry.recv_win (with msg.seqno) for A - B sets entry.recv-conn-id to msg.conn-id - B adds the message to entry.recv_win #3 A and B close connection (e.g. based on a view change (partition)): - Both A and B reset (cancelling pending retransmissions) and remove the entry for their peer from the connection table #4 A closes the connection unilaterally (B keeps it open), then reopens it and sends a message: - A removes the entry for B from its connection table, cancelling all pending retransmissions - (Assuming that B's entry.recv_win for A is at #25) - A creates a new entry for B in its connection table - Entry.send-conn-id is set and sent with the message - Entry.seqno now is 1 - B receives the message with a new conn-id - B does have an entry for A, but entry.recv-conn-id doesn't match msg.conn-id - B creates a new entry.recv_win, sets it to msg.seqno - B sets entry.recv-conn-id to msg.conn-id #5 B closes its connection unilaterally, then A sends a message to B: - B doesn't find an entry for A in its connection table - B discards the message and sends a SEND-FIRST-SEQNO to A - A receives the SEND-FIRST-SEQNO message. It grabs the message with the lowest seqno in its entry.send_win, adds a UnicastHeader with entry.send-conn-id and sends the message to B - B receive the message and creates a new entry and entry.recv_win (with msg.seqno) - B sets entry.recv-conn-id to msg.conn-id #6 Same as #4, but after re-establishing the connection to B, A loses the first message (first part of #4) - A creates a new sender window for B - A sends #1(conn-id=322649) #2(conn-id=0) #3(conn-id=0), but loses #1 - B receives #2 first. It thinks this is part of a regular connection, so it doesn't trash its receiver window - B expects a seqno higher than #2 (from the prev conversation with A), and discards #2, but *acks* it nevertheless - A removes #2 from its sender window - B now finally receives #1, and creates a new receiver window for A at #1 - A retransmits #3 - B stores #3 but doesn't deliver it because it hasn't received #2 yet - However, B will *never* receive #2 from A because that seqno has been removed from A's sender window ! #7 Merge where A and B are in different partitions: - Both A and B removes the entries for each other in their respective connection tables - When the partition heals, both A and B will create new entries (see scenario #2) #8 Merge where A and B are in overlapping partitions A: {A}, B: {A,B}: - (This case is currently handled by shunning, not merging) - A sends a message to B - A removed its entry for B, but B kept its entry for A - A new creates a new connection to B (scenario #1) and sends the message - B receives the message, but entry.recv-conn-id doesn't match msg.conn-id, so B removes entry.recv_win, sets entry.recv-conn-id to msg.conn-id and creates a new entry.recv_win with msg.seqno (same as second half of scenario #4) #9 Merge where A and B are in overlapping partitions A: {A,B}, B: {B}: - A sends a message to B (msg.seqno=25) - B doesn't have an entry for A - B discards the message and sends a SEND-FIRST-SEQNO to A - A receives the SEND-FIRST-SEQNO message. It grabs the message with the lowest seqno in its entry.send_win, adds a UnicastHeader with entry.send-conn-id and sends the message to B - B receive the message and creates a new entry and entry.recv_win (with msg.seqno) - B sets entry.recv-conn-id to msg.conn-id Issues ------ - How do we handle retransmissions of the first message (first=true) ? We *cannot* create a new entry.recv_win, or else we trash already received msgs ! Use a UUID (as connection-ID) instead of first=true ? Maybe the system time is sufficient ? After all, the ID only has to be unique between A and B ! ==> Solved by using connection IDs (see above)