FLUSH2 design ============= Author: Bela Ban Prerequisites: - A flush is always started and stopped by the *same* member, both for joins and state transfer - For state transfers (but not for joins), we can have different members start a flush at the same time Structures: - FlushState - flush_active: whether we are currently running a flush - flush_leaders: set of members which started the flush (no duplicates) - start_flush_set: list of members from whom we expect a START-FLUSH-OK message - digests: digests from all members, received with START-FLUSH-OK message - stop_flush_set: list of members to whom we send a STOP-FLUSH message - down_mutex: for blocking of multicast down calls, unicasts are passed at all times On SUSPEND event: - If FlushState.flush_active: - Add flush leader to flush_leaders (if not yet present) - If already present: terminate (don't send START-FLUSH message) -Else - Set FlushState.flush_active to true - Add flush leader to flush_leaders (if not yet present) - Set start_flush_set and stop_flush_set - Multicast START-FLUSH - Block until start_flush_set is empty (block on start_flush_set) - Look at FlushState.digests: - If all digests are the same --> return - Else --> run reconciliation phase On START-FLUSH(sender=P): - If FlushState.flush_active: - Add flush leader to flush_leaders (if not yet present) -Else - Set FlushState.flush_active to true - Add flush leader to flush_leaders (if not yet present) - Call block() - Make down_mutex block multicast messages from now on - Return START-FLUSH-OK with my digest to P On START-FLUSH-OK(sender=P,digest D): - Add digest D to FlushState.digests - Remove P from FlushPhase.start_flush_set - Notify FlushPhase.start_flush_set if empty On RESUME event: - Multicast STOP-FLUSH event (asynchronously, don't wait for results) On STOP-FLUSH(sender=P): - Remove P from flush_leaders - If flush_leaders is empty: - Unblock down_mutex On SUSPECT(S): - Remove S from start_flush_set and stop_flush_set - If start_flush_set is now empty --> notify thread blocking on it - Remove S from FlushState.flush_leaders - If FlushState.flush_leaders is empty: (flush leader S crashed during flush !) - Unblock down_mutex On view change V: - Remove all members that are not part of V from start_flush_set and stop_flush_set - If start_flush_set is now empty --> notify thread blocking on it - Remove all members from FlushState.flush_leaders which are not part of V - If FlushState.flush_leaders is empty: (flush leader crashed during flush !) - Unblock down_mutex Reconciliation phase (in NAKACK): - Sends XMIT requests for message not in local digest, but in digest received from REBROADCAST_MSGS - Wait until local digest (updated when missing messages are received) is same as digest received from REBROADCAST_MSGS - Takes suspects into account Concurrent flushes ------------------ - Concurrent flushes to overlapping member sets will cause only 1 flush to succeed, and all others to fail - A failed flush set the flushInProgress flag back to false on all members on which it successfully set it - Algorithm: - Send a START-FLUSH to all members in the target set - Every member sets the flushInProgress flag to true - If this succeeds, the member sends back an OK - Else a FAIL is sent back - If we received OKs from all members, startFlush() succeeded and returns true - If 1 FAIL is received: - Send an ABORT-FLUSH to all members which took part in the flush, causes flushInProgress to be set to false - startFlush() failed and returns false Concurrent total flushing (members={A,B,C,D}) --------------------------------------------- - In general, concurrent total flushes are not allowed - Concurrent flushes on the same member: - The first thread to call Channel.startFlush() runs the flush phase - Subsequent threads to call Channel.startFlush() block until the current flush phase has completed (Channel.stopFlush()) - Concurrent flushes on different members: - A.startFlush() and B.startFlush() are called concurrently - A sends a START-FLUSH to all members, and so does B - When a START-FLUSH is received, a flag 'flushing-in-progress' is set - If flushing-in-progress cannot be set (because another flush protocol is running), we send back a START-FLUSH-FAIL. (We might add a small timeout to block until flushing-in-progress can be acquired) - On reception of a START-FLUSH-FAIL, we 'roll back' the flush, setting all of our acquired 'flushing-in-progess' flags back to false - SUMMARY: either A fails, or B fails, or both A and B fail in a concurrent flush phase (similar to concurrent transactions) Concurrent partial flushing (members={A,B,C,D}) ----------------------------------------------- - In general, concurrent partial flushes are not allowed (same as for concurrent total flushes)