Separation of merges from view handling ======================================= Author: Bela Ban JIRA: https://jira.jboss.org/jira/browse/JGRP-1009 Goal: ----- We don't want concurrent merges and view changes (join or leave processing). During a merge, join and leave requests should be discarded. Likewise, during a join/leave processing, merge requests should be discarded. We already do discard join or leave requests during a merge (ViewHandler is suspended), but not the other way round. JGRP-1009 leads to spurious merges: when a join or leave request is being processed and the view is being disseminated, if a merge is permitted to occur during this, the merge leader might detect different views (due to them arriving at different members at different times, maybe a few millisconds apart) and initiate a merge. This won't happen when the merge is discarded during view processing / installation. Design: ------- There are 3 types of events we need to take into acccount: - The coord receiving a JOIN/JOIN_WITH_STATE/LEAVE/SUSPECT event (anything which leads to a new view being installed) - The coord receiving a MERGE event (e.g. from MERGE2 somewhere below in the stack) - The coord receiving a MERGE-REQ request (from a coord in a different partition) On reception of a JOIN/JOIN_WITH_STATE/LEAVE/SUSPECT event ---------------------------------------------------------- - If the ViewHandler is suspended --> discard the event - Else, add the event - When starting to process the event(s) in the queue: - Suspend the ViewHandler - Start the Resumer task (which resumes the ViewHandler after N seconds) - Resume the ViewHandler when done processing On reception of a MERGE event ----------------------------- - If the ViewHandler is suspended --> discard the event - Else: - If there are JOIN/LEAVE/etc events in the queue: discard the event and start the processing of the queued events - Else: - Process the MERGE event - Suspend the ViewHandler - Start the Resumer task (which resumes the ViewHandler after N seconds) - Resume the ViewHandler when done processing On reception of a MERGE-REQ --------------------------- - If the ViewHandler is suspended --> reject the MERGE-REQ (send MERGE-RSP with merge_rejected=true) - Else: - Suspend the ViewHandler - Start the Resumer task - When the merge is done --> resume the ViewHandler - On Resumer timeout: resume the ViewHandler (this could happen for instance when a remote coord starts a merge, then crashes before merge completion) Resuming the view handler: -------------------------- The following 4 cases can resume the view handler #1 JOIN/LEAVE ------------- - When the view has been installed by the coord, the view handler is resumed - The view handler needs to be resumed also if the view installation fails, e.g. due to a failed flush #2 MERGE -------- - On competion of the merge (successful or failed), the view handler is resumed #3 MERGE-REQ ------------ - Resume the view handler when getting a MergeView - Special case: if the merge leader crashes before merge completion: - On a MERGE-REQ, record the merge leader's address (when suspending the view handler) - When the merge completes, null the merge leaders address again - When we get a view excluding the merge leader, and the leader's address is non-null, resume the view handler and null the merge leader's address #4 The Resumer kicks in ----------------------- - The Resumer is started whenever the view handler is suspended - It resumes the view handler when run - When the view handler is resumed regularly, the Resumer is stopped Issues: ------- - What if the client sends a JOIN_WITH_STATE, the coord processes the JOIN, but suspends the queue after it and before processing the GET_STATE ?