File: MERGE_View_Separation.txt

package info (click to toggle)
libjgroups-java 2.12.2.Final-6
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 8,712 kB
  • sloc: java: 109,098; xml: 9,423; sh: 149; makefile: 2
file content (98 lines) | stat: -rw-r--r-- 3,816 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98


Separation of merges from view handling
=======================================

Author: Bela Ban
JIRA: https://jira.jboss.org/jira/browse/JGRP-1009

Goal:
-----

We don't want concurrent merges and view changes (join or leave processing). During a merge, join and leave requests
should be discarded. Likewise, during a join/leave processing, merge requests should be discarded.

We already do discard join or leave requests during a merge (ViewHandler is suspended), but not the other way round.

JGRP-1009 leads to spurious merges: when a join or leave request is being processed and the view is being
disseminated, if a merge is permitted to occur during this, the merge leader might detect different views
(due to them arriving at different members at different times, maybe a few millisconds apart) and initiate a merge.

This won't happen when the merge is discarded during view processing / installation.

Design:
-------

There are 3 types of events we need to take into acccount:
- The coord receiving a JOIN/JOIN_WITH_STATE/LEAVE/SUSPECT event (anything which leads to a new view being installed)
- The coord receiving a MERGE event (e.g. from MERGE2 somewhere below in the stack)
- The coord receiving a MERGE-REQ request (from a coord in a different partition)

On reception of a JOIN/JOIN_WITH_STATE/LEAVE/SUSPECT event
----------------------------------------------------------
- If the ViewHandler is suspended --> discard the event
- Else, add the event
- When starting to process the event(s) in the queue:
  - Suspend the ViewHandler
  - Start the Resumer task (which resumes the ViewHandler after N seconds)
  - Resume the ViewHandler when done processing


On reception of a MERGE event
-----------------------------
- If the ViewHandler is suspended --> discard the event
- Else:
  - If there are JOIN/LEAVE/etc events in the queue: discard the event and start the processing of the queued events
  - Else:
      - Process the MERGE event
      - Suspend the ViewHandler
      - Start the Resumer task (which resumes the ViewHandler after N seconds)
      - Resume the ViewHandler when done processing


On reception of a MERGE-REQ
---------------------------
- If the ViewHandler is suspended --> reject the MERGE-REQ (send MERGE-RSP with merge_rejected=true)
- Else:
  - Suspend the ViewHandler
  - Start the Resumer task
  - When the merge is done --> resume the ViewHandler
  - On Resumer timeout: resume the ViewHandler
    (this could happen for instance when a remote coord starts a merge, then crashes before merge completion)


Resuming the view handler:
--------------------------

The following 4 cases can resume the view handler

#1 JOIN/LEAVE
-------------
- When the view has been installed by the coord, the view handler is resumed
- The view handler needs to be resumed also if the view installation fails, e.g. due to a failed flush

#2 MERGE
--------
- On competion of the merge (successful or failed), the view handler is resumed

#3 MERGE-REQ
------------
- Resume the view handler when getting a MergeView
- Special case: if the merge leader crashes before merge completion:
  - On a MERGE-REQ, record the merge leader's address (when suspending the view handler)
  - When the merge completes, null the merge leaders address again
  - When we get a view excluding the merge leader, and the leader's address is non-null, resume the
    view handler and null the merge leader's address

#4 The Resumer kicks in
-----------------------
- The Resumer is started whenever the view handler is suspended
- It resumes the view handler when run
- When the view handler is resumed regularly, the Resumer is stopped


Issues:
-------

- What if the client sends a JOIN_WITH_STATE, the coord processes the JOIN, but suspends the queue after it and before
  processing the GET_STATE ?