Release Notes JGroups 2.5 ========================= Version: $Id: ReleaseNotes-2.5.txt,v 1.11 2007/07/03 12:28:05 belaban Exp $ Author: Bela Ban JGroups 2.5 is still API-backwards compatible with previous versions (down to 2.2.5). However, there are some changes: - JDK 5 is required - some older protocols were tossed out, e.g. if you used protocols listed in vsync.xml, they won't be found anymore The biggest new functionality is the concurrent stack, which allows for concurrent processing of unrelated messages. Below is a summary (with links to the detailed description) of the major new features. Concurrent stack ---------------- [http://jira.jboss.com/jira/browse/JGRP-181] The concurrent stack is a major performance improvement for clusters where multiple nodes are sending messages at the same time. Up to and including 2.4.x, all messages from all senders were placed into a single queue and delivered in order of reception (FIFO) to the application. This means that, for a given message M, all messages ahead of M had to get processed before M could get processed, even if some of those messages were from different senders. Now, messages from different senders are processed concurrently. This is done through 2 thread pools, one for default messages and another one for out-of-band messages. Both pools can be configured through XML, e.g. core and max number of threads, rejection policy ("run", "discard", "discardoldest" etc), whether to use a queue and if so, queue length etc. The concurrent stack will improve performance dramatically when - there are multiple senders and/or - the processing of a message takes some time. In a cluster of N with N senders, X messages and T think time/message, we have seen total processing time of all messages drop from X * N * T to (X * T) + ! Out-of-band (unordered) messages -------------------------------- [http://jira.jboss.com/jira/browse/JGRP-205] In some cases, messages do not need to get delivered in the order in which they were sent. For example, if a sender A sends messages M1 --> M2 --> M3 --> M4 --> M5 (--> means followed by), and all messages except M3 (heartbeat) and M5 (discovery request) are regular messages, then all 5 messages will be delivered sequentially. This means that M3 has to wait for M1 and M2 to get processed, and M5 has to wait for all 4 messages ahead of it, until it gets processed. An out-of-band (OOB) message is one that is tagged: Message msg; msg.setFlag(Message.OOB) An OOB message is reliably transmitted, that is if the network drops it, JGroups will retransmit it. However, the ordering defined by the stack is ignored for an OOB messages, e.g. in the above case, M3 and M5 can be delivered out of sequence with regard to the other messages. This is perfect for messages like heartbeats or discovery requests or responses, which do not need to be delivered in FIFO order with respect to other messages from the same sender. If a message is tagged as OOB, it will be handled by the OOB thread pool rather than the regular thread pool. Concurrent Multiplexer ---------------------- [http://jira.jboss.com/jira/browse/JGRP-415] A similar problem that the concurrent stack solved, was encountered in the Multiplexer: requests to different services, but sent by the same sender, were processed sequentially (even with the concurrent stack, where messages from the same sender are processed in FIFO order). This was solved by adding a thread pool to the Multiplexer, which handles requests to different services concurrently, even if they were sent by the same sender. Simplified and fast flow control (SFC) -------------------------------------- [http://jira.jboss.com/jira/browse/JGRP-402] SFC is a simple flow control protocol for group (= multipoint) messages. It is simpler than FC, but as the performance report (http://www.jgroups.org/javagroupsnew/perfnew/Report.html) shows, it has about the same performance as FC, so we may combine the 2 in the future. Note, however, that SFC does not apply flow control to unicast messages, whereas FC does. Every sender has max_credits bytes for sending multicast messages to the group. Every multicast message (we don't consider unicast messages) decrements max_credits by its size. When max_credits falls below 0, the sender asks all receivers for new credits and blocks until *all* credits have been received from all members. When the receiver receives a credit request, it checks whether it has received max_credits bytes from the requester since the last credit request. If yes, it sends new credits to the requester and resets the max_credits for the requester. Else, it takes a note of the credit request from P and - when max_credits bytes have finally been received from P - it sends the credits to P and resets max_credits for P. The maximum amount of memory for received messages is therefore * max_credits. The relationship with STABLE is as follows: when a member Q is slow, it will prevent STABLE from collecting messages above the ones seen by Q (everybody else has seen more messages). However, because Q will *not* send credits back to the senders until it has processed all messages worth max_credits bytes, the senders will block. This in turn allows STABLE to progress and eventually garbage collect most messages from all senders. Therefore, SFC and STABLE complement each other, with SFC blocking senders so that STABLE can catch up. Full support for virtual synchrony ---------------------------------- [http://jira.jboss.com/jira/browse/JGRP-341] The FLUSH protocol in 2.4.x supported virtual synchrony ([1]), but the flush phase didn't include a message reconciliation part. This has now been added in 2.5. FLUSH is very important for the Multiplexer, where a flush phase is run whenever a member joins, leaves or crashes, or when a new member acquires the state from an existing member (state transfer). Simple failure detection protocol (FD_ALL) ------------------------------------------ [http://jira.jboss.com/jira/browse/JGRP-395] Failure detection based on simple heartbeat protocol. Every member periodically multicasts a heartbeat. Every member also maintains a table of all members (minus itself). When data or a heartbeat from P are received, we reset the timestamp for P to the current time. Periodically, we check for expired members, and suspect those. Example: In the exampe above, we send a heartbeat every 3 seconds and suspect members if we haven't received a heartbeat (or traffic) for more than 10 seconds. Note that since we check the timestamps every 'interval' milliseconds, we will suspect a member after roughly 4 * 3s == 12 seconds. If we set the timeout to 8500, then we would suspect a member after 3 * 3 secs == 9 seconds. FD_ALL is interchangeable with FD. Better naming of threads ------------------------ Almost all threads have the cluster name and local address appended to their names. This is good if we have multiple clusters in the same JVM (e.g. in JBossAS), allowing for more meaningful stack traces (knowing which thread belongs to which cluster). No need for escape characters in stack configuration ---------------------------------------------------- Now backslashes are not needed for protocol attribute values, e.g. to define an IPv6 mcast_addr, the following works: The colons in mcast_addr do not need to be escaped with a backslash any longer. Note that '(', ')' and '=' are still reserved characters and cannot be used as part of an attribute name or value. Switch to java.util.concurrent classes (JDK 5) ---------------------------------------------- [http://jira.jboss.com/jira/browse/JGRP-391] We switched from concurrent.jar to java.util.concurrent, resulting in slightly better performance and we could also drop a JAR (concurrent.jar). Manual ------ The manual is online at http://www.jgroups.org/javagroupsnew/docs/manual/html/index.html Performance ----------- We measured the performance of the 2.5 stack in a cluster of 4, 6 and 8 nodes. The results are discussed in http://www.jgroups.org/javagroupsnew/perfnew/Report.html. We'll present more comprehensive performance numbers (sepecially for the concurrent stack) in 2.6. Bug fixes --------- The list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP. Bela Ban, Kreuzlingen, Switzerland, June 2007 [1] Reliable communication in presence of failures. Kenneth P. Birman, Thomas A.Joseph. ACM Transactions on Computer Systems, Vol. 5, No. 1, Feb. 1987