Date: 2011-08-25 Author: Sébastien Boisvert Subject: VirtualCommunicator Written on: VIA train 60 (Toronto -> Montréal) The standard way of starting a program in the command-line prompt is: ./MyProgram If MyProgram utilises threads and the systems has more than 1 processor logical core, then MyProgram may use them if needed and/or was designed to. == On the way to passing messages == When using the message-passing interface (MPI), a program built around it can be run on more than one processor logical core. With the message-passing interface, a program is started using an auxiliary program. Its standard name is mpiexec, and this name stands for Message-Passing-Interface EXECution (or perhaps EXEcuter). Other names for such an auxiliary program are mpirun and orterun. With mpiexec, the same program (in our example that would be MyProgram) is started on numerous processor logical cores and each of these program instances is given a unique identifier. Each of these numbered running instances are called MPI ranks. The identifying numbers of 4 MPI ranks would be 0, 1, 2 and 3. To start a program within this framework, the following command must be utilised: mpiexec -n 4 ./MyProgram Basically, this mpiexec program will take start the same program 4 times. The magic of this is that the 4 MPI ranks can be on different computers, thus enabling the true potential of compute clusters with distributed resources. For instance, 16 computers connected with an Internet protocol (TCP/IP) network is a compute cluster with distributed resources. Let's assume that each has 8 processor logical cores. In the MPI world, it means that we can have 8 MPI ranks per computer. 16 computers thus give a total of 16*8 = 128 MPI ranks. == Sending and receiving information == It would be obviously pointless to start several instance of the same program with the same exact command-line arguments if these can not exchange information. Fortunately, there is one single thing that truly differentiate all these otherwise identical running instances of the same program executable. This handy attribute that I refer to is these MPI rank numbers. Using these, any MPI rank can send a message to another specific MPI rank. And any MPI rank can check for new unread messages recently received. Any message thus has a sender MPI rank, a receiver MPI rank and actual information. Any message also has an MPI tag to identify the type of information it contains. The hardest part is presumably to implement programs using this general idea of MPI ranks. Following the single-program multiple-data paradigm (SPMD), the source code of a MPI program is the same regardless of the actual number of a MPI rank. Indeed, when started, each MPI rank gets 2 important values from mpiexec. The first is its MPI rank number that we already discussed above. The second is the total number of MPI ranks, which we also discussed above. == Sending messages is the bottleneck == Communicating messages enables the creation of truly parallel and distributed computational systems. But sending messages will be slowest part in your code on most hardware. see mpi-communication-latency.txt == VirtualCommunicator == Let's say that MPI rank 4 needs to send 500 messages, each having 8 bytes of data, to MPI rank 13. And let's also assume that all these messages have the same MPI tag. Assuming that the maximum size of a single message is 4000 bytes, all these 500 messages can be agglomerated in one large message of 4000 bytes (500 * 8 = 4000). The idea of the VirtualCommunicator is to push each of these small 500 messages individually to a virtual communication device. This way, the programmer does not need to care about actually grouping its messages at all. This can be tricky and/or boring to do manually in the code. This so-called VirtualCommunicator will do the multiplexing and de-multiplexing of all message contents. == Implementation == Author: Sébastien Boisvert License: GPL code/communication/VirtualCommunicator.h code/communication/VirtualCommunicator.cpp = Feeding the VirtualCommunicator == Effectively write code that can push a lot of small messages on the VirtualCommunicator is also easy. You simply have to split the work load in slices. Then, you have to assign each of these slices to a single worker. see VirtualProcessor.txt