Module MasterSlave
Distributed computing using a master-slave model
The classes in this module provide a simple way to parallelize
independent computations in a program. The communication is handled by
the Pyro package, which must be installed before this module can be used.
Pyro can be obtained from http://pyro.sourceforge.net/. By default, the
Pyro name server is used to initialize communication. Please read the
Pyro documentation for learning how to use the name server.
The principle of the master-slave model is that there is a single
master process that defines computational tasks and any number of slave
processes that execute these tasks. The master defines task requests and
then waits for the results to come in. The slaves wait for a task
request, execute it, return the result, and wait for the next task. There
can be any number of slave processes, which can be started and terminated
independently, the only condition being that no slave process can be
started before its master process. This setup makes it possible to
perform a lengthy computation using a variable number of processors.
Communication between the master and the slave processes passes
through a TaskManager object that is created automatically as part of the
master process. The task manager stores and hands out task requests and
results. The task manager also keeps track of the slave processes. When a
slave process disappears (because it was killed or because of a hardware
failure), the task manager re-schedules its active task(s) to another
slave process. This makes the master-slave system very fault
tolerant.
Each task manager has a label that makes it possible to distinguish
between several master-slave groups running at the same time. It is by
the label that slave processes identify the master process for which they
work.
The script "task_manager" prints statistics about a
currently active task manager; it takes the label as an argument. It
shows the number of currently active processes (master plus slaves), the
number of waiting and running tasks, and the number of results waiting to
be picked up.
The script Examples/master_slave_demo.py illustrates the use of the
master-slave setup in a simple script. Both master and slave processes
are defined in the same script. The scripts Examples/master.py and
Examples/slave.py show a master-slave setup using two distinct scripts.
This is more flexible because task requests and result retrievals can be
made from anywhere in the master code.
|
|
MasterProcess
|
initializeMasterProcess(label,
slave_script=None,
slave_module=None,
use_name_server=True)
Initializes a master process. |
|
|
|
runJob(label,
master_class,
slave_class,
watchdog_period=120.0,
launch_slaves=0)
Creates an instance of the master_class and runs it. |
|
|
|
|
initializeMasterProcess(label,
slave_script=None,
slave_module=None,
use_name_server=True)
|
|
Initializes a master process.
- Parameters:
label (str ) - the label that identifies the task manager
slave_script (str ) - the file name of the script that defines the corresponding slave
process
slave_module (str ) - the name of the module that defines the corresponding slave
process
use_name_server (bool ) - If True (default), the task manager is registered
with the Pyro name server. If False , the name server
is not used and slave processes need to know the host on which
the master process is running.
- Returns: MasterProcess
- a process object on which the methods requestTask() and
retrieveResult() can be called.
|
runJob(label,
master_class,
slave_class,
watchdog_period=120.0,
launch_slaves=0)
|
|
Creates an instance of the master_class and runs it. A copy of the
script and the current working directory are stored in the TaskManager
object to enable the task_manager script to launch slave processes.
- Parameters:
label (str ) - the label that identifies the task manager
master_class - the class implementing the master process (a subclass of MasterProcess)
slave_class - the class implementing the slave process (a subclass of SlaveProcess)
watchdog_period (int or NoneType ) - the interval (in seconds) at which the slave process sends
messages to the manager to signal that it is still alive. If
None, no messages are sent at all. In that case, the manager
cannot recognize if the slave job has crashed or been killed.
launch_slaves (int ) - the number of slaves jobs to launch immediately on the same
machine that runs the master process
|
startSlaveProcess(label=None,
master_host=None)
|
|
Starts a slave process. Must be called at the end of a script that
defines or imports all task handlers.
- Parameters:
label (str or NoneType ) - the label that identifies the task manager. May be omitted if the
slave process is started through the task_manager script.
master_host (str or NoneType ) - If None (default), the task manager of the master
process is located using the Pyro name server. If no name server
is used, this parameter must be the hostname of the machine on
which the master process runs, plus the port number if it is
different from the default (7766).
|