Introduction

Threads, processes and the GIL

To run more than one piece of code at the same time on the same computer one has the choice of either using multiple processes or multiple threads.

Although a program can be made up of multiple processes, these processes are in effect completely independent of one another: different processes are not able to cooperate with one another unless one sets up some means of communication between them (such as by using sockets). If a lot of data must be transferred between processes then this can be inefficient.

On the other hand, multiple threads within a single process are intimately connected: they share their data but often can interfere badly with one another. It is often argued that the only way to make multithreaded programming "easy" is to avoid relying on any shared state and for the threads to only communicate by passing messages to each other.

CPython has a Global Interpreter Lock (GIL) which in many ways makes threading easier than it is in most languages by making sure that only one thread can manipulate the interpreter's objects at a time. As a result, it is often safe to let multiple threads access data without using any additional locking as one would need to in a language such as C.

One downside of the GIL is that on multi-processor (or multi-core) systems a multithreaded Python program can only make use of one processor at a time. This is a problem that can be overcome by using multiple processes instead.

Python gives little direct support for writing programs using multiple process. This package allows one to write multi-process programs using much the same API that one uses for writing threaded programs.

Forking and spawning

There are two ways of creating a new process in Python:

The current process can fork a new child process by using the os.fork() function. This effectively creates an identical copy of the current process which is now able to go off and perform some task set by the parent process. This means that the child process inherits copies of all variables that the parent process had.

However, os.fork() is not available on every platform: in particular Windows does not support it.
Alternatively, the current process can spawn a completely new Python interpreter by using the subprocess module or one of the os.spawn*() functions.

Getting this new interpreter in to a fit state to perform the task set for it by its parent process is, however, a bit of a challenge.

The processing package uses os.fork() if it is available since it makes life a lot simpler. Forking the process is also more efficient in terms of memory usage and the time needed to create the new process.

The Process class

In the processing package processes are spawned by creating a Process object and then calling its start() method. processing.Process follows the API of threading.Thread. A trivial example of a multiprocess program is

from processing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

Here the function f is run in a child process.

For an explanation of why (on Windows) the if __name__ == '__main__' part is necessary see Programming guidelines.

Exchanging objects between processes

processing supports two types of communication channel between processes:

Queues:

The function Queue() returns a near clone of Queue.Queue -- see the Python standard documentation. For example

from processing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print q.get()    # prints "[42, None, 'hello']"
    p.join()

Queues are thread and process safe. See Queues.

Pipes:

The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example

from processing import Process, Pipe

def f(conn):
    conn.send([42, None, 'hello'])
    conn.close()

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()   # prints "[42, None, 'hello']"
    p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time. See Pipes.

Synchronization between processes

processing contains equivalents of all the synchronization primitives from threading. For instance one can use a lock to ensure that only one process prints to standard output at a time:

from processing import Process, Lock

def f(l, i):
    l.acquire()
    print 'hello world', i
    l.release()

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        Process(target=f, args=(lock, num)).start()

Without using the lock output from the different processes is liable to get all mixed up.

Sharing state between processes

As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.

However, if you really do need to use some shared data then processing provides a couple of ways of doing so.

Shared memory:

Data can be stored in a shared memory map using Value or Array. For example the following code

from processing import Process, Value, Array

def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print num.value
    print arr[:]

will print

3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

The 'd' and 'i' arguments used when creating num and arr are typecodes of the kind used by the array module: 'd' indicates a double precision float and 'i' inidicates a signed integer. These shared objects will be process and thread safe.

For more flexibility in using shared memory one can use the processing.sharedctypes module which supports the creation of arbitrary ctypes objects allocated from shared memory.

Server process:

A manager object returned by Manager() controls a server process which holds python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array. For example:

from processing import Process, Manager

def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    l = manager.list(range(10))

    p = Process(target=f, args=(d, l))
    p.start()
    p.join()

    print d
    print l

will print

{0.25: None, 1: '1', '2': 2}
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Creating managers which support other types is not hard --- see Customized managers.

Server process managers are more flexible than using shared memory objects because they can be made to support arbitrary object types. Also, a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory. See Server process managers.

Using a pool of workers

The Pool() function returns an object representing a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways.

For example:

from processing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.applyAsync(f, [10])     # evaluate "f(10)" asynchronously
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow
    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

See Process pools.

Speed

The following benchmarks were performed on a single core Pentium 4, 2.5Ghz laptop running Windows XP and Ubuntu Linux 6.10 --- see benchmarks.py.

Number of 256 byte string objects passed between processes/threads per sec:

Connection type	Windows	Linux
Queue.Queue	49,000	17,000-50,000 [1]
processing.Queue	22,000	21,000
Queue managed by server	6,900	6,500
processing.Pipe	52,000	57,000

[1]	For some reason the performance of `Queue.Queue` is very variable on Linux.

Number of acquires/releases of a lock per sec:

Lock type	Windows	Linux
threading.Lock	850,000	560,000
processing.Lock	420,000	510,000
Lock managed by server	10,000	8,400
threading.RLock	93,000	76,000
processing.RLock	420,000	500,000
RLock managed by server	8,800	7,400

Number of interleaved waits/notifies per sec on a condition variable by two processes:

Condition type	Windows	Linux
threading.Condition	27,000	31,000
processing.Condition	26,000	25,000
Condition managed by server	6,600	6,000

Number of integers retrieved from a sequence per sec:

Sequence type	Windows	Linux
list	6,400,000	5,100,000
unsynchornized shared array	3,900,000	3,100,000
synchronized shared array	200,000	220,000
list managed by server	20,000	17,000