Prev         Up         Next

Introduction

Threads, processes and the GIL

To run more than one piece of code at the same time on the same computer one has the choice of either using multiple processes or multiple threads.

Although a program can be made up of multiple processes, these processes are in effect completely independent of one another: different processes are not able to cooperate with one another unless one sets up some means of communication between them (such as by using sockets). If a lot of data must be transferred between processes then this can be inefficient.

On the other hand, multiple threads within a single process are intimately connected: they share their data but often can interfere badly with one another. It is often argued that the only way to make multithreaded programming "easy" is to avoid relying on any shared state and for the threads to only communicate by passing messages to each other.

CPython has a Global Interpreter Lock (GIL) which in many ways makes threading easier than it is in most languages by making sure that only one thread can manipulate the interpreter's objects at a time. As a result, it is often safe to let multiple threads access data without using any additional locking as one would need to in a language such as C.

One downside of the GIL is that on multi-processor (or multi-core) systems a multithreaded Python program can only make use of one processor at a time. This is a problem that can be overcome by using multiple processes instead.

Python gives little direct support for writing programs using multiple process. This package allows one to write multi-process programs using much the same API that one uses for writing threaded programs.

Forking and spawning

There are two ways of creating a new process in Python:

The processing package uses os.fork() if it is available since it makes life a lot simpler. Forking the process is also more efficient in terms of memory usage and the time needed to create the new process.

The Process class

In the processing package processes are spawned by creating a Process object and then calling its start() method. processing.Process follows the API of threading.Thread. A trivial example of a multiprocess program is

from processing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

Here the function f is run in a child process.

For an explanation of why (on Windows) the if __name__ == '__main__' part is necessary see Programming guidelines.

Exchanging objects between processes

processing supports two types of communication channel between processes:

Queues:

The function Queue() returns a near clone of Queue.Queue -- see the Python standard documentation. For example

from processing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print q.get()    # prints "[42, None, 'hello']"
    p.join()

Queues are thread and process safe. See Queues.

Pipes:

The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example

from processing import Process, Pipe

def f(conn):
    conn.send([42, None, 'hello'])
    conn.close()

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()   # prints "[42, None, 'hello']"
    p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time. See Pipes.

Synchronization between processes

processing contains equivalents of all the synchronization primitives from threading. For instance one can use a lock to ensure that only one process prints to standard output at a time:

from processing import Process, Lock

def f(l, i):
    l.acquire()
    print 'hello world', i
    l.release()

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        Process(target=f, args=(lock, num)).start()

Without using the lock output from the different processes is liable to get all mixed up.

Sharing state between processes

As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.

However, if you really do need to use some shared data then processing provides a couple of ways of doing so.

Shared memory:

Data can be stored in a shared memory map using Value or Array. For example the following code

from processing import Process, Value, Array

def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print num.value
    print arr[:]

will print

3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

The 'd' and 'i' arguments used when creating num and arr are typecodes of the kind used by the array module: 'd' indicates a double precision float and 'i' inidicates a signed integer. These shared objects will be process and thread safe.

For more flexibility in using shared memory one can use the processing.sharedctypes module which supports the creation of arbitrary ctypes objects allocated from shared memory.

Server process:

A manager object returned by Manager() controls a server process which holds python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array. For example:

from processing import Process, Manager

def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    l = manager.list(range(10))

    p = Process(target=f, args=(d, l))
    p.start()
    p.join()

    print d
    print l

will print

{0.25: None, 1: '1', '2': 2}
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Creating managers which support other types is not hard --- see Customized managers.

Server process managers are more flexible than using shared memory objects because they can be made to support arbitrary object types. Also, a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory. See Server process managers.

Using a pool of workers

The Pool() function returns an object representing a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways.

For example:

from processing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.applyAsync(f, [10])     # evaluate "f(10)" asynchronously
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow
    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

See Process pools.

Speed

The following benchmarks were performed on a single core Pentium 4, 2.5Ghz laptop running Windows XP and Ubuntu Linux 6.10 --- see benchmarks.py.

Number of 256 byte string objects passed between processes/threads per sec:

Connection type Windows Linux
Queue.Queue 49,000 17,000-50,000 [1]
processing.Queue 22,000 21,000
Queue managed by server 6,900 6,500
processing.Pipe 52,000 57,000
[1]For some reason the performance of Queue.Queue is very variable on Linux.

Number of acquires/releases of a lock per sec:

Lock type Windows Linux
threading.Lock 850,000 560,000
processing.Lock 420,000 510,000
Lock managed by server 10,000 8,400
threading.RLock 93,000 76,000
processing.RLock 420,000 500,000
RLock managed by server 8,800 7,400

Number of interleaved waits/notifies per sec on a condition variable by two processes:

Condition type Windows Linux
threading.Condition 27,000 31,000
processing.Condition 26,000 25,000
Condition managed by server 6,600 6,000

Number of integers retrieved from a sequence per sec:

Sequence type Windows Linux
list 6,400,000 5,100,000
unsynchornized shared array 3,900,000 3,100,000
synchronized shared array 200,000 220,000
list managed by server 20,000 17,000