1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
|
.. _start_methods:
Process start method
====================
.. contents:: Contents
:depth: 2
:local:
The ``multiprocessing`` package allows you to start processes using a few different methods: ``'fork'``, ``'spawn'`` or
``'forkserver'``. Threading is also available by using ``'threading'``. For detailed information on the multiprocessing
contexts, please refer to the multiprocessing documentation_ and caveats_ section. In short:
fork
Copies the parent process such that the child process is effectively identical. This includes copying everything
currently in memory. This is sometimes useful, but other times useless or even a serious bottleneck. ``fork``
enables the use of copy-on-write shared objects (see :ref:`shared_objects`).
spawn
Starts a fresh python interpreter where only those resources necessary are inherited.
forkserver
First starts a server process (using ``'spawn'``). Whenever a new process is needed the parent process requests the
server to fork a new process.
threading
Starts child threads. Suffers from the Global Interpreter Lock (GIL), but works fine for I/O intensive tasks.
For an overview of start method availability and defaults, please refer to the following table:
.. list-table::
:header-rows: 1
* - Start method
- Available on Unix
- Available on Windows
* - ``fork``
- Yes (default)
- No
* - ``spawn``
- Yes
- Yes (default)
* - ``forkserver``
- Yes
- No
* - ``threading``
- Yes
- Yes
Spawn and forkserver
--------------------
When using ``spawn`` or ``forkserver`` as start method, be aware that global variables (constants are fine) might have a
different value than you might expect. You also have to import packages within the called function:
.. code-block:: python
import os
def failing_job(folder, filename):
return os.path.join(folder, filename)
# This will fail because 'os' is not copied to the child processes
with WorkerPool(n_jobs=2, start_method='spawn') as pool:
pool.map(failing_job, [('folder', '0.p3'), ('folder', '1.p3')])
.. code-block:: python
def working_job(folder, filename):
import os
return os.path.join(folder, filename)
# This will work
with WorkerPool(n_jobs=2, start_method='spawn') as pool:
pool.map(working_job, [('folder', '0.p3'), ('folder', '1.p3')])
A lot of effort has been put into making the progress bar, dashboard, and nested pools (with multiple progress bars)
work well with ``spawn`` and ``forkserver``. So, everything should work fine.
.. _documentation: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
.. _caveats: https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
|