1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
Starting a WorkerPool
=====================
.. contents:: Contents
:depth: 2
:local:
The :obj:`mpire.WorkerPool` class controls a pool of worker processes similarly to a ``multiprocessing.Pool``. It
contains all the ``map`` like functions (with the addition of :meth:`mpire.WorkerPool.map_unordered`), together with
the ``apply`` and ``apply_async`` functions (see :ref:`apply-family`).
An :obj:`mpire.WorkerPool` can be started in two different ways. The first and recommended way to do so is using a
context manager:
.. code-block:: python
from mpire import WorkerPool
# Start a pool of 4 workers
with WorkerPool(n_jobs=4) as pool:
# Do some processing here
pass
The ``with`` statement takes care of properly joining/terminating the spawned worker processes after the block has
ended.
The other way is to do it manually:
.. code-block:: python
# Start a pool of 4 workers
pool = WorkerPool(n_jobs=4)
# Do some processing here
pass
# Only needed when keep_alive=True:
# Clean up pool (this will block until all processing has completed)
pool.stop_and_join() # or use pool.join() which is an alias of stop_and_join()
# In the case you want to kill the processes, even though they are still busy
pool.terminate()
When using ``n_jobs=None`` MPIRE will spawn as many processes as there are CPUs on your system. Specifying more jobs
than you have CPUs is, of course, possible as well.
.. warning::
In the manual approach, the results queue should be drained before joining the workers, otherwise you can get a
deadlock. If you want to join either way, use :meth:`mpire.WorkerPool.terminate`. For more information, see the
warnings in the Python docs here_.
.. _here: https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues
Nested WorkerPools
------------------
By default, the :obj:`mpire.WorkerPool` class spawns daemon child processes who are not able to create child processes
themselves, so nested pools are not allowed. There's an option to create non-daemon child processes to allow for nested
structures:
.. code-block:: python
def job(...)
with WorkerPool(n_jobs=4) as p:
# Do some work
results = p.map(...)
with WorkerPool(n_jobs=4, daemon=True, start_method='spawn') as pool:
# This will raise an AssertionError telling you daemon processes
# can't start child processes
pool.map(job, ...)
with WorkerPool(n_jobs=4, daemon=False, start_method='spawn') as pool:
# This will work just fine
pool.map(job, ...)
.. note::
Nested pools aren't supported when using threading.
.. warning::
Spawning processes is not thread-safe_! Both ``start`` and ``join`` methods of the ``process`` class alter global
variables. If you still want to have nested pools, the safest bet is to use ``spawn`` as start method.
.. note::
Due to a strange bug in Python, using ``forkserver`` as start method in a nested pool is not allowed when the
outer pool is using ``fork``, as the forkserver will not have been started there. For it to work your outer pool
will have to have either ``spawn`` or ``forkserver`` as start method.
.. warning::
Nested pools aren't production ready. Error handling and keyboard interrupts when using nested pools can, on some
rare occassions (~1% of the time), still cause deadlocks. Use at your own risk.
When a function is guaranteed to finish successfully, using nested pools is absolutely fine.
.. _thread-safe: https://bugs.python.org/issue40860
|