File: task_chunking.rst

package info (click to toggle)
mpire 2.10.2-5
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,064 kB
  • sloc: python: 5,473; makefile: 209; javascript: 182
file content (60 lines) | stat: -rw-r--r-- 2,738 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
.. _Task chunking:

Task chunking
=============

.. contents:: Contents
    :depth: 2
    :local:

By default, MPIRE chunks the given tasks in to ``64 * n_jobs`` chunks. Each worker is given one chunk of tasks at a time
before returning its results. This usually makes processing faster when you have rather small tasks (computation wise)
and results are pickled/unpickled when they are send to a worker or main process. Chunking the tasks and results ensures
that each process has to pickle/unpickle less often.

However, to determine the number of tasks in the argument list the iterable should implement the ``__len__`` method,
which is available in default containers like ``list`` or ``tuple``, but isn't available in most generator objects
(the ``range`` object is one of the exceptions). To allow working with generators each ``map`` function has the option
to pass the iterable length:

.. code-block:: python

    with WorkerPool(n_jobs=4) as pool:
        # 1. This will issue a warning and sets the chunk size to 1
        results = pool.map(square, ((x,) for x in range(1000)))

        # 2. This will issue a warning as well and sets the chunk size to 1
        results = pool.map(square, ((x,) for x in range(1000)), n_splits=4)

        # 3. Square the numbers using a generator using a specific number of splits
        results = pool.map(square, ((x,) for x in range(1000)), iterable_len=1000, n_splits=4)

        # 4. Square the numbers using a generator using automatic chunking
        results = pool.map(square, ((x,) for x in range(1000)), iterable_len=1000)

        # 5. Square the numbers using a generator using a fixed chunk size
        results = pool.map(square, ((x,) for x in range(1000)), chunk_size=4)

In the first two examples the function call will issue a warning because MPIRE doesn't know how large the chunks should
be as the total number of tasks is unknown, therefore it will fall back to a chunk size of 1. The third example should
work as expected where 4 chunks are used. The fourth example uses 256 chunks (the default 64 times the number of
workers). The last example uses a fixed chunk size of four, so MPIRE doesn't need to know the iterable length.

You can also call the chunk function manually:

.. code-block:: python

    from mpire.utils import chunk_tasks

    # Convert to list because chunk_tasks returns a generator
    print(list(chunk_tasks(range(10), n_splits=3)))
    print(list(chunk_tasks(range(10), chunk_size=2.5)))
    print(list(chunk_tasks((x for x in range(10)), iterable_len=10, n_splits=6)))

will output:

.. code-block:: python

    [(0, 1, 2, 3), (4, 5, 6), (7, 8, 9)]
    [(0, 1, 2), (3, 4), (5, 6, 7), (8, 9)]
    [(0, 1), (2, 3), (4,), (5, 6), (7, 8), (9,)]