File: local_execution.rst

package info (click to toggle)
python-ewoksdask 2.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 164 kB
  • sloc: python: 407; makefile: 3
file content (105 lines) | stat: -rw-r--r-- 3,125 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
Local execution
===============

Dask offers several `scheduler options <https://docs.dask.org/en/latest/scheduler-overview.html>`_
that allow workflow execution without requiring external services. These include:

- **Sequential**: Tasks are executed one after another, in a single thread.
- **Multi-threading**: Tasks are executed concurrently in multiple threads within the same process.
- **Multi-processing**: Tasks are executed concurrently in separate subprocesses of the current process.
- **Cluster**: A temporary Dask cluster is created for the duration of the workflow execution.

Example Workflow
-----------------

In the following example, we define a workflow with two independent nodes.
Since the nodes are not connected, they can execute in parallel:

.. code:: python

    from ewoksdask import execute_graph

    example_workflow = {
        "graph": {"id": "test"},
        "nodes": [
            {
                "id": "node1",
                "task_type": "method",
                "task_identifier": "time.sleep",
                "default_inputs": [{"name": 0, "value": 10}],
            },
            {
                "id": "node2",
                "task_type": "method",
                "task_identifier": "time.sleep",
                "default_inputs": [{"name": 0, "value": 10}],
            }
        ]
    }

    result = execute_graph(example_workflow, scheduler=..., scheduler_options=...)

You can choose different schedulers depending on your requirements:

Sequential
----------

Tasks are run one after another in a single thread.

.. code:: python

    result = execute_graph(example_workflow, scheduler=None)

Multi-Threading
---------------

Runs tasks in parallel using multiple threads.
This example completes in ~10 seconds using two threads:

.. code:: python

    result = execute_graph(
        workflow, scheduler="multithreading", scheduler_options={"num_workers": 2}
    )

Multi-Processing
----------------

Runs tasks in parallel using multiple subprocesses.
This example also completes in ~10 seconds using two processes:

.. code:: python

    if __name__ == "__main__":
        result = execute_graph(
            workflow,
            scheduler="multiprocessing",
            scheduler_options={"num_workers": 2, "context": "spawn"},
        )

.. note::

    The `if __name__ == "__main__":` guard is required when using the *spawn* context.

By default:

- On Linux, the default context is *fork*.

- On Windows/macOS, the default context is *spawn*.

See the `Python multiprocessing docs <https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods>`_ for details on contexts.

Cluster
-------

Creates a temporary local Dask cluster with the specified number of workers.
This example completes in ~10 seconds using two workers:

.. code:: python

    result = execute_graph(
        workflow, scheduler="cluster", scheduler_options={"n_workers": 2}
    )

Additional cluster configuration options are available in the
`Dask distributed.Client documentation <https://docs.dask.org/en/stable/futures.html?highlight=client#distributed.Client>`_.