File: config.rst

package info (click to toggle)
python-parsl 2026.02.09%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 12,144 kB
  • sloc: python: 24,446; makefile: 352; sh: 252; ansic: 45
file content (133 lines) | stat: -rw-r--r-- 4,680 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
Config Objects
==============

A :class:`~parsl.config.Config` object defines and implements how Parsl connects
to compute resources used to execute tasks.

The main part of the ``Config`` is a list of
**executors** which each define a type of worker.

Consider the following example, a configuration to run 7168 parallel workers on
the Frontera supercomputer at TACC:

.. code-block:: python

    from parsl.config import Config
    from parsl.providers import SlurmProvider
    from parsl.executors import HighThroughputExecutor
    from parsl.launchers import SrunLauncher
    from parsl.addresses import address_by_interface

    config = Config(
        executors=[
            HighThroughputExecutor(
                label="frontera_htex",
                address=address_by_interface('ib0'),
                max_workers_per_node=56,
                provider=SlurmProvider(
                    partition='normal',
                    nodes_per_block=128,
                    init_blocks=1,
                    launcher=SrunLauncher(),
                ),
            )
        ],
    )


The worker options are those for :class:`~parsl.executors.HighThroughputExecutor`,
the baseline Executor available in Parsl.
The options include a name for the workers of this type (``label``),
how the worker connects to the main Parsl process (``address``),
and how many workers to place on each compute node (``max_workers_per_node``).

The provider options define the queue used for submission (``partition``),
and tells Parsl request one job (``init_blocks``) of 128 nodes (``nodes_per_block``).

The launcher uses the default mechanism for starting programs on Frontera compute nodes, ``srun``.

Using a Config Object
---------------------

Use the ``Config`` object to start Parsl's data flow kernel with the ``parsl.load`` method:

.. code-block:: python

    from parsl.configs.htex_local import config
    import parsl

    with parsl.load(config) as dfk:
        future = app(x)
        future.wait()

The ``.load()`` function creates a DataFlowKernel ("DFK") object that maintains the workflows state.
The DFK acquires the resources used by Parsl and should be closed to release the resources.
While the DFK can be closed manually (``dfk.cleanup()``), the preferred and Pythonic route is use it as a context manager (the ``with`` statement)
Using a context manager avoids unnecessary code and ensures cleanup occurs even if the workflow fails.

The ``load`` statement can happen after Apps are defined but must occur before tasks are started.
Loading the Config object within context manager like ``with`` is recommended
for implicit cleaning of DFK on exiting the context manager.

The :class:`~parsl.config.Config` object may not be used again once loaded.
Consider a configuration function if the application will shut down and re-launch the DFK.

.. code-block:: python

    from parsl.config import Config
    import parsl

    def make_config() -> Config:
        return Config(...)

    with parsl.load(make_config()):
        # Your workflow here

    # Section which does not require Parsl

    with parsl.load(make_config()):
        # Another workflow here


Config Options
--------------

Options for the :class:`~parsl.config.Config` object apply to Parsl's general behavior
and affect all executors.
Common options include:

- ``run_dir`` for setting where Parsl writes log files
- ``retries`` to restart failed tasks
- ``usage_tracking`` to help Parsl `by reporting how you use it <../advanced/usage_tracking.html>`_

Consult the :py:class:`API documentation for Config <parsl.config.Config>`
or the `advanced documentation <../advanced/index.html>`_ to learn about options.

.. _config-multiple:

Multiple Executors
------------------

A single application can configure multiple executors.

All executors define a ``label`` field that is used
route to specific workers.
All types of apps include a ``executors`` option which takes
a list of executor labels.
For example, tasks from the following App will only run on an executor labelled "frontera_htex".

.. code-block:: python

    @python_app(executors=['frontera_htex'])
    def single_threaded_task(x: int):
        return x * 2 + 1


Consider using multiple executors in the following cases:

- *Different resource requirements between tasks*, such as a workflow
  with a simulation stage that runs on the CPU nodes of an HPC system
  followed by an analysis and visualization stage that runs on GPU nodes.
- *Different scales between workflow stages*, such as a workflow
  with a "fan-out" stage of many long running running on a cluster
  and quick "fan-in" computations which can run on fewer nodes.