File: overview.rst

package info (click to toggle)
python-parsl 2025.01.13%2Bds-1
links: PTS, VCS
area: main
in suites: sid, trixie
size: 12,072 kB
sloc: python: 23,817; makefile: 349; sh: 276; ansic: 45
file content (320 lines) | stat: -rw-r--r-- 16,042 bytes
Overview
========

Parsl is designed to enable straightforward parallelism and orchestration of asynchronous
tasks into dataflow-based workflows, in Python. Parsl manages the concurrent execution of
these tasks across various computation resources, from laptops to supercomputers,
scheduling each task only when its dependencies (e.g., input data dependencies) are met.

Developing a Parsl program is a two-step process:

1. Define Parsl apps by annotating Python functions to indicate that they can be executed concurrently.
2. Use standard Python code to invoke Parsl apps, creating asynchronous tasks and adhering to dependencies defined between apps.

We aim in this section to provide a mental model of how Parsl programs behave.
We discuss how Parsl programs create concurrent tasks, how tasks communicate,
and the nature of the environment on which Parsl programs can perform
operations. In each case, we compare and contrast the behavior of Python
programs that use Parsl constructs with those of conventional Python
programs.

.. note::
	The behavior of a Parsl program can vary in minor respects depending on the
	Executor used (see :ref:`label-execution`). We focus here on the behavior seen when
	using the recommended `parsl.executors.HighThroughputExecutor` (HTEX).

Parsl and Concurrency
---------------------
Any call to a Parsl app creates a new task that executes concurrently with the
main program and any other task(s) that are currently executing. Different
tasks may execute on the same nodes or on different nodes, and on the same or
different computers.

The Parsl execution model thus differs from the Python native execution model,
which is inherently sequential. A Python program that does not contain Parsl
constructs, or make use of other concurrency mechanisms, executes statements
one at a time, in the order that they appear in the program. This behavior is
illustrated in the following figure, which shows a Python program on the left
and, on the right, the statements executed over time when that program is run,
from top to bottom. Each time that the program calls a function, control passes
from the main program (in black) to the function (in red). Execution of the
main program resumes only after the function returns.

.. image:: ../images/overview/python-concurrency.png
   :scale: 70
   :align: center

In contrast, the Parsl execution model is inherently concurrent. Whenever a
program calls an app, a separate thread of execution is created, and the main
program continues without pausing. Thus in the example shown in the figure
below. There is initially a single task: the main program (black). The first
call to ``double`` creates a second task (red) and the second call to ``double``
creates a third task (orange). The second and third task terminate as the
function that they execute returns. (The dashed lines represent the start and
finish of the tasks). The calling program will only block (wait) when it is
explicitly told to do so (in this case by calling ``result()``)

.. image:: ../images/overview/parsl-concurrency.png


.. note::
	Note: We talk here about concurrency rather than parallelism for a reason.
	Two activities are concurrent if they can execute at the same time. Two
	activities occur in parallel if they do run at the same time. If a Parsl
	program creates more tasks that there are available processors, not all
	concurrent activities may run in parallel.


Parsl and Execution
-------------------
We have now seen that Parsl tasks are executed concurrently alongside the main
Python program and other Parsl tasks. We now turn to the question of how and
where are those tasks executed. Given the range of computers on which parallel
programs may be executed, Parsl allows tasks to be executed using different
executors (:py:class:`parsl.executors`). Executors are responsible for taking a queue of tasks and executing
them on local or remote resources.

We briefly describe two of Parsl's most commonly used executors.
Other executors are described in :ref:`label-execution`.

The `parsl.executors.HighThroughputExecutor` (HTEX) implements a *pilot job model* that enables
fine-grain task execution using across one or more provisioned nodes.
HTEX can be used on a single node (e.g., a laptop) and will make use of
multiple processes for concurrent execution.
As shown in the following figure, HTEX uses Parsl's provider abstraction (:py:class:`parsl.providers`) to
communicate with a resource manager (e.g., batch scheduler or cloud API) to
provision a set of nodes (e.g., Parsl will use Slurm’s sbatch command to request
nodes on a Slurm cluster) for the duration of execution.
HTEX deploys a lightweight worker agent on the nodes which subsequently connects
back to the main Parsl process. Parsl tasks are then sent from the main program
to the connected workers for execution and the results are sent back via the
same mechanism. This approach has a number of advantages over other methods:
it avoids long job scheduler queue delays by acquiring one set of resources
for the entire program and it allows for scheduling of many tasks on individual
nodes.

.. image:: ../images/overview/htex-model.png

.. Note:
	Note: when deploying HTEX, or any pilot job model such as the
	WorkQueueExecutor, it is important that the worker nodes be able to connect
	back to the main Parsl process. Thus, you should verify that there is network
  connectivity between the workers and the Parsl process and ensure that the
	correct network address is used by the workers. Parsl provides a helper
	function to automatically detect network addresses
	(`parsl.addresses.address_by_query`).


The `parsl.executors.ThreadPoolExecutor` allows tasks to be executed on a pool of locally
accessible threads. As execution occurs on the same computer, on a pool of
threads forked from the main program, the tasks share memory with one another
(this is discussed further in the following sections).


Parsl and Communication
-----------------------
Parsl tasks typically need to communicate in order to perform useful work.
Parsl provides for two forms of communication: by parameter passing
and by file passing.
As described in the next section, Parsl programs may also communicate by
interacting with shared filesystems and services its environment.

Parameter Passing
^^^^^^^^^^^^^^^^^

The figure above illustrates communication via parameter passing.
The call ``double(3)`` to the app ``double`` in the main program creates a new task
and passes the parameter value, 3, to that new task. When the task completes
execution, its return value, 6, is returned to the main program. Similarly, the
second task is passed the value 5 and returns the value 10. In this case, the
parameters passed are simple primitive types (i.e., integers); however, complex
objects (e.g., Numpy Arrays, Pandas DataFrames, custom objects) can also be
passed to/from tasks.

File Passing
^^^^^^^^^^^^
Parsl supports communication via files in both Bash apps and Python apps.
Files may be used in place of parameter passing for many reasons, such as for
apps are designed to support files, when data to be exchanged are large,
or when data cannot be easily serialized into Python objects.
As Parsl tasks may be executed on remote nodes, without shared file systems,
Parsl offers a Parsl :py:class:`parsl.data_provider.files.File` construct for location-independent reference
to files. Parsl will translate file objects to worker-accessible paths
when executing dependent apps.
Parsl is also able to transfer files in, out, and between Parsl
apps using one of several methods (e.g., FTP, HTTP(S), Globus and rsync).
To accommodate the asynchronous nature of file transfer, Parsl treats
data movement like a Parsl app, adding a dependency to the execution graph
and waiting for transfers to complete before executing dependent apps.
More information is provided in  :ref:`label-data`).

Futures
^^^^^^^
Communication via parameter and file passing also serves a second purpose, namely
synchronization. As we discuss in more detail in :ref:`label-futures`, a call to an
app returns a special object called a future that has a special unassigned
state until such time as the app returns, at which time it takes the return
value. (In the example program, two futures are thus created, d1 and d2.) The
AppFuture function result() blocks until the future to which it is applied takes
a value. Thus the print statement in the main program blocks until both child
tasks created by the calls to the double app return. The following figure
captures this behavior, with time going from left to right rather than top to
bottom as in the preceding figure. Task 1 is initially active as it starts
Tasks 2 and 3, then blocks as a result of calls to d1.result() and d2.result(),
and when those values are available, is active again.

.. image:: ../images/overview/communication.png

The Parsl Environment
---------------------
Regular Python and Parsl-enhanced Python differ in terms of the environment in
which code executes. We use the term *environment* here to refer to the
variables and modules (the *memory environment*), the file system(s)
(the *file system environment*), and the services (the *service environment*)
that are accessible to a function.

An important question when it comes to understanding the behavior of Parsl
programs is the environment in which this new task executes: does it have the
same or different memory, file system, or service environment as its parent
task or any other task? The answer, depends on the executor used, and (in the
case of the file system environment) where the task executes.
Below we describe behavior for the most commonly used `parsl.executors.HighThroughputExecutor`
which is representative of all Parsl executors except the `parsl.executors.ThreadPoolExecutor`.

.. Warning:
	The `parsl.executors.ThreadPoolExecutor` behaves differently than other Parsl executors as
	it allows tasks to share memory.

Memory environment
^^^^^^^^^^^^^^^^^^

In Python, the variables and modules that are accessible to a function are defined
by Python scoping rules, by which a function has access to both variables defined
within the function (*local* variables) and those defined outside the function
(*global* variables). Thus in the following code, the print statement in the
print_answer function accesses the global variable "answer", and we see as output
"the answer is 42."

.. code-block:: python

    answer = 42

    def print_answer():
        print('the answer is', answer)

    print_answer()


In Parsl (except when using the `parsl.executors.ThreadPoolExecutor`) a Parsl app is executed
in a distinct environment that only has access to local variables associated
with the app function. Thus, if the program above is executed with say the
`parsl.executors.HighThroughputExecutor`, will print "the answer is 0" rather than "the answer
is 42," because the print statement in provide_answer does not have access to
the global variable that has been assigned the value 42.  The program will
run without errors when using the `parsl.executors.ThreadPoolExecutor`.

Similarly, the same scoping rules apply to import statements, and thus
the following program will run without errors with the `parsl.executors.ThreadPoolExecutor`,
but raise errors when run with any other executor, because the return statement
in ``ambiguous_double`` refers to a variable (factor) and a module (random) that are
not known to the function.

.. code-block:: python

    import random
    factor = 5

    @python_app
    def ambiguous_double(x):
        return x * random.random() * factor

    print(ambiguous_double(42))


To allow this program to run correctly with all Parsl executors, the random
library must be imported within the app, and the factor variable must be
passed as an argument, as follows.

.. code-block:: python

    import random
    factor = 5

    @python_app
    def good_double(factor, x):
        import random
        return x * random.random() * factor

    print(good_double(factor, 42))


File system environment
^^^^^^^^^^^^^^^^^^^^^^^

In a regular Python program the environment that is accessible to a Python
program also includes the file system(s) of the computer on which it is
executing.
Thus in the following code, a value written to a file "answer.txt" in the
current directory can be retrieved by reading the same file, and the print
statement outputs "the answer is 42."

.. code-block:: python

    def print_answer_file():
        with open('answer.txt','r') as f:
            print('the answer is',  f.read())

    with open('answer.txt','w') as f:
        f.write('42')
        f.close()

    print_answer_file()


The question of which file system environment is accessible to a Parsl app
depends on where the app executes. If two tasks run on nodes that share a
file system, then those tasks (e.g., tasks A and B in the figure below,
but not task C) share a file system environment. Thus the program above will
output "the answer is 42" if the parent task and the child task run on
nodes 1 and 2, but not if they run on nodes 2 and 3.

.. image:: ../images/overview/filesystem.png
   :scale: 70
   :align: center

Service Environment
^^^^^^^^^^^^^^^^^^^

We use the term service environment to refer to network services that may be
accessible to a Parsl program, such as a Redis server or Globus data management
service. These services are accessible to any task.

Environment Summary
^^^^^^^^^^^^^^^^^^^

As we summarize in the table, if tasks execute with the `parsl.executors.ThreadPoolExecutor`,
they share the memory and file system environment of the parent task. If they
execute with any other executor, they have a separate memory environment, and
may or may not share their file system environment with other tasks, depending
on where they are placed. All tasks typically have access to the same network
services.

+--------------------+--------------------+--------------------+---------------------------+------------------+
|                    | Share memory       | Share file system  | Share file system         | Share service    |
|                    | environment with   | environment with   | environment with other    | environment      |
|                    | parent/other tasks | parent             | tasks                     | with other tasks |
+====================+====================+====================+===========================+==================+
+--------------------+--------------------+--------------------+---------------------------+------------------+
| Python             | Yes                | Yes                | N/A                       |     N/A          |
| without            |                    |                    |                           |                  |
| Parsl              |                    |                    |                           |                  |
+--------------------+--------------------+--------------------+---------------------------+------------------+
| Parsl              | Yes                | Yes                | Yes                       |     N/A          |
| ThreadPoolExecutor |                    |                    |                           |                  |
|                    |                    |                    |                           |                  |
+--------------------+--------------------+--------------------+---------------------------+------------------+
| Other Parsl        | No                 | If executed on the | If tasks are executed on  |     N/A          |
| executors          |                    | same node with     | the same node or with     |                  |
|                    |                    | file system access | access to the same file   |                  |
|                    |                    |                    | system                    |                  |
+--------------------+--------------------+--------------------+---------------------------+------------------+