File: futures.rst

package info (click to toggle)
python-parsl 2025.01.13%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 12,072 kB
  • sloc: python: 23,817; makefile: 349; sh: 276; ansic: 45
file content (165 lines) | stat: -rw-r--r-- 7,396 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
.. _label-futures:

Futures
=======

When an ordinary Python function is invoked in a Python program, the Python interpreter waits for the function to complete execution
before proceeding to the next statement. 
But if a function is expected to execute for a long period of time, it may be preferable not to wait for
its completion but instead to proceed immediately with executing subsequent statements.
The function can then execute concurrently with that other computation.

Concurrency can be used to enhance performance when independent activities
can execute on different cores or nodes in parallel. The following
code fragment demonstrates this idea, showing that overall execution time
may be reduced if the two function calls are executed concurrently. 

.. code-block:: python

    v1 = expensive_function(1)
    v2 = expensive_function(2)
    result = v1 + v2
     
However, concurrency also introduces a need for **synchronization**.
In the example, it is not possible to compute the sum of ``v1`` and ``v2`` 
until both function calls have completed.
Synchronization provides a way of blocking execution of one activity
(here, the statement ``result = v1 + v2``) until other activities 
(here, the two calls to ``expensive_function()``) have completed.

Parsl supports concurrency and synchronization as follows. 
Whenever a Parsl program calls a Parsl app (a function annotated with a Parsl
app decorator, see :ref:`apps`),
Parsl will create a new ``task`` and immediately return a 
`future <https://en.wikipedia.org/wiki/Futures_and_promises>`_ in lieu of that function's result(s). 
The program will then continue immediately to the next statement in the program.
At some point, for example when the task's dependencies are met and there
is available computing capacity, Parsl will execute the task. Upon
completion, Parsl will set the value of the future to contain the task's 
output. 

A future can be used to track the status of an asynchronous task. 
For example, after creation, the future may be interrogated to determine 
the task's status (e.g., running, failed, completed), access results, 
and capture exceptions. Further, futures may be used for synchronization, 
enabling the calling Python program to block until the future 
has completed execution. 

Parsl provides two types of futures: `AppFuture` and `DataFuture`. 
While related, they enable subtly different parallel patterns.

AppFutures
----------

AppFutures are the basic building block upon which Parsl programs are built. Every invocation of a Parsl app returns an AppFuture that may be used to monitor and manage the task's execution.
AppFutures are inherited from Python's `concurrent library <https://docs.python.org/3/library/concurrent.futures.html>`_.
They provide three key capabilities:

1. An AppFuture's ``result()`` function can be used to wait for an app to complete, and then access any result(s).
This function is blocking: it returns only when the app completes or fails. 
The following code fragment implements an example similar to the ``expensive_function()`` example above.
Here, the ``sleep_double`` app simply doubles the input value. The program invokes
the ``sleep_double`` app twice, and returns futures in place of results. The example
shows how the future's ``result()`` function can be used to wait for the results from the 
two ``sleep_double`` app invocations to be computed.

.. code-block:: python

    @python_app
    def sleep_double(x):
        import time
        time.sleep(2)   # Sleep for 2 seconds
        return x*2

    # Start two concurrent sleep_double apps. doubled_x1 and doubled_x2 are AppFutures
    doubled_x1 = sleep_double(10)
    doubled_x2 = sleep_double(5)

    # The result() function will block until each of the corresponding app calls have completed
    print(doubled_x1.result() + doubled_x2.result())

2. An AppFuture's ``done()`` function can be used to check the status of an app, *without blocking*.
The following example shows that calling the future's ``done()`` function will not stop execution of the main Python program.

.. code-block:: python

    @python_app
    def double(x):
        return x*2

    # doubled_x is an AppFuture
    doubled_x = double(10)

     # Check status of doubled_x, this will print True if the result is available, else False
     print(doubled_x.done())

3. An AppFuture provides a safe way to handle exceptions and errors while asynchronously executing
apps. The example shows how exceptions can be captured in the same way as a standard Python program
when calling the future's ``result()`` function.

.. code-block:: python

    @python_app
    def bad_divide(x):
        return 6/x

    # Call bad divide with 0, to cause a divide by zero exception
    doubled_x = bad_divide(0)

    # Catch and handle the exception.
    try:
        doubled_x.result()
    except ZeroDivisionError as ze:
        print('Oops! You tried to divide by 0')
    except Exception as e:
        print('Oops! Something really bad happened')


In addition to being able to capture exceptions raised by a specific app, Parsl also raises ``DependencyErrors`` when apps are unable to execute due to failures in prior dependent apps. 
That is, an app that is dependent upon the successful completion of another app will fail with a dependency error if any of the apps on which it depends fail.


DataFutures
-----------

While an AppFuture represents the execution of an asynchronous app, 
a DataFuture represents a file to be produced by that app.
Parsl's dataflow model requires such a construct so that it can determine 
when dependent apps, apps that that are to consume a file produced by another app, 
can start execution. 

When calling an app that produces files as outputs, Parsl requires that a list of output files be specified (as a list of `File` objects passed in via the ``outputs`` keyword argument). Parsl will return a DataFuture for each output file as part AppFuture when the app is executed. 
These DataFutures are accessible in the AppFuture's ``outputs`` attribute.

Each DataFuture will complete when the App has finished executing,
and the corresponding file has been created (and if specified, staged out).

When a DataFuture is passed as an argument to a subsequent app invocation,
that subsequent app will not begin execution until the DataFuture is
completed. The input argument will then be replaced with an appropriate
File object.

The following code snippet shows how DataFutures are used. In this
example, the call to the echo Bash app specifies that the results
should be written to an output file ("hello1.txt"). The main
program inspects the status of the output file (via the future's
``outputs`` attribute) and then blocks waiting for the file to 
be created (``hello.outputs[0].result()``).

.. code-block:: python

      # This app echoes the input string to the first file specified in the
      # outputs list
      @bash_app
      def echo(message, outputs=()):
          return 'echo {} &> {}'.format(message, outputs[0])

      # Call echo specifying the output file
      hello = echo('Hello World!', outputs=[File('hello1.txt')])

      # The AppFuture's outputs attribute is a list of DataFutures
      print(hello.outputs)

      # Print the contents of the output DataFuture when complete
      with open(hello.outputs[0].result().filepath, 'r') as f:
           print(f.read())