1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
|
(asyncresult)=
# The AsyncResult object
In non-blocking mode, {meth}`~.View.apply`, {meth}`~.View.map`, and friends
submit the command to be executed and then return an {class}`~.AsyncResult` object immediately.
The AsyncResult object gives you a way of getting a result at a later
time through its {meth}`get` method, but it also collects metadata
on execution.
## Beyond stdlib AsyncResult and Future
The {class}`AsyncResult` is a subclass of {py:class}`concurrent.futures.Future`.
This means it can be integrated into existing async workflows,
with e.g. {py:func}`asyncio.wrap_future`.
It also extends the {py:class}`~.multiprocessing.AsyncResult` API.
```{seealso}
- {py:class}`multiprocessing.AsyncResult` API
- {py:class}`concurrent.futures.Future` API
```
In addition to these common features,
our AsyncResult objects add a number of convenient methods for working with parallel results,
beyond what is provided by the standard library classes on which they are based.
### get_dict
{meth}`.AsyncResult.get_dict` pulls results as a dictionary,
keyed by engine_id, rather than a flat list. This is useful for quickly
coordinating or distributing information about all of the engines.
As an example, here is a quick call that gives every engine a dict showing
the PID of every other engine:
```ipython
In [10]: ar = rc[:].apply_async(os.getpid)
In [11]: pids = ar.get_dict()
In [12]: rc[:]['pid_map'] = pids
```
This trick is particularly useful when setting up inter-engine communication,
as in IPython's {file}`examples/parallel/interengine` examples.
## Metadata
IPython Parallel tracks some metadata about the tasks, which is stored
in the {attr}`.Client.metadata` dict. The AsyncResult object gives you an
interface for this information as well, including timestamps stdout/err,
and engine IDs.
### Timing
IPython tracks various timestamps as {py:class}`.datetime` objects,
and the AsyncResult object has a few properties that turn these into useful
times (in seconds as floats).
For use while the tasks are still pending:
- {attr}`ar.elapsed` is the elapsed seconds since submission, for use
before the AsyncResult is complete.
- {attr}`ar.progress` is the number of tasks that have completed. Fractional progress
would be:
```
1.0 * ar.progress / len(ar)
```
- {meth}`AsyncResult.wait_interactive` will wait for the result to finish, but
print out status updates on progress and elapsed time while it waits.
For use after the tasks are done:
- {attr}`ar.serial_time` is the sum of the computation time of all of the tasks
done in parallel.
- {attr}`ar.wall_time` is the time between the first task submitted and last result
received. This is the actual cost of computation, including IPython overhead.
```{note}
wall_time is only precise if the Client is waiting for results when
the task finished, because the `received` timestamp is made when the result is
unpacked by the Client, triggered by the {meth}`~Client.spin` call. If you
are doing work in the Client, and not waiting/spinning, then `received` might
be artificially high.
```
An often interesting metric is the time it cost to do the work in parallel
relative to the serial computation, and this can be given with
```python
speedup = ar.serial_time / ar.wall_time
```
## Map results are iterable!
When an AsyncResult object has multiple results (e.g. the {class}`~AsyncMapResult`
object), you can iterate through results themselves, and act on them as they arrive:
```{literalinclude} ../examples/itermapresult.py
:language: python
:lines: 20-67
```
That is to say, if you treat an AsyncMapResult as if it were a list of your actual
results, it should behave as you would expect, with the only difference being
that you can start iterating through the results before they have even been computed.
This lets you do a simple version of map/reduce with the builtin Python functions,
and the only difference between doing this locally and doing it remotely in parallel
is using the asynchronous `view.map` instead of the builtin `map`.
Here is a simple one-line RMS (root-mean-square) implemented with Python's builtin map/reduce.
```ipython
In [38]: X = np.linspace(0,100)
In [39]: from math import sqrt
In [40]: add = lambda a,b: a+b
In [41]: sq = lambda x: x*x
In [42]: sqrt(reduce(add, map(sq, X)) / len(X))
Out[42]: 58.028845747399714
In [43]: sqrt(reduce(add, view.map(sq, X)) / len(X))
Out[43]: 58.028845747399714
```
To break that down:
1. `map(sq, X)` Compute the square of each element in the list (locally, or in parallel)
2. `reduce(add, sqX) / len(X)` compute the mean by summing over the list (or AsyncMapResult)
and dividing by the size
3. take the square root of the resulting number
```{seealso}
When AsyncResult or the AsyncMapResult don't provide what you need (for instance,
handling individual results as they arrive, but with metadata), you can always
split the original result's `msg_ids` attribute, and handle them as you like.
For an example of this, see {file}`examples/customresult.py`
```
|