File: automatic_dispatch.md

package info (click to toggle)
python-autoray 0.7.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,708 kB
  • sloc: python: 5,490; makefile: 20
file content (316 lines) | stat: -rw-r--r-- 10,896 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
# Automatic dispatch

The primary function of [`autoray`](autoray) is to enable writing high level
array / tensor code that is agnostic to the backend arrays being supplied.
It does this via ***'automatic dispatch'***, which has a few notable
differences to other approaches:

* It is automatic - generally neither you or the backend array library needs
  to implement any dispatch logic, instead [`autoray`](autoray) finds, if
  neccesary 'translates', and then caches the relevant functions when they are
  first called.

* It is specialized for array functions and treats [`numpy`](numpy) as the
  reference interface for call signatures of 'equivalent' functions, although
  it doesn't rely or numpy or require it to be installed.

* Despite this, there is no fixed API as such - if a backend can be
  inferred, and the relevant function imported, a [`do`](autoray.do) call is
  valid.


## Basics

The main function of [`autoray`](autoray) is [`do`](autoray.do), which takes a
function name followed by `*args` and `**kwargs`, and automatically looks up
(and caches) the correct backend function. There are four main ways that the
backend is inferred:

***1. Automatic backend:***

```python
do('sqrt', x)
```

Here the backend is inferred from ``x``. By default dispatch happens on the
first argument, but various functions (such as ``'stack'`` and ``'einsum'``)
know to dispatch on other arguments.

***2. Backend 'like' another array:***

```python
do('random.normal', size=(2, 3, 4), like=x)
```

Here the backend is inferred from another array and can thus be implicitly
propagated, even when functions take no array arguments. Some creation routines
such as ``"eye"`` and ``"zeros"`` will also set the default ``dtype`` and / or
device to match ``like`` in this case.

***3. Explicit backend:***

```python
do('einsum', eq, x, y, like='customlib')
```

Here one simply supplies the desired function backend explicitly.

***4. Context manager***

```python
with backend_like('autoray.lazy'):
    xy = do('tensordot', x, y, 1)
    z = do('trace', xy)
```

Here you set a default backend for a whole block of code. This default
overrides method 1. above but 2. and 3. still take precedence. The argument to
[`backend_like`](autoray.backend_like) can be a backend string or an example
array.


````{hint}
In all the above cases `do(fn_name, *args, like=like, **kwargs)` could be
replaced with:
```python
from autoray import numpy as np

np.fn_name(*args, like=like, **kwargs)
```
````

### Manual dispatch functions

You can manually break the process into two steps with the following functions:

* [`autoray.infer_backend`](autoray.infer_backend) - return the backend name
  for a single array.
* [`autoray.infer_backend_multi`](autoray.infer_backend_multi) - return the
  backend name based on multiple arrays.
* [`autoray.get_lib_fn`](autoray.get_lib_fn) - return the actual function for a
  given backend and function name.

If you know you are going to use a function repeatedly, you can thus avoid the
(albeit minor) overhead of dispatching each call separately, for instance:

```python
def matmul_chain(*arrays):
    # if the arrays might be a mix of backends, use infer_backend_multi,
    # but here we just dispatch on the first array
    backend = infer_backend(arrays[0])
    fn = get_lib_fn(backend, 'matmul')
    return functools.reduce(fn, arrays)
```

### Other special functions

There are a few high level functions that might be preferred to attribute
access, for reasons of consitency:

* [`autoray.shape`](autoray.shape) - return the shape of an array. In most
  cases `x.shape` is fine, but this ensures the output is `tuple[int]`
  and also works for builtins without calling `numpy`.
* [`autoray.ndim`](autoray.ndim) - return the number of dimensions of an array.
* [`autoray.size`](autoray.size) - return the total number of elements in an
  array
* [`autoray.dag`](autoray.dag) - return the adjoint of an array, i.e. the
  transpose with complex conjugation.

Functions for dealing with dtypes:

* [`autoray.get_dtype_name`](autoray.get_dtype_name) - return the name of the
  dtype of an array as a string
* [`autoray.to_backend_dtype`](autoray.to_backend_dtype) - turn a string
  specified dtype into the equivalent dtype for a given backend
* [`autoray.astype`](autoray.astype) - cast an array to a given dtype,
  specified as a string.

And for converting any array to a numpy array:

* [`autoray.to_numpy`](autoray.to_numpy)

```{hint}
All of these can be called via [`do`](autoray.do) as well, e.g.
`do('shape', x)`.
```


## Backends

In [`autoray`](autoray) a backend internally is simply specified by a string.
By default, the `backend` of an array is name of the library that the class is
defined in, and the relevant functions are assumed to be in the namespace of
`backend`. If that is the case (e.g. `cupy`), then that library is already
compatible with `autoray`. Note all backend lookups are cached on
`obj.__class__` for speed.

`autoray` also handles common cases where the functions are in a different
library or sub-module (such as `jax -> jax.numpy`). This requires a simple
mapping to be specified, which `autoray` does for various libraries.

You can explicitly register a backend name (and thus default location) for a
specific class with the function
[`register_backend`](autoray.register_backend):

```python
register_backend(mylib.myobjs.MyClass, 'mylib.myfuncs')
```
Now when `autoray` encounters an instance of `MyClass` it will look for
functions in `mylib.myfuncs` instead of `mylib`. You could also use an
arbitrary name for the backend, and then alias it to the correct location
separately.


````{note}
`autoray` is aware of the `scipy` namespace and relevant submodules for
`numpy`, `cupy`, `jax`, for example:

```python
do('scipy.linalg.exp', x)
```
````

## Functions

Once a `backend` is inferred and the location of the relevant functions is
known, `autoray` tries to import and cache the relevant function from that
namespace. Many libraries (e.g. `cupy`, `dask`, `jax`, `autograd`, `sparse`,
...) actively mirror the `numpy` API, so there is little else to be done.

Some other libraries (e.g. `tensorflow`, `pytorch`, ...) diverge from the
`numpy` API more, and yet have largely equivalent functions, simply defined in
slight different places with different names and / or signatures. `autoray`
has a simple translation mechanism for:

* when functions are in a different module (e.g.
  `'trace' -> tensorflow.linalg.trace`)
* when functions have a different name (e.g. `'sum' -> tensorflow.reduce_sum`)
* when functions have a different signature (e.g.
  `tensordot(a, b, axes) -> torch.tensordot(a, b, dims)`)

If you want to directly provide a missing or *alternative* implementation of
some function for a particular backend you can swap one in with
[`register_function`](autoray.register_function):

```python
def my_custom_torch_svd(x):
    import torch

    print('Hello SVD!')
    u, s, v = torch.svd(x)

    return u, s, v.T

ar.register_function('torch', 'linalg.svd', my_custom_torch_svd)

x = ar.do('random.uniform', size=(3, 4), like='torch')

ar.do('linalg.svd', x)
# Hello SVD!
# (tensor([[-0.5832,  0.6188, -0.5262],
#          [-0.5787, -0.7711, -0.2655],
#          [-0.5701,  0.1497,  0.8078]]),
#  tensor([2.0336, 0.8518, 0.4572]),
#  tensor([[-0.4568, -0.3166, -0.6835, -0.4732],
#          [-0.5477,  0.2825, -0.2756,  0.7377],
#          [ 0.2468, -0.8423, -0.0993,  0.4687]]))
```

If you want to make use of the existing function you can supply ``wrap=True``
in which case the custom function supplied should act like a decorator:

```python
def my_custom_sum_wrapper(old_fn):

    def new_fn(*args, **kwargs):
        print('Hello sum!')
        return old_fn(*args **kwargs)

    return new_fn

ar.register_function('torch', 'sum', my_custom_sum_wrapper, wrap=True)

ar.do('sum', x)
# Hello sum!
# tensor(5.4099)
```

Though be careful, if you call [`register_function`](autoray.register_function)
again it will now wrap the
*new* function! Note you can combine
[`register_backend`](autoray.register_backend) and
[`register_function`](autoray.register_function) to dynamically define array
types and functions from anywhere. See also
[`register_dispatch`](autoray.register_dispatch) for
controlling which arguments are used to infer the backend for any function.


### Composing new functions

Sometimes you want to define a function that is composed of many array
functions, but you want to dispatch at the level of the whole block, not each
individual call, or indeed use a completely different implementation. For
instance, you might want to use a [`numba`](https://numba.pydata.org/) or
[`pythran`](https://pythran.readthedocs.io/en/latest/) compiled version for
`numpy`.

The [`autoray.compose`](autoray.compose) function allows you to do this. You
decorate a function, that forms the default implementation, then you can
register alternative implementations for specific backends. For instance:

```python
from autoray import compose
from numba import njit

@compose
def my_func(x):
    # get how many elements are needed to sum to 20
    return ar.do('sum', ar.do('cumsum', x, 0) < 20)

# register a numba implementation
@my_func.register('numpy')
@njit
def my_func_numba(x):
    s = 0.0
    i = 0
    while s < 20:
        s += x[i]
        i += 1
    return i - 1

# any calls like this now dispatch to my_func_numba
do('my_func', x_numpy)
```


### Deviations from `numpy`

As stated above, `autoray` does not have an explicit API, but where there exist
equivalent functions, `autoray` uses the call signature of `numpy` as a
reference. The following are deviations from this:

* `do('linalg.svd', x)` - `autoray` defaults to `full_matrices=False`, since
  this is generally always desired, and many libraries do not even support
  `full_matrices=True`.


-------------------------------------------------------------------------------

## Comparison to alternatives

* The ``__array_function__`` protocol has been
  [suggested](https://www.numpy.org/neps/nep-0018-array-function-protocol.html)
  and now implemented in ``numpy``. This will hopefully eventually be a nice
  solution for array dispatch. However, it requires the backend library to
  implement the protocol, which has not been done for common libraries yet.

* The [uarray](https://github.com/Quansight-Labs/uarray) project appears to
  have similar goals but is still being developed.

* [`functools.singledispatch`](https://docs.python.org/3/library/functools.html#functools.singledispatch) is a general *single* dispatch mechanism, but it is slower
  and requires the user to explicitly register each function they want to
  dispatch on.

* [`plum`](https://github.com/beartype/plum) is a general *multiple* dispatch
  mechanism, but again it would require registering every function for every
  backend explicitly.