File: how-to-use-in-numba-features.md

package info (click to toggle)
python-awkward 2.6.5-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 23,088 kB
  • sloc: python: 148,689; cpp: 33,562; sh: 432; makefile: 21; javascript: 8
file content (196 lines) | stat: -rw-r--r-- 6,836 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.10.3
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

Awkward Array features that are supported in Numba-compiled functions
=====================================================================

See the [Numba documentation](https://numba.readthedocs.io/), which maintains lists of

* [supported Python language features](https://numba.pydata.org/numba-doc/dev/reference/pysupported.html) and
* [supported NumPy library features](https://numba.readthedocs.io/en/stable/reference/numpysupported.html)

in JIT-compiled functions. This page describes the supported Awkward Array library features.

```{code-cell} ipython3
import awkward as ak
import numpy as np
import numba as nb
```

## Passing Awkward Arrays as arguments to a function

The main use is to pass an Awkward Array into a function that has been JIT-compiled by Numba. As many arguments as you want can be Awkward Arrays, and they don't have to have the same length or shape.

```{code-cell} ipython3
array1 = ak.Array([[0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
array2 = ak.Array([
    [{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}, {"x": 3.3, "y": [1, 2, 3]}],
    [],
    [{"x": 4.4, "y": [1, 2, 3, 4]}, {"x": 5.5, "y": [1, 2, 3, 4, 5]}]
])
```

```{code-cell} ipython3
@nb.jit
def first_array(array):
    for i, list_of_numbers in enumerate(array):
        for x in list_of_numbers:
            if x == 3.3:
                return i

@nb.jit
def second_array(array):
    for i, list_of_records in enumerate(array):
        for record in list_of_records:
            if record.x == 3.3:
                return i

@nb.jit
def where_is_3_point_3(a, b):
    return first_array(a), second_array(b)
```

```{code-cell} ipython3
where_is_3_point_3(array1, array2)
```

The only constraint is that union types can't be _accessed_ within the compiled function. (Heterogeneous _parts_ of an array can be ignored and passed through a compiled function.)

## Returning Awkward Arrays from a function

Parts of the input array can be returned from a compiled function.

```{code-cell} ipython3
@nb.jit
def first_array(array):
    for list_of_numbers in array:
        for x in list_of_numbers:
            if x == 3.3:
                return list_of_numbers

@nb.jit
def second_array(array):
    for list_of_records in array:
        for record in list_of_records:
            if record.x == 3.3:
                return record

@nb.jit
def find_3_point_3(a, b):
    return first_array(a), second_array(b)
```

```{code-cell} ipython3
found_a, found_b = find_3_point_3(array1, array2)
```

```{code-cell} ipython3
found_a
```

```{code-cell} ipython3
found_b
```

## Cannot use `ak.*` functions or ufuncs

Outside of a compiled function, Awkward's vectorized `ak.*` functions and NumPy's [universal functions (ufuncs)](https://numpy.org/doc/stable/reference/ufuncs.html) should be highly preferred over for-loop iteration because they are much faster.

Inside of a compiled function, however, they can't be used at all. Use for-loops and if-statements instead.

This is an either-or choice at the boundary of a `@nb.jit`-compiled function. (Even if `ak.*` had been implemented in Numba's compiled context, it would be slower than _compiled_ for-loops and if-statements because of the intermediate arrays they would necessarily create.)

## Cannot use fancy slicing

Similarly, any slicing other than

* a single integer, like `array[i]` where `i` is an integer, or
* a single record field as a _constant, literal_ string, like `array["x"]` or `array.x`,

is not allowed. Unpack the data structures one level at a time.

## Casting one-dimensional arrays as NumPy

One-dimensional Awkward Arrays of numbers, which are completely equivalent to NumPy arrays, can be _cast_ as NumPy arrays within the compiled function.

```{code-cell} ipython3
@nb.jit
def return_last_y_list_squared(array):
    y_list_squared = None
    for list_of_records in array:
        for record in list_of_records:
            y_list_squared = np.asarray(record.y)**2
    return y_list_squared
```

```{code-cell} ipython3
return_last_y_list_squared(array2)
```

This ability to cast Awkward Arrays as NumPy arrays, and then use NumPy's ufuncs or fancy slicing, softens the law against vectorized functions in the compiled context. (However, making intermediate NumPy arrays is just as bad as making intermediate Awkward Arrays.

## Creating new arrays with `ak.ArrayBuilder`

Numba can create NumPy arrays inside a compiled function and return them as NumPy arrays in Python, but Awkward Arrays are more complex and this is not possible. (Aside from implementation, what would be the interface? Data in Numba's compiled context must be fully typed, and Awkward Array types are complex.)

Instead, arrays can be built with {obj}`ak.ArrayBuilder`, which can be used in compiled contexts and discovers type dynamically. Each {obj}`ak.ArrayBuilder` must be instantiated outside of a compiled function and passed in, and then its {func}`ak.ArrayBuilder.snapshot` (which creates the {obj}`ak.Array`) must be called outside of the compiled function, like this:

```{code-cell} ipython3
@nb.jit
def create_ragged_array(builder, n):
    for i in range(n):
        builder.begin_list()
        for j in range(i):
            builder.integer(j)
        builder.end_list()
    return builder
```

```{code-cell} ipython3
builder = ak.ArrayBuilder()

create_ragged_array(builder, 10)

array = builder.snapshot()

array
```

or, more succintly,

```{code-cell} ipython3
create_ragged_array(ak.ArrayBuilder(), 10).snapshot()
```

Note that we didn't need to specify that the type of the data would be `var * int64`; this was determined by the way that {obj}`ak.ArrayBuilder` was called: {func}`ak.ArrayBuilder.integer` was only ever called between {func}`ak.ArrayBuilder.begin_list` and {func}`ak.ArrayBuilder.end_list`, and hence the type is `var * int64`.

Note that {obj}`ak.ArrayBuilder` can be used outside of compiled functions, too, so it can be tested interactively:

```{code-cell} ipython3
with builder.record():
    builder.field("x").real(3.14)
    with builder.field("y").list():
        builder.string("one")
        builder.string("two")
        builder.string("three")
```

```{code-cell} ipython3
builder.snapshot()
```

But the context managers, `with builder.record()` and `with builder.list()`, don't work in Numba-compiled functions because Numba does not yet support it as a language feature.

## Overriding behavior with `ak.behavior`

Just as behaviors can be customized for Awkward Arrays in general, they can be customized in the compiled context as well. See the last section of the {obj}`ak.behavior` reference for details.