File: how-to-math-numpy.md

package info (click to toggle)
python-awkward 2.6.5-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 23,088 kB
  • sloc: python: 148,689; cpp: 33,562; sh: 432; makefile: 21; javascript: 8
file content (187 lines) | stat: -rw-r--r-- 5,079 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.10.3
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

How to perform computations with NumPy
======================================

Awkward Array's integration with NumPy allows you to use NumPy's array functions on data with complex structures, including ragged and heterogeneous arrays. 

```{code-cell} ipython3
import awkward as ak
import numpy as np
```

## Universal functions (ufuncs)

[NumPy's universal functions (ufuncs)](https://numpy.org/doc/stable/reference/ufuncs.html) are functions that operate elementwise on arrays. They are broadcasting-aware, so they can naturally handle data structures like ragged arrays that are common in Awkward Arrays.

Here's an example of applying `np.sqrt`, a NumPy ufunc, to an Awkward Array:

```{code-cell} ipython3
data = ak.Array([[1, 4, 9], [], [16, 25]])

np.sqrt(data)
```

Notice that the ufunc applies to the numeric data, passing through all dimensions of nested lists, even if those lists have variable length. This also applies to heterogeneous data, in which the data are not all of the same type.

```{code-cell} ipython3
data = ak.Array([[1, 4, 9], [], 16, [[[25]]]])

np.sqrt(data)
```

Unary and binary operations on Awkward Arrays, such as `+`, `-`, `>`, and `==`, are actually calling NumPy ufuncs. For instance, `+`:

```{code-cell} ipython3
array1 = ak.Array([[1, 2, 3], [], [4, 5]])
array2 = ak.Array([[10, 20, 30], [], [40, 50]])

array1 + array2
```

is actually `np.add`:

```{code-cell} ipython3
np.add(array1, array2)
```

### Arrays with record fields

Ufuncs can only be applied to numerical data in lists, not records.

```{code-cell} ipython3
records = ak.Array([{"x": 4, "y": 9}, {"x": 16, "y": 25}])
```

```{code-cell} ipython3
---
editable: true
slideshow:
  slide_type: ''
tags: [raises-exception]
---
np.sqrt(records)
```

However, you can pull each field out of a record and apply the ufunc to it.

```{code-cell} ipython3
np.sqrt(records.x)
```

```{code-cell} ipython3
np.sqrt(records.y)
```

If you want the result wrapped up in a new array of records, you can use {func}`ak.zip` to do that.

```{code-cell} ipython3
ak.zip({"x": np.sqrt(records.x), "y": np.sqrt(records.y)})
```

Here's an idiom that would apply a ufunc to every field individually, and then wrap up the result as a new record with the same fields (using {func}`ak.fields`, {func}`ak.unzip`, and {func}`ak.zip`):

```{code-cell} ipython3
ak.zip({key: np.sqrt(value) for key, value in zip(ak.fields(records), ak.unzip(records))})
```

The reaons that Awkward Array does not do this automatically is to prevent mistakes: it's common for records to represent coordinates of data points, and if the coordinates are not Cartesian, the one-to-one application is not correct.

+++

### Using non-NumPy ufuncs

NumPy-compatible ufuncs exist in other libraries, like SciPy, and can be applied in the same way. Here’s how you can apply `scipy.special.gamma` and `scipy.special.erf`:

```{code-cell} ipython3
import scipy.special

data = ak.Array([[0.1, 0.2, 0.3], [], [0.4, 0.5]])
```

```{code-cell} ipython3
scipy.special.gamma(data)
```

```{code-cell} ipython3
scipy.special.erf(data)
```

You can even create your own ufuncs using Numba's `@nb.vectorize`:

```{code-cell} ipython3
import numba as nb

@nb.vectorize
def gcd_euclid(x, y):
    # computation that is more complex than a formula
    while y != 0:
        x, y = y, x % y
    return x
```

```{code-cell} ipython3
x = ak.Array([[10, 20, 30], [], [40, 50]])
y = ak.Array([[5, 40, 15], [], [24, 255]])
```

```{code-cell} ipython3
gcd_euclid(x, y)
```

Since Numba has JIT-compiled this function, it would run much faster on large arrays than custom Python code.

+++

## Non-ufunc NumPy functions

Some NumPy functions don't satisfy the ufunc protocol, but have been implemented for Awkward Arrays because they are useful. You can tell when a NumPy function has an Awkward Array implementation when a function with the same name and signature exists in both libraries.

For instance, `np.where` works on Awkward Arrays because {func}`ak.where` exists:

```{code-cell} ipython3
np.where(y % 2 == 0, x, y) 
```

(The above selects elements from `x` when `y` is even and elements from `y` when `y` is odd.)

Similarly, `np.concatenate` works on Awkward Arrays because {func}`ak.concatenate` exists:

```{code-cell} ipython3
np.concatenate([x, y])
```

```{code-cell} ipython3
np.concatenate([x, y], axis=1)
```

Other NumPy functions, without an equivalent in the Awkward Array library, will work only if the Awkward Array can be converted into a NumPy array.

Ragged arrays can't be converted to NumPy:

```{code-cell} ipython3
---
editable: true
slideshow:
  slide_type: ''
tags: [raises-exception]
---
np.fft.fft(ak.Array([[1.1, 2.2, 3.3], [], [7.7, 8.8, 9.9]]))
```

But arrays with equal-sized lists can:

```{code-cell} ipython3
np.fft.fft(ak.Array([[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [7.7, 8.8, 9.9]]))
```