File: how-to-create-lists.md

package info (click to toggle)
python-awkward 2.6.5-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 23,088 kB
  • sloc: python: 148,689; cpp: 33,562; sh: 432; makefile: 21; javascript: 8
file content (266 lines) | stat: -rw-r--r-- 7,383 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.14.1
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

How to create arrays of lists
=============================

```{code-cell} ipython3
import awkward as ak
import numpy as np
```

From Python lists
-----------------

If you have a collection of Python lists, the easiest way to turn them into an Awkward Array is to pass them to the {class}`ak.Array` constructor, which recognizes a non-dict, non-NumPy iterable and calls {func}`ak.from_iter`.

```{code-cell} ipython3
python_lists = [[1, 2, 3], [], [4, 5], [6], [7, 8, 9, 10]]
python_lists
```

```{code-cell} ipython3
awkward_array = ak.Array(python_lists)
awkward_array
```

The lists of lists can be arbitrarily deep.

```{code-cell} ipython3
python_lists = [[[[], [1, 2, 3]]], [[[4, 5]]], []]
python_lists
```

```{code-cell} ipython3
awkward_array = ak.Array(python_lists)
awkward_array
```

The "`var *`" in the type string indicates nested levels of variable-length lists. This is an array of lists of lists of lists of integers.

+++

The advantage of the Awkward Array is that the numerical data are now all in one array buffer and calculations are vectorized across the array, such as NumPy [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html).

```{code-cell} ipython3
np.sqrt(awkward_array)
```

Unlike Python lists, arrays consist of a homogeneous type. A Python list wouldn't notice if numerical data were given at two different levels of nesting, but that's a big difference to an Awkward Array.

```{code-cell} ipython3
union_array = ak.Array([[[[], [1, 2, 3]]], [[4, 5]], []])
union_array
```

In this example, the data type is a "union" of two levels deep and three levels deep.

```{code-cell} ipython3
union_array.type
```

Some operations are possible with union arrays, but not all. ([Iteration in Numba](https://github.com/scikit-hep/awkward-1.0/issues/174) is one such example.)

+++

From NumPy arrays
-----------------

The {class}`ak.Array` constructor loads NumPy arrays differently from Python lists. The inner dimensions of a NumPy array are guaranteed to have the same lengths, so they are interpreted as a fixed-length list type.

```{code-cell} ipython3
numpy_array = np.arange(2 * 3 * 5).reshape(2, 3, 5)
numpy_array
```

```{code-cell} ipython3
regular_array = ak.Array(numpy_array)
regular_array
```

The type in this case has no "`var *`" in it, only "`2 *`", "`3 *`", and "`5 *`". It's a length-2 array of length-3 lists containing length-5 lists of integers.

+++

Furthermore, if NumPy arrays are _nested within_ Python lists (or other iterables), they'll be treated as variable-length ("`var *`") because there's no guarantee at the start of a sequence that all NumPy arrays in the sequence will have the same shape.

```{code-cell} ipython3
numpy_arrays = [
    np.arange(3 * 5).reshape(3, 5),
    np.arange(3 * 5, 2 * 3 * 5).reshape(3, 5),
]
numpy_arrays
```

```{code-cell} ipython3
irregular_array = ak.Array(numpy_arrays)
irregular_array
```

Both `regular_array` and `irregular_array` have the same data values:

```{code-cell} ipython3
regular_array.to_list() == irregular_array.to_list()
```

but they have different types:

```{code-cell} ipython3
regular_array.type, irregular_array.type
```

This can make a difference in some operations, such as [broadcasting](how-to-math-broadcasting).

+++

If you want more control over this, use the explicit {func}`ak.from_iter` and {func}`ak.from_numpy` functions instead of the general-purpose {class}`ak.Array` constructor.

+++

Unflattening
------------

Another difference between {func}`ak.from_iter` and {func}`ak.from_numpy` is that iteration over Python lists is slow and necessarily copies the data, whereas ingesting a NumPy array is zero-copy. (You can see that it's zero copy by [changing the data in-place](how-to-convert-numpy.html#mutability-of-awkward-arrays-from-numpy).)

In some cases, list-making can be vectorized. If you have a flat NumPy array of data and an array of "counts" that add up to the length of the data, then you can {func}`ak.unflatten` it.

```{code-cell} ipython3
data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
counts = np.array([3, 0, 1, 4])

unflattened = ak.unflatten(data, counts)
unflattened
```

The first list has length `3`, the second has length `0`, the third has length `1`, and the last has length `4`. This is close to Awkward Array's internal representation of variable-length lists, so it can be performed quickly.

+++

This function is named {func}`ak.unflatten` because it has the opposite effect as {func}`ak.flatten` and {func}`ak.num`:

```{code-cell} ipython3
ak.flatten(unflattened)
```

```{code-cell} ipython3
ak.num(unflattened)
```

With ArrayBuilder
-----------------

{class}`ak.ArrayBuilder` is described in more detail [in this tutorial](how-to-create-arraybuilder), but you can also construct arrays of lists using the `begin_list`/`end_list` methods or the `list` context manager.

(This is what {func}`ak.from_iter` uses internally to accumulate lists.)

```{code-cell} ipython3
builder = ak.ArrayBuilder()

builder.begin_list()
builder.append(1)
builder.append(2)
builder.append(3)
builder.end_list()

builder.begin_list()
builder.end_list()

builder.begin_list()
builder.append(4)
builder.append(5)
builder.end_list()

array = builder.snapshot()
array
```

```{code-cell} ipython3
builder = ak.ArrayBuilder()

with builder.list():
    builder.append(1)
    builder.append(2)
    builder.append(3)

with builder.list():
    pass

with builder.list():
    builder.append(4)
    builder.append(5)

array = builder.snapshot()
array
```

In Numba
--------

Functions that Numba Just-In-Time (JIT) compiles can use {class}`ak.ArrayBuilder` or construct flat data and "counts" arrays for {func}`ak.unflatten`.

([At this time](https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#language), Numba can't use context managers, the `with` statement, in fully compiled code. {class}`ak.ArrayBuilder` can't be constructed or converted to an array using `snapshot` inside a JIT-compiled function, but can be outside the compiled context. Similarly, `ak.*` functions like {func}`ak.unflatten` can't be called inside a JIT-compiled function, but can be outside.)

```{code-cell} ipython3
import numba as nb
```

```{code-cell} ipython3
@nb.jit
def append_list(builder, start, stop):
    builder.begin_list()
    for x in range(start, stop):
        builder.append(x)
    builder.end_list()


@nb.jit
def example(builder):
    append_list(builder, 1, 4)
    append_list(builder, 999, 999)
    append_list(builder, 4, 6)
    return builder


builder = example(ak.ArrayBuilder())

array = builder.snapshot()
array
```

```{code-cell} ipython3
@nb.jit
def append_list(i, data, j, counts, start, stop):
    for x in range(start, stop):
        data[i] = x
        i += 1
    counts[j] = stop - start
    j += 1
    return i, j


@nb.jit
def example():
    data = np.empty(5, np.int64)
    counts = np.empty(3, np.int64)
    i, j = 0, 0
    i, j = append_list(i, data, j, counts, 1, 4)
    i, j = append_list(i, data, j, counts, 999, 999)
    i, j = append_list(i, data, j, counts, 4, 6)
    return data, counts


data, counts = example()

array = ak.unflatten(data, counts)
array
```