1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266
|
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.14.1
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---
How to create arrays of lists
=============================
```{code-cell} ipython3
import awkward as ak
import numpy as np
```
From Python lists
-----------------
If you have a collection of Python lists, the easiest way to turn them into an Awkward Array is to pass them to the {class}`ak.Array` constructor, which recognizes a non-dict, non-NumPy iterable and calls {func}`ak.from_iter`.
```{code-cell} ipython3
python_lists = [[1, 2, 3], [], [4, 5], [6], [7, 8, 9, 10]]
python_lists
```
```{code-cell} ipython3
awkward_array = ak.Array(python_lists)
awkward_array
```
The lists of lists can be arbitrarily deep.
```{code-cell} ipython3
python_lists = [[[[], [1, 2, 3]]], [[[4, 5]]], []]
python_lists
```
```{code-cell} ipython3
awkward_array = ak.Array(python_lists)
awkward_array
```
The "`var *`" in the type string indicates nested levels of variable-length lists. This is an array of lists of lists of lists of integers.
+++
The advantage of the Awkward Array is that the numerical data are now all in one array buffer and calculations are vectorized across the array, such as NumPy [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html).
```{code-cell} ipython3
np.sqrt(awkward_array)
```
Unlike Python lists, arrays consist of a homogeneous type. A Python list wouldn't notice if numerical data were given at two different levels of nesting, but that's a big difference to an Awkward Array.
```{code-cell} ipython3
union_array = ak.Array([[[[], [1, 2, 3]]], [[4, 5]], []])
union_array
```
In this example, the data type is a "union" of two levels deep and three levels deep.
```{code-cell} ipython3
union_array.type
```
Some operations are possible with union arrays, but not all. ([Iteration in Numba](https://github.com/scikit-hep/awkward-1.0/issues/174) is one such example.)
+++
From NumPy arrays
-----------------
The {class}`ak.Array` constructor loads NumPy arrays differently from Python lists. The inner dimensions of a NumPy array are guaranteed to have the same lengths, so they are interpreted as a fixed-length list type.
```{code-cell} ipython3
numpy_array = np.arange(2 * 3 * 5).reshape(2, 3, 5)
numpy_array
```
```{code-cell} ipython3
regular_array = ak.Array(numpy_array)
regular_array
```
The type in this case has no "`var *`" in it, only "`2 *`", "`3 *`", and "`5 *`". It's a length-2 array of length-3 lists containing length-5 lists of integers.
+++
Furthermore, if NumPy arrays are _nested within_ Python lists (or other iterables), they'll be treated as variable-length ("`var *`") because there's no guarantee at the start of a sequence that all NumPy arrays in the sequence will have the same shape.
```{code-cell} ipython3
numpy_arrays = [
np.arange(3 * 5).reshape(3, 5),
np.arange(3 * 5, 2 * 3 * 5).reshape(3, 5),
]
numpy_arrays
```
```{code-cell} ipython3
irregular_array = ak.Array(numpy_arrays)
irregular_array
```
Both `regular_array` and `irregular_array` have the same data values:
```{code-cell} ipython3
regular_array.to_list() == irregular_array.to_list()
```
but they have different types:
```{code-cell} ipython3
regular_array.type, irregular_array.type
```
This can make a difference in some operations, such as [broadcasting](how-to-math-broadcasting).
+++
If you want more control over this, use the explicit {func}`ak.from_iter` and {func}`ak.from_numpy` functions instead of the general-purpose {class}`ak.Array` constructor.
+++
Unflattening
------------
Another difference between {func}`ak.from_iter` and {func}`ak.from_numpy` is that iteration over Python lists is slow and necessarily copies the data, whereas ingesting a NumPy array is zero-copy. (You can see that it's zero copy by [changing the data in-place](how-to-convert-numpy.html#mutability-of-awkward-arrays-from-numpy).)
In some cases, list-making can be vectorized. If you have a flat NumPy array of data and an array of "counts" that add up to the length of the data, then you can {func}`ak.unflatten` it.
```{code-cell} ipython3
data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
counts = np.array([3, 0, 1, 4])
unflattened = ak.unflatten(data, counts)
unflattened
```
The first list has length `3`, the second has length `0`, the third has length `1`, and the last has length `4`. This is close to Awkward Array's internal representation of variable-length lists, so it can be performed quickly.
+++
This function is named {func}`ak.unflatten` because it has the opposite effect as {func}`ak.flatten` and {func}`ak.num`:
```{code-cell} ipython3
ak.flatten(unflattened)
```
```{code-cell} ipython3
ak.num(unflattened)
```
With ArrayBuilder
-----------------
{class}`ak.ArrayBuilder` is described in more detail [in this tutorial](how-to-create-arraybuilder), but you can also construct arrays of lists using the `begin_list`/`end_list` methods or the `list` context manager.
(This is what {func}`ak.from_iter` uses internally to accumulate lists.)
```{code-cell} ipython3
builder = ak.ArrayBuilder()
builder.begin_list()
builder.append(1)
builder.append(2)
builder.append(3)
builder.end_list()
builder.begin_list()
builder.end_list()
builder.begin_list()
builder.append(4)
builder.append(5)
builder.end_list()
array = builder.snapshot()
array
```
```{code-cell} ipython3
builder = ak.ArrayBuilder()
with builder.list():
builder.append(1)
builder.append(2)
builder.append(3)
with builder.list():
pass
with builder.list():
builder.append(4)
builder.append(5)
array = builder.snapshot()
array
```
In Numba
--------
Functions that Numba Just-In-Time (JIT) compiles can use {class}`ak.ArrayBuilder` or construct flat data and "counts" arrays for {func}`ak.unflatten`.
([At this time](https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#language), Numba can't use context managers, the `with` statement, in fully compiled code. {class}`ak.ArrayBuilder` can't be constructed or converted to an array using `snapshot` inside a JIT-compiled function, but can be outside the compiled context. Similarly, `ak.*` functions like {func}`ak.unflatten` can't be called inside a JIT-compiled function, but can be outside.)
```{code-cell} ipython3
import numba as nb
```
```{code-cell} ipython3
@nb.jit
def append_list(builder, start, stop):
builder.begin_list()
for x in range(start, stop):
builder.append(x)
builder.end_list()
@nb.jit
def example(builder):
append_list(builder, 1, 4)
append_list(builder, 999, 999)
append_list(builder, 4, 6)
return builder
builder = example(ak.ArrayBuilder())
array = builder.snapshot()
array
```
```{code-cell} ipython3
@nb.jit
def append_list(i, data, j, counts, start, stop):
for x in range(start, stop):
data[i] = x
i += 1
counts[j] = stop - start
j += 1
return i, j
@nb.jit
def example():
data = np.empty(5, np.int64)
counts = np.empty(3, np.int64)
i, j = 0, 0
i, j = append_list(i, data, j, counts, 1, 4)
i, j = append_list(i, data, j, counts, 999, 999)
i, j = append_list(i, data, j, counts, 4, 6)
return data, counts
data, counts = example()
array = ak.unflatten(data, counts)
array
```
|