1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
|
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.10.3
kernelspec:
display_name: Python 3
language: python
name: python3
---
How to examine an array with simple slicing
===========================================
Slicing data from an array is a basic operation in array-oriented data analysis. Awkward Array extends [NumPy's slicing capabilities](https://numpy.org/doc/stable/user/basics.indexing.html) to handle nested and ragged data structures. This tutorial illustrates several ways to slice an array.
For a complete list of slicing features, see {func}`ak.Array.__getitem__`.
```{code-cell} ipython3
import awkward as ak
import numpy as np
```
## Basic slicing with ranges
Much like NumPy, you can slice Awkward Arrays using simple ranges specified with colons (`start:stop:step`). Here’s an example of a regular (non-ragged) Awkward Array:
```{code-cell} ipython3
array = ak.Array(np.arange(10)**2) # squaring numbers for clarity
array
```
To select the first five elements:
```{code-cell} ipython3
array[:5]
```
To select from the fifth-to-last onward:
```{code-cell} ipython3
array[-5:]
```
To select every other element starting from the second:
```{code-cell} ipython3
array[1::2]
```
## Multiple ranges for multiple dimensions
Similarly, for multidimensional data,
```{code-cell} ipython3
np_array3d = np.arange(2*3*5).reshape(2, 3, 5)
np_array3d
```
```{code-cell} ipython3
array3d = ak.Array(np_array3d)
array3d
```
```{code-cell} ipython3
np_array3d[1, ::2, 1:-1]
```
```{code-cell} ipython3
array3d[1, ::2, 1:-1]
```
Just as with NumPy, a single colon (`:`) means "take everything from this dimension" and an ellipsis (`...`) expands to all dimensions between two slices.
```{code-cell} ipython3
array3d[:, :, 1:-1]
```
```{code-cell} ipython3
array3d[..., 1:-1]
```
## Boolean array slices
Like NumPy's [advanced slicing](https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing), an array of booleans filters individual items. For instance, consider an array of booleans constructed by asking which elements of `array` are greater than 20:
```{code-cell} ipython3
array > 20
```
When applied to `array` between square brackets, the boolean array eliminates all items in which `array > 20` is `False`:
```{code-cell} ipython3
array[array > 20]
```
Boolean array slicing is more powerful than range slicing because the `True` and `False` values may have any pattern. The following selects only even numbers.
```{code-cell} ipython3
array % 2 == 0
```
```{code-cell} ipython3
array[array % 2 == 0]
```
## Integer array slices
You can also use arrays of integer indices to select specific elements.
```{code-cell} ipython3
indices = ak.Array([2, 5, 3])
array[indices]
```
If you are passing indexes directly between the `array`'s square brackets, be sure that they, too, are nested within square brackets (to be a list, rather than a tuple).
```{code-cell} ipython3
array[[2, 5, 3]]
```
In addition to picking elements out of order, you can pick the same element multiple times.
```{code-cell} ipython3
array[[2, 5, 5, 5, 5, 5, 3]]
```
Any slices that could be performed by boolean arrays can be performed by integer arrays, but only integer arrays can reorder and duplicate elements.
## Ragged array slicing
One of the unique features of Awkward Array is its ability to handle ragged arrays efficiently. Here's an example of a ragged array:
```{code-cell} ipython3
ragged_array = ak.Array([[10, 20, 30], [40], [], [50, 60]])
ragged_array
```
You can slice individual sublists like this:
```{code-cell} ipython3
ragged_array[1]
```
And you can perform slices that operate across the sublists:
```{code-cell} ipython3
ragged_array[:, :2] # get first two elements of each sublist
```
Ranges and single indices mixed with slice notation allow the complexity of ragged slicing to express selecting ranges in nested lists, a feature unique to Awkward Array beyond NumPy’s capabilities. Here's an example where we skip the first element of each sublist that has more than one element:
```{code-cell} ipython3
ragged_array[ak.num(ragged_array) > 1, 1:]
```
## Boolean array slicing with missing data
When working with boolean arrays for slicing, the arrays can include `None` (missing) values. Awkward Array handles missing data gracefully during boolean slicing:
```{code-cell} ipython3
bool_mask = ak.Array([True, None, False, True])
array[bool_mask]
```
This ability to cope with missing data without failing or needing imputation is invaluable in data analysis tasks where missing data is common.
|