File: how-to-filter-masked.md

package info (click to toggle)
python-awkward 2.6.5-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 23,088 kB
  • sloc: python: 148,689; cpp: 33,562; sh: 432; makefile: 21; javascript: 8
file content (131 lines) | stat: -rw-r--r-- 3,889 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.14.4
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

How to filter with arrays containing missing values
===================================================

```{code-cell} ipython3
import awkward as ak
import numpy as np
```

(how-to-filter-ragged:indexing-with-missing-values)=
## Indexing with missing values
In {ref}`how-to-filter-masked:building-an-awkward-index`, we looked building arrays of integers to perform awkward indexing using {func}`ak.argmin` and {func}`ak.argmax`. In particular, the `keepdims` argument of {func}`ak.argmin` and {func}`ak.argmax` is very useful for creating arrays that can be used to index into the original array. However, reducers such as {func}`ak.argmax` behave differently when they are asked to operate upon empty lists. 

Let's first create an array that contains empty sublists:

```{code-cell} ipython3
array = ak.Array(
    [
        [],
        [10, 3, 2, 9],
        [4, 5, 5, 12, 6],
        [],
        [8, 9, -1],
    ]
)
array
```

Awkward reducers accept a `mask_identity` argument, which changes the {attr}`ak.Array.type` and the values of the result:

```{code-cell} ipython3
ak.argmax(array, keepdims=True, axis=-1, mask_identity=False)
```

```{code-cell} ipython3
ak.argmax(array, keepdims=True, axis=-1, mask_identity=True)
```

Setting `mask_identity=True` yields the identity value for the reducer instead of `None` when reducing empty lists. From the above examples of {func}`ak.argmax`, we can see that the identity for the {func}`ak.argmax` is `-1`: What happens if we try and use the array produced with `mask_identity=False` to index into `array`?

+++

As discussed in {ref}`how-to-filter-ragged:indexing-with-argmin-and-argmax`, we first need to convert _at least_ one dimension to a ragged dimension

```{code-cell} ipython3
index = ak.from_regular(
    ak.argmax(array, keepdims=True, axis=-1, mask_identity=False)
)
```

Now, if we try and index into `array` with `index`, it will raise an exception

```{code-cell} ipython3
:tags: [raises-exception]

array[index]
```

From the error message, it is clear that for some sublist(s) the index `-1` is out of range. This makes sense; some of our sublists are empty, meaning that there is no valid integer to index into them. 

Now let's look at the result of indexing with `mask_identity=True`. 

```{code-cell} ipython3
index = ak.argmax(array, keepdims=True, axis=-1, mask_identity=True)
```

Because it contains an option type, `index` already satisfies rule (2) in {ref}`how-to-filter-masked:building-an-awkward-index`, and we do not need to convert it to a ragged array. We can see that this index succeeds:

```{code-cell} ipython3
array[index]
```

Here, the missing values in the index array correspond to missing values _in the output array_.

+++

## Indexing with missing sublists

Ragged indexing also supports using `None` in place of _empty sublists_ within an index. For example, given the following array

```{code-cell} ipython3
array = ak.Array(
    [
        [10, 3, 2, 9],
        [4, 5, 5, 12, 6],
        [],
        [8, 9, -1],
    ]
)
array
```

let's use build a ragged index to pull out some particular values. Rather than using empty lists, we can use `None` to mask out sublists that we don't care about:

```{code-cell} ipython3
array[
    [
        [0, 1],
        None,
        [],
        [2],
    ],
]
```

If we compare this with simply providing an empty sublist,

```{code-cell} ipython3
array[
    [
        [0, 1],
        [],
        [],
        [2],
    ],
]
```

we can see that the `None` value introduces an option-type into the final result. `None` values can be used at _any_ level in the index array to introduce an option-type at that depth in the result.