File: how-to-combinatorics-cartesian-combinations.md

package info (click to toggle)
python-awkward 2.6.5-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 23,088 kB
  • sloc: python: 148,689; cpp: 33,562; sh: 432; makefile: 21; javascript: 8
file content (206 lines) | stat: -rw-r--r-- 6,131 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.10.3
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

How to find all combinations of elements: Cartesian (cross) product and "n choose k"
====================================================================================

```{code-cell} ipython3
import awkward as ak
import numpy as np
```

## Motivation

In non-array code that operates on arbitrary data structures, such as Python for loops and Python objects, doubly nested for loops like the following are pretty common:

```{code-cell} ipython3
class City:
    def __init__(self, name, latitude, longitude):
        self.name = name
        self.latitude = latitude
        self.longitude = longitude

cities_us = [
    City("New York", 40.7128, -74.0060),
    City("Los Angeles", 34.0522, -118.2437),
    City("Chicago", 41.8781, -87.6298),
]
cities_canada = [
    City("Toronto", 43.6510, -79.3470),
    City("Vancouver", 49.2827, -123.1207),
    City("Montreal", 45.5017, -73.5673),
]
```

Cartesian product:

```{code-cell} ipython3
class CityPair:
    def __init__(self, city1, city2):
        self.city1 = city1
        self.city2 = city2
    def __repr__(self):
        return f"<CityPair {self.city1.name} {self.city2.name}>"

pairs = []

for city_us in cities_us:
    for city_canada in cities_canada:
        pairs.append(CityPair(city_us, city_canada))

pairs
```

and "n choose k" (combinations without replacement):

```{code-cell} ipython3
all_cities = cities_us + cities_canada
```

```{code-cell} ipython3
pairs = []

for i, city1 in enumerate(all_cities):
    for city2 in all_cities[i + 1:]:
        pairs.append(CityPair(city1, city2))

pairs
```

These kinds of combinations are common enough that there are special functions for them in Python's [itertools](https://docs.python.org/3/library/itertools.html) library:

* [itertools.product](https://docs.python.org/3/library/itertools.html#itertools.product) for the Cartesian product,
* [itertools.combinations](https://docs.python.org/3/library/itertools.html#itertools.combinations) for combinations without replacement.

```{code-cell} ipython3
import itertools
```

```{code-cell} ipython3
list(
    CityPair(city1, city2)
    for city1, city2 in itertools.product(cities_us, cities_canada)
)
```

```{code-cell} ipython3
list(
    CityPair(city1, city2)
    for city1, city2 in itertools.combinations(all_cities, 2)
)
```

Awkward Array has special functions for these kinds of combinations as well:

* {func}`ak.cartesian` for the Cartesian product,
* {func}`ak.combinations` for combinations without replacement.

```{code-cell} ipython3
def instance_to_dict(city):
    return {"name": city.name, "latitude": city.latitude, "longitude": city.longitude}

cities_us = ak.Array([instance_to_dict(city) for city in cities_us])
cities_canada = ak.Array([instance_to_dict(city) for city in cities_canada])

all_cities = ak.concatenate([cities_us, cities_canada])
```

```{code-cell} ipython3
ak.cartesian([cities_us, cities_canada], axis=0)
```

```{code-cell} ipython3
ak.combinations(all_cities, 2, axis=0)
```

## Combinations with `axis=1`

The default `axis` for these functions is 1, rather than 0, as in the motivating example. Problems that are big enough to benefit from vectorized combinations would produce very large output arrays, which likely wouldn't fit in any computer's memory. (Those problems are a better fit for SQL's `CROSS JOIN`; note that Python has a built-in interface to [sqlite3](https://docs.python.org/3/library/sqlite3.html) in-memory tables. You could even use SQL to populate an array of integer indexes to later slice an Awkward Array...)

The most useful application of Awkward Array combinatorics are on problems in which small, variable-length lists need to be combined—and there are many of them. This is `axis=1` (default) or `axis > 1`.

Here is an example of many Cartesian products:

![](cartoon-cartesian.png)

```{code-cell} ipython3
numbers = ak.Array([[1, 2, 3], [], [4, 5], [6, 7, 8, 9]] * 250)
letters = ak.Array([["a", "b"], ["c"], ["d", "e", "f", "g"], ["h", "i"]] * 250)
```

```{code-cell} ipython3
ak.cartesian([numbers, letters])
```

Here is an example of many combinations without replacement:

![](cartoon-combinations.png)

```{code-cell} ipython3
ak.combinations(numbers, 2)
```

## Calculations on pairs

Usually, you'll want to do some calculation on each pair (or on each triple or quadruple, etc.). To get the left-side and right-side of each pair into separate arrays, so they can be used in a calculation, you could address the members of the tuple individually:

```{code-cell} ipython3
tuples = ak.combinations(numbers, 2)
```

```{code-cell} ipython3
tuples["0"], tuples["1"]
```

Be sure to use integers in strings when addressing fields of a tuple ("columns") and plain integers when addressing array elements ("rows"). The above is different from

```{code-cell} ipython3
tuples[0], tuples[1]
```

Once they're in separate arrays, they can be used in a formula:

```{code-cell} ipython3
tuples["0"] * tuples["1"]
```

Another way to get fields of a tuple (or fields of a record) as individual arrays is to use {func}`ak.unzip`:

```{code-cell} ipython3
lefts, rights = ak.unzip(tuples)

lefts * rights
```

## Maintaining groups

In combinations like

```{code-cell} ipython3
ak.cartesian([np.arange(5), np.arange(4)], axis=0)
```

produce a flat list of combinations, but some calculations need triples with the same first or second value in the same list, for instance if they're going to {func}`ak.max` over lists ("find the best combination in which...") or compute {func}`ak.any` or {func}`ak.all` ("is there any combination in which...?"). The `nested` argument controls this.

```{code-cell} ipython3
result = ak.cartesian([np.arange(5), np.arange(4)], axis=0, nested=True)
result
```

For instance, "is there any combination in which |_left_ - _right_| ≥ 3?"

```{code-cell} ipython3
lefts, rights = ak.unzip(result)

ak.any(abs(lefts - rights) >= 3, axis=1)
```