# Uproot Awkward Columnar HATS

_Originally presented as [part](https://github.com/jpivarski-talks/2021-06-14-uproot-awkward-columnar-hats/blob/main/3-awkward-array.ipynb) of [CMS HATS training on June 14, 2021](https://indico.cern.ch/event/1042866/)._

<br><br><br><br><br>

## What about an array of lists?

In [None]:
import skhep_testdata
import awkward as ak
import numpy as np
import uproot

In [None]:
events = uproot.open(skhep_testdata.data_path("uproot-HZZ.root"))["events"]
events.show()

In [None]:
events["Muon_Px"].array()

In [None]:
events["Muon_Px"].array(entry_stop=20).tolist()

This is what Awkward Array was made for. NumPy's equivalent is cumbersome and inefficient.

In [None]:
jagged_numpy = events["Muon_Px"].array(entry_stop=20, library="np")
jagged_numpy

What if I want the first item in each list as an array?

In [None]:
np.array([x[0] for x in jagged_numpy])

This violates the rule from [1-python-performance.ipynb](https://github.com/jpivarski-talks/2021-06-14-uproot-awkward-columnar-hats/blob/main/1-python-performance.ipynb): don't iterate in Python.

In [None]:
jagged_awkward = events["Muon_Px"].array(entry_stop=20, library="ak")
jagged_awkward

In [None]:
jagged_awkward[:, 0]

<br><br><br><br><br>

## Awkward Array is a general-purpose library: NumPy-like idioms on JSON-like data

![](pivarski-one-slide-summary.svg)

<br><br><br><br><br>

## Main idea: slicing through structure is computationally inexpensive

Slicing by field name doesn't modify any large buffers and [ak.zip](https://awkward-array.readthedocs.io/en/latest/_auto/ak.zip.html) only scans them to ensure they're compatible (not even that if `depth_limit=1`).

In [None]:
array = events.arrays()
array

Think of this as zero-cost:

In [None]:
array.Muon_Px, array.Muon_Py, array.Muon_Pz

Think of this as zero-cost:

In [None]:
ak.zip({"px": array.Muon_Px, "py": array.Muon_Py, "pz": array.Muon_Pz})

(The above is a manual version of `how="zip"`.)

<br><br><br>

NumPy ufuncs work on these arrays (if they're "[broadcastable](https://awkward-array.readthedocs.io/en/latest/_auto/ak.broadcast_arrays.html)").

In [None]:
np.sqrt(array.Muon_Px**2 + array.Muon_Py**2)

<br><br><br>

And there are specialized operations that only make sense in a variable-length context.

{func}`ak.cartesian`

![](cartoon-cartesian.png)

{func}`ak.combinations`

![](cartoon-combinations.png)


In [None]:
ak.cartesian((array.Muon_Px, array.Jet_Px))

In [None]:
ak.combinations(array.Muon_Px, 2)

<br><br><br><br><br>

## Arrays can have custom [behavior](https://awkward-array.readthedocs.io/en/latest/ak.behavior.html)

The following come from the new [Vector](https://github.com/scikit-hep/vector#readme) library.

In [None]:
import vector
vector.register_awkward()

In [None]:
muons = ak.zip({"px": array.Muon_Px, "py": array.Muon_Py, "pz": array.Muon_Pz, "E": array.Muon_E}, with_name="Momentum4D")
muons

This is an array of lists of vectors, and methods like `pt`, `eta`, `phi` apply through the whole array.

In [None]:
muons.pt

In [None]:
muons.eta

In [None]:
muons.phi

<br><br><br>

Let's try an example: Î”R(muons, jets)

In [None]:
jets = ak.zip({"px": array.Jet_Px, "py": array.Jet_Py, "pz": array.Jet_Pz, "E": array.Jet_E}, with_name="Momentum4D")
jets

In [None]:
ak.num(muons), ak.num(jets)

In [None]:
ms, js = ak.unzip(ak.cartesian((muons, jets)))
ms, js

In [None]:
ak.num(ms), ak.num(js)

In [None]:
ms.deltaR(js)

<br><br><br>

And another: muon pairs (all combinations, not just the first two per event).

In [None]:
ak.num(muons)

In [None]:
m1, m2 = ak.unzip(ak.combinations(muons, 2))
m1, m2

In [None]:
ak.num(m1), ak.num(m2)

In [None]:
m1 + m2

In [None]:
(m1 + m2).mass

In [None]:
import hist

hist.Hist.new.Reg(120, 0, 120, name="mass").Double().fill(
    ak.flatten((m1 + m2).mass)
).plot()

None

<br><br><br>

### It doesn't matter which coordinates were used to construct it

In [None]:
array2 = uproot.open(
    "https://github.com/jpivarski-talks/2023-12-18-hsf-india-tutorial-bhubaneswar/raw/main/data/SMHiggsToZZTo4L.root:Events"
).arrays(["Muon_pt", "Muon_eta", "Muon_phi", "Muon_charge"], entry_stop=100000)

In [None]:
import particle

muons2 = ak.zip({"pt": array2.Muon_pt, "eta": array2.Muon_eta, "phi": array2.Muon_phi, "q": array2.Muon_charge}, with_name="Momentum4D")
muons2["mass"] = particle.Particle.findall("mu-")[0].mass / 1000.0
muons2

As long as you use properties (dots, not strings in brackets), you don't need to care what coordinates it's based on.

In [None]:
muons2.px

In [None]:
muons2.py

In [None]:
muons2.pz

In [None]:
muons2.E

In [None]:
m1, m2 = ak.unzip(ak.combinations(muons2, 2))
hist.Hist.new.Log(200, 0.1, 120, name="mass").Double().fill(
    ak.flatten((m1 + m2).mass)
).plot()

None

<br><br><br>

## Awkward Arrays and Vector in Numba

Remember Numba, the JIT-compiler from [1-python-performance.ipynb](https://github.com/jpivarski-talks/2021-06-14-uproot-awkward-columnar-hats/blob/main/1-python-performance.ipynb)? Awkward Array and Vector have been implemented in Numba's compiler.

In [None]:
import numba as nb

@nb.njit
def first_big_dimuon(events):
    for event in events:
        for i in range(len(event)):
            mu1 = event[i]
            for j in range(i + 1, len(event)):
                mu2 = event[j]
                dimuon = mu1 + mu2
                if dimuon.mass > 10:
                    return dimuon

In [None]:
first_big_dimuon(muons2)