1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
|
.. _usage-accumulators:
Accumulators
============
Common properties
-----------------
All accumulators can be filled like a histogram. You just call ``.fill`` with
values, and this looks and behaves like filling a single-bin or "scalar"
histogram. Like histograms, the fill is inplace.
All accumulators have a ``.value`` property as well, which gives the primary
value being accumulated.
Types
-----
There are several accumulators.
Sum
^^^
This is the simplest accumulator, and is never returned from a histogram. This
is internally used by the Double and Unlimited storages to perform sums when
needed. It uses a highly accurate Neumaier sum to compute the floating point
sum with a correction term. Since this accumulator is never returned by a
histogram, it is not available in a view form, but only as a single accumulator
for comparison and access to the algorithm. Usage example in Python 3.8,
showing how non-accurate sums fail to produce the obvious answer, 2.0::
import math
import numpy as np
import boost_histogram as bh
values = [1.0, 1e100, 1.0, -1e100]
print(f"{sum(values) = } (simple)")
print(f"{math.fsum(values) = }")
print(f"{np.sum(values) = } (pairwise)")
print(f"{bh.accumulators.Sum().fill(values) = }")
.. code-block:: text
sum(values) = 0.0 (simple)
math.fsum(values) = 2.0
np.sum(values) = 0.0 (pairwise)
bh.accumulators.Sum().fill(values) = Sum(0 + 2)
Note that this is still intended for performance and does not guarantee
correctness as ``math.fsum`` does. In general, you must not have more than two
orders of values::
values = [1., 1e100, 1e50, 1., -1e50, -1e100]
print(f"{math.fsum(values) = }")
print(f"{bh.accumulators.Sum().fill(values) = }")
.. code-block:: text
math.fsum(values) = 2.0
bh.accumulators.Sum().fill(values) = Sum(0 + 0)
You should note that this is a highly contrived example and the Sum accumulator
should still outperform simple and pairwise summation methods for a minimal
performance cost. Most notably, you have to have large cancellations with
negative values, which histograms generally do not have.
You can use ``+=`` with a float value or a Sum to fill as well.
WeightedSum
^^^^^^^^^^^
This accumulator is contained in the Weight storage, and supports Views. It
provides two values; ``.value``, and ``.variance``. The value is the sum of the
weights, and the variance is the sum of the squared weights.
For example, you could sum the following values::
import boost_histogram as bh
values = [10]*10
smooth = bh.accumulators.WeightedSum().fill(values)
print(f"{smooth = }")
values = [1]*9 + [91]
rough = bh.accumulators.WeightedSum().fill(values)
print(f"{rough = }")
.. code-block:: text
smooth = WeightedSum(value=100, variance=1000)
rough = WeightedSum(value=100, variance=8290)
When filling, you can optionally provide a ``variance=`` keyword, with either a
single value or a matching length array of values.
You can also fill with ``+=`` on a value or another WeighedSum.
Mean
^^^^
This accumulator is contained in the Mean storage, and supports Views. It
provides three values; ``.count``, ``.value``, and ``.variance``. Internally,
the variance is stored as ``_sum_of_deltas_squared``, which is used to compute
``variance``.
For example, you could compute the mean of the following values::
import boost_histogram as bh
values = [10]*10
smooth = bh.accumulators.Mean().fill(values)
print(f"{smooth = }")
values = [1]*9 + [91]
rough = bh.accumulators.Mean().fill(values)
print(f"{rough = }")
.. code-block:: text
smooth = Mean(count=10, value=10, variance=0)
rough = Mean(count=10, value=10, variance=810)
You can add a ``weight=`` keyword when filling, with either a single value
or a matching length array of values.
You can call a Mean with a value or with another Mean to fill inplace, as well.
WeightedMean
^^^^^^^^^^^^
This accumulator is contained in the WeightedMean storage, and supports Views.
It provides four values; ``.sum_of_weights``, ``sum_of_weights_squared``,
``.value``, and ``.variance``. Internally, the variance is stored as
``_sum_of_weighted_deltas_squared``, which is used to compute ``variance``.
For example, you could compute the mean of the following values::
import boost_histogram as bh
values = [1]*9 + [91]
wm = bh.accumulators.WeightedMean().fill(values, weight=2)
print(f"{wm = }")
.. code-block:: text
wm = WeightedMean(sum_of_weights=20, sum_of_weights_squared=40, value=10, variance=810)
You can add a ``weight=`` keyword when filling, with either a single value or a
matching length array of values.
You can call a WeightedMean with a value or with another WeightedMean to fill
inplace, as well.
Views
-----
Most of the accumulators (except Sum) support a View. This is what is returned from
a histogram when ``.view()`` is requested. This is a structured NumPy ndarray, with a few small
additions to make them easier to work with. Like a NumPy recarray, you can access the fields with
attributes; you can even access (but not set) computed attributes like ``.variance``. A view will
also return an accumulator instance if you select a single item. You can set a view's contents
with a stacked array, and each item in the stack will be used for the (computed) values that a
normal constructor would take. For example, WeighedMean can take an array with a final
dimension four long, with ``sum_of_weights``, ``sum_of_weights_squared``, ``value``, and ``variance``
elements, even though several of these values are computed from the internal representation.
|