File: accumulators.rst

package info (click to toggle)
python-boost-histogram 1.7.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,236 kB
  • sloc: python: 7,940; cpp: 3,243; makefile: 22; sh: 1
file content (166 lines) | stat: -rw-r--r-- 5,674 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
.. _usage-accumulators:

Accumulators
============

Common properties
-----------------

All accumulators can be filled like a histogram. You just call ``.fill`` with
values, and this looks and behaves like filling a single-bin or "scalar"
histogram. Like histograms, the fill is inplace.

All accumulators have a ``.value`` property as well, which gives the primary
value being accumulated.

Types
-----

There are several accumulators.

Sum
^^^

This is the simplest accumulator, and is never returned from a histogram. This
is internally used by the Double and Unlimited storages to perform sums when
needed. It uses a highly accurate Neumaier sum to compute the floating point
sum with a correction term. Since this accumulator is never returned by a
histogram, it is not available in a view form, but only as a single accumulator
for comparison and access to the algorithm. Usage example in Python 3.8,
showing how non-accurate sums fail to produce the obvious answer, 2.0::

    import math
    import numpy as np
    import boost_histogram as bh

    values = [1.0, 1e100, 1.0, -1e100]
    print(f"{sum(values) = } (simple)")
    print(f"{math.fsum(values) = }")
    print(f"{np.sum(values) = } (pairwise)")
    print(f"{bh.accumulators.Sum().fill(values) = }")

.. code-block:: text

    sum(values) = 0.0 (simple)
    math.fsum(values) = 2.0
    np.sum(values) = 0.0 (pairwise)
    bh.accumulators.Sum().fill(values) = Sum(0 + 2)


Note that this is still intended for performance and does not guarantee
correctness as ``math.fsum`` does. In general, you must not have more than two
orders of values::

    values = [1., 1e100, 1e50, 1., -1e50, -1e100]
    print(f"{math.fsum(values) = }")
    print(f"{bh.accumulators.Sum().fill(values) = }")

.. code-block:: text

    math.fsum(values) = 2.0
    bh.accumulators.Sum().fill(values) = Sum(0 + 0)

You should note that this is a highly contrived example and the Sum accumulator
should still outperform simple and pairwise summation methods for a minimal
performance cost. Most notably, you have to have large cancellations with
negative values, which histograms generally do not have.

You can use ``+=`` with a float value or a Sum to fill as well.

WeightedSum
^^^^^^^^^^^

This accumulator is contained in the Weight storage, and supports Views. It
provides two values; ``.value``, and ``.variance``. The value is the sum of the
weights, and the variance is the sum of the squared weights.

For example, you could sum the following values::

    import boost_histogram as bh

    values = [10]*10
    smooth = bh.accumulators.WeightedSum().fill(values)
    print(f"{smooth = }")

    values = [1]*9 + [91]
    rough = bh.accumulators.WeightedSum().fill(values)
    print(f"{rough =  }")

.. code-block:: text

    smooth = WeightedSum(value=100, variance=1000)
    rough =  WeightedSum(value=100, variance=8290)

When filling, you can optionally provide a ``variance=`` keyword, with either a
single value or a matching length array of values.

You can also fill with ``+=`` on a value or another WeighedSum.

Mean
^^^^

This accumulator is contained in the Mean storage, and supports Views. It
provides three values; ``.count``, ``.value``, and ``.variance``. Internally,
the variance is stored as ``_sum_of_deltas_squared``, which is used to compute
``variance``.

For example, you could compute the mean of the following values::

    import boost_histogram as bh

    values = [10]*10
    smooth = bh.accumulators.Mean().fill(values)
    print(f"{smooth = }")

    values = [1]*9 + [91]
    rough = bh.accumulators.Mean().fill(values)
    print(f"{rough =  }")

.. code-block:: text

    smooth = Mean(count=10, value=10, variance=0)
    rough =  Mean(count=10, value=10, variance=810)

You can add a ``weight=`` keyword when filling, with either a single value
or a matching length array of values.

You can call a Mean with a value or with another Mean to fill inplace, as well.

WeightedMean
^^^^^^^^^^^^

This accumulator is contained in the WeightedMean storage, and supports Views.
It provides four values; ``.sum_of_weights``, ``sum_of_weights_squared``,
``.value``, and ``.variance``. Internally, the variance is stored as
``_sum_of_weighted_deltas_squared``, which is used to compute ``variance``.

For example, you could compute the mean of the following values::

    import boost_histogram as bh

    values = [1]*9 + [91]
    wm = bh.accumulators.WeightedMean().fill(values, weight=2)
    print(f"{wm = }")

.. code-block:: text

    wm = WeightedMean(sum_of_weights=20, sum_of_weights_squared=40, value=10, variance=810)

You can add a ``weight=`` keyword when filling, with either a single value or a
matching length array of values.

You can call a WeightedMean with a value or with another WeightedMean to fill
inplace, as well.

Views
-----

Most of the accumulators (except Sum) support a View. This is what is returned from
a histogram when ``.view()`` is requested. This is a structured NumPy ndarray, with a few small
additions to make them easier to work with. Like a NumPy recarray, you can access the fields with
attributes; you can even access (but not set) computed attributes like ``.variance``. A view will
also return an accumulator instance if you select a single item. You can set a view's contents
with a stacked array, and each item in the stack will be used for the (computed) values that a
normal constructor would take. For example, WeighedMean can take an array with a final
dimension four long, with ``sum_of_weights``, ``sum_of_weights_squared``, ``value``, and ``variance``
elements, even though several of these values are computed from the internal representation.