File: index.rst

package info (click to toggle)
astropy 7.0.1-3
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 35,328 kB
  • sloc: python: 233,437; ansic: 55,264; javascript: 17,680; lex: 8,621; sh: 3,317; xml: 2,287; makefile: 191
file content (371 lines) | stat: -rw-r--r-- 15,060 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
.. _astropy-uncertainty:

.. testsetup::

    >>> import numpy as np
    >>> np.random.seed(12345)

*******************************************************
Uncertainties and Distributions (`astropy.uncertainty`)
*******************************************************

Introduction
============

.. note:: This subpackage is still in development.

``astropy`` provides a |Distribution| object to represent statistical
distributions in a form that acts as a drop-in replacement for a |Quantity|
object or a regular |ndarray|. Used in this manner, |Distribution| provides
uncertainty propagation at the cost of additional computation. It can also more
generally represent sampled distributions for Monte Carlo calculation
techniques, for instance.

The core object for this feature is the |Distribution|. Currently, all
such distributions are Monte Carlo sampled. While this means each distribution
may take more memory, it allows arbitrarily complex operations to be performed
on distributions while maintaining their correlation structure. Some specific
well-behaved distributions (e.g., the normal distribution) have
analytic forms which may eventually enable a more compact and efficient
representation. In the future, these may provide a coherent uncertainty
propagation mechanism to work with `~astropy.nddata.NDData`. However, this is
not currently implemented. Hence, details of storing uncertainties for
`~astropy.nddata.NDData` objects can be found in the :ref:`astropy_nddata`
section.

Getting Started
===============

To demonstrate a basic use case for distributions, consider the problem of
uncertainty propagation of normal distributions. Assume there are two
measurements you wish to add, each with normal uncertainties. We start
with some initial imports and setup::

  >>> import numpy as np
  >>> from astropy import units as u
  >>> from astropy import uncertainty as unc
  >>> rng = np.random.default_rng(12345)  # ensures reproducible example numbers

Now we create two |Distribution| objects to represent our distributions::

  >>> a = unc.normal(1*u.kpc, std=30*u.pc, n_samples=10000)
  >>> b = unc.normal(2*u.kpc, std=40*u.pc, n_samples=10000)

For normal distributions, the centers should add as expected, and the standard
deviations add in quadrature. We can check these results (to the limits of our
Monte Carlo sampling) trivially with |Distribution| arithmetic and attributes::

  >>> c = a + b
  >>> c # doctest: +ELLIPSIS
  <QuantityDistribution [...] kpc with n_samples=10000>
  >>> c.pdf_mean() # doctest: +FLOAT_CMP
  <Quantity 2.99970555 kpc>
  >>> c.pdf_std().to(u.pc) # doctest: +FLOAT_CMP
  <Quantity 50.07120457 pc>

Indeed these are close to the expectations. While this may seem unnecessary for
the basic Gaussian case, for more complex distributions or arithmetic
operations where error analysis becomes untenable, |Distribution| still powers
through::

  >>> d = unc.uniform(center=3*u.kpc, width=800*u.pc, n_samples=10000)
  >>> e = unc.Distribution(((rng.beta(2,5, 10000)-(2/7))/2 + 3)*u.kpc)
  >>> f = (c * d * e) ** (1/3)
  >>> f.pdf_mean() # doctest: +FLOAT_CMP
  <Quantity 2.99760998 kpc>
  >>> f.pdf_std() # doctest: +FLOAT_CMP
  <Quantity 0.08308941 kpc>
  >>> from matplotlib import pyplot as plt # doctest: +SKIP
  >>> from astropy.visualization import quantity_support # doctest: +SKIP
  >>> with quantity_support():
  ...     plt.hist(f.distribution, bins=50) # doctest: +SKIP

.. plot::

  import numpy as np
  from astropy import units as u
  from astropy import uncertainty as unc
  from astropy.visualization import quantity_support
  from matplotlib import pyplot as plt
  rng = np.random.default_rng()
  a = unc.normal(1*u.kpc, std=30*u.pc, n_samples=10000)
  b = unc.normal(2*u.kpc, std=40*u.pc, n_samples=10000)
  c = a + b
  d = unc.uniform(center=3*u.kpc, width=800*u.pc, n_samples=10000)
  e = unc.Distribution(((rng.beta(2,5, 10000)-(2/7))/2 + 3)*u.kpc)
  f = (c * d * e) ** (1/3)
  with quantity_support():
      plt.hist(f.distribution, bins=50)


Using `astropy.uncertainty`
===========================

Creating Distributions
----------------------

.. EXAMPLE START: Creating Distributions Using Arrays or Quantities

The most direct way to create a distribution is to use an array or |Quantity|
that carries the samples in the *last* dimension::

  >>> import numpy as np
  >>> from astropy import units as u
  >>> from astropy import uncertainty as unc
  >>> rng = np.random.default_rng(123456)  # ensures "random" numbers match examples below
  >>> unc.Distribution(rng.poisson(12, (1000)))  # doctest: +IGNORE_OUTPUT
  NdarrayDistribution([..., 12,...]) with n_samples=1000
  >>> pq = rng.poisson([1, 5, 30, 400], (1000, 4)).T * u.ct # note the transpose, required to get the sampling on the *last* axis
  >>> distr = unc.Distribution(pq)
  >>> distr # doctest: +ELLIPSIS
  <QuantityDistribution [[...],
             [...],
             [...],
             [...]] ct with n_samples=1000>

Note the distinction for these two distributions: the first is built from an
array and therefore does not have |Quantity| attributes like ``unit``, while the
latter does have these attributes. This is reflected in how they interact with
other objects, for example, the ``NdarrayDistribution`` will not combine with
|Quantity| objects containing units.

.. EXAMPLE END

.. EXAMPLE START: Creating Distributions Using Helper Functions

For commonly used distributions, helper functions exist to make creating them
more convenient. The examples below demonstrate several equivalent ways to
create a normal/Gaussian distribution::

  >>> center = [1, 5, 30, 400]
  >>> n_distr = unc.normal(center*u.kpc, std=[0.2, 1.5, 4, 1]*u.kpc, n_samples=1000)
  >>> n_distr = unc.normal(center*u.kpc, var=[0.04, 2.25, 16, 1]*u.kpc**2, n_samples=1000)
  >>> n_distr = unc.normal(center*u.kpc, ivar=[25, 0.44444444, 0.625, 1]*u.kpc**-2, n_samples=1000)
  >>> n_distr.distribution.shape
  (4, 1000)
  >>> unc.normal(center*u.kpc, std=[0.2, 1.5, 4, 1]*u.kpc, n_samples=100).distribution.shape
  (4, 100)
  >>> unc.normal(center*u.kpc, std=[0.2, 1.5, 4, 1]*u.kpc, n_samples=20000).distribution.shape
  (4, 20000)

Additionally, Poisson and uniform |Distribution| creation functions exist::

  >>> unc.poisson(center*u.count, n_samples=1000) # doctest: +ELLIPSIS
  <QuantityDistribution [[...],
               [...],
               [...],
               [...]] ct with n_samples=1000>
  >>> uwidth = [10, 20, 10, 55]*u.pc
  >>> unc.uniform(center=center*u.kpc, width=uwidth, n_samples=1000) # doctest: +ELLIPSIS
  <QuantityDistribution [[...],
               [...],
               [...],
               [...]] kpc with n_samples=1000>
  >>> unc.uniform(lower=center*u.kpc - uwidth/2,  upper=center*u.kpc + uwidth/2, n_samples=1000)  # doctest: +ELLIPSIS
  <QuantityDistribution [[...],
               [...],
               [...],
               [...]] kpc with n_samples=1000>

.. EXAMPLE END

Users are free to create their own distribution classes following similar
patterns.

Using Distributions
-------------------

.. EXAMPLE START: Accessing Properties of Distributions

This object now acts much like a |Quantity| or |ndarray| for all but the
non-sampled dimension, but with additional statistical operations that work on
the sampled distributions::

  >>> distr.shape
  (4,)
  >>> distr.size
  4
  >>> distr.unit
  Unit("ct")
  >>> distr.n_samples
  1000
  >>> distr.pdf_mean() # doctest: +FLOAT_CMP
  <Quantity [  1.034,   5.026,  29.994, 400.365] ct>
  >>> distr.pdf_std() # doctest: +FLOAT_CMP
  <Quantity [ 1.04539179,  2.19484031,  5.47776998, 19.87022333] ct>
  >>> distr.pdf_var() # doctest: +FLOAT_CMP
  <Quantity [  1.092844,   4.817324,  30.005964, 394.825775] ct2>
  >>> distr.pdf_median()
  <Quantity [   1.,   5.,  30., 400.] ct>
  >>> distr.pdf_mad()  # Median absolute deviation # doctest: +FLOAT_CMP
  <Quantity [ 1.,  1.,  4., 13.] ct>
  >>> distr.pdf_smad()  # Median absolute deviation, rescaled to match std for normal # doctest: +FLOAT_CMP
  <Quantity [ 1.48260222,  1.48260222,  5.93040887, 19.27382884] ct>
  >>> distr.pdf_percentiles([10, 50, 90])
  <Quantity [[  0. ,   2. ,  23. , 375. ],
             [  1. ,   5. ,  30. , 400. ],
             [  2. ,   8. ,  37. , 426.1]] ct>
  >>> distr.pdf_percentiles([.1, .5, .9]*u.dimensionless_unscaled)
  <Quantity [[  0. ,   2. ,  23. , 375. ],
            [  1. ,   5. ,  30. , 400. ],
            [  2. ,   8. ,  37. , 426.1]] ct>

If need be, the underlying array can then be accessed from the ``distribution``
attribute::

  >>> distr.distribution
  <Quantity [[  2.,   2.,   0., ...,   1.,   0.,   1.],
             [  3.,   2.,   8., ...,   8.,   3.,   3.],
             [ 31.,  30.,  32., ...,  20.,  34.,  31.],
             [354., 373., 384., ..., 410., 404., 395.]] ct>
  >>> distr.distribution.shape
  (4, 1000)

.. EXAMPLE END

.. EXAMPLE START: Interaction Between Quantity Objects and Distributions

A |Quantity| distribution interacts naturally with non-|Distribution|
|Quantity| objects, assuming the |Quantity| is a Dirac delta distribution::

  >>> distr_in_kpc = distr * u.kpc/u.count  # for the sake of round numbers in examples
  >>> distrplus = distr_in_kpc + [2000,0,0,500]*u.pc
  >>> distrplus.pdf_median()
  <Quantity [   3. ,   5. ,  30. , 400.5] kpc>
  >>> distrplus.pdf_var() # doctest: +FLOAT_CMP
  <Quantity [  1.092844,   4.817324,  30.005964, 394.825775] kpc2>

It also operates as expected with other distributions (but see below for a
discussion of covariances)::

  >>> means = [2000, 0, 0, 500]
  >>> sigmas = [1000, .01, 3000, 10]
  >>> another_distr = unc.Distribution((rng.normal(means, sigmas, (1000,4))).T * u.pc)
  >>> combined_distr = distr_in_kpc + another_distr
  >>> combined_distr.pdf_median()  # doctest: +FLOAT_CMP
  <Quantity [  2.81374275,   4.99999631,  29.7150889 , 400.49576691] kpc>
  >>> combined_distr.pdf_var()  # doctest: +FLOAT_CMP
  <Quantity [  2.15512118,   4.817324  ,  39.0614616 , 394.82969655] kpc2>

.. EXAMPLE END

Covariance in Distributions and Discrete Sampling Effects
---------------------------------------------------------

One of the main applications for distributions is uncertainty propagation, which
critically requires proper treatment of covariance. This comes naturally in the
Monte Carlo sampling approach used by the |Distribution| class, as long as
proper care is taken with sampling error.

.. EXAMPLE START: Covariance in Distributions

To start with a basic example, two un-correlated distributions should produce
an un-correlated joint distribution plot:

.. plot::
  :context: close-figs
  :include-source:

  >>> import numpy as np
  >>> from astropy import units as u
  >>> from astropy import uncertainty as unc
  >>> from matplotlib import pyplot as plt # doctest: +SKIP
  >>> n1 = unc.normal(center=0., std=1, n_samples=10000)
  >>> n2 = unc.normal(center=0., std=2, n_samples=10000)
  >>> plt.scatter(n1.distribution, n2.distribution, s=2, lw=0, alpha=.5) # doctest: +SKIP
  >>> plt.xlim(-4, 4) # doctest: +SKIP
  >>> plt.ylim(-4, 4) # doctest: +SKIP

Indeed, the distributions are independent. If we instead construct a covariant
pair of Gaussians, it is immediately apparent:

.. plot::
  :context: close-figs
  :include-source:

  >>> rng = np.random.default_rng(357)
  >>> ncov = rng.multivariate_normal([0, 0], [[1, .5], [.5, 2]], size=10000)
  >>> n1 = unc.Distribution(ncov[:, 0])
  >>> n2 = unc.Distribution(ncov[:, 1])
  >>> plt.scatter(n1.distribution, n2.distribution, s=2, lw=0, alpha=.5) # doctest: +SKIP
  >>> plt.xlim(-4, 4) # doctest: +SKIP
  >>> plt.ylim(-4, 4) # doctest: +SKIP

Most importantly, the proper correlated structure is preserved or generated as
expected by appropriate arithmetic operations. For example, ratios of
uncorrelated normal distribution gain covariances if the axes are not
independent, as in this simulation of iron, hydrogen, and oxygen abundances in
a hypothetical collection of stars:

.. plot::
  :context: close-figs
  :include-source:

  >>> fe_abund = unc.normal(center=-2, std=.25, n_samples=10000)
  >>> o_abund = unc.normal(center=-6., std=.5, n_samples=10000)
  >>> h_abund = unc.normal(center=-0.7, std=.1, n_samples=10000)
  >>> feh = fe_abund - h_abund
  >>> ofe = o_abund - fe_abund
  >>> plt.scatter(ofe.distribution, feh.distribution, s=2, lw=0, alpha=.5) # doctest: +SKIP
  >>> plt.xlabel('[Fe/H]') # doctest: +SKIP
  >>> plt.ylabel('[O/Fe]') # doctest: +SKIP

This demonstrates that the correlations naturally arise from the variables, but
there is no need to explicitly account for it: the sampling process naturally
recovers correlations that are present.

.. EXAMPLE END

.. EXAMPLE START: Preserving Covariance in Distributions

An important note of warning, however, is that the covariance is only preserved
if the sampling axes are exactly matched sample by sample. If they are not, all
covariance information is (silently) lost:

.. plot::
  :context: close-figs
  :include-source:

  >>> n2_wrong = unc.Distribution(ncov[::-1, 1])  #reverse the sampling axis order
  >>> plt.scatter(n1.distribution, n2_wrong.distribution, s=2, lw=0, alpha=.5) # doctest: +SKIP
  >>> plt.xlim(-4, 4) # doctest: +SKIP
  >>> plt.ylim(-4, 4) # doctest: +SKIP

Moreover, an insufficiently sampled distribution may give poor estimates or
hide correlations. The example below is the same as the covariant Gaussian
example above, but with 200x fewer samples:


.. plot::
  :context: close-figs
  :include-source:

  >>> ncov = rng.multivariate_normal([0, 0], [[1, .5], [.5, 2]], size=50)
  >>> n1 = unc.Distribution(ncov[:, 0])
  >>> n2 = unc.Distribution(ncov[:, 1])
  >>> plt.scatter(n1.distribution, n2.distribution, s=5, lw=0) # doctest: +SKIP
  >>> plt.xlim(-4, 4) # doctest: +SKIP
  >>> plt.ylim(-4, 4) # doctest: +SKIP
  >>> np.cov(n1.distribution, n2.distribution) # doctest: +FLOAT_CMP
  array([[0.95534365, 0.35220031],
         [0.35220031, 1.99511743]])

The covariance structure is much less apparent by eye, and this is reflected
in significant discrepancies between the input and output covariance matrix.
In general this is an intrinsic trade-off using sampled distributions: a smaller
number of samples is computationally more efficient, but leads to larger
uncertainties in any of the relevant quantities. These tend to be of order
:math:`\sqrt{n_{\rm samples}}` in any derived quantity, but that depends on the
complexity of the distribution in question.

.. EXAMPLE END

.. note that if this section gets too long, it should be moved to a separate
   doc page - see the top of performance.inc.rst for the instructions on how to do
   that
.. include:: performance.inc.rst

Reference/API
=============

.. automodapi:: astropy.uncertainty