File: robust.rst

package info (click to toggle)
astropy 5.2.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 41,972 kB
  • sloc: python: 219,331; ansic: 147,297; javascript: 13,556; lex: 8,496; sh: 3,319; xml: 1,622; makefile: 185
file content (191 lines) | stat: -rw-r--r-- 6,397 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
.. _stats-robust:

.. testsetup::

    >>> import numpy as np
    >>> np.random.seed(0)

*****************************
Robust Statistical Estimators
*****************************

Robust statistics provides reliable estimates of basic statistics for complex
distributions. The statistics package includes several robust statistical
functions that are commonly used in astronomy. This includes methods for
rejecting outliers as well as statistical description of the underlying
distributions.

In addition to the functions mentioned here, models can be fit with outlier
rejection using :func:`~astropy.modeling.fitting.FittingWithOutlierRemoval`.

Sigma Clipping
==============

Sigma clipping provides a fast method for identifying outliers in a
distribution. For a distribution of points, a center and a standard
deviation are calculated. Values which are less or more than a
specified number of standard deviations from a center value are
rejected. The process can be iterated to further reject outliers.

The `astropy.stats` package provides both a functional and
object-oriented interface for sigma clipping. The function is called
:func:`~astropy.stats.sigma_clip` and the class is called
:class:`~astropy.stats.SigmaClip`. By default, they both return a
masked array where the rejected points are masked.

Examples
--------

..
  EXAMPLE START
  Functional Sigma Clipping with astropy.stats.sigma_clip

We can start by generating some data that has a mean of 0 and standard
deviation of 0.2, but with outliers:

.. doctest-requires:: scipy

     >>> import numpy as np
     >>> import scipy.stats as stats
     >>> rng = np.random.default_rng(0)
     >>> x = np.arange(200)
     >>> y = np.zeros(200)
     >>> c = stats.bernoulli.rvs(0.35, size=x.shape)
     >>> y += (rng.normal(0., 0.2, x.shape) +
     ...       c * rng.normal(3.0, 5.0, x.shape))

Now we can use :func:`~astropy.stats.sigma_clip` to perform sigma
clipping on the data:

.. doctest-requires:: scipy

     >>> from astropy.stats import sigma_clip
     >>> filtered_data = sigma_clip(y, sigma=3, maxiters=10)

The output masked array then can be used to calculate statistics on
the data, fit models to the data, or otherwise explore the data.

..
  EXAMPLE END

..
  EXAMPLE START
  Object-Oriented Sigma Clipping with the astropy.stats.SigmaClip Class

To perform the same sigma clipping with the
:class:`~astropy.stats.SigmaClip` class:

.. doctest-requires:: scipy

     >>> from astropy.stats import SigmaClip
     >>> sigclip = SigmaClip(sigma=3, maxiters=10)
     >>> print(sigclip)  # doctest: +SKIP
     <SigmaClip>
        sigma: 3
        sigma_lower: None
        sigma_upper: None
        maxiters: 10
        cenfunc: <function median at 0x108dbde18>
        stdfunc: <function std at 0x103ab52f0>
     >>> filtered_data = sigclip(y)

Note that once the ``sigclip`` instance is defined above, it can be
applied to other data using the same already defined sigma-clipping
parameters.

..
  EXAMPLE END

For basic statistics, :func:`~astropy.stats.sigma_clipped_stats` is a
convenience function to calculate the sigma-clipped mean, median, and
standard deviation of an array. As can be seen, rejecting the
outliers returns accurate values for the underlying distribution.

..
  EXAMPLE START
  Calculating the Sigma-Clipped Mean, Median, and Standard Deviation of an Array

To use :func:`~astropy.stats.sigma_clipped_stats` for sigma-clipped statistics
calculation:

.. doctest-requires:: scipy

     >>> from astropy.stats import sigma_clipped_stats
     >>> y.mean(), np.median(y), y.std()  # doctest: +FLOAT_CMP
     (0.7068938765410144, 0.013567387681385379, 3.599605215851649)
     >>> sigma_clipped_stats(y, sigma=3, maxiters=10)  # doctest: +FLOAT_CMP
     (-0.0228473012826993, -0.02356858871405204, 0.2079616996908159)

:func:`~astropy.stats.sigma_clip` and
:class:`~astropy.stats.SigmaClip` can be combined with other robust
statistics to provide improved outlier rejection as well.

.. plot::
    :include-source:

    import numpy as np
    import scipy.stats as stats
    from matplotlib import pyplot as plt
    from astropy.stats import sigma_clip, mad_std

    # Generate fake data that has a mean of 0 and standard deviation of 0.2 with outliers
    rng = np.random.default_rng(0)
    x = np.arange(200)
    y = np.zeros(200)
    c = stats.bernoulli.rvs(0.35, size=x.shape)
    y += (rng.normal(0., 0.2, x.shape) +
          c * rng.normal(3.0, 5.0, x.shape))

    filtered_data = sigma_clip(y, sigma=3, maxiters=1, stdfunc=mad_std)

    # plot the original and rejected data
    plt.figure(figsize=(8,5))
    plt.plot(x, y, '+', color='#1f77b4', label="original data")
    plt.plot(x[filtered_data.mask], y[filtered_data.mask], 'x',
             color='#d62728', label="rejected data")
    plt.xlabel('x')
    plt.ylabel('y')
    plt.legend(loc=2, numpoints=1)

.. automodapi:: astropy.stats.sigma_clipping

..
  EXAMPLE END

Median Absolute Deviation
=========================

The median absolute deviation (MAD) is a measure of the spread of a
distribution and is defined as ``median(abs(a - median(a)))``. The
MAD can be calculated using `~astropy.stats.median_absolute_deviation`. For a
normal distribution, the MAD is related to the standard deviation by a factor
of 1.4826, and a convenience function, `~astropy.stats.mad_std`, is
available to apply the conversion.

.. note::

   A function can be supplied to the
   `~astropy.stats.median_absolute_deviation` to specify the median
   function to be used in the calculation. Depending on the version
   of NumPy and whether the array is masked or contains irregular
   values, significant performance increases can be had by
   preselecting the median function. If the median function is not
   specified, `~astropy.stats.median_absolute_deviation` will attempt
   to select the most relevant function according to the input data.


Biweight Estimators
===================

A set of functions are included in the `astropy.stats` package that use the
biweight formalism. These functions have long been used in astronomy,
particularly to calculate the velocity dispersion of galaxy clusters [1]_. The
following set of tasks are available for biweight measurements:

.. automodapi:: astropy.stats.biweight


References
----------

.. [1] Beers, Flynn, and Gebhardt (1990; AJ 100, 32) (https://ui.adsabs.harvard.edu/abs/1990AJ....100...32B)