File: chi2_fitting_test.rst

package info (click to toggle)
openturns 1.24-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,204 kB
  • sloc: cpp: 256,662; python: 63,381; ansic: 4,414; javascript: 406; sh: 180; xml: 164; yacc: 123; makefile: 98; lex: 55
file content (77 lines) | stat: -rw-r--r-- 3,442 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
.. _chi2_fitting_test:

Chi-squared test
----------------

The :math:`\chi^2` test is a statistical test of whether a given sample of data is drawn
from a given discrete distribution. The library only provides the :math:`\chi^2` test for
distributions of dimension 1.

We denote by :math:`\left\{ x_1,\dots,x_{\sampleSize} \right\}` a sample of dimension 1.
Let :math:`F` be  the (unknown) cumulative distribution function of the discrete distribution.
We want to
test whether the sample is drawn from the discrete distribution characterized by the
probabilities :math:`\left\{ p(x;\vect{\theta}) \right\}_{x \in \cE}` where
:math:`\vect{\theta}` is the set of parameters of the distribution and
and :math:`\cE` its support. Let :math:`G` be the cumulative distribution function of this candidate distribution.

This test  involves the calculation of the test statistic which is
the distance between the empirical number of values equal to :math:`x` in the sample and the
theoretical mean one evaluated from the discrete distribution.

Let :math:`X_1, \ldots , X_{\sampleSize}` be i.i.d. random variables following the
distribution with CDF :math:`F`. According to the tested distribution :math:`G`,
the theoretical mean number of values equal to :math:`x` is :math:`\sampleSize p(x;\vect{\theta})`
whereas the number evaluated from :math:`X_1, \ldots , X_{\sampleSize}` is
:math:`N(x) = \sum_{i=1}^{\sampleSize} 1_{X_i=x}`.
Then the test statistic is defined by:

  .. math::

         D_{\sampleSize} = \sum_{x \in \cE} \frac{\left[\sampleSize p(x)-N(x)\right]^2}{N(x)}.

If some values of :math:`x` have not been observed in the sample, we have to gather values in
classes so that they contain at least 5 data points (empirical rule). Then the theoretical
probabilities of all the values in the class are added to get the
theoretical probability of the class.

Let :math:`d_{\sampleSize}` be the realization of the test statistic :math:`d_{\sampleSize}`
on the sample :math:`\left\{ x_1,\dots,x_{\sampleSize} \right\}`.
Under the null hypothesis :math:`\mathcal{H}_0 = \{ G = F\}`,
the distribution of the test statistic :math:`D_{\sampleSize}` is
known: this is the :math:`\chi^2(J-1)` distribution, where :math:`J` is the number
of distinct values in the support of :math:`G`.
We apply the test as follows.

We fix a risk :math:`\alpha` (error type I) and we evaluate the associated critical value
:math:`d_\alpha` which is the quantile of order :math:`1-\alpha` of :math:`D_{\sampleSize}`.
Then a decision is made, either by comparing the test statistic to the theoretical threshold
:math:`d_\alpha` (or equivalently by evaluating the p-value of the sample  defined as
:math:`\Prob{D_{\sampleSize} > d_{\sampleSize}}` and by comparing it to :math:`\alpha`):

-  if :math:`d_{\sampleSize}>d_{\alpha}` (or equivalently
   :math:`\Prob{D_{\sampleSize} > d_{\sampleSize}} < \alpha`),
   then we reject :math:`G`,

-  if :math:`d_{\sampleSize} \leq d_{\alpha}` (or equivalently
   :math:`\Prob{D_{\sampleSize} > d_{\sampleSize}} \geq \alpha`),
   then :math:`G` is considered acceptable.


.. topic:: API:

    - See :py:func:`~openturns.FittingTest.ChiSquared`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/statistical_tests/plot_chi2_fitting_test`

.. topic:: References:

    - [saporta1990]_
    - [dixon1983]_
    - [nisthandbook]_
    - [dagostino1986]_
    - [sprent2001]_
    - [bhattacharyya1997]_
    - [burnham2002]_