File: chi2_independence_test.rst

package info (click to toggle)
openturns 1.24-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,204 kB
  • sloc: cpp: 256,662; python: 63,381; ansic: 4,414; javascript: 406; sh: 180; xml: 164; yacc: 123; makefile: 98; lex: 55
file content (80 lines) | stat: -rw-r--r-- 3,232 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
.. _chi2_independence_test:

Chi-squared test for independence
---------------------------------

The :math:`\chi^2` test can be used to detect dependencies between two random discrete variables.

Let :math:`\vect{X} = (X^1, X^2)` be a random variable of dimension 2 with  values in
:math:`\{b_1, \dots, b_{\ell} \} \times \{c_1, \dots, c_{r} \}`.

We want to test whether :math:`\vect{X}` has independent components.

Let :math:`\vect{X}_1, \ldots , \vect{X}_\sampleSize` be i.i.d. random variables following the distribution of :math:`\vect{X}`. Two test statistics can be defined by:

.. math::

       D_{\sampleSize}^{(1)}  = \sum_{i=1}^{\ell} \sum_{j=1}^{r} \dfrac{\left(N_{i,j} -
       \frac{N_{i,.}N_{.,j}}{\sampleSize}\right)}{N_{i,j}} \\
       D_{\sampleSize}^{(2)}  = \sampleSize \sum_{i=1}^{\ell} \sum_{j=1}^{r}
       \dfrac{\left(N_{i,j} - \frac{N_{i,.}N_{.,j}}{\sampleSize}\right)}{N_{i,.}N_{.,j}}


where:

-  :math:`N_{i,j} = \sum_{k=1}^{\sampleSize}1_{X^1_k = b_i, X^2_k = c_j}` be the number of pairs
   equal to :math:`(b_i, c_j)`,

-  :math:`N_{i,.}= \sum_{k=1}^{\sampleSize}1_{X^1_k = b_i}` be the number of pairs
   such that the first component is equal to :math:`b_i`,

-  :math:`N_{., j}= \sum_{k=1}^{\sampleSize}1_{X^2_k = c_j}` be the number of pairs
   such that the second component is equal to :math:`c_j`.

Let :math:`d_{\sampleSize}^{(i)}` be the realization of the test statistic
:math:`D_{\sampleSize}^{(i)}` on the sample
:math:`\left\{ \vect{x}_1,\dots,\vect{x}_{\sampleSize} \right\}` with :math:`i=1,2`.

Under the null hypothesis :math:`\mathcal{H}_0 = \{ \vect{X} \mbox{ has independent components}\}`,
the distribution of both test statistics :math:`D_{\sampleSize}^{(i)}` is asymptotically
known: i.e. when :math:`\sampleSize \rightarrow +\infty`: this is
the :math:`\chi^2((\ell-1)(r-1))` distribution.
If :math:`\sampleSize` is sufficiently large, we can use the asymptotic distribution to apply
the test as follows.

We fix a risk :math:`\alpha` (error type I) and we evaluate the associated critical value
:math:`d_\alpha` which is the quantile of order
:math:`1-\alpha` of :math:`D_{\sampleSize}^{(i)}`.

Then a decision is made, either by comparing the test statistic to the theoretical threshold
:math:`d_\alpha^{(i)}` (or equivalently by evaluating the p-value of the sample  defined as
:math:`\Prob{D_{\sampleSize}^{(i)} > d_{\sampleSize}^{(i)}}` and by comparing it to :math:`\alpha`):

-  if :math:`d_{\sampleSize}^{(i)}>d_{\alpha}^{(i)}` (or equivalently
   :math:`\Prob{D_{\sampleSize}^{(i)} > d_{\sampleSize}^{(i)}} < \alpha`),
   then we reject the independence between the components,

-  if :math:`d_{\sampleSize}^{(i)} \leq d_{\alpha}^{(i)}` (or equivalently
   :math:`\Prob{D_{\sampleSize}^{(i)} > d_{\sampleSize}^{(i)}} \geq \alpha`),
   then the independence between the components is considered acceptable.


.. topic:: API:

    - See :py:func:`~openturns.HypothesisTest.ChiSquared`


.. topic:: Examples:

    - See :doc:`/auto_data_analysis/statistical_tests/plot_test_independence`


.. topic:: References:

    - [saporta1990]_
    - [dixon1983]_
    - [nisthandbook]_
    - [dagostino1986]_
    - [bhattacharyya1997]_
    - [sprent2001]_
    - [burnham2002]_