File: plot_test_independence.py

package info (click to toggle)
openturns 1.26-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 67,708 kB
  • sloc: cpp: 261,605; python: 67,030; ansic: 4,378; javascript: 406; sh: 185; xml: 164; makefile: 101
file content (161 lines) | stat: -rw-r--r-- 6,575 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
"""
Test independence
=================
"""

# %%
import openturns as ot

# %%
# Sample independence test
# ------------------------
#
# In this paragraph we perform tests to assess whether two 1-d samples generated
# by two random variables :math:`X` and :math:`Y` are independent or not.
#
# The following tests are available:
#
# - the ChiSquared test only used for discrete variables. Refer to :ref:`chi2_independence_test` for
#   more details.
#
# - the Pearson test: this test checks if there exists a linear
#   relationship between :math:`X` and :math:`Y`. It is equivalent to an independence test only
#   if the random vector :math:`(X,Y)` is a Gaussian vector. Refer to :ref:`pearson_test` for
#   more details.
#
# - the Spearman test: this test checks if there exists a monotonic
#   relationship between :math:`X` and :math:`Y`. Refer to :ref:`spearman_test` for
#   more details.
#
# - independence test using regression: this test checks if there exists a linear relation between
#   :math:`X` and :math:`Y` using a linear model.

# %%
# Case 1: Pearson and Spearman tests
# ----------------------------------
#
# We create a sample generated by a bivariate Gaussian vector :math:`(X,Y)` with independent components.
sample_Biv = ot.Normal(2).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)

# %%
# To test the independence between both samples, we first use the Pearson test with
# the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis).
# The Pearson test checks if there is a linear correlation between both random variables.
# The null hypothesis is: *There is no linear relation*.
# As :math:`(X,Y)` is a  Gaussian vector, it is equivalent to test the independence of the components.
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)

# %%
# We can then display the result of the test as a yes/no answer with
# the `getBinaryQualityMeasure`. We can retrieve the p-value and the threshold with the `getPValue`
# and `getThreshold` methods.
print(
    "Is the Pearson correlation coefficient is null ?",
    resultPearson.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultPearson.getPValue(),
    "threshold=%.6g" % resultPearson.getThreshold(),
)

# %%
# **Conclusion**: The Pearson test validates that there is no linear correlation between both samples:
# the null hypothesis assuming that the Pearson correlation coefficient is null is accepted. It
# means that the components are independent.
# In the general case, the Gaussian vector hypothesis must be validated!

# %%
# We can also use the Spearman test  with
# the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis).
# The Spearman test checks if there exists a monotonic relationship between :math:`X` and :math:`Y`.
# The null hypothesis is: *There is no monotonic relation*.
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print(
    "Is the Spearman correlation coefficient is null ?",
    resultSpearman.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultSpearman.getPValue(),
    "threshold=%.6g" % resultSpearman.getThreshold(),
)

# %%
# **Conclusion**: The Spearman test validates that there is no monotonic correlation between both samples:
# the null hypothesis assuming that the Spearman correlation coefficient is null is accepted.

# %%
# Here, we create a bivariate sample from a Gaussian vector which components are correlated. We note
# that the Pearson test and the Spearman test both detect a correlation as both null hypotheses are
# rejected.
cor_Matrix = ot.CorrelationMatrix(2)
cor_Matrix[0, 1] = 0.8
sample_Biv = ot.Normal([0] * 2, [1] * 2, cor_Matrix).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print('Pearson test : ', resultPearson)
print('Spearman test : ', resultSpearman)

# %%
# We consider now a discrete distribution. Let us create two independent samples.
sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)

# %%
# We use the Chi2 test to check independence.
resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)

# %%
# We display the results.
print(
    "Are the components independent?",
    resultChi2.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultChi2.getPValue(),
    "threshold=%.6g" % resultChi2.getThreshold(),
)

# %%
# **Conclusion**: The Chi2  test validates that both samples are independent:
# the null hypothesis assuming the independence is accepted.


# %%
# Case 2: Independence test using regression
# ------------------------------------------
#
# This test consists in fitting a linear model between :math:`X` and :math:`Y` and anylysing
# if the coefficients are significantly different from 0.

# %%
# We create a sample generated by a Gaussian vector :math:`(X_1, X_2, X_3)` with zero mean, unit
# variance and which components :math:`(X_1, X_3)` are correlated.
corr_Matrix = ot.CorrelationMatrix(3)
corr_Matrix[0, 2] = 0.9
distribution = ot.Normal([0] * 3, [1] * 3, corr_Matrix)
sample = distribution.getSample(100)

# %%
# Next, we split the sample in two samples : the first one is associated to  :math:`(X_1, X_2)` and the
# second one is associated to  :math:`X_3`.
first_Sample = sample.getMarginal([0, 1])
second_Sample = sample.getMarginal(2)

# %%
# We fit a linear model of :math:`X_3` with respect to :math:`(X_1, X_2)`:
# :math:`X_3 = a_0 + a_1X_1 + a_2X_2`.
# Then, we test if each coefficient :math:`a_k` is significantly different from 0.
# The null hypothesis is *The coefficient of the linear model is equal to zero*.
# When the result is *True*, the null hypothesis is accepted, which means that
# there is no dependence between the components. When the result is *False*, the null
# hypothesis is rejected, which means that there is a linear relationship between the components.
test_results = ot.LinearModelTest.FullRegression(first_Sample, second_Sample)
for i in range(len(test_results)):
    print(
        "Coefficient a" + str(i) + " is equal to 0?",
        test_results[i].getBinaryQualityMeasure(),
        "p-value=%.6g" % test_results[i].getPValue(),
        "threshold=%.6g" % test_results[i].getThreshold(),
    )

# %%
# **Conclusion**: The test detects the independence between :math:`X_1` and :math:`X_3` and the
# correlation between :math:`X_2` and :math:`X_3`. It also detects that :math:`a_0` is null.