1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
|
"""
Test independence
=================
"""
# %%
import openturns as ot
# %%
# Sample independence test
# ------------------------
#
# In this paragraph we perform tests to assess whether two 1-d samples generated
# by two random variables :math:`X` and :math:`Y` are independent or not.
#
# The following tests are available:
#
# - the ChiSquared test only used for discrete variables. Refer to :ref:`chi2_independence_test` for
# more details.
#
# - the Pearson test: this test checks if there exists a linear
# relationship between :math:`X` and :math:`Y`. It is equivalent to an independence test only
# if the random vector :math:`(X,Y)` is a Gaussian vector. Refer to :ref:`pearson_test` for
# more details.
#
# - the Spearman test: this test checks if there exists a monotonic
# relationship between :math:`X` and :math:`Y`. Refer to :ref:`spearman_test` for
# more details.
#
# - independence test using regression: this test checks if there exists a linear relation between
# :math:`X` and :math:`Y` using a linear model.
# %%
# Case 1: Pearson and Spearman tests
# ----------------------------------
#
# We create a sample generated by a bivariate Gaussian vector :math:`(X,Y)` with independent components.
sample_Biv = ot.Normal(2).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)
# %%
# To test the independence between both samples, we first use the Pearson test with
# the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis).
# The Pearson test checks if there is a linear correlation between both random variables.
# The null hypothesis is: *There is no linear relation*.
# As :math:`(X,Y)` is a Gaussian vector, it is equivalent to test the independence of the components.
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
# %%
# We can then display the result of the test as a yes/no answer with
# the `getBinaryQualityMeasure`. We can retrieve the p-value and the threshold with the `getPValue`
# and `getThreshold` methods.
print(
"Is the Pearson correlation coefficient is null ?",
resultPearson.getBinaryQualityMeasure(),
"p-value=%.6g" % resultPearson.getPValue(),
"threshold=%.6g" % resultPearson.getThreshold(),
)
# %%
# **Conclusion**: The Pearson test validates that there is no linear correlation between both samples:
# the null hypothesis assuming that the Pearson correlation coefficient is null is accepted. It
# means that the components are independent.
# In the general case, the Gaussian vector hypothesis must be validated!
# %%
# We can also use the Spearman test with
# the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis).
# The Spearman test checks if there exists a monotonic relationship between :math:`X` and :math:`Y`.
# The null hypothesis is: *There is no monotonic relation*.
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print(
"Is the Spearman correlation coefficient is null ?",
resultSpearman.getBinaryQualityMeasure(),
"p-value=%.6g" % resultSpearman.getPValue(),
"threshold=%.6g" % resultSpearman.getThreshold(),
)
# %%
# **Conclusion**: The Spearman test validates that there is no monotonic correlation between both samples:
# the null hypothesis assuming that the Spearman correlation coefficient is null is accepted.
# %%
# Here, we create a bivariate sample from a Gaussian vector which components are correlated. We note
# that the Pearson test and the Spearman test both detect a correlation as both null hypotheses are
# rejected.
cor_Matrix = ot.CorrelationMatrix(2)
cor_Matrix[0, 1] = 0.8
sample_Biv = ot.Normal([0] * 2, [1] * 2, cor_Matrix).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print('Pearson test : ', resultPearson)
print('Spearman test : ', resultSpearman)
# %%
# We consider now a discrete distribution. Let us create two independent samples.
sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)
# %%
# We use the Chi2 test to check independence.
resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)
# %%
# We display the results.
print(
"Are the components independent?",
resultChi2.getBinaryQualityMeasure(),
"p-value=%.6g" % resultChi2.getPValue(),
"threshold=%.6g" % resultChi2.getThreshold(),
)
# %%
# **Conclusion**: The Chi2 test validates that both samples are independent:
# the null hypothesis assuming the independence is accepted.
# %%
# Case 2: Independence test using regression
# ------------------------------------------
#
# This test consists in fitting a linear model between :math:`X` and :math:`Y` and anylysing
# if the coefficients are significantly different from 0.
# %%
# We create a sample generated by a Gaussian vector :math:`(X_1, X_2, X_3)` with zero mean, unit
# variance and which components :math:`(X_1, X_3)` are correlated.
corr_Matrix = ot.CorrelationMatrix(3)
corr_Matrix[0, 2] = 0.9
distribution = ot.Normal([0] * 3, [1] * 3, corr_Matrix)
sample = distribution.getSample(100)
# %%
# Next, we split the sample in two samples : the first one is associated to :math:`(X_1, X_2)` and the
# second one is associated to :math:`X_3`.
first_Sample = sample.getMarginal([0, 1])
second_Sample = sample.getMarginal(2)
# %%
# We fit a linear model of :math:`X_3` with respect to :math:`(X_1, X_2)`:
# :math:`X_3 = a_0 + a_1X_1 + a_2X_2`.
# Then, we test if each coefficient :math:`a_k` is significantly different from 0.
# The null hypothesis is *The coefficient of the linear model is equal to zero*.
# When the result is *True*, the null hypothesis is accepted, which means that
# there is no dependence between the components. When the result is *False*, the null
# hypothesis is rejected, which means that there is a linear relationship between the components.
test_results = ot.LinearModelTest.FullRegression(first_Sample, second_Sample)
for i in range(len(test_results)):
print(
"Coefficient a" + str(i) + " is equal to 0?",
test_results[i].getBinaryQualityMeasure(),
"p-value=%.6g" % test_results[i].getPValue(),
"threshold=%.6g" % test_results[i].getThreshold(),
)
# %%
# **Conclusion**: The test detects the independence between :math:`X_1` and :math:`X_3` and the
# correlation between :math:`X_2` and :math:`X_3`. It also detects that :math:`a_0` is null.
|