File: plot_test_independence.py

package info (click to toggle)
openturns 1.26-4
links: PTS, VCS
area: main
in suites: forky, sid
size: 67,708 kB
sloc: cpp: 261,605; python: 67,030; ansic: 4,378; javascript: 406; sh: 185; xml: 164; makefile: 101
file content (161 lines) | stat: -rw-r--r-- 6,575 bytes
"""
Test independence
=================
"""

# %%
import openturns as ot

# %%
# Sample independence test
# ------------------------
#
# In this paragraph we perform tests to assess whether two 1-d samples generated
# by two random variables :math:`X` and :math:`Y` are independent or not.
#
# The following tests are available:
#
# - the ChiSquared test only used for discrete variables. Refer to :ref:`chi2_independence_test` for
#   more details.
#
# - the Pearson test: this test checks if there exists a linear
#   relationship between :math:`X` and :math:`Y`. It is equivalent to an independence test only
#   if the random vector :math:`(X,Y)` is a Gaussian vector. Refer to :ref:`pearson_test` for
#   more details.
#
# - the Spearman test: this test checks if there exists a monotonic
#   relationship between :math:`X` and :math:`Y`. Refer to :ref:`spearman_test` for
#   more details.
#
# - independence test using regression: this test checks if there exists a linear relation between
#   :math:`X` and :math:`Y` using a linear model.

# %%
# Case 1: Pearson and Spearman tests
# ----------------------------------
#
# We create a sample generated by a bivariate Gaussian vector :math:`(X,Y)` with independent components.
sample_Biv = ot.Normal(2).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)

# %%
# To test the independence between both samples, we first use the Pearson test with
# the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis).
# The Pearson test checks if there is a linear correlation between both random variables.
# The null hypothesis is: *There is no linear relation*.
# As :math:`(X,Y)` is a  Gaussian vector, it is equivalent to test the independence of the components.
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)

# %%
# We can then display the result of the test as a yes/no answer with
# the `getBinaryQualityMeasure`. We can retrieve the p-value and the threshold with the `getPValue`
# and `getThreshold` methods.
print(
    "Is the Pearson correlation coefficient is null ?",
    resultPearson.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultPearson.getPValue(),
    "threshold=%.6g" % resultPearson.getThreshold(),
)

# %%
# **Conclusion**: The Pearson test validates that there is no linear correlation between both samples:
# the null hypothesis assuming that the Pearson correlation coefficient is null is accepted. It
# means that the components are independent.
# In the general case, the Gaussian vector hypothesis must be validated!

# %%
# We can also use the Spearman test  with
# the Type I error equal to 0.1 (which is the probability to wrongly rejects the null hypothesis).
# The Spearman test checks if there exists a monotonic relationship between :math:`X` and :math:`Y`.
# The null hypothesis is: *There is no monotonic relation*.
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print(
    "Is the Spearman correlation coefficient is null ?",
    resultSpearman.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultSpearman.getPValue(),
    "threshold=%.6g" % resultSpearman.getThreshold(),
)

# %%
# **Conclusion**: The Spearman test validates that there is no monotonic correlation between both samples:
# the null hypothesis assuming that the Spearman correlation coefficient is null is accepted.

# %%
# Here, we create a bivariate sample from a Gaussian vector which components are correlated. We note
# that the Pearson test and the Spearman test both detect a correlation as both null hypotheses are
# rejected.
cor_Matrix = ot.CorrelationMatrix(2)
cor_Matrix[0, 1] = 0.8
sample_Biv = ot.Normal([0] * 2, [1] * 2, cor_Matrix).getSample(1000)
sample1 = sample_Biv.getMarginal(0)
sample2 = sample_Biv.getMarginal(1)
resultPearson = ot.HypothesisTest.Pearson(sample1, sample2, 0.10)
resultSpearman = ot.HypothesisTest.Spearman(sample1, sample2, 0.10)
print('Pearson test : ', resultPearson)
print('Spearman test : ', resultSpearman)

# %%
# We consider now a discrete distribution. Let us create two independent samples.
sample1 = ot.Poisson(0.2).getSample(100)
sample2 = ot.Poisson(0.2).getSample(100)

# %%
# We use the Chi2 test to check independence.
resultChi2 = ot.HypothesisTest.ChiSquared(sample1, sample2, 0.10)

# %%
# We display the results.
print(
    "Are the components independent?",
    resultChi2.getBinaryQualityMeasure(),
    "p-value=%.6g" % resultChi2.getPValue(),
    "threshold=%.6g" % resultChi2.getThreshold(),
)

# %%
# **Conclusion**: The Chi2  test validates that both samples are independent:
# the null hypothesis assuming the independence is accepted.


# %%
# Case 2: Independence test using regression
# ------------------------------------------
#
# This test consists in fitting a linear model between :math:`X` and :math:`Y` and anylysing
# if the coefficients are significantly different from 0.

# %%
# We create a sample generated by a Gaussian vector :math:`(X_1, X_2, X_3)` with zero mean, unit
# variance and which components :math:`(X_1, X_3)` are correlated.
corr_Matrix = ot.CorrelationMatrix(3)
corr_Matrix[0, 2] = 0.9
distribution = ot.Normal([0] * 3, [1] * 3, corr_Matrix)
sample = distribution.getSample(100)

# %%
# Next, we split the sample in two samples : the first one is associated to  :math:`(X_1, X_2)` and the
# second one is associated to  :math:`X_3`.
first_Sample = sample.getMarginal([0, 1])
second_Sample = sample.getMarginal(2)

# %%
# We fit a linear model of :math:`X_3` with respect to :math:`(X_1, X_2)`:
# :math:`X_3 = a_0 + a_1X_1 + a_2X_2`.
# Then, we test if each coefficient :math:`a_k` is significantly different from 0.
# The null hypothesis is *The coefficient of the linear model is equal to zero*.
# When the result is *True*, the null hypothesis is accepted, which means that
# there is no dependence between the components. When the result is *False*, the null
# hypothesis is rejected, which means that there is a linear relationship between the components.
test_results = ot.LinearModelTest.FullRegression(first_Sample, second_Sample)
for i in range(len(test_results)):
    print(
        "Coefficient a" + str(i) + " is equal to 0?",
        test_results[i].getBinaryQualityMeasure(),
        "p-value=%.6g" % test_results[i].getPValue(),
        "threshold=%.6g" % test_results[i].getThreshold(),
    )

# %%
# **Conclusion**: The test detects the independence between :math:`X_1` and :math:`X_3` and the
# correlation between :math:`X_2` and :math:`X_3`. It also detects that :math:`a_0` is null.