1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
|
.. _pearson_coefficient:
Pearson correlation coefficient
-------------------------------
Pearson’s correlation coefficient :math:`\rho_{U,V}` aims to measure
the strength of a linear relationship between two random variables
:math:`U` and :math:`V`. It is defined as follows:
.. math::
\begin{aligned}
\rho_{U,V} = \frac{\displaystyle \Cov{U,V}}{\sigma_U \sigma_V}
\end{aligned}
where
:math:`\Cov{U,V} = \Expect{ \left( U - m_U \right) \left( V - m_V \right) }`,
:math:`m_U= \Expect{U}`, :math:`m_V= \Expect{V}`,
:math:`\sigma_U= \sqrt{\Var{U}}` and :math:`\sigma_V= \sqrt{\Var{V}}`.
If we have a sample made up of a set of :math:`N` pairs
:math:`\left\{ (u_1,v_1),(u_2,v_2),\ldots,(u_N,v_N) \right\}`, Pearson’s
correlation coefficient can be estimated using the formula:
.. math::
\begin{aligned}
\widehat{\rho}_{U,V} = \frac{ \displaystyle \sum_{i=1}^N \left( u_i - \overline{u} \right) \left( v_i - \overline{v} \right) }{ \sqrt{\displaystyle \sum_{i=1}^N \left( u_i - \overline{u} \right)^2 \left( v_i - \overline{v} \right)^2} }
\end{aligned}
where :math:`\overline{u}` and :math:`\overline{v}` represent the
empirical means of the samples :math:`(u_1,\ldots,u_N)` and
:math:`(v_1,\ldots,v_N)`.
Pearson’s correlation coefficient takes values between -1 and 1. The
closer its absolute value is to 1, the stronger the indication is that a
linear relationship exists between variables :math:`U` and :math:`V`.
The sign of Pearson’s coefficient indicates if the two variables
increase or decrease in the same direction (positive coefficient) or in
opposite directions (negative coefficient). We note that a correlation
coefficient equal to 0 does not necessarily imply the independence of
variables :math:`U` and :math:`V`: this property is in fact
theoretically guaranteed only if :math:`U` and :math:`V` both follow a
Normal distribution. In all other cases, there are two possible
situations in the event of a zero Pearson’s correlation coefficient:
- the variables :math:`U` and :math:`V` are in fact independent,
- or a non-linear relationship exists between :math:`U` and :math:`V`.
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['5*x+10'])
y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('A linear relationship exists between U and V:\n Pearson\'s coefficient is a relevant measure of dependency')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['x^2'])
y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('There is a strong, non-linear relationship between U and V:\n Pearson\'s coefficient is not a relevant measure of dependency')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['5'])
y = ot.Uniform(0.0, 10.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('Pearson\'s coefficient estimate is quite close to zero\nbecause U and V are independent')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['30*sin(x)'])
y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('Pearson\'s coefficient estimate is quite close to zero\neven though U and V are not independent')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
The estimate :math:`\widehat{\rho}` of Pearson’s correlation
coefficient is sometimes denoted by :math:`r`.
.. topic:: API:
- See method :py:meth:`~openturns.CorrelationAnalysis.computeLinearCorrelation`
- See method :py:meth:`~openturns.Sample.computeLinearCorrelation`
.. topic:: Examples:
- See :doc:`/auto_data_analysis/manage_data_and_samples/plot_sample_correlation`
.. topic:: References:
- [saporta1990]_
- [dixon1983]_
- [nisthandbook]_
- [dagostino1986]_
- [bhattacharyya1997]_
- [sprent2001]_
- [burnham2002]_
|