1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
|
.. _spearman_coefficient:
Spearman correlation coefficient
--------------------------------
Spearman’s correlation coefficient :math:`\rho^S_{U,V}` aims to
measure the strength of a monotonic relationship between two random
variables :math:`U` and :math:`V`. It is in fact equivalent to the
Pearson’s correlation coefficient after having transformed :math:`U` and
:math:`V` to linearize any monotonic relationship (remember that
Pearson’s correlation coefficient may only be used to measure the
strength of linear relationships, see :ref:`Pearson’s correlation coefficient <pearson_coefficient>`):
.. math::
\begin{aligned}
\rho^S_{U,V} = \rho_{F_U(U),F_V(V)}
\end{aligned}
where :math:`F_U` and :math:`F_V` denote the cumulative distribution
functions of :math:`U` and :math:`V`.
If we arrange a sample made up of :math:`N` pairs
:math:`\left\{ (u_1,v_1),(u_2,v_2),\ldots,(u_N,v_N) \right\}`, the
estimation of Spearman’s correlation coefficient first of all requires a
ranking to produce two samples :math:`(u_1,\ldots,u_N)` and
:math:`(v_1,\ldots,v_N)`. The ranking :math:`u_{[i]}` of the observation
:math:`u_i` is defined as the position of :math:`u_i` in the sample
reordered in ascending order: if :math:`u_i` is the smallest value in
the sample :math:`(u_1,\ldots,u_N)`, its ranking would equal 1; if
:math:`u_i` is the second smallest value in the sample, its ranking
would equal 2, and so forth. The ranking transformation is a procedure
that takes the sample :math:`(u_1,\ldots,u_N)`) as input data and
produces the sample :math:`(u_{[1]},\ldots,u_{[N]})` as an output
result.
For example, let us consider the sample
:math:`(u_1,u_2,u_3,u_4) = (1.5,0.7,5.1,4.3)`. We therefore have
:math:`(u_{[1]},u_{[2]}u_{[3]},u_{[4]}) = (2,1,4,3)`. :math:`u_1 = 1.5`
is in fact the second smallest value in the original, :math:`u_2 = 0.7`
the smallest, etc.
The estimation of Spearman’s correlation coefficient is therefore equal
to Pearson’s coefficient estimated with the aid of the :math:`N` pairs
:math:`(u_{[1]},v_{[1]})`, :math:`(u_{[2]},v_{[2]})`, …,
:math:`(u_{[N]},v_{[N]})`:
.. math::
\begin{aligned}
\widehat{\rho}^S_{U,V} = \frac{ \displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right) \left( v_{[i]} - \overline{v}_{[]} \right) }{ \sqrt{\displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right)^2 \left( v_{[i]} - \overline{v}_{[]} \right)^2} }
\end{aligned}
where :math:`\overline{u}_{[]}` and :math:`\overline{v}_{[]}` represent
the empirical means of the samples :math:`(u_{[1]},\ldots,u_{[N]})` and
:math:`(v_{[1]},\ldots,v_{[N]})`.
The Spearman’s correlation coefficient takes values between -1 and 1.
The closer its absolute value is to 1, the stronger the indication is
that a monotonic relationship exists between variables :math:`U` and
:math:`V`. The sign of Spearman’s coefficient indicates if the two
variables increase or decrease in the same direction (positive
coefficient) or in opposite directions (negative coefficient). We note
that a correlation coefficient equal to 0 does not necessarily imply the
independence of variables :math:`U` and :math:`V`. There are two
possible situations in the event of a zero Spearman’s correlation
coefficient:
- the variables :math:`U` and :math:`V` are in fact independent,
- or a non-monotonic relationship exists between :math:`U` and
:math:`V`.
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['x^2'])
y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('There is a monotonic relationship between U and V:\nSpearman\'s coefficient is a relevant measure of dependency...')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['5*x+10'])
y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('... because the rank transformation turns any monotonic trend\ninto a linear relation for which Pearson\'s correlation is relevant')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['5'])
y = ot.Uniform(0.0, 10.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('nSpearman\'s coefficient estimate is close to zero\nbecause U and V are independent')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
.. plot::
import openturns as ot
from openturns.viewer import View
N = 20
ot.RandomGenerator.SetSeed(10)
x = ot.Uniform(0.0, 10.0).getSample(N)
f = ot.SymbolicFunction(['x'], ['30*sin(x)'])
y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
graph = f.draw(0.0, 10.0)
graph.setTitle('Spearman\'s coefficient estimate is quite close to zero\neven though U and V are not independent')
graph.setXTitle('u')
graph.setYTitle('v')
cloud = ot.Cloud(x, y)
cloud.setPointStyle('circle')
cloud.setColor('orange')
graph.add(cloud)
View(graph)
Spearman’s coefficient is often referred to as the rank correlation
coefficient.
.. topic:: API:
- See method :py:meth:`~openturns.CorrelationAnalysis.computeSpearmanCorrelation`
- See method :py:meth:`~openturns.Sample.computeSpearmanCorrelation`
.. topic:: Examples:
- See :doc:`/auto_data_analysis/manage_data_and_samples/plot_sample_correlation`
.. topic:: References:
- [saporta1990]_
- [dixon1983]_
- [nisthandbook]_
- [dagostino1986]_
- [bhattacharyya1997]_
- [sprent2001]_
- [burnham2002]_
|