File: spearman_coefficient.rst

package info (click to toggle)
openturns 1.26-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 67,708 kB
  • sloc: cpp: 261,605; python: 67,030; ansic: 4,378; javascript: 406; sh: 185; xml: 164; makefile: 101
file content (172 lines) | stat: -rw-r--r-- 6,078 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
.. _spearman_coefficient:

Spearman correlation coefficient
--------------------------------

The Spearman rank correlation coefficient measures how strongly
two random variables with finite variance are correlated. Spearman's correlation assesses monotonic
relationships between both variables.

Let :math:`(X,Y)` be two random variables which CDF are denoted by :math:`F_X` and :math:`F_Y`.
Spearman’s rank correlation coefficient :math:`\rho_S(X,Y)` is defined by:

.. math::
    \rho_S(X,Y) = \dfrac{\Cov{F_X(X),F_Y(Y)}}{\sqrt{\Var{F_X(X)}\Var{F_Y(Y)}}}

where :math:`\Cov{.}` is the covariance operator and
:math:`F_X` and :math:`F_Y` are the  respective CDF of :math:`X` and :math:`Y`.

The Spearman correlation between two variables is equal to the
:ref:`Pearson correlation coefficient <pearson_coefficient>` between the rank values of the variables:

.. math::
    \rho_S(X,Y) = \rho_P(F_X(X), F_Y(Y))


If :math:`C` is the CDF of the copula of the random vector :math:`(X,Y)`, then we get:

.. math::
   \rho_S(X,Y) = \rho_P(F_X(X),F_Y(Y)) = 12 \iint_{[0,1]^2} C(u,v)\,du\,dv - 3

which shows that the  Spearman correlation is linked to the copula only.

Let :math:`((x_1, y_1), \dots, (x_\sampleSize, y_\sampleSize))` be a sample generated
by the bivariate random vector :math:`(X,Y)`.
We denote by :math:`(r_1, s_1), \dots, (r_\sampleSize, s_\sampleSize)` the rank sample,
which means that :math:`r_k` is the rank of the value :math:`x_k` within the sample
:math:`(x_1, \dots, x_\sampleSize)` and :math:`s_k` is the rank of the value :math:`y_k` within the
sample :math:`(y_1, \dots, y_\sampleSize)`. The estimator :math:`\hat{\rho}_S(X,Y)` is equal to the
estimator  :math:`\hat{\rho}_P(X,Y)` computed
on the rank sample :math:`(r_1, s_1), \dots, (r_\sampleSize, s_\sampleSize)`. It is estimated as follows:

.. math::
    :label: SpearmanEstim

    \hat{\rho}_S(X,Y) = \dfrac{\sum_{k=1}^\sampleSize (r_k- \bar{r})(s_k- \bar{s})}
    {\sqrt{\sum_{k=1}^\sampleSize(r_k- \bar{r})^2\sum_{k=1}^\sampleSize(s_k- \bar{s})^2}}

where :math:`\bar{r} = \dfrac{1}{\sampleSize} \sum_{k=1}^\sampleSize r_k` and
:math:`\bar{s} = \dfrac{1}{\sampleSize} \sum_{k=1}^\sampleSize s_k` are the empirical mean rank of each sample.


We sum up some interesting features of the coefficient:

- The Spearman correlation coefficient takes values between -1 and 1.

- If :math:`|\rho_S(X,Y)|=1` then there exists a monotonic function
  :math:`\varphi` such that :math:`Y=\varphi(X)`.

- The closer :math:`|\rho_S(X,Y)|` is to 1, the stronger the indication is
  that a monotonic relationship exists between :math:`X` and
  :math:`Y`. The sign of the Spearman coefficient indicates if the two
  variables increase or decrease in the same direction (positive
  coefficient) or in opposite directions (negative coefficient).

- If :math:`X` and :math:`Y` are independent, then :math:`\rho_S(X,Y)=0`.

- If :math:`\rho_S(X,Y)=0`, it does not imply the independence of the variables
  :math:`X` and :math:`Y`. It may only means that the relation between both variables
  is not monotonic.

.. plot::

    import openturns as ot
    import openturns.viewer as otv

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['x^2'])
    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('There is a monotonic relationship between U and V:\nSpearman\'s coefficient is a relevant measure of dependency...')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    otv.View(graph)

.. plot::

    import openturns as ot
    import openturns.viewer as otv

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['5*x+10'])
    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('... because the rank transformation turns any monotonic trend\ninto a linear relation for which Pearson\'s correlation is relevant')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    otv.View(graph)

.. plot::

    import openturns as ot
    import openturns.viewer as otv

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['5'])
    y = ot.Uniform(0.0, 10.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('nSpearman\'s coefficient estimate is close to zero\nbecause U and V are independent')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    otv.View(graph)

.. plot::

    import openturns as ot
    import openturns.viewer as otv

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['30*sin(x)'])
    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('Spearman\'s coefficient estimate is quite close to zero\neven though U and V are not independent')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    otv.View(graph)

Spearman’s coefficient is often referred to as the rank correlation
coefficient.


.. topic:: API:

    - See :class:`~openturns.CorrelationAnalysis` class method :py:meth:`~openturns.CorrelationAnalysis.computeSpearmanCorrelation`
    - See :class:`~openturns.Sample` class method :py:meth:`~openturns.Sample.computeSpearmanCorrelation`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/sample_analysis/plot_sample_correlation`

.. topic:: References:

    - [saporta1990]_
    - [dixon1983]_
    - [nisthandbook]_
    - [dagostino1986]_
    - [bhattacharyya1997]_
    - [sprent2001]_
    - [burnham2002]_