File: spearman_coefficient.rst

package info (click to toggle)
openturns 1.24-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,204 kB
  • sloc: cpp: 256,662; python: 63,381; ansic: 4,414; javascript: 406; sh: 180; xml: 164; yacc: 123; makefile: 98; lex: 55
file content (175 lines) | stat: -rw-r--r-- 6,341 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
.. _spearman_coefficient:

Spearman correlation coefficient
--------------------------------

Spearman’s correlation coefficient :math:`\rho^S_{U,V}` aims to
measure the strength of a monotonic relationship between two random
variables :math:`U` and :math:`V`. It is in fact equivalent to the
Pearson’s correlation coefficient after having transformed :math:`U` and
:math:`V` to linearize any monotonic relationship (remember that
Pearson’s correlation coefficient may only be used to measure the
strength of linear relationships, see :ref:`Pearson’s correlation coefficient <pearson_coefficient>`):

.. math::

   \begin{aligned}
       \rho^S_{U,V} = \rho_{F_U(U),F_V(V)}
     \end{aligned}

where :math:`F_U` and :math:`F_V` denote the cumulative distribution
functions of :math:`U` and :math:`V`.

If we arrange a sample made up of :math:`N` pairs
:math:`\left\{ (u_1,v_1),(u_2,v_2),\ldots,(u_N,v_N) \right\}`, the
estimation of Spearman’s correlation coefficient first of all requires a
ranking to produce two samples :math:`(u_1,\ldots,u_N)` and
:math:`(v_1,\ldots,v_N)`. The ranking :math:`u_{[i]}` of the observation
:math:`u_i` is defined as the position of :math:`u_i` in the sample
reordered in ascending order: if :math:`u_i` is the smallest value in
the sample :math:`(u_1,\ldots,u_N)`, its ranking would equal 1; if
:math:`u_i` is the second smallest value in the sample, its ranking
would equal 2, and so forth. The ranking transformation is a procedure
that takes the sample :math:`(u_1,\ldots,u_N)`) as input data and
produces the sample :math:`(u_{[1]},\ldots,u_{[N]})` as an output
result.

For example, let us consider the sample
:math:`(u_1,u_2,u_3,u_4) = (1.5,0.7,5.1,4.3)`. We therefore have
:math:`(u_{[1]},u_{[2]}u_{[3]},u_{[4]}) = (2,1,4,3)`. :math:`u_1 = 1.5`
is in fact the second smallest value in the original, :math:`u_2 = 0.7`
the smallest, etc.

The estimation of Spearman’s correlation coefficient is therefore equal
to Pearson’s coefficient estimated with the aid of the :math:`N` pairs
:math:`(u_{[1]},v_{[1]})`, :math:`(u_{[2]},v_{[2]})`, …,
:math:`(u_{[N]},v_{[N]})`:

.. math::

   \begin{aligned}
       \widehat{\rho}^S_{U,V} = \frac{ \displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right) \left( v_{[i]} - \overline{v}_{[]} \right) }{ \sqrt{\displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right)^2 \left( v_{[i]} - \overline{v}_{[]} \right)^2} }
     \end{aligned}

where :math:`\overline{u}_{[]}` and :math:`\overline{v}_{[]}` represent
the empirical means of the samples :math:`(u_{[1]},\ldots,u_{[N]})` and
:math:`(v_{[1]},\ldots,v_{[N]})`.

The Spearman’s correlation coefficient takes values between -1 and 1.
The closer its absolute value is to 1, the stronger the indication is
that a monotonic relationship exists between variables :math:`U` and
:math:`V`. The sign of Spearman’s coefficient indicates if the two
variables increase or decrease in the same direction (positive
coefficient) or in opposite directions (negative coefficient). We note
that a correlation coefficient equal to 0 does not necessarily imply the
independence of variables :math:`U` and :math:`V`. There are two
possible situations in the event of a zero Spearman’s correlation
coefficient:

-  the variables :math:`U` and :math:`V` are in fact independent,

-  or a non-monotonic relationship exists between :math:`U` and
   :math:`V`.

.. plot::

    import openturns as ot
    from openturns.viewer import View

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['x^2'])
    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('There is a monotonic relationship between U and V:\nSpearman\'s coefficient is a relevant measure of dependency...')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    View(graph)

.. plot::

    import openturns as ot
    from openturns.viewer import View

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['5*x+10'])
    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('... because the rank transformation turns any monotonic trend\ninto a linear relation for which Pearson\'s correlation is relevant')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    View(graph)

.. plot::

    import openturns as ot
    from openturns.viewer import View

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['5'])
    y = ot.Uniform(0.0, 10.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('nSpearman\'s coefficient estimate is close to zero\nbecause U and V are independent')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    View(graph)

.. plot::

    import openturns as ot
    from openturns.viewer import View

    N = 20
    ot.RandomGenerator.SetSeed(10)
    x = ot.Uniform(0.0, 10.0).getSample(N)
    f = ot.SymbolicFunction(['x'], ['30*sin(x)'])
    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
    graph = f.draw(0.0, 10.0)
    graph.setTitle('Spearman\'s coefficient estimate is quite close to zero\neven though U and V are not independent')
    graph.setXTitle('u')
    graph.setYTitle('v')
    cloud = ot.Cloud(x, y)
    cloud.setPointStyle('circle')
    cloud.setColor('orange')
    graph.add(cloud)
    View(graph)

Spearman’s coefficient is often referred to as the rank correlation
coefficient.


.. topic:: API:

    - See method :py:meth:`~openturns.CorrelationAnalysis.computeSpearmanCorrelation`
    - See method :py:meth:`~openturns.Sample.computeSpearmanCorrelation`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/manage_data_and_samples/plot_sample_correlation`

.. topic:: References:

    - [saporta1990]_
    - [dixon1983]_
    - [nisthandbook]_
    - [dagostino1986]_
    - [bhattacharyya1997]_
    - [sprent2001]_
    - [burnham2002]_