1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275
|
.. _graphical_fitting_test:
Graphical goodness-of-fit tests
-------------------------------
We gather some graphical tools to validate whether a given sample of data
is drawn from a given continuous distribution of dimension 1.
We denote by :math:`\left\{ x_1,\ldots,x_{\sampleSize} \right\}` the data of dimension 1
which have been independently generated by the random variable :math:`X`.
Let :math:`F` be a continuous cumulative distribution function.
We want to validate whether :math:`X` follows the distribution characterized by :math:`F`.
QQ-plot
~~~~~~~
The Quantile - Quantile - Plot (QQ Plot) is based on the comparison of some quantiles
between the tested distribution and the empirical ones. Let :math:`q_{X}(\alpha)` be the quantile of order
:math:`\alpha` of the distribution :math:`F`, with :math:`\alpha \in (0, 1)`. It is defined by:
.. math::
\begin{aligned}
q_{X}(\alpha) = \inf \{ x \in \Rset \, |\, F(x) \geq \alpha \}
\end{aligned}
The empirical quantile of order :math:`\alpha` built on the sample is defined by:
.. math::
\begin{aligned}
\widehat{q}_{X}(\alpha) = x_{([\sampleSize \alpha]+1)}
\end{aligned}
where :math:`[\sampleSize\alpha]` denotes the integral part of :math:`\sampleSize \alpha`
and :math:`\left\{ x_{(1)},\ldots,x_{(\sampleSize)} \right\}` is the sample sorted in ascended order:
.. math::
x_{(1)} \leq \dots \leq x_{(\sampleSize)}
Thus, the :math:`j^\textrm{th}` smallest value of the sample
:math:`x_{(j)}` is an estimate :math:`\widehat{q}_{X}(\alpha)` of the
:math:`\alpha`-quantile where :math:`\alpha = (j-1)/\sampleSize`, for :math:`1 < j \leq \sampleSize`.
The QQ-plot draws the couples
:math:`(x_{(j)}, q_{X}\left(\dfrac{j-1}{\sampleSize}\right))_{1 < j \leq \sampleSize}`.
If :math:`X` follows the distribution :math:`F`, then the points should be close to the diagonal.
The following figure illustrates a QQ-plot with a
sample of size :math:`\sampleSize=50`. In this example, the
points remain close to the diagonal and the hypothesis “:math:`F` is the
cumulative distribution function of :math:`X`” does not seem false,
even if a more quantitative analysis should be
carried out to confirm this.
.. plot::
import openturns as ot
from openturns.viewer import View
ot.RandomGenerator.SetSeed(0)
distribution = ot.Normal(3.0, 2.0)
sample = distribution.getSample(150)
graph = ot.VisualTest.DrawQQplot(sample, distribution)
View(graph)
In this second example, the tested continuous distribution is clearly
false.
.. plot::
import openturns as ot
from openturns.viewer import View
ot.RandomGenerator.SetSeed(0)
distribution = ot.Normal(3.0, 3.0)
distribution2 = ot.Normal(2.0, 1.0)
sample = distribution.getSample(150)
graph = ot.VisualTest.DrawQQplot(sample, distribution2)
View(graph)
Normal probability plot (Henry’s line)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This test is dedicated to the normal distribution.
The following result is used in the test: if :math:`X` follows the :math:`\cN(\mu,\sigma)` distribution,
then :math:`(X-\mu) / \sigma` follows the :math:`\cN(0,1)` distribution. Furthermore, let :math:`q_{\cN(\mu,\sigma)}(\alpha)`
be the quantile of order :math:`\alpha` of :math:`\cN(\mu,\sigma)` and let :math:`q_{\cN(0,1)}(\alpha)`
be the quantile of order :math:`\alpha` of :math:`\cN(0,1)`. Then we have the relation:
.. math::
q_{\cN(0,1)}(\alpha) = \dfrac{q_{\cN(\mu,\sigma)}(\alpha) - \mu}{\sigma}
Then the Henri line draws the QQ-plot built from the empirical quantiles of order :math:`\dfrac{j-1}{\sampleSize}`
and the quantiles of same order of the :math:`\cN(0,1)` distribution. If the sample comes from the :math:`\cN(\mu,\sigma)`
distribution, then the points should be close to the line of equation :math:`y = \dfrac{x-\mu}{\sigma}`.
The following figure illustrates the Henry’s line
with a sample of size :math:`\sampleSize=50`. In this
example, the points remain close to a line and the hypothesis “:math:`X` follows
a normal distribution“ does not seem
false, even if a more quantitative analysis
should be carried out to confirm this.
.. plot::
import openturns as ot
from openturns.viewer import View
ot.RandomGenerator.SetSeed(0)
distribution = ot.Normal(10.0, 2.0)
sample = distribution.getSample(50)
graph = ot.VisualTest.DrawHenryLine(sample)
View(graph)
In this second example, the hypothesis of a normal distribution seems
far less plausible because of the behavior for small values of
:math:`X`.
.. plot::
import openturns as ot
from openturns.viewer import View
ot.RandomGenerator.SetSeed(0)
distribution = ot.LogNormal(2.0, 1.0, 0.0)
sample = distribution.getSample(50)
graph = ot.VisualTest.DrawHenryLine(sample)
View(graph)
Kendall plot
~~~~~~~~~~~~
In the bivariate case, the Kendall Plot test allows one to validate whether a sample is drawn from
a given copula or to check whether two samples share
the same copula.
Let :math:`\inputRV = (X_1, X_2)` be a bivariate random vector with the copula :math:`C` and
the marginal cumulative distribution functions :math:`(F_1, F_2)`.
Let :math:`(U_1, U_2) = (F_1(X_1), F_2(X_2))` be the random vector with :math:`\cU(0,1)` marginal distributions
and :math:`C` copula.
Let :math:`(\inputReal_i)_{1 \leq i \leq \sampleSize}` a sample drawn from :math:`\inputRV`. We build the rank sample
defined by :math:`(\vect{u}_i)_{1 \leq i \leq \sampleSize}` where :math:`\vect{u}_i =(F_1(x_{1,i}), F_2(x_{2,i}))`.
We define:
.. math::
H = C(U,V)
where :math:`(U,V)` is a bivariate random vector with :math:`\cU(0,1)` marginal distributions and :math:`C` copula.
We denote by :math:`K_0` the cumulative distribution function of :math:`H`.
We can get a sample of :math:`H` denoted by :math:`(h_i)_{1 \leq i \leq \sampleSize}` from the sample
:math:`(\vect{u}_i)_{1 \leq i \leq \sampleSize}` as follows:
.. math::
h_i & = C(u_{1,i}, u_{2,i}) \\
& = \Prob{F_1(X_1) \leq u_{1,i}, F_2(X_2) \leq u_{2,i}}\\
& = F_{(U_1, U_2)}(u_{1,i}, u_{2,i}) \\
& \approx \widehat{F}_{(U_1, U_2)}(u_{1,i}, u_{2,i})
where :math:`\widehat{F}_{(U_1, U_2)}` is the empirical cumulative distribution function
of the sample :math:`(\vect{u}_i)_{1 \leq i \leq \sampleSize}`.
Then, we have, for all :math:`1 \leq i \leq \sampleSize`:
.. math::
\widehat{h}_i = \frac{1}{\sampleSize-1} Card
\left\{ j \in [1,\sampleSize], j \neq i, \, | \, X^j_1 \leq X^i_1 \mbox{ and } X^j_2 \leq X^i_2 \right \}
From the sample :math:`(h_i)_{1 \leq i \leq \sampleSize}`, we build the ordered sample
:math:`(h_{(i)})_{1 \leq i \leq \sampleSize}`.
Let :math:`(H_{(1)}, \dots, H_{(\sampleSize)})` be the order statistics of :math:`(H_1, \dots, H_{\sampleSize})`.
Then we know that the cumulative distribution function of :math:`H_{(i)}` is the composition between the cumulative
distribution function of the :math:`Beta(i, n-1+1)` distribution and the distribution :math:`K_0` of :math:`H`:
.. math::
F_{H_{(i)}} = F_{Beta(i, n-1+1)} \circ K_0
Let :math:`w_i` be the statistic defined by:
.. math::
w_i = \Expect{H_{(i)}}
Thus we have:
.. math::
:label: wi
w_i = \sampleSize C_{\sampleSize-1}^{i-1} \int_0^1 t K_0(t)^{i-1} (1-K_0(t))^{n-i} \, dK_0(t)
For a given copula :math:`C`, equation :eq:`wi` is evaluated by Monte Carlo
sampling: we generate :math:`N` samples of size
:math:`\sampleSize` from :math:`C(U,V)`, in order to get
:math:`N` realizations of the statistics
:math:`H_{(i)},\forall 1 \leq i \leq \sampleSize` that are used to calculate :math:`w_i`
as the empirical mean of :math:`H_{(i)}`.
The Kendall Plot draws the points :math:`(w_i, h_{(i)})_{1 \leq i \leq \sampleSize}`.
If the points are on the first diagonal, the copula :math:`C` is
validated.
In particular, we can use the Kendall plot to test the independence between :math:`X_1` and :math:`X_2`
by using the independent copula to calculate the values :math:`(w_i)_{1 \leq i \leq \sampleSize}`.
To test whether two samples share the same copula, the Kendall
Plot test draws the points
:math:`(h^1_{(i)}, h^2_{(i)})_{1 \leq i \leq \sampleSize}` respectively
associated to the first and second sample. Note that the two samples
must have the same size.
In the first example, the Kendall Plot test validates the use of the Frank copula for the given sample.
.. plot::
import openturns as ot
from openturns.viewer import View
ot.RandomGenerator.SetSeed(0)
copula = ot.FrankCopula(1.5)
sample = copula.getSample(100)
graph = ot.VisualTest.DrawKendallPlot(sample, copula)
View(graph)
In the second example, the Kendall Plot test invalidates the use of the Frank copula for the given sample.
.. plot::
import openturns as ot
from openturns.viewer import View
ot.RandomGenerator.SetSeed(0)
copula = ot.FrankCopula(1.5)
copula2 = ot.GumbelCopula(4.5)
sample = copula.getSample(100)
graph = ot.VisualTest.DrawKendallPlot(sample, copula2)
View(graph)
Remark: In the case where you want to test a sample with respect to a
specific copula, if the size of the sample is greater than 500, we
recommend to use the second form of the Kendall plot test: generate a
sample of the proper size from your copula and then test both samples.
Testing this way is more efficient.
.. topic:: API:
- See :py:func:`~openturns.VisualTest.DrawQQplot` to draw a QQ plot
- See :py:func:`~openturns.VisualTest.DrawHenryLine` to draw the Henry line
- See :py:func:`~openturns.VisualTest.DrawKendallPlot` to draw the Kendall plot
.. topic:: Examples:
- See :doc:`/auto_data_analysis/statistical_tests/plot_qqplot_graph`
- See :doc:`/auto_data_analysis/statistical_tests/plot_test_normality`
- See :doc:`/auto_data_analysis/statistical_tests/plot_test_copula`
.. topic:: References:
- [saporta1990]_
- [dixon1983]_
|