1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
|
.. _quantile_estimation_wilks:
Estimation of a quantile upper bound by Wilks' method
-----------------------------------------------------
We consider a random variable :math:`X` of dimension 1 and the unknown :math:`x_{\alpha}`
level quantile of its distribution (:math:`\alpha \in [0, 1]`).
We seek to evaluate an upper bound of :math:`x_{\alpha}` with a confidence greater or equal to
:math:`\beta`, using a given order statistics.
Let :math:`(X_1, \dots, X_\sampleSize)` be some independent copies of :math:`X`.
Let :math:`X_{(k)}` be the :math:`k` -th order statistics of :math:`(X_1, \dots, X_\sampleSize)` which means that
:math:`X_{(k)}` is the :math:`k` -th maximum of :math:`(X_1, \dots, X_\sampleSize)` for :math:`1 \leq k \leq \sampleSize`. For
example, :math:`X_{(1)} = \min (X_1, \dots, X_\sampleSize)` is the minimum
and :math:`X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize)` is the maximum. We have:
.. math::
X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}
Smallest rank for an upper bound to the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let :math:`(x_1, \dots, x_\sampleSize)` be an i.i.d. sample of size :math:`\sampleSize` of
the random variable :math:`X`.
Given a quantile level :math:`\alpha \in [0,1]`, a confidence level
:math:`\beta \in [0,1]`, and a sample size :math:`\sampleSize`, we seek the smallest
rank :math:`k \in \llbracket 1, \sampleSize \rrbracket` such that:
.. math::
:label: EqOrderStat
\Prob{x_{\alpha} \leq X_{(k)}} \geq \beta
The probability density and cumulative distribution functions of the order
statistics :math:`X_{(k)}` are:
.. math::
:label: DistOrderStat
F_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x)
\right)^i \left(1-F(x)
\right)^{\sampleSize-i} \\
p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1}
\left(1-F(x)
\right)^{\sampleSize-k} p(x)
We notice that :math:`F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1)` where
:math:`F_{(\sampleSize,F(x))}` is the cumulated
distribution function of the Binomial distribution :math:`\cB(\sampleSize,F(x))` and
:math:`\overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k)` is the
complementary cumulated distribution fonction (also named survival function in dimension
1).
Therefore:
.. math::
F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i}
= \overline{F}_{(\sampleSize,\alpha)}(k-1)
and equation :eq:`EqOrderStat` implies:
.. math::
:label: EqOrderStat2
1-F_{X_{(k)}}(x_{\alpha})\geq \beta
This implies:
.. math::
F_{\sampleSize, \alpha}(k-1)\geq \beta
The smallest rank :math:`k_{sol}` such that the previous equation is satisfied is:
.. math::
k_{sol} & = \min \{ k \in \llbracket 1, n \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\
& = 1 + \min \{ k \in \llbracket 1, n\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}
An upper bound of :math:`x_{\alpha}` is estimated by the value of :math:`X_{(k_{sol})}`
on the sample
:math:`(x_1, \dots, x_\sampleSize)`.
Minimum sample size for an upper bound to the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Given :math:`\alpha`, :math:`\beta`, and :math:`k`, we seek for the smallest sample size
:math:`\sampleSize`
such that the equation :eq:`EqOrderStat` is satisfied. In order to do so, we solve the
equation :eq:`EqOrderStat2` with respect to the sample size :math:`\sampleSize`.
Once the smallest size :math:`\sampleSize` has been estimated, a sample of size
:math:`\sampleSize` can be
generated from
:math:`X` and an upper bound of :math:`x_{\alpha}` is estimated using
:math:`x_{(\sampleSize-i)}` i.e. the :math:`\sampleSize - i`-th observation
in the ordered sample :math:`(x_{(1)}, \dots, x_{(\sampleSize)})`.
.. topic:: API:
- See :class:`~openturns.Wilks`
.. topic:: Examples:
- See :doc:`/auto_data_analysis/manage_data_and_samples/plot_quantile_estimation_wilks`
.. topic:: References:
- Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. The Annals of Mathematical Statistics, 12(1), 91-96
- Robert C.P., Casella G. (2004). Monte-Carlo Statistical Methods, Springer, ISBN 0-387-21239-6, 2nd ed.
- Rubinstein R.Y. (1981). Simulation and The Monte-Carlo methods, John Wiley & Sons
|