File: quantile_estimation_wilks.rst

package info (click to toggle)
openturns 1.24-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,204 kB
  • sloc: cpp: 256,662; python: 63,381; ansic: 4,414; javascript: 406; sh: 180; xml: 164; yacc: 123; makefile: 98; lex: 55
file content (114 lines) | stat: -rw-r--r-- 4,425 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
.. _quantile_estimation_wilks:

Estimation of a quantile upper bound by Wilks' method
-----------------------------------------------------

We consider a random variable :math:`X` of dimension 1 and the unknown  :math:`x_{\alpha}`
level quantile of its distribution (:math:`\alpha \in [0, 1]`).
We seek to evaluate an upper bound of :math:`x_{\alpha}` with a confidence greater or equal to
:math:`\beta`, using a given order statistics.

Let :math:`(X_1, \dots, X_\sampleSize)` be some independent copies of :math:`X`.
Let :math:`X_{(k)}` be the :math:`k` -th order statistics of :math:`(X_1, \dots, X_\sampleSize)` which means that
:math:`X_{(k)}` is the :math:`k` -th maximum of :math:`(X_1, \dots, X_\sampleSize)` for :math:`1 \leq k \leq \sampleSize`. For
example, :math:`X_{(1)} = \min (X_1, \dots, X_\sampleSize)` is the minimum
and :math:`X_{(\sampleSize)} = \max (X_1, \dots, X_\sampleSize)` is the maximum. We have:

.. math::

    X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(\sampleSize)}


Smallest rank for an upper bound to the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let :math:`(x_1, \dots, x_\sampleSize)` be an i.i.d. sample of size :math:`\sampleSize` of
the random variable :math:`X`.
Given a quantile level :math:`\alpha \in [0,1]`, a confidence level
:math:`\beta \in [0,1]`, and a sample size :math:`\sampleSize`, we seek the smallest
rank :math:`k \in \llbracket 1, \sampleSize \rrbracket` such that:

.. math::
    :label: EqOrderStat

    \Prob{x_{\alpha} \leq X_{(k)}} \geq \beta

The probability density and cumulative distribution functions of the order
statistics :math:`X_{(k)}` are:

.. math::
    :label: DistOrderStat

    F_{X_{(k)}}(x) & = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i}\left(F(x)
    \right)^i \left(1-F(x)
    \right)^{\sampleSize-i} \\
    p_{X_{(k)}}(x) & = (\sampleSize-k+1)\binom{\sampleSize}{k-1}\left(F(x)\right)^{k-1}
    \left(1-F(x)
    \right)^{\sampleSize-k} p(x)

We notice that :math:`F_{X_{(k)}}(x) = \overline{F}_{(\sampleSize,F(x))}(k-1)` where
:math:`F_{(\sampleSize,F(x))}` is the cumulated
distribution function of the Binomial distribution :math:`\cB(\sampleSize,F(x))` and
:math:`\overline{F}_{(\sampleSize,F(x))}(k) = 1 - F_{(\sampleSize,F(x))}(k)` is the
complementary cumulated distribution fonction (also named survival function in dimension
1).
Therefore:

.. math::

    F_{X_{(k)}}(x_{\alpha}) = \sum_{i=k}^{\sampleSize} \binom{\sampleSize}{i} \alpha^i (1-\alpha)^{\sampleSize-i}
    = \overline{F}_{(\sampleSize,\alpha)}(k-1)

and equation :eq:`EqOrderStat` implies:

.. math::
    :label: EqOrderStat2

    1-F_{X_{(k)}}(x_{\alpha})\geq \beta

This implies:

.. math::

    F_{\sampleSize, \alpha}(k-1)\geq \beta

The smallest rank :math:`k_{sol}` such that the previous equation is satisfied is:

.. math::

    k_{sol} & = \min \{ k \in \llbracket 1, n \rrbracket \, | \, F_{\sampleSize, \alpha}(k-1)\geq \beta \}\\
            & = 1 +  \min \{ k \in \llbracket 1, n\rrbracket \, | \, F_{\sampleSize, \alpha}(k)\geq \beta \}

An upper bound of  :math:`x_{\alpha}` is estimated by the value of :math:`X_{(k_{sol})}`
on the sample
:math:`(x_1, \dots, x_\sampleSize)`.

Minimum sample size for an upper bound to the quantile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given :math:`\alpha`, :math:`\beta`, and :math:`k`, we seek for the smallest sample size
:math:`\sampleSize`
such that the equation :eq:`EqOrderStat` is satisfied. In order to do so, we solve the
equation :eq:`EqOrderStat2` with respect to the sample size :math:`\sampleSize`.

Once the smallest size :math:`\sampleSize`  has been estimated, a sample of size
:math:`\sampleSize` can be
generated from
:math:`X` and an upper bound of :math:`x_{\alpha}` is estimated using
:math:`x_{(\sampleSize-i)}` i.e. the :math:`\sampleSize - i`-th observation
in the ordered sample :math:`(x_{(1)}, \dots, x_{(\sampleSize)})`.


.. topic:: API:

    - See :class:`~openturns.Wilks`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/manage_data_and_samples/plot_quantile_estimation_wilks`

.. topic:: References:

    - Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. The Annals of Mathematical Statistics, 12(1), 91-96
    - Robert C.P., Casella G. (2004). Monte-Carlo Statistical Methods, Springer, ISBN 0-387-21239-6, 2nd ed.
    - Rubinstein R.Y. (1981). Simulation and The Monte-Carlo methods, John Wiley & Sons