1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
|
.. _empirical_cdf:
Empirical cumulative distribution function
------------------------------------------
The empirical cumulative distribution function provides a graphical
representation of the probability distribution of a random vector
without implying any prior assumption concerning the form of this
distribution. It concerns a non-parametric approach which enables the
description of complex behavior not necessarily detected with
parametric approaches.
Therefore, using general notation, this means that we are looking for an
estimator :math:`\widehat{F}_N` for the cumulative distribution function
:math:`F_{X}` of the random variable
:math:`\vect{X} = \left( X^1,\ldots,X^{n_X} \right)`:
.. math::
\begin{aligned}
\widehat{F}_N \leftrightarrow F_{X}
\end{aligned}
Let us first consider the uni-dimensional case, and let us denote
:math:`\vect{X} = X^1 = X`. The empirical probability distribution is
the distribution created from a sample of observed values
:math:`\left\{x_1, x_2, \ldots, x_N\right\}`. It corresponds to a
discrete uniform distribution on
:math:`\left\{x_1, x_2, \ldots, x_N\right\}`: where :math:`X'` follows
this distribution,
.. math::
\begin{aligned}
\forall \; i \in \left\{1,\ldots, N\right\} ,\ \textrm{Pr}\left(X'=x_i\right) = \frac{1}{N}
\end{aligned}
The empirical cumulative distribution function :math:`\widehat{F}_N`
with this distribution is constructed as follows:
.. math::
\begin{aligned}
F_N(x) = \frac{1}{N} \sum_{i=1}^N \mathbf{1}_{ \left\{ x_i \leq x \right\} }
\end{aligned}
The empirical cumulative distribution function :math:`F_N(x)` is defined
as the proportion of observations that are less than (or equal to)
:math:`x` and is thus an approximation of the cumulative distribution
function :math:`F_X(x)` which is the probability that an observation is
less than (or equal to) :math:`x`.
.. math::
\begin{aligned}
F_X(x) = \textrm{Pr} \left( X \leq x \right)
\end{aligned}
The diagram below provides an illustration of an ordered sample
:math:`\left\{5,6,10,22,27\right\}`.
.. plot::
import openturns as ot
from matplotlib import pyplot as plt
from openturns.viewer import View
sample = [[5.0], [6.0], [10.0], [22.0], [27.0]]
xmin = 0.0
xmax = 30.0
graph = ot.UserDefined(sample).drawCDF(xmin, xmax)
graph.setTitle('Empirical CDF')
View(graph)
The method is similar for the case :math:`n_X>1`. The empirical
probability distribution is a distribution created from a sample
:math:`\left\{\vect{x}_1, \vect{x}_2, \ldots, \vect{x}_N\right\}`. It
corresponds to a discrete uniform distribution on
:math:`\left\{\vect{x}_1, \vect{x}_2, \ldots, \vect{x}_N\right\}`: where
:math:`\vect{X}'` follows this distribution,
.. math::
\begin{aligned}
\forall \; i \in \left\{1,\ldots, N\right\} ,\ \textrm{Pr}\left(\vect{X}'=\vect{x}_i\right) = \frac{1}{N}
\end{aligned}
Thus we have:
.. math::
\begin{aligned}
F_N(\vect{x}) = \frac{1}{N} \sum_{i=1}^N \mathbf{1}_{ \left\{ x^1_i \leq x^1,\ldots,x^{n_X}_N \leq x^{n_X} \right\} }
\end{aligned}
in comparison with the theoretical probability density function :math:`F_X`:
.. math::
\begin{aligned}
F_X(x) = \Prob{X^1 \leq x^1,\ldots,X^{n_X} \leq x^{n_X}}
\end{aligned}
This method is also referred to in the literature as the empirical
distribution function.
.. topic:: API:
- See :class:`~openturns.UserDefined` for the empirical distribution
.. topic:: Examples:
- See :doc:`/auto_data_analysis/sample_analysis/plot_visualize_empirical_cdf`
.. topic:: References:
- [saporta1990]_
- [dixon1983]_
|