1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
|
[section:kolmogorov_smirnov_dist Kolmogorov-Smirnov Distribution]
``#include <boost/math/distributions/kolmogorov_smirnov.hpp>``
namespace boost{ namespace math{
template <class RealType = double,
class ``__Policy`` = ``__policy_class`` >
class kolmogorov_smirnov_distribution;
typedef kolmogorov_smirnov_distribution<> kolmogorov_smirnov;
template <class RealType, class ``__Policy``>
class kolmogorov_smirnov_distribution
{
public:
typedef RealType value_type;
typedef Policy policy_type;
// Constructor:
kolmogorov_smirnov_distribution(RealType n);
// Accessor to parameter:
RealType number_of_observations()const;
}} // namespaces
The Kolmogorov-Smirnov test in statistics compares two empirical distributions,
or an empirical distribution against any theoretical distribution.[footnote
[@https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test Wikipedia:
Kolmogorov-Smirnov test]] It makes use of a specific distribution which is
informally known in the literature as the Kolmogorv-Smirnov distribution,
implemented here.
Formally, if /n/ observations are taken from a theoretical distribution /G(x)/,
and if /G/[sub /n/](/x/) represents the empirical CDF of those /n/ observations,
then the test statistic
[equation kolmogorov_smirnov_test_statistic] [/ D_n = \sup_x|G(x)-\hat{G}(x)| ]
will be distributed according to a Kolmogorov-Smirnov distribution parameterized by /n/.
The exact form of a Kolmogorov-Smirnov distribution is the subject of a large,
decades-old literature.[footnote
[@https://www.jstatsoft.org/article/view/v039i11 Simard, R. and L'Ecuyer, P.
(2011) "Computing the Two-Sided Kolmogorov-Smirnov Distribution". Journal of
Statistical Software, vol. 39, no. 11.]] In the interest of simplicity, Boost
implements the first-order, limiting form of this distribution (the same form
originally identified by Kolmogorov[footnote Kolmogorov A (1933). "Sulla
determinazione empirica di una legge di distribuzione". G. Ist. Ital. Attuari.
4: 83–91.]), namely
[equation kolmogorov_smirnov_distribution]
[/ \lim_{n \to \infty} F_n(x/\sqrt{n})=1+2\sum_{k=1}^\infty (-1)^k e^{-2k^2x^2} ]
Note that while the exact distribution only has support over \[0, 1\], this
limiting form has positive mass above unity, particularly for small /n/. The
following graph illustrations how the distribution changes for different values
of /n/:
[graph kolmogorov_smirnov_pdf]
[h4 Member Functions]
kolmogorov_smirnov_distribution(RealType n);
Constructs a Kolmogorov-Smirnov distribution with /n/ observations.
Requires n > 0, otherwise calls __domain_error.
RealType number_of_observations()const;
Returns the parameter /n/ from which this object was constructed.
[h4 Non-member Accessors]
All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.
The domain of the random variable is \[0, +[infin]\].
[h4 Accuracy]
The CDF of the Kolmogorov-Smirnov distribution is implemented in terms of the
fourth [link math_toolkit.jacobi_theta.jacobi_theta_overview Jacobi Theta
function]; please refer to the accuracy ULP plots for that function.
The PDF is implemented separately, and the following ULP plot illustrates its
accuracy:
[graph kolmogorov_smirnov_pdf_ulp]
Because PDF values are simply scaled out and up by the square root of /n/, the
above plot is representative for all values of /n/. Note that for present
purposes, "accuracy" refers to deviations from the limiting approximation,
rather than deviations from the exact distribution.
[h4 Implementation]
In the following table, /n/ is the number of observations, /x/ is the random variable,
[pi] is Archimedes' constant, and [zeta](3) is Apéry's constant.
[table
[[Function][Implementation Notes]]
[[cdf][Using the relation: cdf = __jacobi_theta4tau(0, 2*x*x/[pi])]]
[[pdf][Using a manual derivative of the CDF]]
[[cdf complement][
When x*x*n == 0: 1
When 2*x*x*n <= [pi]: 1 - __jacobi_theta4tau(0, 2*x*x*n/[pi])
When 2*x*x*n > [pi]: -__jacobi_theta4m1tau(0, 2*x*x*n/[pi])]]
[[quantile][Using a Newton-Raphson iteration]]
[[quantile from the complement][Using a Newton-Raphson iteration]]
[[mode][Using a run-time PDF maximizer]]
[[mean][sqrt([pi]/2) * ln(2) / sqrt(n)]]
[[variance][([pi][super 2]/12 - [pi]/2*ln[super 2](2))/n]]
[[skewness][(9/16*sqrt([pi]/2)*[zeta](3)/n[super 3/2] - 3 * mean * variance - mean[super 2] * variance) / (variance[super 3/2])]]
[[kurtosis][(7/720*[pi][super 4]/n[super 2] - 4 * mean * skewness * variance[super 3/2] - 6 * mean[super 2] * variance - mean[super 4]) / (variance[super 2])]]
]
[endsect] [/section:kolmogorov_smirnov_dist Kolmogorov-Smirnov]
[/
Copyright Evan Miller 2020.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]
|