File: kolmogorov_smirnov.qbk

package info (click to toggle)
scipy 1.16.0-1exp7
links: PTS, VCS
area: main
in suites: experimental
size: 234,820 kB
sloc: cpp: 503,145; python: 344,611; ansic: 195,638; javascript: 89,566; fortran: 56,210; cs: 3,081; f90: 1,150; sh: 848; makefile: 785; pascal: 284; csh: 135; lisp: 134; xml: 56; perl: 51
file content (129 lines) | stat: -rw-r--r-- 4,962 bytes
parent folder | download | duplicates (9)
[section:kolmogorov_smirnov_dist Kolmogorov-Smirnov Distribution]

``#include <boost/math/distributions/kolmogorov_smirnov.hpp>``

   namespace boost{ namespace math{

   template <class RealType = double,
             class ``__Policy``   = ``__policy_class`` >
   class kolmogorov_smirnov_distribution;

   typedef kolmogorov_smirnov_distribution<> kolmogorov_smirnov;

   template <class RealType, class ``__Policy``>
   class kolmogorov_smirnov_distribution
   {
   public:
      typedef RealType  value_type;
      typedef Policy    policy_type;

      // Constructor:
      kolmogorov_smirnov_distribution(RealType n);

      // Accessor to parameter:
      RealType number_of_observations()const;

   }} // namespaces

The Kolmogorov-Smirnov test in statistics compares two empirical distributions,
or an empirical distribution against any theoretical distribution.[footnote
[@https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test Wikipedia:
Kolmogorov-Smirnov test]]  It makes use of a specific distribution which is
informally known in the literature as the Kolmogorv-Smirnov distribution,
implemented here.

Formally, if /n/ observations are taken from a theoretical distribution /G(x)/,
and if /G/[sub /n/](/x/) represents the empirical CDF of those /n/ observations,
then the test statistic

[equation kolmogorov_smirnov_test_statistic] [/ D_n = \sup_x|G(x)-\hat{G}(x)| ]

will be distributed according to a Kolmogorov-Smirnov distribution parameterized by /n/.

The exact form of a Kolmogorov-Smirnov distribution is the subject of a large,
decades-old literature.[footnote
[@https://www.jstatsoft.org/article/view/v039i11 Simard, R. and L'Ecuyer, P.
(2011) "Computing the Two-Sided Kolmogorov-Smirnov Distribution". Journal of
Statistical Software, vol. 39, no. 11.]] In the interest of simplicity, Boost
implements the first-order, limiting form of this distribution (the same form
originally identified by Kolmogorov[footnote Kolmogorov A (1933). "Sulla
determinazione empirica di una legge di distribuzione". G. Ist. Ital. Attuari.
4: 83–91.]), namely

[equation kolmogorov_smirnov_distribution]
[/ \lim_{n \to \infty} F_n(x/\sqrt{n})=1+2\sum_{k=1}^\infty (-1)^k e^{-2k^2x^2} ]

Note that while the exact distribution only has support over \[0, 1\], this
limiting form has positive mass above unity, particularly for small /n/. The
following graph illustrations how the distribution changes for different values
of /n/:

[graph kolmogorov_smirnov_pdf]

[h4 Member Functions]

        kolmogorov_smirnov_distribution(RealType n);

Constructs a Kolmogorov-Smirnov distribution with /n/ observations.

Requires n > 0, otherwise calls __domain_error.

      RealType number_of_observations()const;
      
Returns the parameter /n/ from which this object was constructed.

[h4 Non-member Accessors]

All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.

The domain of the random variable is \[0, +[infin]\].

[h4 Accuracy]

The CDF of the Kolmogorov-Smirnov distribution is implemented in terms of the
fourth [link math_toolkit.jacobi_theta.jacobi_theta_overview Jacobi Theta
function]; please refer to the accuracy ULP plots for that function.

The PDF is implemented separately, and the following ULP plot illustrates its
accuracy:

[graph kolmogorov_smirnov_pdf_ulp]

Because PDF values are simply scaled out and up by the square root of /n/, the
above plot is representative for all values of /n/. Note that for present
purposes, "accuracy" refers to deviations from the limiting approximation,
rather than deviations from the exact distribution.

[h4 Implementation]

In the following table, /n/ is the number of observations, /x/ is the random variable,
[pi] is Archimedes' constant, and [zeta](3) is Apéry's constant.

[table
[[Function][Implementation Notes]]
[[cdf][Using the relation: cdf = __jacobi_theta4tau(0, 2*x*x/[pi])]]
[[pdf][Using a manual derivative of the CDF]]
[[cdf complement][
When x*x*n == 0: 1

When 2*x*x*n <= [pi]: 1 - __jacobi_theta4tau(0, 2*x*x*n/[pi])

When 2*x*x*n > [pi]: -__jacobi_theta4m1tau(0, 2*x*x*n/[pi])]]
[[quantile][Using a Newton-Raphson iteration]]
[[quantile from the complement][Using a Newton-Raphson iteration]]
[[mode][Using a run-time PDF maximizer]]
[[mean][sqrt([pi]/2) * ln(2) / sqrt(n)]]
[[variance][([pi][super 2]/12 - [pi]/2*ln[super 2](2))/n]]
[[skewness][(9/16*sqrt([pi]/2)*[zeta](3)/n[super 3/2] - 3 * mean * variance - mean[super 2] * variance) / (variance[super 3/2])]]
[[kurtosis][(7/720*[pi][super 4]/n[super 2] - 4 * mean * skewness * variance[super 3/2] - 6 * mean[super 2] * variance - mean[super 4]) / (variance[super 2])]]
]

[endsect] [/section:kolmogorov_smirnov_dist Kolmogorov-Smirnov]

[/
  Copyright Evan Miller 2020.
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]