File: statistics.texi

package info (click to toggle)
gsl-doc 2.3-1
links: PTS
area: non-free
in suites: buster
size: 27,748 kB
ctags: 15,177
sloc: ansic: 235,014; sh: 11,585; makefile: 925
file content (827 lines) | stat: -rw-r--r-- 29,077 bytes
@cindex statistics
@cindex mean
@cindex standard deviation
@cindex variance
@cindex estimated standard deviation
@cindex estimated variance
@cindex t-test
@cindex range
@cindex min
@cindex max

This chapter describes the statistical functions in the library.  The
basic statistical functions include routines to compute the mean,
variance and standard deviation.  More advanced functions allow you to
calculate absolute deviations, skewness, and kurtosis as well as the
median and arbitrary percentiles.  The algorithms use recurrence
relations to compute average quantities in a stable way, without large
intermediate values that might overflow. 

The functions are available in versions for datasets in the standard
floating-point and integer types.  The versions for double precision
floating-point data have the prefix @code{gsl_stats} and are declared in
the header file @file{gsl_statistics_double.h}.  The versions for integer
data have the prefix @code{gsl_stats_int} and are declared in the header
file @file{gsl_statistics_int.h}.   All the functions operate on C 
arrays with a @var{stride} parameter specifying the spacing between 
elements.  

@menu
* Mean and standard deviation and variance::  
* Absolute deviation::          
* Higher moments (skewness and kurtosis)::  
* Autocorrelation::             
* Covariance::                  
* Correlation::                  
* Weighted Samples::            
* Maximum and Minimum values::  
* Median and Percentiles::      
* Example statistical programs::  
* Statistics References and Further Reading::  
@end menu

@node Mean and standard deviation and variance
@section Mean, Standard Deviation and Variance

@deftypefun double gsl_stats_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the arithmetic mean of @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The arithmetic mean, or
@dfn{sample mean}, is denoted by @math{\Hat\mu} and defined as,
@tex
\beforedisplay
$$
{\Hat\mu} = {1 \over N} \sum x_i
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\mu = (1/N) \sum x_i
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  For
samples drawn from a gaussian distribution the variance of
@math{\Hat\mu} is @math{\sigma^2 / N}.
@end deftypefun

@deftypefun double gsl_stats_variance (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the estimated, or @dfn{sample}, variance of
@var{data}, a dataset of length @var{n} with stride @var{stride}.  The
estimated variance is denoted by @math{\Hat\sigma^2} and is defined by,
@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  Note that
the normalization factor of @math{1/(N-1)} results from the derivation
of @math{\Hat\sigma^2} as an unbiased estimator of the population
variance @math{\sigma^2}.  For samples drawn from a Gaussian distribution
the variance of @math{\Hat\sigma^2} itself is @math{2 \sigma^4 / N}.

This function computes the mean via a call to @code{gsl_stats_mean}.  If
you have already computed the mean then you can pass it directly to
@code{gsl_stats_variance_m}.
@end deftypefun

@deftypefun double gsl_stats_variance_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function returns the sample variance of @var{data} relative to the
given value of @var{mean}.  The function is computed with @math{\Hat\mu}
replaced by the value of @var{mean} that you supply,
@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n})
@deftypefunx double gsl_stats_sd_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
The standard deviation is defined as the square root of the variance.
These functions return the square root of the corresponding variance
functions above.
@end deftypefun

@deftypefun double gsl_stats_tss (const double @var{data}[], size_t @var{stride}, size_t @var{n})
@deftypefunx double gsl_stats_tss_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
These functions return the total sum of squares (TSS) of @var{data} about
the mean.  For @code{gsl_stats_tss_m} the user-supplied value of
@var{mean} is used, and for @code{gsl_stats_tss} it is computed using
@code{gsl_stats_mean}.
@tex
\beforedisplay
$$
{\rm TSS} = \sum (x_i - mean)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
TSS =  \sum (x_i - mean)^2
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_variance_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function computes an unbiased estimate of the variance of
@var{data} when the population mean @var{mean} of the underlying
distribution is known @emph{a priori}.  In this case the estimator for
the variance uses the factor @math{1/N} and the sample mean
@math{\Hat\mu} is replaced by the known population mean @math{\mu},
@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2
@end example
@end ifinfo
@end deftypefun


@deftypefun double gsl_stats_sd_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function calculates the standard deviation of @var{data} for a
fixed population mean @var{mean}.  The result is the square root of the
corresponding variance function.
@end deftypefun

@node Absolute deviation
@section Absolute deviation

@deftypefun double gsl_stats_absdev (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the absolute deviation from the mean of
@var{data}, a dataset of length @var{n} with stride @var{stride}.  The
absolute deviation from the mean is defined as,
@tex
\beforedisplay
$$
absdev  = {1 \over N} \sum |x_i - {\Hat\mu}|
$$
\afterdisplay
@end tex
@ifinfo

@example
absdev  = (1/N) \sum |x_i - \Hat\mu|
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  The
absolute deviation from the mean provides a more robust measure of the
width of a distribution than the variance.  This function computes the
mean of @var{data} via a call to @code{gsl_stats_mean}.
@end deftypefun

@deftypefun double gsl_stats_absdev_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function computes the absolute deviation of the dataset @var{data}
relative to the given value of @var{mean},
@tex
\beforedisplay
$$
absdev  = {1 \over N} \sum |x_i - mean|
$$
\afterdisplay
@end tex
@ifinfo

@example
absdev  = (1/N) \sum |x_i - mean|
@end example

@end ifinfo
@noindent
This function is useful if you have already computed the mean of
@var{data} (and want to avoid recomputing it), or wish to calculate the
absolute deviation relative to another value (such as zero, or the
median).
@end deftypefun

@node Higher moments (skewness and kurtosis)
@section Higher moments (skewness and kurtosis)
@cindex skewness
@cindex kurtosis

@deftypefun double gsl_stats_skew (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the skewness of @var{data}, a dataset of length
@var{n} with stride @var{stride}.  The skewness is defined as,
@tex
\beforedisplay
$$
skew = {1 \over N} \sum 
 {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
$$
\afterdisplay
@end tex
@ifinfo

@example
skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
@end example

@end ifinfo
@noindent
where @math{x_i} are the elements of the dataset @var{data}.  The skewness
measures the asymmetry of the tails of a distribution.

The function computes the mean and estimated standard deviation of
@var{data} via calls to @code{gsl_stats_mean} and @code{gsl_stats_sd}.
@end deftypefun

@deftypefun double gsl_stats_skew_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
This function computes the skewness of the dataset @var{data} using the
given values of the mean @var{mean} and standard deviation @var{sd},
@tex
\beforedisplay
$$
skew = {1 \over N}
     \sum {\left( x_i - mean \over sd \right)}^3
$$
\afterdisplay
@end tex
@ifinfo

@example
skew = (1/N) \sum ((x_i - mean)/sd)^3
@end example

@end ifinfo
@noindent
These functions are useful if you have already computed the mean and
standard deviation of @var{data} and want to avoid recomputing them.
@end deftypefun

@deftypefun double gsl_stats_kurtosis (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the kurtosis of @var{data}, a dataset of length
@var{n} with stride @var{stride}.  The kurtosis is defined as,
@tex
\beforedisplay
$$
kurtosis = \left( {1 \over N} \sum 
 {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 
 \right) 
 - 3
$$
\afterdisplay
@end tex
@ifinfo

@example
kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3
@end example

@end ifinfo
@noindent
The kurtosis measures how sharply peaked a distribution is, relative to
its width.  The kurtosis is normalized to zero for a Gaussian
distribution.
@end deftypefun

@deftypefun double gsl_stats_kurtosis_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
This function computes the kurtosis of the dataset @var{data} using the
given values of the mean @var{mean} and standard deviation @var{sd},
@tex
\beforedisplay
$$
kurtosis = {1 \over N}
  \left( \sum {\left(x_i - mean \over sd \right)}^4 \right) 
  - 3
$$
\afterdisplay
@end tex
@ifinfo

@example
kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3
@end example

@end ifinfo
@noindent
This function is useful if you have already computed the mean and
standard deviation of @var{data} and want to avoid recomputing them.
@end deftypefun

@node Autocorrelation
@section Autocorrelation

@deftypefun double gsl_stats_lag1_autocorrelation (const double @var{data}[], const size_t @var{stride}, const size_t @var{n})
This function computes the lag-1 autocorrelation of the dataset @var{data}.
@tex
\beforedisplay
$$
a_1 = {\sum_{i = 2}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
\over
\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
$$
\afterdisplay
@end tex
@ifinfo

@example
a_1 = @{\sum_@{i = 2@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i-1@} - \Hat\mu)
       \over
       \sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i@} - \Hat\mu)@}
@end example
@end ifinfo
@end deftypefun


@deftypefun double gsl_stats_lag1_autocorrelation_m (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}, const double @var{mean})
This function computes the lag-1 autocorrelation of the dataset
@var{data} using the given value of the mean @var{mean}.

@end deftypefun

@node Covariance
@section Covariance
@cindex covariance, of two datasets

@deftypefun double gsl_stats_covariance (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n})
This function computes the covariance of the datasets @var{data1} and
@var{data2} which must both be of the same length @var{n}.
@tex
\beforedisplay
$$
covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y)
$$
\afterdisplay
@end tex
@ifinfo

@example
covar = (1/(n - 1)) \sum_@{i = 1@}^@{n@} (x_i - \Hat x) (y_i - \Hat y)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_covariance_m (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}, const double @var{mean1}, const double @var{mean2})
This function computes the covariance of the datasets @var{data1} and
@var{data2} using the given values of the means, @var{mean1} and
@var{mean2}.  This is useful if you have already computed the means of
@var{data1} and @var{data2} and want to avoid recomputing them.
@end deftypefun

@node Correlation
@section Correlation
@cindex correlation, of two datasets

@deftypefun double gsl_stats_correlation (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n})
This function efficiently computes the Pearson correlation coefficient
between the datasets @var{data1} and @var{data2} which must both be of
the same length @var{n}.
@tex
\beforedisplay
$$
r = {cov(x, y) \over \Hat\sigma_x \Hat\sigma_y} =
{{1 \over n-1} \sum (x_i - \Hat x) (y_i - \Hat y)
\over
\sqrt{{1 \over n-1} \sum (x_i - {\Hat x})^2}
\sqrt{{1 \over n-1} \sum (y_i - {\Hat y})^2}
}
$$
\afterdisplay
@end tex
@ifinfo
@example
r = cov(x, y) / (\Hat\sigma_x \Hat\sigma_y)
  = @{1/(n-1) \sum (x_i - \Hat x) (y_i - \Hat y)
     \over
     \sqrt@{1/(n-1) \sum (x_i - \Hat x)^2@} \sqrt@{1/(n-1) \sum (y_i - \Hat y)^2@}
    @}
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_spearman (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}, double @var{work}[])
This function computes the Spearman rank correlation coefficient between
the datasets @var{data1} and @var{data2} which must both be of the same
length @var{n}. Additional workspace of size 2*@var{n} is required in
@var{work}. The Spearman rank correlation between vectors @math{x} and
@math{y} is equivalent to the Pearson correlation between the ranked
vectors @math{x_R} and @math{y_R}, where ranks are defined to be the
average of the positions of an element in the ascending order of the values.
@end deftypefun

@node Weighted Samples
@section Weighted Samples

The functions described in this section allow the computation of
statistics for weighted samples.  The functions accept an array of
samples, @math{x_i}, with associated weights, @math{w_i}.  Each sample
@math{x_i} is considered as having been drawn from a Gaussian
distribution with variance @math{\sigma_i^2}.  The sample weight
@math{w_i} is defined as the reciprocal of this variance, @math{w_i =
1/\sigma_i^2}.  Setting a weight to zero corresponds to removing a
sample from a dataset.

@deftypefun double gsl_stats_wmean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the weighted mean of the dataset @var{data} with
stride @var{stride} and length @var{n}, using the set of weights @var{w}
with stride @var{wstride} and length @var{n}.  The weighted mean is defined as,
@tex
\beforedisplay
$$
{\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\mu = (\sum w_i x_i) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun


@deftypefun double gsl_stats_wvariance (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the estimated variance of the dataset @var{data}
with stride @var{stride} and length @var{n}, using the set of weights
@var{w} with stride @var{wstride} and length @var{n}.  The estimated
variance of a weighted dataset is calculated as,
@tex
\beforedisplay
$$
\Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}} 
                \sum w_i (x_i - \Hat\mu)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) 
                \sum w_i (x_i - \Hat\mu)^2
@end example

@end ifinfo
@noindent
Note that this expression reduces to an unweighted variance with the
familiar @math{1/(N-1)} factor when there are @math{N} equal non-zero
weights.
@end deftypefun

@deftypefun double gsl_stats_wvariance_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
This function returns the estimated variance of the weighted dataset
@var{data} using the given weighted mean @var{wmean}.
@end deftypefun

@deftypefun double gsl_stats_wsd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
The standard deviation is defined as the square root of the variance.
This function returns the square root of the corresponding variance
function @code{gsl_stats_wvariance} above.
@end deftypefun

@deftypefun double gsl_stats_wsd_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
This function returns the square root of the corresponding variance
function @code{gsl_stats_wvariance_m} above.
@end deftypefun

@deftypefun double gsl_stats_wvariance_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean})
This function computes an unbiased estimate of the variance of the weighted
dataset @var{data} when the population mean @var{mean} of the underlying
distribution is known @emph{a priori}.  In this case the estimator for
the variance replaces the sample mean @math{\Hat\mu} by the known
population mean @math{\mu},
@tex
\beforedisplay
$$
\Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wsd_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean})
The standard deviation is defined as the square root of the variance.
This function returns the square root of the corresponding variance
function above.
@end deftypefun

@deftypefun double gsl_stats_wtss (const double @var{w}[], const size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
@deftypefunx double gsl_stats_wtss_m (const double @var{w}[], const size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
These functions return the weighted total sum of squares (TSS) of
@var{data} about the weighted mean.  For @code{gsl_stats_wtss_m} the
user-supplied value of @var{wmean} is used, and for @code{gsl_stats_wtss}
it is computed using @code{gsl_stats_wmean}.
@tex
\beforedisplay
$$
{\rm TSS} = \sum w_i (x_i - wmean)^2
$$
\afterdisplay
@end tex
@ifinfo

@example
TSS =  \sum w_i (x_i - wmean)^2
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wabsdev (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the weighted absolute deviation from the weighted
mean of @var{data}.  The absolute deviation from the mean is defined as,
@tex
\beforedisplay
$$
absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wabsdev_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})
This function computes the absolute deviation of the weighted dataset
@var{data} about the given weighted mean @var{wmean}.
@end deftypefun

@deftypefun double gsl_stats_wskew (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the weighted skewness of the dataset @var{data}.
@tex
\beforedisplay
$$
skew = {{\sum w_i ((x_i - {\Hat x})/{\Hat \sigma})^3} \over {\sum w_i}}
$$
\afterdisplay
@end tex
@ifinfo

@example
skew = (\sum w_i ((x_i - \Hat x)/\Hat \sigma)^3) / (\sum w_i)
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wskew_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})
This function computes the weighted skewness of the dataset @var{data}
using the given values of the weighted mean and weighted standard
deviation, @var{wmean} and @var{wsd}.
@end deftypefun

@deftypefun double gsl_stats_wkurtosis (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the weighted kurtosis of the dataset @var{data}.
@tex
\beforedisplay
$$
kurtosis = {{\sum w_i ((x_i - {\Hat x})/{\Hat \sigma})^4} \over {\sum w_i}} - 3
$$
\afterdisplay
@end tex
@ifinfo

@example
kurtosis = ((\sum w_i ((x_i - \Hat x)/\Hat \sigma)^4) / (\sum w_i)) - 3
@end example
@end ifinfo
@end deftypefun

@deftypefun double gsl_stats_wkurtosis_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})
This function computes the weighted kurtosis of the dataset @var{data}
using the given values of the weighted mean and weighted standard
deviation, @var{wmean} and @var{wsd}.
@end deftypefun

@node Maximum and Minimum values
@section Maximum and Minimum values

The following functions find the maximum and minimum values of a
dataset (or their indices).  If the data contains @code{NaN}s then a
@code{NaN} will be returned, since the maximum or minimum value is
undefined.  For functions which return an index, the location of the
first @code{NaN} in the array is returned.

@deftypefun double gsl_stats_max (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the maximum value in @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The maximum value is defined
as the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$}
@math{x_i >= x_j} for all @math{j}.

If you want instead to find the element with the largest absolute
magnitude you will need to apply @code{fabs} or @code{abs} to your data
before calling this function.
@end deftypefun

@deftypefun double gsl_stats_min (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the minimum value in @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The minimum value is defined
as the value of the element @math{x_i} which satisfies @c{$x_i \le x_j$}
@math{x_i <= x_j} for all @math{j}.

If you want instead to find the element with the smallest absolute
magnitude you will need to apply @code{fabs} or @code{abs} to your data
before calling this function.
@end deftypefun

@deftypefun void gsl_stats_minmax (double * @var{min}, double * @var{max}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function finds both the minimum and maximum values @var{min},
@var{max} in @var{data} in a single pass.
@end deftypefun

@deftypefun size_t gsl_stats_max_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the index of the maximum value in @var{data}, a
dataset of length @var{n} with stride @var{stride}.  The maximum value is
defined as the value of the element @math{x_i} which satisfies 
@c{$x_i \ge x_j$}
@math{x_i >= x_j} for all @math{j}.  When there are several equal maximum
elements then the first one is chosen.
@end deftypefun

@deftypefun size_t gsl_stats_min_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the index of the minimum value in @var{data}, a
dataset of length @var{n} with stride @var{stride}.  The minimum value
is defined as the value of the element @math{x_i} which satisfies
@c{$x_i \ge x_j$}
@math{x_i >= x_j} for all @math{j}.  When there are several equal
minimum elements then the first one is chosen.
@end deftypefun

@deftypefun void gsl_stats_minmax_index (size_t * @var{min_index}, size_t * @var{max_index}, const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the indexes @var{min_index}, @var{max_index} of
the minimum and maximum values in @var{data} in a single pass.
@end deftypefun

@node Median and Percentiles
@section Median and Percentiles

The median and percentile functions described in this section operate on
sorted data.  For convenience we use @dfn{quantiles}, measured on a scale
of 0 to 1, instead of percentiles (which use a scale of 0 to 100).

@deftypefun double gsl_stats_median_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n})
This function returns the median value of @var{sorted_data}, a dataset
of length @var{n} with stride @var{stride}.  The elements of the array
must be in ascending numerical order.  There are no checks to see
whether the data are sorted, so the function @code{gsl_sort} should
always be used first.

When the dataset has an odd number of elements the median is the value
of element @math{(n-1)/2}.  When the dataset has an even number of
elements the median is the mean of the two nearest middle values,
elements @math{(n-1)/2} and @math{n/2}.  Since the algorithm for
computing the median involves interpolation this function always returns
a floating-point number, even for integer data types.
@end deftypefun

@deftypefun double gsl_stats_quantile_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}, double @var{f})
This function returns a quantile value of @var{sorted_data}, a
double-precision array of length @var{n} with stride @var{stride}.  The
elements of the array must be in ascending numerical order.  The
quantile is determined by the @var{f}, a fraction between 0 and 1.  For
example, to compute the value of the 75th percentile @var{f} should have
the value 0.75.

There are no checks to see whether the data are sorted, so the function
@code{gsl_sort} should always be used first.

The quantile is found by interpolation, using the formula
@tex
\beforedisplay
$$
\hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}
$$
\afterdisplay
@end tex
@ifinfo

@example
quantile = (1 - \delta) x_i + \delta x_@{i+1@}
@end example

@end ifinfo
@noindent
where @math{i} is @code{floor}(@math{(n - 1)f}) and @math{\delta} is
@math{(n-1)f - i}.

Thus the minimum value of the array (@code{data[0*stride]}) is given by
@var{f} equal to zero, the maximum value (@code{data[(n-1)*stride]}) is
given by @var{f} equal to one and the median value is given by @var{f}
equal to 0.5.  Since the algorithm for computing quantiles involves
interpolation this function always returns a floating-point number, even
for integer data types.
@end deftypefun


@comment @node Statistical tests
@comment @section Statistical tests

@comment FIXME, do more work on the statistical tests

@comment -@deftypefun double gsl_stats_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})
@comment -@deftypefunx Statistics double gsl_stats_int_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})

@comment The function @code{gsl_stats_ttest} computes the t-test statistic for
@comment the two arrays @var{data1}[] and @var{data2}[], of lengths @var{n1} and
@comment -@var{n2} respectively.

@comment The t-test statistic measures the difference between the means of two
@comment datasets.

@node Example statistical programs
@section Examples
Here is a basic example of how to use the statistical functions:

@example
@verbatiminclude examples/stat.c
@end example

The program should produce the following output,

@example
@verbatiminclude examples/stat.txt
@end example


Here is an example using sorted data,

@example
@verbatiminclude examples/statsort.c
@end example

This program should produce the following output,

@example
@verbatiminclude examples/statsort.txt
@end example

@node Statistics References and Further Reading
@section References and Further Reading

The standard reference for almost any topic in statistics is the
multi-volume @cite{Advanced Theory of Statistics} by Kendall and Stuart.

@itemize @w{}
@item
Maurice Kendall, Alan Stuart, and J. Keith Ord.
@cite{The Advanced Theory of Statistics} (multiple volumes)
reprinted as @cite{Kendall's Advanced Theory of Statistics}.
Wiley, ISBN 047023380X.
@end itemize

@noindent
Many statistical concepts can be more easily understood by a Bayesian
approach.  The following book by Gelman, Carlin, Stern and Rubin gives a
comprehensive coverage of the subject.

@itemize @w{}
@item
Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.
@cite{Bayesian Data Analysis}.
Chapman & Hall, ISBN 0412039915.
@end itemize

@noindent
For physicists the Particle Data Group provides useful reviews of
Probability and Statistics in the ``Mathematical Tools'' section of its
Annual Review of Particle Physics. 

@itemize @w{}
@item
@cite{Review of Particle Properties}
R.M. Barnett et al., Physical Review D54, 1 (1996)
@end itemize

@noindent
The Review of Particle Physics is available online at
the website @uref{http://pdg.lbl.gov/}.