File: bivariate_statistics.qbk

package info (click to toggle)
scipy 1.16.0-1exp7
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 234,820 kB
  • sloc: cpp: 503,145; python: 344,611; ansic: 195,638; javascript: 89,566; fortran: 56,210; cs: 3,081; f90: 1,150; sh: 848; makefile: 785; pascal: 284; csh: 135; lisp: 134; xml: 56; perl: 51
file content (92 lines) | stat: -rw-r--r-- 3,604 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
[/
  Copyright 2018 Nick Thompson
  Copyright 2021 Matt Borland

  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]

[section:bivariate_statistics Bivariate Statistics]

[heading Synopsis]

``
#include <boost/math/statistics/bivariate_statistics.hpp>

namespace boost{ namespace math{ namespace statistics {

    template<typename ExecutionPolicy, typename Container>
    auto covariance(ExecutionPolicy&& exec, Container const & u, Container const & v);

    template<typename Container>
    auto covariance(Container const & u, Container const & v);

    template<typename ExecutionPolicy, typename Container>
    auto means_and_covariance(ExecutionPolicy&& exec, Container const & u, Container const & v);

    template<typename Container>
    auto means_and_covariance(Container const & u, Container const & v);

    template<typename ExecutionPolicy, typename Container>
    auto correlation_coefficient(ExecutionPolicy&& exec, Container const & u, Container const & v);

    template<typename Container>
    auto correlation_coefficient(Container const & u, Container const & v);

}}}
``

[heading Description]

This file provides functions for computing bivariate statistics.
The functions are C++11 compatible, but require C++17 to use execution policies.
If an execution policy is not passed to the function the default is std::execution::seq.

[heading Covariance]

Computes the population covariance of two datasets:

    std::vector<double> u{1,2,3,4,5};
    std::vector<double> v{1,2,3,4,5};
    double cov_uv = boost::math::statistics::covariance(u, v);

The implementation follows [@https://doi.org/10.1109/CLUSTR.2009.5289161 Bennet et al].
The parallel implementation follows [@https://dl.acm.org/doi/10.1145/3221269.3223036 Schubert et al].
The data is not modified.
Works with real-valued inputs and does not work with complex-valued inputs.

/Nota bene:/ If the input is an integer type the output will be a double precision type.

The algorithm used herein simultaneously generates the mean values of the input data /u/ and /v/.
For certain applications, it might be useful to get them in a single pass through the data.
As such, we provide `means_and_covariance`:

    std::vector<double> u{1,2,3,4,5};
    std::vector<double> v{1,2,3,4,5};
    auto [mu_u, mu_v, cov_uv] = boost::math::statistics::means_and_covariance(u, v);

[heading Correlation Coefficient]

Computes the [@https://en.wikipedia.org/wiki/Pearson_correlation_coefficient Pearson correlation coefficient] of two datasets /u/ and /v/:

    std::vector<double> u{1,2,3,4,5};
    std::vector<double> v{1,2,3,4,5};
    double rho_uv = boost::math::statistics::correlation_coefficient(u, v);
    // rho_uv = 1.

Works with real-valued inputs and does not work with complex-valued inputs.

/Nota bene:/ If the input is an integer type the output will be a double precision type.

If one or both of the datasets is constant, the correlation coefficient is an indeterminant form (0/0).
In this case the returned value is a `quiet_NaN()`.


[heading References]

* Bennett, Janine, et al. ['Numerically stable, single-pass, parallel statistics algorithms.] Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009.
* Schubert, Erich; Gertz, Michael ['Numerically stable parallel computation of (co-)variance'] Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018.

[endsect]
[/section:bivariate_statistics Bivariate Statistics]