File: nonparametric.rst

package info (click to toggle)
statsmodels 0.14.4%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 49,848 kB
  • sloc: python: 253,316; f90: 612; sh: 560; javascript: 337; asm: 156; makefile: 132; ansic: 16; xml: 9
file content (188 lines) | stat: -rw-r--r-- 6,278 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
.. currentmodule:: statsmodels.nonparametric

.. _nonparametric:


Nonparametric Methods :mod:`nonparametric`
==========================================

This section collects various methods in nonparametric statistics. This
includes kernel density estimation for univariate and multivariate data,
kernel regression and locally weighted scatterplot smoothing (lowess).

sandbox.nonparametric contains additional functions that are work in progress
or do not have unit tests yet. We are planning to include here nonparametric
density estimators, especially based on kernel or orthogonal polynomials,
smoothers, and tools for nonparametric models and methods in other parts of
statsmodels.


Kernel density estimation
-------------------------

The kernel density estimation (KDE) functionality is split between univariate
and multivariate estimation, which are implemented in quite different ways.

Univariate estimation (as provided by `KDEUnivariate`) uses FFT transforms,
which makes it quite fast.  Therefore it should be preferred for *continuous,
univariate* data if speed is important.  It supports using different kernels;
bandwidth estimation is done only by a rule of thumb (Scott or Silverman).

Multivariate estimation (as provided by `KDEMultivariate`) uses product
kernels.   It supports least squares and maximum likelihood cross-validation
for bandwidth estimation, as well as estimating mixed continuous, ordered and
unordered data.  The default kernels (Gaussian, Wang-Ryzin and
Aitchison-Aitken) cannot be altered at the moment however.  Direct estimation
of the conditional density (:math:`P(X | Y) = P(X, Y) / P(Y)`) is supported
by `KDEMultivariateConditional`.

`KDEMultivariate` can do univariate estimation as well, but is up to two orders
of magnitude slower than `KDEUnivariate`.


Kernel regression
-----------------

Kernel regression (as provided by `KernelReg`) is based on the same product
kernel approach as `KDEMultivariate`, and therefore has the same set of
features (mixed data, cross-validated bandwidth estimation, kernels) as
described above for `KDEMultivariate`.  Censored regression is provided by
`KernelCensoredReg`.

Note that code for semi-parametric partial linear models and single index
models, based on `KernelReg`, can be found in the sandbox.


References
----------

* B.W. Silverman, "Density Estimation for Statistics and Data Analysis"
* J.S. Racine, "Nonparametric Econometrics: A Primer," Foundation and
  Trends in Econometrics, Vol. 3, No. 1, pp. 1-88, 2008.
* Q. Li and J.S. Racine, "Nonparametric econometrics: theory and practice",
  Princeton University Press, 2006.
* Hastie, Tibshirani and Friedman, "The Elements of Statistical Learning:
  Data Mining, Inference, and Prediction", Springer, 2009.
* Racine, J., Li, Q. "Nonparametric Estimation of Distributions
  with Categorical and Continuous Data." Working Paper. (2000)
* Racine, J. Li, Q. "Kernel Estimation of Multivariate Conditional
  Distributions Annals of Economics and Finance 5, 211-235 (2004)
* Liu, R., Yang, L. "Kernel estimation of multivariate
  cumulative distribution function." Journal of Nonparametric Statistics
  (2008)
* Li, R., Ju, G. "Nonparametric Estimation of Multivariate CDF
  with Categorical and Continuous Data." Working Paper
* Li, Q., Racine, J. "Cross-validated local linear nonparametric
  regression" Statistica Sinica 14(2004), pp. 485-512
* Racine, J.: "Consistent Significance Testing for Nonparametric
  Regression" Journal of Business & Economics Statistics
* Racine, J., Hart, J., Li, Q., "Testing the Significance of
  Categorical Predictor Variables in Nonparametric Regression
  Models", 2006, Econometric Reviews 25, 523-544


Module Reference
----------------

.. module:: statsmodels.nonparametric
   :synopsis: Nonparametric estimation of densities and curves

The public functions and classes are

.. currentmodule:: statsmodels.nonparametric.smoothers_lowess
.. autosummary::
   :toctree: generated/

   lowess

.. currentmodule:: statsmodels.nonparametric.kde
.. autosummary::
   :toctree: generated/

   KDEUnivariate

.. currentmodule:: statsmodels.nonparametric.kernel_density
.. autosummary::
   :toctree: generated/

   KDEMultivariate
   KDEMultivariateConditional
   EstimatorSettings

.. currentmodule:: statsmodels.nonparametric.kernel_regression
.. autosummary::
   :toctree: generated/

   KernelReg
   KernelCensoredReg

helper functions for kernel bandwidths

.. currentmodule:: statsmodels.nonparametric.bandwidths
.. autosummary::
   :toctree: generated/

   bw_scott
   bw_silverman
   select_bandwidth

There are some examples for nonlinear functions in
:mod:`statsmodels.nonparametric.dgp_examples`


Asymmetric Kernels
------------------

Asymmetric kernels like beta for the unit interval and gamma for positive
valued random variables avoid problems at the boundary of the support of the
distribution.

Statsmodels has preliminary support for estimating density and cumulative
distribution function using kernels for the unit interval, ``beta`` or the
positive real line, all other kernels.

Several of the kernels for the positive real line assume that the density at
the zero boundary is zero. The gamma kernel also allows the case of positive
or unbound density at the zero boundary.

There are currently no defaults and no support for choosing the bandwidth. the
user has to provide the bandwidth.

The functions to compute kernel density and kernel cdf are

.. currentmodule:: statsmodels.nonparametric.kernels_asymmetric
.. autosummary::
   :toctree: generated/

   pdf_kernel_asym
   cdf_kernel_asym

The available kernel functions for pdf and cdf are

.. autosummary::
   :toctree: generated/

   kernel_pdf_beta
   kernel_pdf_beta2
   kernel_pdf_bs
   kernel_pdf_gamma
   kernel_pdf_gamma2
   kernel_pdf_invgamma
   kernel_pdf_invgauss
   kernel_pdf_lognorm
   kernel_pdf_recipinvgauss
   kernel_pdf_weibull
   kernel_cdf_beta
   kernel_cdf_beta2
   kernel_cdf_bs
   kernel_cdf_gamma
   kernel_cdf_gamma2
   kernel_cdf_invgamma
   kernel_cdf_invgauss
   kernel_cdf_lognorm
   kernel_cdf_recipinvgauss
   kernel_cdf_weibull


The sandbox.nonparametric contains additional insufficiently tested classes
for testing functional form and for semi-linear and single index models.