1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188
|
.. currentmodule:: statsmodels.nonparametric
.. _nonparametric:
Nonparametric Methods :mod:`nonparametric`
==========================================
This section collects various methods in nonparametric statistics. This
includes kernel density estimation for univariate and multivariate data,
kernel regression and locally weighted scatterplot smoothing (lowess).
sandbox.nonparametric contains additional functions that are work in progress
or do not have unit tests yet. We are planning to include here nonparametric
density estimators, especially based on kernel or orthogonal polynomials,
smoothers, and tools for nonparametric models and methods in other parts of
statsmodels.
Kernel density estimation
-------------------------
The kernel density estimation (KDE) functionality is split between univariate
and multivariate estimation, which are implemented in quite different ways.
Univariate estimation (as provided by `KDEUnivariate`) uses FFT transforms,
which makes it quite fast. Therefore it should be preferred for *continuous,
univariate* data if speed is important. It supports using different kernels;
bandwidth estimation is done only by a rule of thumb (Scott or Silverman).
Multivariate estimation (as provided by `KDEMultivariate`) uses product
kernels. It supports least squares and maximum likelihood cross-validation
for bandwidth estimation, as well as estimating mixed continuous, ordered and
unordered data. The default kernels (Gaussian, Wang-Ryzin and
Aitchison-Aitken) cannot be altered at the moment however. Direct estimation
of the conditional density (:math:`P(X | Y) = P(X, Y) / P(Y)`) is supported
by `KDEMultivariateConditional`.
`KDEMultivariate` can do univariate estimation as well, but is up to two orders
of magnitude slower than `KDEUnivariate`.
Kernel regression
-----------------
Kernel regression (as provided by `KernelReg`) is based on the same product
kernel approach as `KDEMultivariate`, and therefore has the same set of
features (mixed data, cross-validated bandwidth estimation, kernels) as
described above for `KDEMultivariate`. Censored regression is provided by
`KernelCensoredReg`.
Note that code for semi-parametric partial linear models and single index
models, based on `KernelReg`, can be found in the sandbox.
References
----------
* B.W. Silverman, "Density Estimation for Statistics and Data Analysis"
* J.S. Racine, "Nonparametric Econometrics: A Primer," Foundation and
Trends in Econometrics, Vol. 3, No. 1, pp. 1-88, 2008.
* Q. Li and J.S. Racine, "Nonparametric econometrics: theory and practice",
Princeton University Press, 2006.
* Hastie, Tibshirani and Friedman, "The Elements of Statistical Learning:
Data Mining, Inference, and Prediction", Springer, 2009.
* Racine, J., Li, Q. "Nonparametric Estimation of Distributions
with Categorical and Continuous Data." Working Paper. (2000)
* Racine, J. Li, Q. "Kernel Estimation of Multivariate Conditional
Distributions Annals of Economics and Finance 5, 211-235 (2004)
* Liu, R., Yang, L. "Kernel estimation of multivariate
cumulative distribution function." Journal of Nonparametric Statistics
(2008)
* Li, R., Ju, G. "Nonparametric Estimation of Multivariate CDF
with Categorical and Continuous Data." Working Paper
* Li, Q., Racine, J. "Cross-validated local linear nonparametric
regression" Statistica Sinica 14(2004), pp. 485-512
* Racine, J.: "Consistent Significance Testing for Nonparametric
Regression" Journal of Business & Economics Statistics
* Racine, J., Hart, J., Li, Q., "Testing the Significance of
Categorical Predictor Variables in Nonparametric Regression
Models", 2006, Econometric Reviews 25, 523-544
Module Reference
----------------
.. module:: statsmodels.nonparametric
:synopsis: Nonparametric estimation of densities and curves
The public functions and classes are
.. currentmodule:: statsmodels.nonparametric.smoothers_lowess
.. autosummary::
:toctree: generated/
lowess
.. currentmodule:: statsmodels.nonparametric.kde
.. autosummary::
:toctree: generated/
KDEUnivariate
.. currentmodule:: statsmodels.nonparametric.kernel_density
.. autosummary::
:toctree: generated/
KDEMultivariate
KDEMultivariateConditional
EstimatorSettings
.. currentmodule:: statsmodels.nonparametric.kernel_regression
.. autosummary::
:toctree: generated/
KernelReg
KernelCensoredReg
helper functions for kernel bandwidths
.. currentmodule:: statsmodels.nonparametric.bandwidths
.. autosummary::
:toctree: generated/
bw_scott
bw_silverman
select_bandwidth
There are some examples for nonlinear functions in
:mod:`statsmodels.nonparametric.dgp_examples`
Asymmetric Kernels
------------------
Asymmetric kernels like beta for the unit interval and gamma for positive
valued random variables avoid problems at the boundary of the support of the
distribution.
Statsmodels has preliminary support for estimating density and cumulative
distribution function using kernels for the unit interval, ``beta`` or the
positive real line, all other kernels.
Several of the kernels for the positive real line assume that the density at
the zero boundary is zero. The gamma kernel also allows the case of positive
or unbound density at the zero boundary.
There are currently no defaults and no support for choosing the bandwidth. the
user has to provide the bandwidth.
The functions to compute kernel density and kernel cdf are
.. currentmodule:: statsmodels.nonparametric.kernels_asymmetric
.. autosummary::
:toctree: generated/
pdf_kernel_asym
cdf_kernel_asym
The available kernel functions for pdf and cdf are
.. autosummary::
:toctree: generated/
kernel_pdf_beta
kernel_pdf_beta2
kernel_pdf_bs
kernel_pdf_gamma
kernel_pdf_gamma2
kernel_pdf_invgamma
kernel_pdf_invgauss
kernel_pdf_lognorm
kernel_pdf_recipinvgauss
kernel_pdf_weibull
kernel_cdf_beta
kernel_cdf_beta2
kernel_cdf_bs
kernel_cdf_gamma
kernel_cdf_gamma2
kernel_cdf_invgamma
kernel_cdf_invgauss
kernel_cdf_lognorm
kernel_cdf_recipinvgauss
kernel_cdf_weibull
The sandbox.nonparametric contains additional insufficiently tested classes
for testing functional form and for semi-linear and single index models.
|