1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
|
#!/usr/bin/env python
# coding: utf-8
# DO NOT EDIT
# Autogenerated from the notebook regression_diagnostics.ipynb.
# Edit the notebook and then sync the output with this file.
#
# flake8: noqa
# DO NOT EDIT
# # Regression diagnostics
# This example file shows how to use a few of the ``statsmodels``
# regression diagnostic tests in a real-life context. You can learn about
# more tests and find out more information about the tests here on the
# [Regression Diagnostics
# page.](https://www.statsmodels.org/stable/diagnostic.html)
#
# Note that most of the tests described here only return a tuple of
# numbers, without any annotation. A full description of outputs is always
# included in the docstring and in the online ``statsmodels`` documentation.
# For presentation purposes, we use the ``zip(name,test)`` construct to
# pretty-print short descriptions in the examples below.
# ## Estimate a regression model
from statsmodels.compat import lzip
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
# Load data
url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/HistData/Guerry.csv"
dat = pd.read_csv(url)
# Fit regression model (using the natural log of one of the regressors)
results = smf.ols("Lottery ~ Literacy + np.log(Pop1831)", data=dat).fit()
# Inspect the results
print(results.summary())
# ## Normality of the residuals
# Jarque-Bera test:
name = ["Jarque-Bera", "Chi^2 two-tail prob.", "Skew", "Kurtosis"]
test = sms.jarque_bera(results.resid)
lzip(name, test)
# Omni test:
name = ["Chi^2", "Two-tail probability"]
test = sms.omni_normtest(results.resid)
lzip(name, test)
# ## Influence tests
#
# Once created, an object of class ``OLSInfluence`` holds attributes and
# methods that allow users to assess the influence of each observation. For
# example, we can compute and extract the first few rows of DFbetas by:
from statsmodels.stats.outliers_influence import OLSInfluence
test_class = OLSInfluence(results)
test_class.dfbetas[:5, :]
# Explore other options by typing ``dir(influence_test)``
#
# Useful information on leverage can also be plotted:
from statsmodels.graphics.regressionplots import plot_leverage_resid2
fig, ax = plt.subplots(figsize=(8, 6))
fig = plot_leverage_resid2(results, ax=ax)
# Other plotting options can be found on the [Graphics
# page.](https://www.statsmodels.org/stable/graphics.html)
# ## Multicollinearity
#
# Condition number:
np.linalg.cond(results.model.exog)
# ## Heteroskedasticity tests
#
# Breush-Pagan test:
name = ["Lagrange multiplier statistic", "p-value", "f-value", "f p-value"]
test = sms.het_breuschpagan(results.resid, results.model.exog)
lzip(name, test)
# Goldfeld-Quandt test
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(results.resid, results.model.exog)
lzip(name, test)
# ## Linearity
#
# Harvey-Collier multiplier test for Null hypothesis that the linear
# specification is correct:
name = ["t value", "p value"]
test = sms.linear_harvey_collier(results)
lzip(name, test)
|