File: regression_diagnostics.py

package info (click to toggle)
statsmodels 0.14.6%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 49,956 kB
  • sloc: python: 254,365; f90: 612; sh: 560; javascript: 337; asm: 156; makefile: 145; ansic: 32; xml: 9
file content (109 lines) | stat: -rw-r--r-- 3,120 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
#!/usr/bin/env python
# coding: utf-8

# DO NOT EDIT
# Autogenerated from the notebook regression_diagnostics.ipynb.
# Edit the notebook and then sync the output with this file.
#
# flake8: noqa
# DO NOT EDIT

# # Regression diagnostics

# This example file shows how to use a few of the ``statsmodels``
# regression diagnostic tests in a real-life context. You can learn about
# more tests and find out more information about the tests here on the
# [Regression Diagnostics
# page.](https://www.statsmodels.org/stable/diagnostic.html)
#
# Note that most of the tests described here only return a tuple of
# numbers, without any annotation. A full description of outputs is always
# included in the docstring and in the online ``statsmodels`` documentation.
# For presentation purposes, we use the ``zip(name,test)`` construct to
# pretty-print short descriptions in the examples below.

# ## Estimate a regression model

from statsmodels.compat import lzip

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt

# Load data
url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/HistData/Guerry.csv"
dat = pd.read_csv(url)

# Fit regression model (using the natural log of one of the regressors)
results = smf.ols("Lottery ~ Literacy + np.log(Pop1831)", data=dat).fit()

# Inspect the results
print(results.summary())

# ## Normality of the residuals

# Jarque-Bera test:

name = ["Jarque-Bera", "Chi^2 two-tail prob.", "Skew", "Kurtosis"]
test = sms.jarque_bera(results.resid)
lzip(name, test)

# Omni test:

name = ["Chi^2", "Two-tail probability"]
test = sms.omni_normtest(results.resid)
lzip(name, test)

# ## Influence tests
#
# Once created, an object of class ``OLSInfluence`` holds attributes and
# methods that allow users to assess the influence of each observation. For
# example, we can compute and extract the first few rows of DFbetas by:

from statsmodels.stats.outliers_influence import OLSInfluence

test_class = OLSInfluence(results)
test_class.dfbetas[:5, :]

# Explore other options by typing ``dir(influence_test)``
#
# Useful information on leverage can also be plotted:

from statsmodels.graphics.regressionplots import plot_leverage_resid2

fig, ax = plt.subplots(figsize=(8, 6))
fig = plot_leverage_resid2(results, ax=ax)

# Other plotting options can be found on the [Graphics
# page.](https://www.statsmodels.org/stable/graphics.html)

# ## Multicollinearity
#
# Condition number:

np.linalg.cond(results.model.exog)

# ## Heteroskedasticity tests
#
# Breush-Pagan test:

name = ["Lagrange multiplier statistic", "p-value", "f-value", "f p-value"]
test = sms.het_breuschpagan(results.resid, results.model.exog)
lzip(name, test)

# Goldfeld-Quandt test

name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(results.resid, results.model.exog)
lzip(name, test)

# ## Linearity
#
# Harvey-Collier multiplier test for Null hypothesis that the linear
# specification is correct:

name = ["t value", "p value"]
test = sms.linear_harvey_collier(results)
lzip(name, test)