File: statespace_sarimax_faq.py

package info (click to toggle)
statsmodels 0.13.5%2Bdfsg-7
links: PTS, VCS
area: main
in suites: bookworm
size: 46,912 kB
sloc: python: 240,079; f90: 612; sh: 467; javascript: 337; asm: 156; makefile: 131; ansic: 16; xml: 9
file content (533 lines) | stat: -rw-r--r-- 16,891 bytes
parent folder | download | duplicates (2)
#!/usr/bin/env python
# coding: utf-8

# DO NOT EDIT
# Autogenerated from the notebook statespace_sarimax_faq.ipynb.
# Edit the notebook and then sync the output with this file.
#
# flake8: noqa
# DO NOT EDIT

# # SARIMAX and ARIMA: Frequently Asked Questions (FAQ)
#
# This notebook contains explanations for frequently asked questions.
#
# * Comparing trends and exogenous variables in `SARIMAX`, `ARIMA` and
# `AutoReg`
# * Reconstructing residuals, fitted values and forecasts in `SARIMAX` and
# `ARIMA`
# * Initial residuals in `SARIMAX` and `ARIMA`

# ## Comparing trends and exogenous variables in `SARIMAX`, `ARIMA` and
# `AutoReg`
#
# `ARIMA` are formally OLS with ARMA errors.  A basic AR(1) in the OLS
# with ARMA errors is described as
#
# $$
# \begin{align}
# Y_t & = \delta + \epsilon_t \\
# \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \\
# \eta_t & \sim WN(0,\sigma^2) \\
# \end{align}
# $$
#
# In large samples, $\hat{\delta}\stackrel{p}{\rightarrow} E[Y]$.
#
# `SARIMAX` uses a different representation, so that the model when
# estimated using `SARIMAX` is
#
# $$
# \begin{align}
# Y_t & = \phi + \rho Y_{t-1} + \eta_t \\
# \eta_t & \sim WN(0,\sigma^2) \\
# \end{align}
# $$
#
#
# This is the same representation that is used when the model is estimated
# using OLS (`AutoReg`). In large samples,
# $\hat{\phi}\stackrel{p}{\rightarrow} E[Y](1-\rho)$.
#
# In the next cell, we simulate a large sample and verify that these
# relationship hold in practice.

import numpy as np
import pandas as pd

rng = np.random.default_rng(20210819)
eta = rng.standard_normal(5200)
rho = 0.8
beta = 10
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    epsilon[i] = rho * epsilon[i - 1] + eta[i]
y = beta + epsilon
y = y[200:]

from statsmodels.tsa.api import SARIMAX, AutoReg
from statsmodels.tsa.arima.model import ARIMA

# The three models are specified and estimated in the next cell.  An AR(0)
# is included as a reference. The AR(0) is identical using all three
# estimators.

ar0_res = SARIMAX(y, order=(0, 0, 0), trend="c").fit()
sarimax_res = SARIMAX(y, order=(1, 0, 0), trend="c").fit()
arima_res = ARIMA(y, order=(1, 0, 0), trend="c").fit()
autoreg_res = AutoReg(y, 1, trend="c").fit()

# The table below contains the estimated parameter in the model, the
# estimated AR(1) coefficient, and the long-run mean which is either equal
# to the estimated parameters (AR(0) or `ARIMA`), or depends on the ratio of
# the intercept to 1 minus the AR(1) parameter.

intercept = [
    ar0_res.params[0],
    sarimax_res.params[0],
    arima_res.params[0],
    autoreg_res.params[0],
]
rho_hat = [0] + [r.params[1] for r in (sarimax_res, arima_res, autoreg_res)]
long_run = [
    ar0_res.params[0],
    sarimax_res.params[0] / (1 - sarimax_res.params[1]),
    arima_res.params[0],
    autoreg_res.params[0] / (1 - autoreg_res.params[1]),
]
cols = ["AR(0)", "SARIMAX", "ARIMA", "AutoReg"]
pd.DataFrame(
    [intercept, rho_hat, long_run],
    columns=cols,
    index=["delta-or-phi", "rho", "long-run mean"],
)

# ### Differences between trend and exog in `SARIMAX`
#
# When `SARIMAX` includes `exog` variables, then the `exog` are treated as
# OLS regressors, so that the model estimated is
#
# $$
# \begin{align}
# Y_t - X_t \beta & = \delta + \rho (Y_{t-1} - X_{t-1}\beta) + \eta_t \\
# \eta_t & \sim WN(0,\sigma^2) \\
# \end{align}
# $$
#
# In the next example, we omit the trend and instead include a column of
# 1, which produces a model that is equivalent, in large samples, to the
# case with no exogenous regressor and `trend="c"`. Here the estimated value
# of `const` matches the value estimated using `ARIMA`. This happens since
# both exog in `SARIMAX` and the trend in `ARIMA` are treated as linear
# regression models with ARMA errors.

sarimax_exog_res = SARIMAX(y, exog=np.ones_like(y), order=(1, 0, 0),
                           trend="n").fit()
print(sarimax_exog_res.summary())

# ### Using `exog` in `SARIMAX` and `ARIMA`
#
# While `exog` are treated the same in both models, the intercept
# continues to differ.  Below we add an exogenous regressor to `y` and then
# fit the model using all three methods. The data generating process is now
#
# $$
# \begin{align}
# Y_t & = \delta + X_t \beta + \epsilon_t \\
# \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \\
# \eta_t & \sim WN(0,\sigma^2) \\
# \end{align}
# $$
#

full_x = rng.standard_normal(eta.shape)
x = full_x[200:]
y += 3 * x

sarimax_exog_res = SARIMAX(y, exog=x, order=(1, 0, 0), trend="c").fit()
arima_exog_res = ARIMA(y, exog=x, order=(1, 0, 0), trend="c").fit()

# Examining the parameter tables, we see that the parameter estimates on
# `x1` are identical while the estimates of the `intercept` continue to
# differ due to the differences in the treatment of trends in these
# estimators.

# #### `SARIMAX`


def print_params(s):
    from io import StringIO

    return pd.read_csv(StringIO(s.tables[1].as_csv()), index_col=0)


print_params(sarimax_exog_res.summary())

# #### `ARIMA`

print_params(arima_exog_res.summary())

# ### `exog` in `AutoReg`
#
# When using `AutoReg` to estimate a model using OLS, the model differs
# from both `SARIMAX` and `ARIMA`. The `AutoReg` specification with
# exogenous variables is
#
# $$
# \begin{align}
# Y_t & = \phi + \rho Y_{t-1} + X_{t}\beta + \eta_t \\
# \eta_t & \sim WN(0,\sigma^2) \\
# \end{align}
# $$
#
# This specification is not equivalent to the specification estimated in
# `SARIMAX` and `ARIMA`. Here the difference is non-trivial, and naive
# estimation on the same time series results in different parameter values,
# even in large samples (and the limit). Estimating this model changes the
# parameter estimates on the AR(1) coefficient.

# #### `AutoReg`

autoreg_exog_res = AutoReg(y, 1, exog=x, trend="c").fit()
print_params(autoreg_exog_res.summary())

# The key difference can be seen by writing the model in lag operator
# notation.
#
# $$
# \begin{align}
# (1-\phi L ) Y_t & = X_{t}\beta + \eta_t \Rightarrow \\
# Y_t & = (1-\phi L )^{-1}\left(X_{t}\beta + \eta_t\right) \\
# Y_t & = \sum_{i=0}^{\infty} \phi^i \left(X_{t-i}\beta +
# \eta_{t-i}\right)
# \end{align}
# $$
#
# where it is is assumed that $|\phi|<1$.  Here we see that $Y_t$ depends
# on all lagged values of $X_t$ and $\eta_t$.  This differs from the
# specification estimated by `SARIMAX` and `ARIMA`, which can be seen to be
#
# $$
# \begin{align}
# Y_t - X_t \beta & = \delta + \rho (Y_{t-1} - X_{t-1}\beta) + \eta_t \\
# \left(1-\rho L \right)\left(Y_t - X_t  \beta\right) & = \delta +  \eta_t
# \\
# Y_t - X_t  \beta & = \frac{\delta}{1-\rho} +  \left(1-\rho L
# \right)^{-1}\eta_t \\
# Y_t - X_t  \beta & = \frac{\delta}{1-\rho} +  \sum_{i=0}^\infty \rho^i
# \eta_{t-i} \\
# Y_t  & = \frac{\delta}{1-\rho} + X_t  \beta +  \sum_{i=0}^\infty \rho^i
# \eta_{t-i} \\
# \end{align}
# $$
#
# In this specification, $Y_t$ only depends on $X_t$ and no other lags.

# ### Using the correct DGP with `AutoReg`
#
# Simulating the process that is estimated in `AutoReg` shows that the
# parameters are recovered from the true model.

y = beta + eta
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    y[i] = beta * (1 - rho) + rho * y[i - 1] + 3 * full_x[i] + eta[i]
y = y[200:]

# #### `AutoReg` with correct DGP

autoreg_alt_exog_res = AutoReg(y, 1, exog=x, trend="c").fit()
print_params(autoreg_alt_exog_res.summary())

# ## Reconstructing residuals, fitted values and forecasts in `SARIMAX`
# and `ARIMA`
#
# In models that contain only autoregressive terms, trends and exogenous
# variables, fitted values and forecasts can be easily reconstructed once
# the maximum lag length in the model has been reached.  In practice, this
# means after $(P+D)s+p+d$ periods. Earlier predictions and residuals are
# harder to reconstruct since the model builds the best prediction for
# $Y_t|Y_{t-1},Y_{t-2},...$.  When the number of lags of $Y$ is less than
# the autoregressive order, then the expression for the optimal prediction
# differs from the model.  For example, when predicting the very first
# value, $Y_1$, there is no information available from the history of $Y$,
# and so the best prediction is the unconditional mean. In the case of an
# AR(1), the second prediction will follow the model, so that when using
# `ARIMA`, the prediction is
#
# $$
# Y_2 = \hat{\delta} + \hat{\rho} \left(Y_1 - \hat{\delta}\right)
# $$
#
# since `ARIMA` treats both exogenous and trend terms as regression with
# ARMA errors.
#
# This can be seen in the next set of cells.

arima_res = ARIMA(y, order=(1, 0, 0), trend="c").fit()
print_params(arima_res.summary())

arima_res.predict(0, 2)

delta_hat, rho_hat = arima_res.params[:2]
delta_hat + rho_hat * (y[0] - delta_hat)

# `SARIMAX` treats trend terms differently, and so the one-step forecast
# from a model estimated using `SARIMAX` is
#
# $$
# Y_2 = \hat\delta + \hat\rho Y_1
# $$

sarima_res = SARIMAX(y, order=(1, 0, 0), trend="c").fit()
print_params(sarima_res.summary())

sarima_res.predict(0, 2)

delta_hat, rho_hat = sarima_res.params[:2]
delta_hat + rho_hat * y[0]

# ### Prediction with MA components
#
# When a model contains a MA component, the prediction is more complicated
# since errors are never directly observable.  The prediction is still
# $Y_t|Y_{t-1},Y_{t-2},...$, and when the MA component is invertible, then
# the optimal prediction can be represented as a $t$-lag AR process. When
# $t$ is large, this should be very close to the prediction as if the errors
# were observable. For short lags, this can differ markedly.
#
# In the next cell we simulate an MA(1) process, and fit an MA model.

rho = 0.8
beta = 10
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    epsilon[i] = rho * eta[i - 1] + eta[i]
y = beta + epsilon
y = y[200:]

ma_res = ARIMA(y, order=(0, 0, 1), trend="c").fit()
print_params(ma_res.summary())

# We start by looking at predictions near the beginning of the sample
# corresponding `y[1]`, ..., `y[5]`.

ma_res.predict(1, 5)

# and the corresponding residuals that are needed to produce the "direct"
# forecasts

ma_res.resid[:5]

# Using the model parameters, we can produce the "direct" forecasts using
# the MA(1) specification
#
# $$
# \hat Y_t = \hat\delta + \hat\rho \hat\epsilon_{t-1}
# $$
#
# We see that these are not especially close to the actual model
# predictions for the initial forecasts, but that the gap quickly reduces.

delta_hat, rho_hat = ma_res.params[:2]
direct = delta_hat + rho_hat * ma_res.resid[:5]
direct

# The difference is nearly a standard deviation for the first but declines
# as the index increases.

ma_res.predict(1, 5) - direct

# We next look at the end of the sample and the final three predictions.

t = y.shape[0]
ma_res.predict(t - 3, t - 1)

ma_res.resid[-4:-1]

direct = delta_hat + rho_hat * ma_res.resid[-4:-1]
direct

# The "direct" forecasts are identical. This happens since the effect of
# the short sample has disappeared by the end of the sample (In practice it
# is negligible by observations 100 or so, and numerically absent by around
# observation 160).

ma_res.predict(t - 3, t - 1) - direct

# The same principle applies in more complicated model that include
# multiple lags or seasonal term - predictions in AR models are simple once
# the effective lag length has been reached, while predictions in models
# that contains MA components are only simple once the maximum root of the
# MA lag polynomial is sufficiently small so that the residuals are close to
# the true residuals.

# ### Prediction differences in `SARIMAX` and `ARIMA`
#
# The formulas used to make predictions from `SARIMAX` and `ARIMA` models
# differ in one key aspect - `ARIMA` treats all trend terms, e.g, the
# intercept or time trend, as part of the exogenous regressors.  For
# example, an AR(1) model with an intercept and linear time trend estimated
# using `ARIMA` has the specification
#
# $$
# \begin{align*}
# Y_t - \delta_0 - \delta_1 t & = \epsilon_t \\
# \epsilon_t & = \rho \epsilon_{t-1} + \eta_t
# \end{align*}
# $$
#
# When the same model is estimated using `SARIMAX`, the specification is
#
# $$
# \begin{align*}
# Y_t & = \epsilon_t \\
# \epsilon_t & =  \delta_0 + \delta_1 t  + \rho \epsilon_{t-1} + \eta_t
# \end{align*}
# $$
#
# The differences are more apparent when the model contains exogenous
# regressors, $X_t$.  The `ARIMA` specification is
#
# $$
# \begin{align*}
# Y_t - \delta_0 - \delta_1 t - X_t \beta & = \epsilon_t \\
# \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \\
#            & = \rho \left(Y_{t-1} - \delta_0 - \delta_1 (t-1) - X_{t-1}
# \beta\right) + \eta_t
# \end{align*}
# $$
#
# while the `SARIMAX` specification is
#
# $$
# \begin{align*}
# Y_t & =  X_t \beta + \epsilon_t \\
# \epsilon_t & =  \delta_0 + \delta_1 t  + \rho \epsilon_{t-1} + \eta_t \\
#            & = \delta_0 + \delta_1 t  + \rho \left(Y_{t-1} -
# X_{t-1}\beta\right) + \eta_t
# \end{align*}
# $$
#
# The key difference between these two is that the intercept and the trend
# are effectively equivalent to exogenous regressions in `ARIMA` while they
# are more like standard ARMA terms in `SARIMAX`.
#
# The next cell simulates an ARX with a time trend using the specification
# in `ARIMA` and estimates the parameters using both estimators.

rho = 0.8
beta = 2
delta0 = 10
delta1 = 0.5
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    epsilon[i] = rho * epsilon[i - 1] + eta[i]
t = np.arange(epsilon.shape[0])
y = delta0 + delta1 * t + beta * full_x + epsilon
y = y[200:]

start = np.array([110, delta1, beta, rho, 1])
arx_res = ARIMA(y, exog=x, order=(1, 0, 0), trend="ct").fit()
mod = SARIMAX(y, exog=x, order=(1, 0, 0), trend="ct")
start[:2] *= 1 - rho
sarimax_res = mod.fit(start_params=start, method="bfgs")

# The two estimators fit similarly, although there is a small difference
# in the log-likelihood.  This is a numerical issue and should not
# materially affect the predictions. Importantly the two trend parameters,
# `const` and `x1` (unfortunately named for the time trend), differ between
# the two.  The other parameters are effectively identical.

print(arx_res.summary())

print(sarimax_res.summary())

# ## Initial residuals `SARIMAX` and `ARIMA`
#
# Residuals for observations before the maximal model order, which depends
# on the AR, MA, Seasonal AR, Seasonal MA and differencing parameters, are
# not reliable and should not be used for performance assessment. In
# general, in an ARIMA with orders $(p,d,q)\times(P,D,Q,s)$, the formula for
# residuals that are less well behaved is:
#
# $$
# \max((P+D)s+p+d,Qs+q)
# $$
#
# We can simulate some data from an ARIMA(1,0,0)(1,0,0,12) and examine the
# residuals.

import numpy as np
import pandas as pd

rho = 0.8
psi = -0.6
beta = 20
epsilon = eta.copy()
for i in range(13, eta.shape[0]):
    epsilon[i] = (rho * epsilon[i - 1] + psi * epsilon[i - 12] -
                  (rho * psi) * epsilon[i - 13] + eta[i])
y = beta + epsilon
y = y[200:]

# With a large sample, the parameter estimates are very close to the DGP
# parameters.

res = ARIMA(y, order=(1, 0, 0), trend="c", seasonal_order=(1, 0, 0, 12)).fit()
print(res.summary())

# We can first examine the initial 13 residuals by plotting against the
# actual shocks in the model.  While there is a correspondence, it is fairly
# weak and the correlation is much less than 1.

import matplotlib.pyplot as plt

plt.rc("figure", figsize=(10, 10))
plt.rc("font", size=14)

_ = plt.scatter(res.resid[:13], eta[200:200 + 13])

# Looking at the next 24 residuals and shocks, we see there is nearly
# perfect correlation. This is expected in large samples once the less
# accurate residuals are ignored.

_ = plt.scatter(res.resid[13:37], eta[200 + 13:200 + 37])

# Next, we simulate an ARIMA(1,1,0), and include a time trend.

rng = np.random.default_rng(20210819)
eta = rng.standard_normal(5200)
rho = 0.8
beta = 20
epsilon = eta.copy()
for i in range(2, eta.shape[0]):
    epsilon[i] = (1 + rho) * epsilon[i - 1] - rho * epsilon[i - 2] + eta[i]
t = np.arange(epsilon.shape[0])
y = beta + 2 * t + epsilon
y = y[200:]

# Again the parameter estimates are very close to the DGP parameters.

res = ARIMA(y, order=(1, 1, 0), trend="t").fit()
print(res.summary())

# The residuals are not accurate, and the first residual is approximately
# 500.  The others are closer, although in this model the first 2 should
# usually be ignored.

res.resid[:5]

# The reason why the first residual is so large is that the optimal
# prediction of this value is the mean of the difference, which is 1.77.
# Once the first value is known, the second value makes use of the first
# value in its prediction and the prediction is substantially closer to the
# truth.

res.predict(0, 5)

# It is worth noting that the results class contains two parameters than
# can be helpful in understanding which residuals are problematic,
# `loglikelihood_burn` and `nobs_diffuse`.

res.loglikelihood_burn, res.nobs_diffuse