File: theta-model.py

package info (click to toggle)
statsmodels 0.14.6%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 49,956 kB
  • sloc: python: 254,365; f90: 612; sh: 560; javascript: 337; asm: 156; makefile: 145; ansic: 32; xml: 9
file content (218 lines) | stat: -rw-r--r-- 6,930 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
#!/usr/bin/env python
# coding: utf-8

# DO NOT EDIT
# Autogenerated from the notebook theta-model.ipynb.
# Edit the notebook and then sync the output with this file.
#
# flake8: noqa
# DO NOT EDIT

# # The Theta Model
#
# The Theta model of Assimakopoulos & Nikolopoulos (2000) is a simple
# method for forecasting the involves fitting two $\theta$-lines,
# forecasting the lines using a Simple Exponential Smoother, and then
# combining the forecasts from the two lines to produce the final forecast.
# The model is implemented in steps:
#
#
# 1. Test for seasonality
# 2. Deseasonalize if seasonality detected
# 3. Estimate $\alpha$ by fitting a SES model to the data and $b_0$ by
# OLS.
# 4. Forecast the series
# 5. Reseasonalize if the data was deseasonalized.
#
# The seasonality test examines the ACF at the seasonal lag $m$.  If this
# lag is significantly different from zero then the data is deseasonalize
# using `statsmodels.tsa.seasonal_decompose` use either a multiplicative
# method (default) or additive.
#
# The parameters of the model are $b_0$ and $\alpha$ where $b_0$ is
# estimated from the OLS regression
#
# $$
# X_t = a_0 + b_0 (t-1) + \epsilon_t
# $$
#
# and $\alpha$ is the SES smoothing parameter in
#
# $$
# \tilde{X}_t = (1-\alpha) X_t + \alpha \tilde{X}_{t-1}
# $$
#
# The forecasts are then
#
# $$
#  \hat{X}_{T+h|T} = \frac{\theta-1}{\theta} \hat{b}_0
#                      \left[h - 1 + \frac{1}{\hat{\alpha}}
#                      - \frac{(1-\hat{\alpha})^T}{\hat{\alpha}} \right]
#                      + \tilde{X}_{T+h|T}
# $$
#
# Ultimately $\theta$ only plays a role in determining how much the trend
# is damped.  If $\theta$ is very large, then the forecast of the model is
# identical to that from an Integrated Moving Average with a drift,
#
# $$
# X_t = X_{t-1} + b_0 + (\alpha-1)\epsilon_{t-1} + \epsilon_t.
# $$
#
# Finally, the forecasts are reseasonalized if needed.
#
# This module is based on:
#
# * Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: a
# decomposition
#   approach to forecasting. International journal of forecasting, 16(4),
# 521-530.
# * Hyndman, R. J., & Billah, B. (2003). Unmasking the Theta method.
#   International Journal of Forecasting, 19(2), 287-290.
# * Fioruci, J. A., Pellegrini, T. R., Louzada, F., & Petropoulos, F.
#   (2015). The optimized theta method. arXiv preprint arXiv:1503.03529.

# ## Imports
#
# We start with the standard set of imports and some tweaks to the default
# matplotlib style.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns

plt.rc("figure", figsize=(16, 8))
plt.rc("font", size=15)
plt.rc("lines", linewidth=3)
sns.set_style("darkgrid")

# ## Load some Data
#
# We will first look at housing starts using US data. This series is
# clearly seasonal but does not have a clear trend during the same.

reader = pdr.fred.FredReader(["HOUST"], start="1980-01-01", end="2020-04-01")
data = reader.read()
housing = data.HOUST
housing.index.freq = housing.index.inferred_freq
ax = housing.plot()

# We fit specify the model without any options and fit it. The summary
# shows that the data was deseasonalized using the multiplicative method.
# The drift is modest and negative, and the smoothing parameter is fairly
# low.

from statsmodels.tsa.forecasting.theta import ThetaModel

tm = ThetaModel(housing)
res = tm.fit()
print(res.summary())

# The model is first and foremost a forecasting method.  Forecasts are
# produced using the `forecast` method from fitted model. Below we produce a
# hedgehog plot by forecasting 2-years ahead every 2 years.
#
# **Note**: the default $\theta$ is 2.

forecasts = {"housing": housing}
for year in range(1995, 2020, 2):
    sub = housing[:str(year)]
    res = ThetaModel(sub).fit()
    fcast = res.forecast(24)
    forecasts[str(year)] = fcast
forecasts = pd.DataFrame(forecasts)
ax = forecasts["1995":].plot(legend=False)
children = ax.get_children()
children[0].set_linewidth(4)
children[0].set_alpha(0.3)
children[0].set_color("#000000")
ax.set_title("Housing Starts")
plt.tight_layout(pad=1.0)

# We could alternatively fit the log of the data.  Here it makes more
# sense to force the deseasonalizing to use the additive method, if needed.
# We also fit the model parameters using MLE.  This method fits the IMA
#
# $$ X_t = X_{t-1} + \gamma\epsilon_{t-1} + \epsilon_t $$
#
# where $\hat{\alpha}$ = $\min(\hat{\gamma}+1, 0.9998)$ using
# `statsmodels.tsa.SARIMAX`. The parameters are similar although the drift
# is closer to zero.

tm = ThetaModel(np.log(housing), method="additive")
res = tm.fit(use_mle=True)
print(res.summary())

# The forecast only depends on the forecast trend component,
# $$
# \hat{b}_0
#                      \left[h - 1 + \frac{1}{\hat{\alpha}}
#                      - \frac{(1-\hat{\alpha})^T}{\hat{\alpha}} \right],
# $$
#
# the forecast from the SES (which does not change with the horizon), and
# the seasonal. These three components are available using the
# `forecast_components`. This allows forecasts to be constructed using
# multiple choices of $\theta$ using the weight expression above.

res.forecast_components(12)

# ## Personal Consumption Expenditure
#
# We next look at personal consumption expenditure. This series has a
# clear seasonal component and a drift.

reader = pdr.fred.FredReader(["NA000349Q"],
                             start="1980-01-01",
                             end="2020-04-01")
pce = reader.read()
pce.columns = ["PCE"]
pce.index.freq = "QS-OCT"
_ = pce.plot()

# Since this series is always positive, we model the $\ln$.

mod = ThetaModel(np.log(pce))
res = mod.fit()
print(res.summary())

# Next we explore differenced in the forecast as $\theta$ changes. When
# $\theta$ is close to 1, the drift is nearly absent.  As $\theta$
# increases, the drift becomes more obvious.

forecasts = pd.DataFrame({
    "ln PCE": np.log(pce.PCE),
    "theta=1.2": res.forecast(12, theta=1.2),
    "theta=2": res.forecast(12),
    "theta=3": res.forecast(12, theta=3),
    "No damping": res.forecast(12, theta=np.inf),
})
_ = forecasts.tail(36).plot()
plt.title("Forecasts of ln PCE")
plt.tight_layout(pad=1.0)

# Finally, `plot_predict` can be used to visualize the predictions and
# prediction intervals which are constructed assuming the IMA is true.

ax = res.plot_predict(24, theta=2)

# We conclude be producing a hedgehog plot using 2-year non-overlapping
# samples.

ln_pce = np.log(pce.PCE)
forecasts = {"ln PCE": ln_pce}
for year in range(1995, 2020, 3):
    sub = ln_pce[:str(year)]
    res = ThetaModel(sub).fit()
    fcast = res.forecast(12)
    forecasts[str(year)] = fcast
forecasts = pd.DataFrame(forecasts)
ax = forecasts["1995":].plot(legend=False)
children = ax.get_children()
children[0].set_linewidth(4)
children[0].set_alpha(0.3)
children[0].set_color("#000000")
ax.set_title("ln PCE")
plt.tight_layout(pad=1.0)