File: arma_estimation.rst

package info (click to toggle)
openturns 1.24-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,204 kB
  • sloc: cpp: 256,662; python: 63,381; ansic: 4,414; javascript: 406; sh: 180; xml: 164; yacc: 123; makefile: 98; lex: 55
file content (168 lines) | stat: -rw-r--r-- 6,531 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
.. _arma_estimation:

ARMA process estimation
-----------------------

From the order :math:`(p,q)` or a range of orders :math:`(p,q) \in Ind_p \times Ind_q`,
where :math:`Ind_p = [p_1, p_2]`
and :math:`Ind_q = [q_1, q_2]`, the methods aims to find the *best* model
:math:`ARMA(p,q)` that fits the data and estimate the
corresponding coefficients. The *best* model is considered with
respect to the :math:`AIC_c` criteria (corrected Akaike Information
Criterion), defined by:

.. math::

      AICc = -2\log L_w + 2(p + q + 1)\frac{m}{m - p - q - 2}

where :math:`m` is half the number of points of the time grid of the
process sample (if the data are a process sample) or in a block of the
time series (if the data are a time series).

Two other criteria are computed for each order :math:`(p,q)`:

-  the AIC criterion:

   .. math::

      AIC = -2\log L_w + 2(p + q + 1)

-  and the BIC criterion:

   .. math::

      BIC = -2\log L_w + 2(p + q + 1)\log(p + q + 1)

The *BIC* criterion leads to a model that gives a better prediction;
the *AIC* criterion selects the best model that fits the given data;
the :math:`AIC_c` criterion improves the previous one by penalizing a
too high order that would artificially fit to the data.
For each order :math:`(p,q)`, the estimation of the coefficients
:math:`(a_k)_{1\leq k\leq p}`, :math:`(b_k)_{1\leq k\leq q}` and the
variance :math:`\sigma^2` is done using the Whittle estimator which is
based on the maximization of the likelihood function in the frequency
domain.
The principle is detailed hereafter for the case of a time series: in
the case of a process sample, the estimator is similar except for the
periodogram which is computed differently.
We consider a time series associated to the time grid
:math:`(t_0, \hdots, t_{n-1})` and a particular order :math:`(p,q)`.
The spectral density function of the
:math:`ARMA(p,q)` process writes:

.. math::
  :label: arma_spectrum

    f(\lambda, \vect{\theta}, \sigma^2) = \frac{\sigma^2}{2 \pi} \frac{|1 + b_1 \exp(-i \lambda) + \ldots
    + b_q \exp(-i q \lambda)|^2}{|1 + a_1 \exp(-i \lambda) + \ldots + a_p \exp(-i p \lambda)|^2} =
    \frac{\sigma^2}{2 \pi} g(\lambda, \vect{\theta})

where :math:`\vect{\theta} = (a_1, a_2,...,a_p,b_1,...,b_q)` and
:math:`\lambda` is the frequency value.

The Whittle log-likelihood writes:

.. math::
  :label: arma_likelihood

    \log L_w(\vect{\theta}, \sigma^2) = - \sum_{j=1}^{m} \log f(\lambda_j,  \vect{\theta}, \sigma^2) - \frac{1}{2 \pi} \sum_{j=1}^{m} \frac{I(\lambda_j)}{f(\lambda_j,  \vect{\theta}, \sigma^2)}

where:

-  :math:`I` is the non parametric estimate of the spectral density,
   expressed in the Fourier space (frequencies in :math:`[0,2\pi]`
   instead of :math:`[-f_{max}, f_{max}]`). By default the Welch estimator is used.

-  :math:`\lambda_j` is the :math:`j-th` Fourier frequency,
   :math:`\lambda_j = \frac{2 \pi j}{n}`, :math:`j=1 \ldots m` with
   :math:`m` the largest integer :math:`\leq \frac{n-1}{2}`.

We estimate the :math:`(p+q+1)` scalar coefficients by maximizing the
log-likelihood function. The corresponding equations lead to the
following relation:

.. math::
  :label: optSigma

    \sigma^{2*} = \frac{1}{m} \sum_{j=1}^{m} \frac{I(\lambda_j)}{g(\lambda_j, \vect{\theta}^{*})}

where :math:`\vect{\theta}^{*}` maximizes:

.. math::
  :label: lik2

    \log L_w(\vect{\theta}) = m (\log(2 \pi) - 1) - m \log\left( \frac{1}{m} \sum_{j=1}^{m} \frac{I(\lambda_j)}{g(\lambda_j, \vect{\theta})}\right) - \sum_{j=1}^{m} g(\lambda_j, \vect{\theta})

The Whitle estimation requires that:

-  the determinant of the eigenvalues of the companion matrix
   associated to the polynomial :math:`1 + a_1X + \dots + a_pX^p` are
   outside the unit disc,, which guarantees the stationarity of the
   process;

-  the determinant of the eigenvalues of the companion matrix
   associated to the polynomial :math:`1 + ba_1X + \dots + b_qX^q` are
   outside the unit disc, which guarantees the invertibility of the
   process.

Multivariate estimation
~~~~~~~~~~~~~~~~~~~~~~~

Let :math:`(t_i, \vect{X}_i)_{0\leq i \leq n-1}` be a multivariate
time series of dimension :math:`d` generated by an ARMA process
where :math:`(p,q)` are supposed to
be known. We assume that the white noise :math:`\varepsilon` is
distributed according to the normal distribution with zero mean and
with covariance matrix
:math:`\mat{\Sigma}_{\varepsilon} = \sigma^2\mat{Q}` where
:math:`|\mat{Q}| = 1` .
The normality of the white noise implies the normality of the process.
If we note :math:`\vect{W} = (\vect{X}_0, \hdots, \vect{X}_{n-1})`,
then :math:`\vect{W}` is normal with zero mean. Its covariance matrix
writes
:math:`\mathbb{E}(\vect{W}\vect{W}^t)= \sigma^2 \Sigma_{\vect{W}}`
which depends on the coefficients :math:`(\mat{A}_k, \mat{B}_l)` for
:math:`k = 1,\ldots,p` and :math:`l = 1,\ldots, q` and on the matrix
:math:`\mat{Q}`.

The likelihood of :math:`\vect{W}` writes:

.. math::
  :label: likelihoodMV

    L(\vect{\beta}, \sigma^2 | \vect{W}) = (2 \pi \sigma^2) ^{-\frac{d n}{2}} |\Sigma_{w}|^{-\frac{1}{2}} \exp\left(- (2\sigma^2)^{-1}  \vect{W}^t \Sigma_{\vect{W}}^{-1}  \vect{W} \right)

where
:math:`\vect{\beta} = (\mat{A}_{k}, \mat{B}_{l}, \mat{Q}),\ k = 1,\ldots,p`,
:math:`l = 1,\ldots, q` and where :math:`|.|` denotes the determinant.

The difficulty arises from the great size (:math:`dn \times dn`) of
:math:`\Sigma_{\vect{W}}` which is a dense matrix in the general case.
[mauricio1995]_ proposes an efficient algorithm to evaluate the likelihood
function. The main point is to use a change of variable that leads to a
block-diagonal sparse covariance matrix.

The multivariate Whittle estimation requires that:

-  the determinant of the eigenvalues of the companion matrix
   associated to the polynomial
   :math:`\mat{I} + \mat{A}_1\mat{X} + \dots + \mat{A}_p\mat{X}^p` are
   outside the unit disc, which guarantees the stationarity of the
   process;

-  the determinant of the eigenvalues of the companion matrix
   associated to the polynomial
   :math:`\mat{I} + \mat{B}_1\mat{X} + \dots + \mat{B}_q\mat{X}^q` are
   outside the unit disc, which guarantees the invertibility of the
   process.

.. topic:: API:

    - See :class:`~openturns.WhittleFactory`
    - See :class:`~openturns.WelchFactory`
    - See :class:`~openturns.ARMA`

.. topic:: Examples:

    - See :doc:`/auto_data_analysis/estimate_stochastic_processes/plot_estimate_arma`
    - See :doc:`/auto_data_analysis/estimate_stochastic_processes/plot_estimate_multivariate_arma`