1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367
|
\chapter{Multivariate models}
\label{chap:system}
By a multivariate model we mean one that includes more than one
dependent variable. Certain specific types of multivariate model for
time-series data are discussed elsewhere: chapter~\ref{chap:var} deals
with VARs and chapter~\ref{chap:vecm} with VECMs. Here we discuss two
general sorts of multivariate model, implemented in gretl via
the \cmd{system} command: SUR systems (Seemingly Unrelated
Regressions), in which all the regressors are taken to be exogenous
and interest centers on the covariance of the error term across
equations; and simultaneous systems, in which some regressors are
assumed to be endogenous.
In this chapter we give an account of the syntax and use of the
\texttt{system} command and its companions, \texttt{restrict}
and \texttt{estimate}; we also explain the options and accessors
available in connection with multivariate models.
\section{The system command}
\label{sec:sys-command}
The specification of a multivariate system takes the form of a block
of statements, starting with \texttt{system} and ending with
\texttt{end system}. Once a system is specified it can estimated via
various methods, using the \texttt{estimate} command, with or without
restrictions, which may be imposed via the \texttt{restrict} command.
\subsection{Starting a system block}
The first line of a \texttt{system} block may be augmented in either
(or both) of two ways:
\begin{itemize}
\item An estimation method is specified for the system. This is done
by following \texttt{system} with an expression of the form
\texttt{method=}\textsl{estimator}, where \textsl{estimator} must be
one of \texttt{ols} (Ordinary Least Squares), \texttt{tsls}
(Two-Stage Least Squares), \texttt{sur} (Seemingly Unrelated
Regressions), \texttt{3sls} (Three-Stage Least Squares),
\texttt{liml} (Limited Information Maximum Likelihood) or
\texttt{fiml} (Full Information Maximum Likelihood). Two examples:
\begin{code}
system method=sur
system method=fiml
\end{code}
OLS, TSLS and LIML are, of course, single-equation methods rather than
true system estimators; they are included to facilitate comparisons.
\item The system is assigned a name. This is done by giving the name
first, followed by a back-arrow, ``\verb|<-|'', followed by
\texttt{system}. If the name contains spaces it must be enclosed in
double-quotes. Here are two examples:
\begin{code}
sys1 <- system
"System 1" <- system
\end{code}
Note, however, that this naming method is not available within a
user-defined function, only in the main body of a gretl script.
\end{itemize}
If the initial \texttt{system} line is augmented in the first way, the
effect is that the system is estimated as soon as its definition is
completed, using the specified method. The effect of the second option
is that the system can then be referenced by the assigned name for the
purposes of the \texttt{restrict} and \texttt{estimate} commands; in
the gretl GUI an additional effect is that an icon for the
system is added to the ``Session view''.
These two possibilities can be combined, as in
\begin{code}
mysys <- system method=3sls
\end{code}
In this example the system is estimated immediately via Three-Stage
Least Squares, and is also available for subsequent use under the
name \texttt{mysys}.
If the system is not named via the back-arrow mechanism, it is still
available for subsequent use via \texttt{restrict} and
\texttt{estimate}; in this case you should use the generic name
\verb|$system| to refer to the last-defined multivariate system.
\subsection{The body of a system block}
The most basic element in the body of a \texttt{system} block is the
\texttt{equation} statement, which is used to specify each equation
within the system. This takes the same form as the regression
specification for single-equation estimators, namely a list of series
with the dependent variable given first, followed by the regressors,
with the series given either by name or by ID number (order in the
dataset). A system block must contain at least two \texttt{equation}
statements, and for systems without endogenous regressors these
statements are all that is required. So, for example, a minimal
SUR specification might look like this:
\begin{code}
system method=sur
equation y1 const x1
equation y2 const x2
end system
\end{code}
For simultaneous systems it is necessary to determine which regressors
are endogenous and which exogenous. By default all regressors are
treated as exogenous, except that any variable that appears as the
dependent variable in one equation is automatically treated as
endogeous if it appears as a regressor elsewhere. However, an explicit
list of endogenous regressors may be supplied following the
\texttt{equations} lines: this takes the form of the keyword
\texttt{endog} followed by the names or ID numbers of the relevant
regressors.
When estimation is via TSLS or 3SLS it is possible to specify a
particular set of instruments for each equation. This is done by
giving the \texttt{equation} lists in the format used with the
\cmd{tsls} command: first the dependent variable, then the
regressors, then a semicolon, then the instruments, as in
\begin{code}
system method=3sls
equation y1 const x11 x12 ; const x11 z1
equation y2 const x21 x22 ; const x21 z2
end system
\end{code}
An alternative way of specifying instruments is to insert an extra
line starting with \texttt{instr}, followed by the list of
variables acting as instruments. This is especially useful for
specifying the system with the \texttt{equations} keyword, see the
following subsection.
As in \cmd{tsls}, any regressors that are not also
listed as instruments are treated as endogenous, so in the example
above \texttt{x11} and \texttt{x21} are treated as exogenous while
\texttt{x21} and \texttt{x22} are endogenous, and instrumented by
\texttt{z1} and \texttt{z2} respectively.
One more sort of statement is allowed in a \cmd{system} block: that
is, the keyword \texttt{identity} followed by an equation that defines
an accounting relationship, rather then a stochastic one, between
variables. For example,
\begin{code}
identity Y = C + I + G + X
\end{code}
There can be more than one \texttt{identity} in a system block. But
note that these statements are specific to estimation via FIML; they
are ignored for other estimators.
\subsection{Equation systems within functions}
It is also possible to define a multivariate system in a programmatic
way. This is useful if the precise specification of the system depends
on some input parameters that are not known in advance, but are given
when the script is actually run.\footnote{This feature was added in
version 1.9.7 of gretl.}
The relevant syntax is given by the \texttt{equations} keyword (note
the plural), which replaces the block of \texttt{equation} lines in
the standard form. An \texttt{equations} line requires two list
arguments. The first list must contain all series on the left-hand
side of the system; thus the number of elements in this first list
determines the number of equations in the system. The second list is a
``list of lists'', which is a special variant of the list data type.
That is, for each equation of the system you must provide a list of
right-hand side variables, and the lists for all equations must be
joined by assigning them to another list object; in that assignment,
they must be separated by a semicolon. Here is an example for a
two-equation system:
\begin{code}
list syslist = xlist1 ; xlist2
\end{code}
Therefore, specifying a system generically in this way just involves
building the necessary list arguments, as shown in the following
example:
\begin{code}
open denmark
list LHS = LRM LRY
list RHS1 = const LRM(-1) IBO(-1) IDE(-1)
list RHS2 = const LRY(-1) IBO(-1)
list RHS = RHS1 ; RHS2
system method=ols
equations LHS RHS
end system
\end{code}
As mentioned above, the option of assigning a specific name to a
system is not available within functions, but the generic identifier
\verb|$system| can be used to similar effect. The following example
shows how one can define a system, estimate it via two methods, apply
a restriction, then re-estimate it subject to the restriction.
\begin{code}
function void anonsys(series x, series y)
system
equation x const
equation y const
end system
estimate $system method=ols
estimate $system method=sur
restrict $system
b[1,1] - b[2,1] = 0
end restrict
estimate $system method=ols
end function
\end{code}
\section{Restriction and estimation}
\label{sec:sys-est}
The behavior of the \texttt{restrict} command is a little different
for multivariate systems as compared with single-equation models.
In the single-equation case, \texttt{restrict} refers to the
last-estimated model, and once the command is completed the
restriction is tested. In the multivariate case, you must give the
name of the system to which the restriction is to be applied (or
\verb|$system| to refer to the last-defined system), and the effect of
the command is just to attach the restriction to the system; testing
is not done until the next \texttt{estimate} command is given. In
addition, in the system case the default is to produce full estimates
of the restricted model; if you are not interested in the full
estimates and just want the test statistic you can append the
\verb|--quiet| option to \texttt{estimate}.
A given system restriction remains in force until it is replaced or
removed. To return a system to its unrestricted state you can give
an empty restrict block, as in
\begin{code}
restrict sysname
end restrict
\end{code}
As illustrated above, you can use the \texttt{method} tag to specify
an estimation method with the \texttt{estimate} command. If the system
has already been estimated you can omit this tag and the previous
method is used again.
The \texttt{estimate} command is the main locus for options regarding
the details of estimation. The available options are as follows:
\begin{itemize}
\item If the estimation method is SUR or 3SLS and the \verb|--iterate|
flag is given, the estimator will be iterated. In the case of SUR,
if the procedure converges the results are maximum likelihood
estimates. Iteration of three-stage least squares, however, does not
in general converge on the full-information maximum likelihood
results. This flag is ignored for other estimators.
\item If the equation-by-equation estimators OLS or TSLS are chosen,
the default is to apply a degrees of freedom correction when
calculating standard errors. This can be suppressed using the
\verb|--no-df-corr| flag. This flag has no effect with the other
estimators; no degrees of freedom correction is applied in any case.
\item By default, the formula used in calculating the elements of the
cross-equation covariance matrix is
\[
\hat{\sigma}_{ij} = \frac{\hat{u}'_i \hat{u}_j}{T}
\]
where $T$ is the sample size and $\hat{u}_i$ is the vector of
residuals from equation $i$. But if the \verb|--geomean| flag is
given, a degrees of freedom correction is applied: the formula is
\[
\hat{\sigma}_{ij} = \frac{\hat{u}'_i \hat{u}_j}{\sqrt{(T-k_i)(T-k_j)}}
\]
where $k_i$ denotes the number of independent parameters in equation
$i$.
\item If an iterative method is specified, the \verb|--verbose| option
calls for printing of the details of the iterations.
\item When the system estimator is SUR or 3SLS the cross-equation
covariance matrix is initially estimated via OLS or TSLS,
respectively. In the case of a system subject to restrictions the
question arises: should the initial single-equation estimator be
restricted or unrestricted? The default is the former, but the
\verb|--unrestrict-init| flag can be used to select unrestricted
initialization. (Note that this is unlikely to make much difference
if the \verb|--iterate| option is given.)
\end{itemize}
\section{System accessors}
\label{sec:sys-access}
After system estimation various matrices may be retrieved for further
analysis. Let $g$ denote the number of equations in the system and
let $K$ denote the total number of estimated parameters ($K = \sum_i
k_i$). The accessors \verb|$uhat| and \verb|$yhat| get $T \times g$
matrices holding the residuals and fitted values respectively. The
accessor \verb|$coeff| gets the stacked $K$-vector of parameter
estimates; \verb|$vcv| gets the $K \times K$ variance matrix of the
parameter estimates; and \verb|$sigma| gets the $g \times g$
cross-equation covariance matrix, $\hat{\Sigma}$.
A test statistic for the hypothesis that $\Sigma$ is diagonal can be
retrieved as \verb|$diagtest| and its p-value as
\verb|$diagpval|. This is the Breusch--Pagan test except when the
estimator is (unrestricted) iterated SUR, in which case it's a
Likelihood Ratio test. The Breusch--Pagan test is computed as
\[
\mbox{LM} = T \sum_{i=2}^g \sum_{j=1}^{i-1} r^2_{ij}
\]
where $r_{ij} = \hat{\sigma}_{ij} /
\sqrt{\hat{\sigma}_{ii}\hat{\sigma}_{jj}}$; the LR test is
\[
\mbox{LR} = T \left(\sum_{i=1}^g \log \hat{\sigma}^2_i -\log
|\hat{\Sigma}| \right)
\]
where $\hat{\sigma}^2_i$ is $\hat{u}'_i \hat{u}_i / T$ from the
individual OLS regressions. In both cases the test statistic is
distributed asymptotically as $\chi^2$ with $g(g-1)/2$ degrees of
freedom.
\subsection{Structural and reduced forms}
Systems of simultaneous systems can be represented in structural form
as
\[
\Gamma y_t = A_1 y_{t-1} + A_2 y_{t-2} + \cdots + A_p y_{t-p}
+ B x_t + \epsilon_t
\]
where $y_t$ represents the vector of endogenous variables in period
$t$, $x_t$ denotes the vector of exogenous variables, and $p$ is the
maximum lag of the endogenous regressors. The structural-form
matrices can be retrieved as \verb|$sysGamma|, \verb|$sysA| and
\verb|$sysB| respectively. If $y_t$ is $m \times 1$ and $x_t$ is $n
\times 1$, then $\Gamma$ is $m \times m$ and $B$ is $m \times n$. If
the system contains no lags of the endogenous variables then the $A$
matrix is not defined, otherwise $A$ is the horizontal concatenation
of $A_1,\dots,A_p$, and is therefore $m \times mp$.
% $
From the structural form it is straightforward to obtain the reduced
form, namely,
\[
y_t = \Gamma^{-1} \left(\sum_{i=1}^p A_i y_{t-i}\right)
+ \Gamma^{-1} B x_t + v_t
\]
where $v_t \equiv \Gamma^{-1}\epsilon_t$. The reduced form is used by
gretl to generate forecasts in response to the \texttt{fcast}
command. This means that---in contrast to single-equation
estimation---the values produced via \texttt{fcast} for a static,
within-sample forecast will in general differ from the fitted values
retrieved via \verb|$yhat|. The fitted values for equation $i$
represent the expectation of $y_{ti}$ conditional on the
contemporaneous values of all the regressors, while the \texttt{fcast}
values are conditional on the exogenous and predetermined variables
only.
The above account has to be qualified for the case where a system is
set up for estimation via TSLS or 3SLS using a specific list of
instruments per equation, as described in
section~\ref{sec:sys-command}. In that case it is possible to include
more endogenous regressors than explicit equations (although, of
course, there must be sufficient instruments to achieve
identification). In such systems endogenous regressors that have no
associated explicit equation are treated ``as if'' exogenous when
constructing the structural-form matrices. This means that forecasts
are conditional on the observed values of the ``extra'' endogenous
regressors rather than solely on the values of the exogenous and
predetermined variables.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End:
|