File: nonparam.tex

package info (click to toggle)
gretl 2016d-1
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 48,620 kB
  • ctags: 22,779
  • sloc: ansic: 345,830; sh: 4,648; makefile: 2,712; xml: 570; perl: 364
file content (287 lines) | stat: -rw-r--r-- 9,988 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
\chapter{Nonparametric methods}
\label{chap:nonparam}

The main focus of gretl is on parametric estimation, but we offer a
selection of nonparametric methods. The most basic of these
%
\begin{itemize}
\item various tests for difference in distribution (Sign test,
  Wilcoxon rank-sum test, Wilcoxon signed-rank test);
\item the Runs test for randomness; and
\item nonparametric measures of association: Spearman's rho and
  Kendall's tau.
\end{itemize}

Details on the above can be found by consulting the help for the
commands \cmd{difftest}, \cmd{runs}, \cmd{corr} and \cmd{spearman}.
In the GUI program these items are found under the \textsf{Tools} menu
and the \textsf{Robust estimation} item under the \textsf{Model} menu.

In this chapter we concentrate on two relatively complex methods for
nonparametric curve-fitting and prediction, namely William
Cleveland's ``loess'' (also known as ``lowess'') and the
Nadaraya--Watson estimator.

\section{Locally weighted regression (loess)}
\label{sec:loess}

Loess \citep{cleveland79} is a nonparametric smoother employing
locally weighted polynomial regression. It is intended to yield an
approximation to $g(\cdot)$ when the dependent variable, $y$, can be
expressed as
\[
y_i = g(x_i) + \epsilon_i
\]
for some smooth function $g(\cdot)$.

Given a sample of $n$ observations on the variables $y$ and $x$, the
procedure is to run a weighted least squares regression (a polynomial
of order $d$ = 0, 1 or 2 in $x$) localized to each data point, $i$. In
each such regression the sample consists of the $r$ nearest neighbors
(in the $x$ dimension) to the point $i$, with weights that are
inversely related to the distance $|x_i - x_k|$, $k=1,\dots,r$. The
predicted value $\hat{y_i}$ is then obtained by evaluating the
estimated polynomial at $x_i$. The most commonly used order is $d=1$.

A bandwidth parameter $0 < q \leq 1$ controls the proportion of the
total number of data points used in each regression; thus $r = qn$
(rounded up to an integer). Larger values of $q$ lead to a smoother
fitted series, smaller values to a series that tracks the actual data
more closely; $0.25 \leq q \leq 0.5$ is often a suitable range.

In gretl's implementation of loess the weighting scheme is
that given by Cleveland, namely,
\[
w_k(x_i) = W(h_i^{-1}(x_k - x_i))
\]
where $h_i$ is the distance between $x_i$ and its $r^{\rm th}$ nearest
neighbor, and $W(\cdot)$ is the tricube function,
\[
W(x) = \begin{cases}
 (1 - |x|^3)^3 & \mbox{for } |x| < 1 \\
  0            & \mbox{for } |x| \geq 1
\end{cases}
\]

The local regression can be made robust via an adjustment based on the
residuals, $e_i = y_i - \hat{y}_i$. Robustness weights, $\delta_k$, are
defined by
\[
\delta_k = B(e_k / 6s)
\]
where $s$ is the median of the $|e_i|$ and $B(\cdot)$ is the
bisquare function,
\[
B(x) = \begin{cases}
 (1 - x^2)^2 & \mbox{for } |x| < 1 \\
  0          & \mbox{for } |x| \geq 1
\end{cases}
\]
The polynomial regression is then re-run using weight
$\delta_kw_k(x_i)$ at $(x_k, y_k)$.

The \texttt{loess()} function in gretl takes up to five
arguments as follows: the $y$ series, the $x$ series, the order $d$,
the bandwidth $q$, and a Boolean switch to turn on the robust
adjustment. The last three arguments are optional: if they are omitted
the default values are $d=1$, $q=0.5$ and no robust adjustment. An
example of a full call to \texttt{loess()} is shown below; in this
case a quadratic in $x$ is specified, three quarters of the data
points will be used in each local regression, and robustness is turned
on:
\begin{code}
series yh = loess(y, x, 2, 0.75, 1)
\end{code}

An illustration of loess is provided in Listing~\ref{scr:loess-sine}:
we generate a series that has a deterministic sine wave component
overlaid with noise uniformly distributed on $(-1,1)$. Loess is then
used to retrieve a good approximation to the sine function.  The
resulting graph is shown in Figure~\ref{fig:loess-sine}.

\begin{script}[htbp]
  \caption{Loess script}
  \label{scr:loess-sine}
\begin{scode}
nulldata 120
series x = index
scalar n = $nobs
series y = sin(2*pi*x/n) + uniform(-1, 1)
series yh = loess(y, x, 2, 0.75, 0)
gnuplot y yh x --output=display --with-lines=yh
\end{scode}
\end{script}

\begin{figure}[htbp]
  \centering
  \includegraphics[scale=0.75]{figures/loess-sine}
  \caption{Loess: retrieving a sine wave}
  \label{fig:loess-sine}
\end{figure}

\section{The Nadaraya--Watson estimator}
\label{sec:nadarwat}

The Nadaraya--Watson nonparametric estimator \citep{nadaraya64,
  watson64} is an estimator for the conditional mean of a variable
$Y$, available in a sample of size $n$, for a given value of a
conditioning variable $X$, and is defined as
\[
  m(X) = \frac{ \sum_{j=1}^{n} y_j \cdot K_h(X - x_j)} {\sum_{j=1}^{n} K_h(X - x_j)}
\]
where $K_h(\cdot)$ is the so-called \emph{kernel function}, which is
usually some simple transform of a density function that depends on a
scalar called the \emph{bandwidth}. The one gretl uses is given
by
\[
  K_h(x) = \exp\left(-\frac{x^2}{2h}\right)
\]
for $|x| < \tau$ and zero otherwise. The scalar $\tau$ is used to
prevent numerical problems when the kernel function is evaluated too
far away from zero and is called the trim parameter.

\begin{script}[htbp]
  \caption{Nadaraya--Watson example}
  \label{scr:nadarwat-ex}
\begin{scode}
# Nonparametric regression example: husband's age on wife's age
open mroz87.gdt

# initial value for the bandwidth
scalar h = $nobs^(-0.2)
# three increasingly smooth estimates
series m0 = nadarwat(HA, WA, h)
series m1 = nadarwat(HA, WA, h * 5)
series m2 = nadarwat(HA, WA, h * 10)

# produce the graph
dataset sortby WA
gnuplot HA m0 m1 m2 WA --output=display --with-lines=m0,m1,m2
\end{scode}
%$
\end{script}

\begin{figure}[htbp]
  \centering
  \includegraphics[scale=0.5]{figures/nadarwat-ex}
  \caption{Nadaraya--Watson example for several choices of the bandwidth parameter}
  \label{fig:nadarwat-ex}
\end{figure}

Listing~\ref{scr:nadarwat-ex} produces the graph shown in
Figure~\ref{fig:nadarwat-ex} (after some slight editing).

The choice of the bandwidth is up to the user: larger values of $h$
lead to a smoother $m(\cdot)$ function; smaller values make the
$m(\cdot)$ function follow the $y_i$ values more closely, so that the
function appears more ``jagged''. In fact, as $h \to \infty$, $m(x_i)
\to \bar{Y}$; on the contrary, if $h \to 0$, observations for which
$x_i \ne X$ are not taken into account at all when computing $m(X)$.

Also, the statistical properties of $m(\cdot)$ vary with $h$: its
variance can be shown to be decreasing in $h$, while its squared bias
is increasing in $h$.  It can be shown that choosing $h \sim n^{-1/5}$
minimizes the RMSE, so that value is customarily taken as a reference
point.

Note that the kernel function has its tails ``trimmed''. The scalar
$\tau$, which controls the level at which trimming occurs is set by
default at $4 \cdot h$; this setting, however, may be changed via the
\cmd{set} command. For example,
\begin{code}
  set nadarwat_trim 10
\end{code}
sets $\tau = 10 \cdot h$. This may at times produce more sensible
results in regions of $X$ with sparse support; however, you should be
aware that in those same cases machine precision (division by
numerical zero) may render your results spurious. The default is
relatively safe, but experimenting with larger values may be a sensible
strategy in some cases.

A common variant of the Nadaraya--Watson estimator is the so-called
``leave-one-out'' estimator: this is a variant of the estimator that
does not use the $i$-th observation for evaluating $m(x_i)$. This
makes the estimator more robust numerically and its usage is often
advised for inference purposes.  In formulae, the leave-one-out
estimator is
\[
m(x_i) = \frac{ \sum_{j \ne i} y_j \cdot K_h(x_i -
  x_j)} {\sum_{j \ne i} K_h(x_i - x_j)}
\]
In order to have gretl compute the leave-one-out estimator, just
reverse the sign of $h$: if we changed example \ref{scr:nadarwat-ex} by
substituting
\begin{code}
  scalar h = $nobs^(-0.2)
\end{code}
with
\begin{code}
  scalar h = -($nobs^(-0.2))
\end{code}
the rest of the example would have stayed unchanged, the only
difference being the usage of the leave-one-out estimator.

Although $X$ could be, in principle, any value, in the typical usage
of this estimator you want to compute $m(X)$ for $X$ equal to one or
more values actually observed in your sample, that is $m(x_i)$. If you
need a point estimate of $m(X)$ for some value of $X$ which is not
present among the valid observations of your dependent variable, you
may want to add some ``fake'' observations to your dataset in which
$y$ is missing and $x$ contains the values you want $m(x)$ evaluated
at. For example, the following script evaluates $m(x)$ at regular
intervals between -2.0 and 2.0:

\begin{code}
nulldata 120
set seed 120496

# first part of the sample: actual data
smpl 1 100
x = normal()
y = x^2 + sin(x) + normal()

# second part of the sample: fake x data
smpl 101 120
x = (obs-110) / 5

# compute the Nadaraya-Watson estimate
# with bandwidth equal to 0.4 (note that
# 100^(-0.2) = 0.398)
smpl full
m = nadarwat(y, x, 0.4)

# show m(x) for the fake x values only
smpl 101 120
print x m -o
\end{code}

and running it produces
\begin{code}
               x            m

101         -1.8     1.165934
102         -1.6     0.730221
103         -1.4     0.314705
104         -1.2     0.026057
105         -1.0    -0.131999
106         -0.8    -0.215445
107         -0.6    -0.269257
108         -0.4    -0.304451
109         -0.2    -0.306448
110          0.0    -0.238766
111          0.2    -0.038837
112          0.4     0.354660
113          0.6     0.908178
114          0.8     1.485178
115          1.0     2.000003
116          1.2     2.460100
117          1.4     2.905176
118          1.6     3.380874
119          1.8     3.927682
120          2.0     4.538364
\end{code}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: