File: graphs.tex

package info (click to toggle)
gretl 2016d-1
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 48,620 kB
  • ctags: 22,779
  • sloc: ansic: 345,830; sh: 4,648; makefile: 2,712; xml: 570; perl: 364
file content (481 lines) | stat: -rw-r--r-- 20,058 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
\chapter{Graphs and plots}
\label{chap:graphs}

\section{Gnuplot graphs}
\label{gnuplot-graphs}

A separate program, \app{gnuplot}, is called to generate graphs.
Gnuplot is a very full-featured graphing program with myriad options.
It is available from \href{http://www.gnuplot.info/}{www.gnuplot.info}
(but note that a suitable copy of gnuplot is bundled with the packaged
versions of gretl for MS Windows and Mac OS X).  Gretl gives you
direct access, via a graphical interface, to a subset of gnuplot's
options and it tries to choose sensible values for you; it also allows
you to take complete control over graph details if you wish.

With a graph displayed, you can click on the graph window for a pop-up
menu with the following options.

\begin{itemize}
\item \textsf{Save as PNG}: Save the graph in Portable Network
  Graphics format (the same format that you see on screen).
\item \textsf{Save as postscript}: Save in encapsulated postscript
  (EPS) format.
\item \textsf{Save as Windows metafile}: Save in Enhanced Metafile
  (EMF) format.
\item \textsf{Save to session as icon}: The graph will appear in
  iconic form when you select ``Icon view'' from the View menu.
\item \textsf{Zoom}: Lets you select an area within the graph for
  closer inspection (not available for all graphs).
\item \textsf{Print}: (Current GTK or MS Windows only) lets you
  print the graph directly.
\item \textsf{Copy to clipboard}: MS Windows only, lets you paste the
  graph into Windows applications such as MS Word.
\item \textsf{Edit}: Opens a controller for the plot which lets you
  adjust many aspects of its appearance.
\item \textsf{Close}: Closes the graph window.
\end{itemize}


\subsection{Displaying data labels}
\label{plot-labels}

For simple X-Y scatter plots, some further options are available if
the dataset includes ``case markers'' (that is, labels identifying
each observation).\footnote{For an example of such a dataset, see the
  Ramanathan file \verb+data4-10+: this contains data on private
  school enrollment for the 50 states of the USA plus Washington, DC;
  the case markers are the two-letter codes for the states.} With a
scatter plot displayed, when you move the mouse pointer over a data
point its label is shown on the graph.  By default these labels are
transient: they do not appear in the printed or copied version of the
graph.  They can be removed by selecting ``Clear data labels'' from
the graph pop-up menu. If you want the labels to be affixed
permanently (so they will show up when the graph is printed or
copied), select the option ``Freeze data labels'' from the pop-up
menu; ``Clear data labels'' cancels this operation.  The other
label-related option, ``All data labels'', requests that case markers
be shown for all observations.  At present the display of case markers
is disabled for graphs containing more than 250 data points.


\subsection{GUI plot editor}
\label{plot-editor}

Selecting the \textsf{Edit} option in the graph popup menu opens
an editing dialog box, shown in Figure~\ref{fig-plot}.  Notice that
there are several tabs, allowing you to adjust many aspects of
a graph's appearance: font, title, axis scaling, line colors
and types, and so on.  You can also add lines or descriptive
labels to a graph (under the Lines and Labels tabs).  The
``Apply'' button applies your changes without closing the
editor; ``OK'' applies the changes and closes the dialog.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics[scale=0.6]{figures/plot_control}
  \end{center}
  \caption{gretl's gnuplot controller}
  \label{fig-plot}
\end{figure}


\subsection{Publication-quality graphics: advanced options}
\label{plot-advanced}

The GUI plot editor has two limitations.  First, it cannot represent
all the myriad options that \app{gnuplot} offers. Users who are
sufficiently familiar with \app{gnuplot} to know what they're missing
in the plot editor presumably don't need much help from gretl,
so long as they can get hold of the \app{gnuplot} command file that
gretl has put together.  Second, even if the plot editor meets
your needs, in terms of fine-tuning the graph you see on screen, a few
details may need further work in order to get optimal results for
publication.

Either way, the first step in advanced tweaking of a graph is to get
access to the graph command file.

\begin{itemize}
\item In the graph display window, right-click and choose ``Save to
  session as icon''.
\item If it's not already open, open the icon view window---either
  via the menu item View/Icon view, or by clicking the ``session icon
  view'' button on the main-window toolbar.
\item Right-click on the icon representing the newly added graph and
  select ``Edit plot commands'' from the pop-up menu.
\item You get a window displaying the plot file
  (Figure~\ref{fig:plot-edit}).
\end{itemize}

\begin{figure}[htbp]
  \centering
  \includegraphics[scale=0.6]{figures/plotedit}
  \caption{Plot commands editor}
  \label{fig:plot-edit}
\end{figure}

Here are the basic things you can do in this window.  Obviously, you
can edit the file you just opened.  You can also send it for
processing by gnuplot, by clicking the ``Execute'' (cogwheel)
icon in the toolbar.  Or you can use the ``Save as'' button to save
a copy for editing and processing as you wish.

Unless you're a gnuplot expert, most likely you'll only need to edit a
couple of lines at the top of the file, specifying a driver (plus
options) and an output file.  We offer here a brief summary of some
points that may be useful.

First, \app{gnuplot}'s output mode is set via the command \texttt{set
  term} followed by the name of a supported driver (``terminal'' in
gnuplot parlance) plus various possible options.  (The top line in the
plot commands window shows the \texttt{set term} line that gretl
used to make a PNG file, commented out.)  The graphic formats that are
most suitable for publication are PDF and EPS.  These are supported by
the gnuplot \texttt{term} types \texttt{pdf}, \texttt{pdfcairo} and
\texttt{postscript} (with the \texttt{eps} option).  The
\texttt{pdfcairo} driver has the virtue that is behaves in a very
similar manner to the PNG one, the output of which you see on screen.
This is provided by the version of gnuplot that is included in the
gretl packages for MS Windows and Mac OS X; if you're on Linux
it may or may be supported.  If \texttt{pdfcairo} is not available,
the \texttt{pdf} terminal may be available; the \texttt{postscript}
terminal is almost certainly available.

Besides selecting a term type, if you want to get gnuplot to write the
actual output file you need to append a \texttt{set output} line
giving a filename.  Here are a few examples of the first two lines you
might type in the window editing your plot commands.  We'll make
these more ``realistic'' shortly.
%
\begin{code}
set term pdfcairo
set output 'mygraph.pdf'

set term pdf
set output 'mygraph.pdf'

set term postscript eps
set output 'mygraph.eps'
\end{code}

There are a couple of things worth remarking here.  First, you may
want to adjust the size of the graph, and second you may want to
change the font.  The default sizes produced by the above drivers are
5 inches by 3 inches for \texttt{pdfcairo} and \texttt{pdf}, and 5
inches by 3.5 inches for \texttt{postscript eps}.  In each case
you can change this by giving a size specification, which takes the
form \texttt{XX,YY} (examples below).  

You may ask, why bother changing the size in the gnuplot command file?
After all, PDF and EPS are both vector formats, so the graphs can be
scaled at will.  True, but a uniform scaling will also affect the font
size, which may end looking wrong.  You can get optimal results by
experimenting with the \texttt{font} and \texttt{size} options to
\app{gnuplot}'s \texttt{set term} command.  Here are some examples
(comments follow below).
%
\begin{code}
# pdfcairo, regular size, slightly amended
set term pdfcairo font "Sans,6" size 5in,3.5in
# or small size
set term pdfcairo font "Sans,5" size 3in,2in

# pdf, regular size, slightly amended
set term pdf font "Helvetica,8" size 5in,3.5in
# or small
set term pdf font "Helvetica,6" size 3in,2in

# postscript, regular 
set term post eps solid font "Helvetica,16"
# or small
set term post eps solid font "Helvetica,12" size 3in,2in
\end{code}

On the first line we set a sans serif font for \texttt{pdfcairo} at a
suitable size for a 5 $\times$ 3.5 inch plot (which you may find looks
better than the rather ``letterboxy'' default of 5 $\times$ 3).  And
on the second we illustrate what you might do to get a smaller 3
$\times$ 2 inch plot. You can specify the plot size in centimeters
if you prefer, as in
\begin{code}
set term pdfcairo font "Sans,6" size 6cm,4cm
\end{code}

We then repeat the exercise for the \texttt{pdf} terminal.  Notice
that here we're specifying one of the 35 standard PostScript fonts,
namely Helvetica.  Unlike \texttt{pdfcairo}, the plain \texttt{pdf}
driver is unlikely to be able to find fonts other than these.

In the third pair of lines we illustrate options for the
\texttt{postscript} driver (which, as you see, can be abbreviated as
\texttt{post}).  Note that here we have added the option
\texttt{solid}.  Unlike most other drivers, this one uses dashed lines
unless you specify the \texttt{solid} option.  Also note that we've
(apparently) specified a much larger font in this case.  That's
because the \texttt{eps} option in effect tells the
\texttt{postscript} driver to work at half-size (among other things),
so we need to double the font size.

Table~\ref{tab:drivers} summarizes the basics for the three drivers we
have mentioned.

\begin{table}[htbp]
  \centering
  \begin{tabular}{lcc}
    Terminal & default size (inches) & suggested font \\ [6pt]
    \texttt{pdfcairo} & 5 $\times$ 3 &   Sans,6 \\
    \texttt{pdf}      & 5 $\times$ 3 &   Helvetica,8 \\
    \texttt{post eps} & 5 $\times$ 3.5 & Helvetica,16 \\
  \end{tabular}
  \caption{Drivers for publication-quality graphics}
  \label{tab:drivers}
\end{table}

To find out more about \app{gnuplot} visit
\href{http://www.gnuplot.info/}{www.gnuplot.info}. This site has
documentation for the current version of the program in various
formats.

\subsection{Additional tips}
\label{subsect-graph-tips}

To be written.  Line widths, enhanced text.  Show a ``before and
after'' example.  

\section{Plotting graphs from scripts}
\label{sec:plotenv}

When working with scripts, you may want to have a graph shown onto
your display or saved into a file. In fact, if in your usual workflow
you find yourself creating similar graphs over and over again, you
might want to consider the option of writing a script which automates
this process for you. \app{Gretl} gives you two main tools for doing
this: one is a command called \cmd{gnuplot}, whose main use is to
create standard plot quickly. The other one is the \cmd{plot} command
block, which has a more elaborate syntax but offers you more control
on output.

\subsection{The \cmd{gnuplot} command}
\label{sec:gnuplot-cmd}

The \cmd{gnuplot} command is described at length in the \GCR\ and the
online help system. Here, we just summarize its main features:
basically, it consists of the \cmd{gnuplot} keyword, followed by a
list of items, telling the command \emph{what} you want plotted and a
list of options, telling it \emph{how} you want it plotted.

For example, the line
\begin{code}
gnuplot y1 y2 x   
\end{code}
will give you a basic XY plot of the two series \texttt{y1} and
\texttt{y2} on the vertical axis versus the series \texttt{x} on the
horizontal axis. In general, the arguments to the \cmd{gnuplot}
command is a list of series, the last of which goes on the x-axis,
while all the other ones go onto the y-axis. By default, the
\cmd{gnuplot} command gives you a scatterplot. If you just have one
variable on the y-axis, then \app{gretl} will also draw a the OLS
interpolation, if the fit is good enough.\footnote{The technical
  condition for this is that the two-tailed $p$-value for the slope
  coefficient should be under 10\%.}

Several aspects of the behavior described above can be modified. You
do this by appending options to the command. Most options can be
broadly grouped in three categories:
\begin{enumerate}
\item Plot styles: we support points (the default choice), lines,
  lines and points together, and impulses (vertical lines). 
\item Algorithm for the fitted line: here you can choose between
  linear, quadratic and cubic interpolation, but also more exotic
  choices, such as semi-log, inverse or loess (non-parametric). Of
  course, you can also turn this feature off.
\item Input and output: you can choose whether you want your graph on
  your computer screen (and possibly use the in-built graphical widget
  to further customize it --- see above, page \pageref{plot-editor}),
  or rather save it to a file. We support several graphical formats,
  among which PNG and PDF, to make it easy to incorporate your
  plots into text documents.
\end{enumerate}

The following script uses the AWM dataset to exemplify some
traditional plots in macroeconomics:

\begin{scode}
open AWM.gdt --quiet

# --- consumption and income, different styles ------------

gnuplot PCR YER
gnuplot PCR YER --output=display
gnuplot PCR YER --output=display --time-series
gnuplot PCR YER --output=display --time-series --with-lines

# --- Phillips' curve, different fitted lines -------------

gnuplot INFQ URX --output=display
gnuplot INFQ URX --suppress-fitted --output=display
gnuplot INFQ URX --inverse-fit --output=display
gnuplot INFQ URX --loess-fit --output=display
\end{scode}

FIXME: comment on the above

For more detail, consult the \GCR.


\subsection{The \cmd{plot} command block}
\label{sec:plotblock}

The \cmd{plot} environment is a way to pass information to
\app{Gnuplot} in a more structured way, so that customization of basic
plots becomes easier. It has the following characteristics:

The block starts with the \cmd{plot} keyword, followed by a required
parameter: the name of a list, a single series or a matrix. This
parameter specifies the data to be plotted. The starting line may be
prefixed with the \verb|savename <-| apparatus to save a plot as an icon
in the GUI program. The block ends with \cmd{end plot}.

Inside the block you have zero or more lines of these types, identified 
by an initial keyword:
\begin{description}
\item[\normalfont \texttt{option}:] specify a single option (details below)
\item[\normalfont \texttt{options}:] specify multiple options on a single line; if
  more than one option is given on a line, the options should be
  separated by spaces.
\item[\normalfont \texttt{literal}:] a command to be passed to gnuplot literally 
\item[\normalfont \texttt{printf}:] a printf statement whose result will be passed
  to gnuplot literally; this allows the use of string variables
  without having to resort to \verb!@!-style string substitution.
\end{description}

The options available are basically those of the current \cmd{gnuplot} 
command, but with a few differences. For one thing you don't need the 
leading double-dash in an "option" (or "options") line. Besides that,
\begin{itemize}
\item You can't use the option \option{matrix=whatever} with \cmd{plot}:
  that possibility is handled by providing the name of a matrix on the
  initial \cmd{plot} line.
\item The \option{input=filename} option is not supported: use
  \cmd{gnuplot} for the case where you're supplying the entire plot
  specification yourself.
\item The several options pertaining to the presence and type of a
  fitted line, are replaced in \cmd{plot} by a single option \cmd{fit} which
  requires a parameter. Supported values for the parameter are: none,
  linear, quadratic, cubic, inverse, semilog and loess. Example:
\begin{code}
  option fit=quadratic
\end{code}
\end{itemize}

As with \cmd{gnuplot}, the default is to show a linear fit in an X-Y
scatter if it's significant at the 10 percent level.

Here's a simple example, the plot specification from the ``bandplot''
package, which shows how to achieve the same result via the
\cmd{gnuplot} command and a \cmd{plot} block, respectively---the
latter occupies a few more lines but is clearer

\begin{code}
   gnuplot 1 2 3 4 --with-lines --matrix=plotmat \
   --suppress-fitted --output=display \
   { set linetype 3 lc rgb "#0000ff"; set title "@title"; \
     set nokey; set xlabel "@xname"; }
\end{code}

\begin{code}
   plot plotmat
     options with-lines fit=none
     literal set linetype 3 lc rgb "#0000ff"
     literal set nokey
     printf "set title \"%s\"", title
     printf "set xlabel \"%s\"", xname
   end plot --output=display
\end{code}

Note that \option{output=display} is appended to \cmd{end plot}; also
note that if you give a matrix to \cmd{plot} it's assumed you want to
plot all the columns. In addition, if you give a single series and the
dataset is time series, it's assumed you want a time-series plot.

FIXME: provide an example with real data.

\section{Boxplots}
\label{sect-boxplots}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{figures/boxplot_sample}
  \end{center}
  \caption{Sample boxplot}
  \label{fig-boxplot}
\end{figure}

These plots (after Tukey and Chambers) display the distribution of a
variable. The central box encloses the middle 50 percent of the data,
i.e.\ it is bounded by the first and third quartiles.  The
``whiskers'' extend to from each end of the box for a range equal to
1.5 times the interquartile range. Observations outside that range are
considered outliers and represented via dots.\footnote{To give you an
  intuitive idea, if a variable is normally distributed, the chances
  of picking an outlier by this definition are slightly below 0.7\%.}
A line is drawn across the box at the median and a ``\texttt{+}'' sign
identifies the mean---see Figure~\ref{fig-boxplot}.

In the case of boxplots with confidence intervals, dotted lines show
the limits of an approximate 90 percent confidence interval for the
median.  This is obtained by the bootstrap method, which can take a
while if the data series is very long. For details on constructing
boxplots, see the entry for \cmd{boxplot} in the \GCR\, or use the
\textsf{Help} button that appears when you select one of the boxplot
items under the menu item ``View, Graph specified vars'' in the main
gretl window.

\subsection{Factorized boxplots}

A nice feature which is quite useful for data visualization is the
conditional, or factorized boxplot.  This type of plot allows you to
examine the distribution of a variable conditional on the value of
some discrete factor.

As an example, we'll use one of the datasets supplied with
\app{gretl}, that is \cmd{rac3d}, which contains an example taken from
\cite{cameron-trivedi13} on the health conditions of 5190 people. The
script below compares the unconditional (marginal) distribution of the
number of illnesses in the past 2 weeks with the distribution of the
same variable, conditional on age classes.

\begin{scode}
open rac3d.gdt
# unconditional boxplot
boxplot ILLNESS --output=display
# create a discrete variable for age class: 
# 0 = below 20, 1 = between 20 and 39, etc
series age_class = floor(AGE/0.2)
# conditional boxplot
boxplot ILLNESS age_class --factorized --output=display
\end{scode}

After running the code above, you should see two graphs similar to
Figure \ref{fig:fact-boxplots}. By comparing the marginal plot to
the factorized one, the effect of age on the mean number of illnesses
is quite evident: by joining the green crosses you get what is
technically known as the conditional mean function, or regression
function if you prefer.

\begin{figure}[htbp]
  \centering
  \begin{tabular}{cc}
    \includegraphics[width=0.475\textwidth]{figures/uboxplot} & 
    \includegraphics[width=0.475\textwidth]{figures/fboxplot}
  \end{tabular}
  \caption{Conditional and unconditional distribution of illnesses}
  \label{fig:fact-boxplots}
\end{figure}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: