1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481
|
\chapter{Graphs and plots}
\label{chap:graphs}
\section{Gnuplot graphs}
\label{gnuplot-graphs}
A separate program, \app{gnuplot}, is called to generate graphs.
Gnuplot is a very full-featured graphing program with myriad options.
It is available from \href{http://www.gnuplot.info/}{www.gnuplot.info}
(but note that a suitable copy of gnuplot is bundled with the packaged
versions of gretl for MS Windows and Mac OS X). Gretl gives you
direct access, via a graphical interface, to a subset of gnuplot's
options and it tries to choose sensible values for you; it also allows
you to take complete control over graph details if you wish.
With a graph displayed, you can click on the graph window for a pop-up
menu with the following options.
\begin{itemize}
\item \textsf{Save as PNG}: Save the graph in Portable Network
Graphics format (the same format that you see on screen).
\item \textsf{Save as postscript}: Save in encapsulated postscript
(EPS) format.
\item \textsf{Save as Windows metafile}: Save in Enhanced Metafile
(EMF) format.
\item \textsf{Save to session as icon}: The graph will appear in
iconic form when you select ``Icon view'' from the View menu.
\item \textsf{Zoom}: Lets you select an area within the graph for
closer inspection (not available for all graphs).
\item \textsf{Print}: (Current GTK or MS Windows only) lets you
print the graph directly.
\item \textsf{Copy to clipboard}: MS Windows only, lets you paste the
graph into Windows applications such as MS Word.
\item \textsf{Edit}: Opens a controller for the plot which lets you
adjust many aspects of its appearance.
\item \textsf{Close}: Closes the graph window.
\end{itemize}
\subsection{Displaying data labels}
\label{plot-labels}
For simple X-Y scatter plots, some further options are available if
the dataset includes ``case markers'' (that is, labels identifying
each observation).\footnote{For an example of such a dataset, see the
Ramanathan file \verb+data4-10+: this contains data on private
school enrollment for the 50 states of the USA plus Washington, DC;
the case markers are the two-letter codes for the states.} With a
scatter plot displayed, when you move the mouse pointer over a data
point its label is shown on the graph. By default these labels are
transient: they do not appear in the printed or copied version of the
graph. They can be removed by selecting ``Clear data labels'' from
the graph pop-up menu. If you want the labels to be affixed
permanently (so they will show up when the graph is printed or
copied), select the option ``Freeze data labels'' from the pop-up
menu; ``Clear data labels'' cancels this operation. The other
label-related option, ``All data labels'', requests that case markers
be shown for all observations. At present the display of case markers
is disabled for graphs containing more than 250 data points.
\subsection{GUI plot editor}
\label{plot-editor}
Selecting the \textsf{Edit} option in the graph popup menu opens
an editing dialog box, shown in Figure~\ref{fig-plot}. Notice that
there are several tabs, allowing you to adjust many aspects of
a graph's appearance: font, title, axis scaling, line colors
and types, and so on. You can also add lines or descriptive
labels to a graph (under the Lines and Labels tabs). The
``Apply'' button applies your changes without closing the
editor; ``OK'' applies the changes and closes the dialog.
\begin{figure}[htbp]
\begin{center}
\includegraphics[scale=0.6]{figures/plot_control}
\end{center}
\caption{gretl's gnuplot controller}
\label{fig-plot}
\end{figure}
\subsection{Publication-quality graphics: advanced options}
\label{plot-advanced}
The GUI plot editor has two limitations. First, it cannot represent
all the myriad options that \app{gnuplot} offers. Users who are
sufficiently familiar with \app{gnuplot} to know what they're missing
in the plot editor presumably don't need much help from gretl,
so long as they can get hold of the \app{gnuplot} command file that
gretl has put together. Second, even if the plot editor meets
your needs, in terms of fine-tuning the graph you see on screen, a few
details may need further work in order to get optimal results for
publication.
Either way, the first step in advanced tweaking of a graph is to get
access to the graph command file.
\begin{itemize}
\item In the graph display window, right-click and choose ``Save to
session as icon''.
\item If it's not already open, open the icon view window---either
via the menu item View/Icon view, or by clicking the ``session icon
view'' button on the main-window toolbar.
\item Right-click on the icon representing the newly added graph and
select ``Edit plot commands'' from the pop-up menu.
\item You get a window displaying the plot file
(Figure~\ref{fig:plot-edit}).
\end{itemize}
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.6]{figures/plotedit}
\caption{Plot commands editor}
\label{fig:plot-edit}
\end{figure}
Here are the basic things you can do in this window. Obviously, you
can edit the file you just opened. You can also send it for
processing by gnuplot, by clicking the ``Execute'' (cogwheel)
icon in the toolbar. Or you can use the ``Save as'' button to save
a copy for editing and processing as you wish.
Unless you're a gnuplot expert, most likely you'll only need to edit a
couple of lines at the top of the file, specifying a driver (plus
options) and an output file. We offer here a brief summary of some
points that may be useful.
First, \app{gnuplot}'s output mode is set via the command \texttt{set
term} followed by the name of a supported driver (``terminal'' in
gnuplot parlance) plus various possible options. (The top line in the
plot commands window shows the \texttt{set term} line that gretl
used to make a PNG file, commented out.) The graphic formats that are
most suitable for publication are PDF and EPS. These are supported by
the gnuplot \texttt{term} types \texttt{pdf}, \texttt{pdfcairo} and
\texttt{postscript} (with the \texttt{eps} option). The
\texttt{pdfcairo} driver has the virtue that is behaves in a very
similar manner to the PNG one, the output of which you see on screen.
This is provided by the version of gnuplot that is included in the
gretl packages for MS Windows and Mac OS X; if you're on Linux
it may or may be supported. If \texttt{pdfcairo} is not available,
the \texttt{pdf} terminal may be available; the \texttt{postscript}
terminal is almost certainly available.
Besides selecting a term type, if you want to get gnuplot to write the
actual output file you need to append a \texttt{set output} line
giving a filename. Here are a few examples of the first two lines you
might type in the window editing your plot commands. We'll make
these more ``realistic'' shortly.
%
\begin{code}
set term pdfcairo
set output 'mygraph.pdf'
set term pdf
set output 'mygraph.pdf'
set term postscript eps
set output 'mygraph.eps'
\end{code}
There are a couple of things worth remarking here. First, you may
want to adjust the size of the graph, and second you may want to
change the font. The default sizes produced by the above drivers are
5 inches by 3 inches for \texttt{pdfcairo} and \texttt{pdf}, and 5
inches by 3.5 inches for \texttt{postscript eps}. In each case
you can change this by giving a size specification, which takes the
form \texttt{XX,YY} (examples below).
You may ask, why bother changing the size in the gnuplot command file?
After all, PDF and EPS are both vector formats, so the graphs can be
scaled at will. True, but a uniform scaling will also affect the font
size, which may end looking wrong. You can get optimal results by
experimenting with the \texttt{font} and \texttt{size} options to
\app{gnuplot}'s \texttt{set term} command. Here are some examples
(comments follow below).
%
\begin{code}
# pdfcairo, regular size, slightly amended
set term pdfcairo font "Sans,6" size 5in,3.5in
# or small size
set term pdfcairo font "Sans,5" size 3in,2in
# pdf, regular size, slightly amended
set term pdf font "Helvetica,8" size 5in,3.5in
# or small
set term pdf font "Helvetica,6" size 3in,2in
# postscript, regular
set term post eps solid font "Helvetica,16"
# or small
set term post eps solid font "Helvetica,12" size 3in,2in
\end{code}
On the first line we set a sans serif font for \texttt{pdfcairo} at a
suitable size for a 5 $\times$ 3.5 inch plot (which you may find looks
better than the rather ``letterboxy'' default of 5 $\times$ 3). And
on the second we illustrate what you might do to get a smaller 3
$\times$ 2 inch plot. You can specify the plot size in centimeters
if you prefer, as in
\begin{code}
set term pdfcairo font "Sans,6" size 6cm,4cm
\end{code}
We then repeat the exercise for the \texttt{pdf} terminal. Notice
that here we're specifying one of the 35 standard PostScript fonts,
namely Helvetica. Unlike \texttt{pdfcairo}, the plain \texttt{pdf}
driver is unlikely to be able to find fonts other than these.
In the third pair of lines we illustrate options for the
\texttt{postscript} driver (which, as you see, can be abbreviated as
\texttt{post}). Note that here we have added the option
\texttt{solid}. Unlike most other drivers, this one uses dashed lines
unless you specify the \texttt{solid} option. Also note that we've
(apparently) specified a much larger font in this case. That's
because the \texttt{eps} option in effect tells the
\texttt{postscript} driver to work at half-size (among other things),
so we need to double the font size.
Table~\ref{tab:drivers} summarizes the basics for the three drivers we
have mentioned.
\begin{table}[htbp]
\centering
\begin{tabular}{lcc}
Terminal & default size (inches) & suggested font \\ [6pt]
\texttt{pdfcairo} & 5 $\times$ 3 & Sans,6 \\
\texttt{pdf} & 5 $\times$ 3 & Helvetica,8 \\
\texttt{post eps} & 5 $\times$ 3.5 & Helvetica,16 \\
\end{tabular}
\caption{Drivers for publication-quality graphics}
\label{tab:drivers}
\end{table}
To find out more about \app{gnuplot} visit
\href{http://www.gnuplot.info/}{www.gnuplot.info}. This site has
documentation for the current version of the program in various
formats.
\subsection{Additional tips}
\label{subsect-graph-tips}
To be written. Line widths, enhanced text. Show a ``before and
after'' example.
\section{Plotting graphs from scripts}
\label{sec:plotenv}
When working with scripts, you may want to have a graph shown onto
your display or saved into a file. In fact, if in your usual workflow
you find yourself creating similar graphs over and over again, you
might want to consider the option of writing a script which automates
this process for you. \app{Gretl} gives you two main tools for doing
this: one is a command called \cmd{gnuplot}, whose main use is to
create standard plot quickly. The other one is the \cmd{plot} command
block, which has a more elaborate syntax but offers you more control
on output.
\subsection{The \cmd{gnuplot} command}
\label{sec:gnuplot-cmd}
The \cmd{gnuplot} command is described at length in the \GCR\ and the
online help system. Here, we just summarize its main features:
basically, it consists of the \cmd{gnuplot} keyword, followed by a
list of items, telling the command \emph{what} you want plotted and a
list of options, telling it \emph{how} you want it plotted.
For example, the line
\begin{code}
gnuplot y1 y2 x
\end{code}
will give you a basic XY plot of the two series \texttt{y1} and
\texttt{y2} on the vertical axis versus the series \texttt{x} on the
horizontal axis. In general, the arguments to the \cmd{gnuplot}
command is a list of series, the last of which goes on the x-axis,
while all the other ones go onto the y-axis. By default, the
\cmd{gnuplot} command gives you a scatterplot. If you just have one
variable on the y-axis, then \app{gretl} will also draw a the OLS
interpolation, if the fit is good enough.\footnote{The technical
condition for this is that the two-tailed $p$-value for the slope
coefficient should be under 10\%.}
Several aspects of the behavior described above can be modified. You
do this by appending options to the command. Most options can be
broadly grouped in three categories:
\begin{enumerate}
\item Plot styles: we support points (the default choice), lines,
lines and points together, and impulses (vertical lines).
\item Algorithm for the fitted line: here you can choose between
linear, quadratic and cubic interpolation, but also more exotic
choices, such as semi-log, inverse or loess (non-parametric). Of
course, you can also turn this feature off.
\item Input and output: you can choose whether you want your graph on
your computer screen (and possibly use the in-built graphical widget
to further customize it --- see above, page \pageref{plot-editor}),
or rather save it to a file. We support several graphical formats,
among which PNG and PDF, to make it easy to incorporate your
plots into text documents.
\end{enumerate}
The following script uses the AWM dataset to exemplify some
traditional plots in macroeconomics:
\begin{scode}
open AWM.gdt --quiet
# --- consumption and income, different styles ------------
gnuplot PCR YER
gnuplot PCR YER --output=display
gnuplot PCR YER --output=display --time-series
gnuplot PCR YER --output=display --time-series --with-lines
# --- Phillips' curve, different fitted lines -------------
gnuplot INFQ URX --output=display
gnuplot INFQ URX --suppress-fitted --output=display
gnuplot INFQ URX --inverse-fit --output=display
gnuplot INFQ URX --loess-fit --output=display
\end{scode}
FIXME: comment on the above
For more detail, consult the \GCR.
\subsection{The \cmd{plot} command block}
\label{sec:plotblock}
The \cmd{plot} environment is a way to pass information to
\app{Gnuplot} in a more structured way, so that customization of basic
plots becomes easier. It has the following characteristics:
The block starts with the \cmd{plot} keyword, followed by a required
parameter: the name of a list, a single series or a matrix. This
parameter specifies the data to be plotted. The starting line may be
prefixed with the \verb|savename <-| apparatus to save a plot as an icon
in the GUI program. The block ends with \cmd{end plot}.
Inside the block you have zero or more lines of these types, identified
by an initial keyword:
\begin{description}
\item[\normalfont \texttt{option}:] specify a single option (details below)
\item[\normalfont \texttt{options}:] specify multiple options on a single line; if
more than one option is given on a line, the options should be
separated by spaces.
\item[\normalfont \texttt{literal}:] a command to be passed to gnuplot literally
\item[\normalfont \texttt{printf}:] a printf statement whose result will be passed
to gnuplot literally; this allows the use of string variables
without having to resort to \verb!@!-style string substitution.
\end{description}
The options available are basically those of the current \cmd{gnuplot}
command, but with a few differences. For one thing you don't need the
leading double-dash in an "option" (or "options") line. Besides that,
\begin{itemize}
\item You can't use the option \option{matrix=whatever} with \cmd{plot}:
that possibility is handled by providing the name of a matrix on the
initial \cmd{plot} line.
\item The \option{input=filename} option is not supported: use
\cmd{gnuplot} for the case where you're supplying the entire plot
specification yourself.
\item The several options pertaining to the presence and type of a
fitted line, are replaced in \cmd{plot} by a single option \cmd{fit} which
requires a parameter. Supported values for the parameter are: none,
linear, quadratic, cubic, inverse, semilog and loess. Example:
\begin{code}
option fit=quadratic
\end{code}
\end{itemize}
As with \cmd{gnuplot}, the default is to show a linear fit in an X-Y
scatter if it's significant at the 10 percent level.
Here's a simple example, the plot specification from the ``bandplot''
package, which shows how to achieve the same result via the
\cmd{gnuplot} command and a \cmd{plot} block, respectively---the
latter occupies a few more lines but is clearer
\begin{code}
gnuplot 1 2 3 4 --with-lines --matrix=plotmat \
--suppress-fitted --output=display \
{ set linetype 3 lc rgb "#0000ff"; set title "@title"; \
set nokey; set xlabel "@xname"; }
\end{code}
\begin{code}
plot plotmat
options with-lines fit=none
literal set linetype 3 lc rgb "#0000ff"
literal set nokey
printf "set title \"%s\"", title
printf "set xlabel \"%s\"", xname
end plot --output=display
\end{code}
Note that \option{output=display} is appended to \cmd{end plot}; also
note that if you give a matrix to \cmd{plot} it's assumed you want to
plot all the columns. In addition, if you give a single series and the
dataset is time series, it's assumed you want a time-series plot.
FIXME: provide an example with real data.
\section{Boxplots}
\label{sect-boxplots}
\begin{figure}[htbp]
\begin{center}
\includegraphics{figures/boxplot_sample}
\end{center}
\caption{Sample boxplot}
\label{fig-boxplot}
\end{figure}
These plots (after Tukey and Chambers) display the distribution of a
variable. The central box encloses the middle 50 percent of the data,
i.e.\ it is bounded by the first and third quartiles. The
``whiskers'' extend to from each end of the box for a range equal to
1.5 times the interquartile range. Observations outside that range are
considered outliers and represented via dots.\footnote{To give you an
intuitive idea, if a variable is normally distributed, the chances
of picking an outlier by this definition are slightly below 0.7\%.}
A line is drawn across the box at the median and a ``\texttt{+}'' sign
identifies the mean---see Figure~\ref{fig-boxplot}.
In the case of boxplots with confidence intervals, dotted lines show
the limits of an approximate 90 percent confidence interval for the
median. This is obtained by the bootstrap method, which can take a
while if the data series is very long. For details on constructing
boxplots, see the entry for \cmd{boxplot} in the \GCR\, or use the
\textsf{Help} button that appears when you select one of the boxplot
items under the menu item ``View, Graph specified vars'' in the main
gretl window.
\subsection{Factorized boxplots}
A nice feature which is quite useful for data visualization is the
conditional, or factorized boxplot. This type of plot allows you to
examine the distribution of a variable conditional on the value of
some discrete factor.
As an example, we'll use one of the datasets supplied with
\app{gretl}, that is \cmd{rac3d}, which contains an example taken from
\cite{cameron-trivedi13} on the health conditions of 5190 people. The
script below compares the unconditional (marginal) distribution of the
number of illnesses in the past 2 weeks with the distribution of the
same variable, conditional on age classes.
\begin{scode}
open rac3d.gdt
# unconditional boxplot
boxplot ILLNESS --output=display
# create a discrete variable for age class:
# 0 = below 20, 1 = between 20 and 39, etc
series age_class = floor(AGE/0.2)
# conditional boxplot
boxplot ILLNESS age_class --factorized --output=display
\end{scode}
After running the code above, you should see two graphs similar to
Figure \ref{fig:fact-boxplots}. By comparing the marginal plot to
the factorized one, the effect of age on the mean number of illnesses
is quite evident: by joining the green crosses you get what is
technically known as the conditional mean function, or regression
function if you prefer.
\begin{figure}[htbp]
\centering
\begin{tabular}{cc}
\includegraphics[width=0.475\textwidth]{figures/uboxplot} &
\includegraphics[width=0.475\textwidth]{figures/fboxplot}
\end{tabular}
\caption{Conditional and unconditional distribution of illnesses}
\label{fig:fact-boxplots}
\end{figure}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End:
|