File: data.tex

package info (click to toggle)
pyxplot 0.9.2-14
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 10,288 kB
  • sloc: ansic: 50,373; xml: 1,339; python: 570; sh: 318; makefile: 89
file content (911 lines) | stat: -rw-r--r-- 41,083 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
% data.tex
%
% The documentation in this file is part of Pyxplot
% <http://www.pyxplot.org.uk>
%
% Copyright (C) 2006-2012 Dominic Ford <coders@pyxplot.org.uk>
%               2008-2012 Ross Church
%
% $Id: data.tex 1277 2012-07-27 23:04:20Z dcf21 $
%
% Pyxplot is free software; you can redistribute it and/or modify it under the
% terms of the GNU General Public License as published by the Free Software
% Foundation; either version 2 of the License, or (at your option) any later
% version.
%
% You should have received a copy of the GNU General Public License along with
% Pyxplot; if not, write to the Free Software Foundation, Inc., 51 Franklin
% Street, Fifth Floor, Boston, MA  02110-1301, USA

% ----------------------------------------------------------------------------

% LaTeX source for the Pyxplot Users' Guide

\chapter{Working with data}
\label{ch:numerics}

This chapter returns to Pyxplot's commands for acting on data stored in files.
Chapter~\ref{ch:first_steps} has already introduced the {\tt plot} command,
which draws graphs, but there are also commands for tabulating data to new
\datafile s, for computing histograms, for interpolating data, and for taking
Fourier transforms.

Section~\ref{sec:plot_datafiles} has already introduced the options which can
be used to select data from particular columns or which satisfy particular
criteria: {\tt using}, {\tt index}, {\tt every} and {\tt select}.  These
options are universal to all of Pyxplot's commands which operate on data sets.
In all cases, data sets can be read from files, sampled from functions, or
specified as a colon-separated list of vectors (see
Section~\ref{sec:vectorplot}).

This chapter begins by describing other common features of these commands,
before moving on to describe each command in turn. It leaves the details of the
{\tt plot} command, which was introduced in Chapter~\ref{ch:first_steps}, to be
described in full detail in Chapter~\ref{ch:plotting}.

\section{Input filters}
\label{sec:filters}

By default, Pyxplot expects \datafile s to be in a simple plaintext format
which is described in Section~\ref{sec:plot_datafiles}. However, input filters
provide a mechanism by which \datafile s in arbitrary formats can be read.

An input filter is specified to act on all \datafile s that match some
filename pattern. For example, a filter could be defined to act on all
\datafile s called {\tt *.txt} or {\tt *.dat}. The filter itself takes the form
of a program which is launched by Pyxplot whenever a matching \datafile\ is
read.  The program is passed the filename of the \datafile\ as a command line
argument immediately following any arguments specified in the filter's
definition. It is then expected to return the data contained in the file to
Pyxplot in plaintext format using its {\tt stdout} stream. Any errors which
such a program returns to {\tt stderr} are passed to the user as error
messages.

Pyxplot has five input filters built-in, as the {\tt show filters} command
reveals:

\begin{verbatim}
set filter "*.fits"   "/usr/local/lib/pyxplot/pyxplot_fitshelper"
set filter "*.gz"     "/bin/gunzip -c"
set filter "*.log"    "/usr/local/lib/pyxplot/pyxplot_timehelper"
set filter "ftp://*"  "/usr/bin/wget -O -"
set filter "http://*" "/usr/bin/wget -O -"
\end{verbatim}

The above set of filters allow Pyxplot to read data from gzipped plaintext
\datafile s, from \datafile s available over the web via HTTP\index{HTTP} or
FTP\index{FTP}, and data tables in FITS format\index{FITS format}.  A filter is
also provided for converting textual dates in log files into numbers
representing days, months and years. To add to this list of filters, it is
necessary to write a short program or shell script; the simple filters provided
in Pyxplot's source code for {\tt .log} and {.fits} files may provide a useful
model.

The filter can then be installed using the syntax

\begin{verbatim}
set filter <filenameWildcard> <shellCommand>
\end{verbatim}


\section{Reading data from a pipe}

Pyxplot usually reads data from files, or samples it from functions. However,
it is also possible to read data {\it piped}\index{pipes} into it from other
processes if these are directed to Pyxplot's standard input stream, {\tt
stdin}\index{stdin}.  To do this, the magic filename {\tt -} is used:

\begin{verbatim}
plot '-' with lines
\end{verbatim}

This makes it possible to call Pyxplot from within another program and pass it
data to plot without ever storing the data on disk. Whilst this facility has
great power, it should be used with caution.

It is often very tempting to write programs which both perform calculations and
plot the results immediately, but this can make it difficult to replot graphs
later. A few months after the event, when the need arises to replot the same
data in a different form or in a different style, remembering how to use a
sizable program can prove tricky -- especially if the person struggling to do
so is not you! But a simple \datafile\ is quite straightforward to plot time
and again.

\section{Including data within command scripts}

It is also possible to embed data directly within Pyxplot scripts, which may be
useful when a small number of markers are wanted at particular pre-defined
positions on a graph, or when it is desirable to roll a Pyxplot script and the
data it takes into a single file for easy storage or transmission. To do this,
one uses the magic filename {\tt \--\--} and terminates the data with the string
{\tt END}:

\begin{verbatim}
plot '--' with lines
0 0
1 1
2 0
3 1
END
print "More Pyxplot commands can be placed after END"
\end{verbatim}

\section{Special comment lines in \datafile s}
\label{sec:special_comments}
\index{comment lines!in datafiles}

Whilst most comment lines in \datafile s -- those lines which begin with a hash
character -- are ignored by Pyxplot, lines which begin with any of the
following four strings are parsed:
\begin{verbatim}
# Columns:
# ColumnUnits:

# Rows:
# RowUnits:
\end{verbatim}
The first pair of special comments affect the behaviour of Pyxplot when
plotting {\tt using columns}, while the second pair affect the behaviour of
Pyxplot when plotting {\tt using rows} (see
Section~\ref{sec:horizontal_datafiles}). Within each pair, the first may be
used to tell Pyxplot the names of each of the columns/rows in the \datafile,
while the second may be used to tell Pyxplot the physical units of the values
in each of the columns/rows. These special comments may appear multiple times
throughout a single \datafile\ to indicate changes to the format of the data.

For example, a \datafile\ prefixed with the lines
\begin{verbatim}
# Columns:      Time   Distance
# ColumnUnits:   s      10*m
\end{verbatim}
contains two columns of data, the first containing times measured in seconds
and the second containing distances measured in tens of metres. Note that
because the entries on each of these lines are whitespace-separated, spaces are
not allowed in column names or within units such as {\tt 10*m}. This \datafile\
could be plotted using any of the following forms equivalently
\begin{verbatim}
plot 'data' using Time:Distance
plot 'data' using $Time:$Distance
plot 'data' using 1:2
\end{verbatim}
and the axes of the graph would indicate the units of the data (see
Section~\ref{sec:set_axisunitstyle}).

\section{Tabulating functions and slicing \datafile s}
\label{sec:tabulate}

Pyxplot's \indcmdt{tabulate} is similar to its {\tt plot} command, but instead
of plotting a series of \datapoint s onto a graph, it writes them to a
\datafile. It can be used to produce text files containing samples of
functions, to rearrange/filter the columns in \datafile s, to produce a copy of
a \datafile\ using different physical units, and so on.

The following example would produce a \datafile\ called {\tt gamma.dat}
containing a list of values of the gamma function:

\begin{verbatim}
set output 'gamma.dat'
tabulate [1:5] gamma(x)
\end{verbatim}

One way to tabulate multiple functions into a common file is with the
\indmodt{using} modifier, as in the example

\begin{verbatim}
tabulate [0:2*pi] sin(x):cos(x):tan(x) using 1:2:3:4
\end{verbatim}

\noindent This tabulates the supplied functions horizontally alongside one
another in a series of columns. As many expressions may be supplied to the
\indmodt{using} modifier as columns are wanted.

Alternatively, if a series on functions or \datafile s are listed in a
comma-separated list (as is done in the {\tt plot} command to plot multiple
datasets), the functions are tabulated one after another in a series of index
blocks separated by double linefeeds (see Section~\ref{sec:plot_datafiles}):

\begin{verbatim}
tabulate [0:2*pi] sin(x), cos(x), tan(x)
\end{verbatim}

The \indcmdt{set samples} can be used to control the number of points that are
listed when tabulating functions, in the same way that it controls the number
of data points drawn by the {\tt plot} command:

\begin{verbatim}
set samples 200
\end{verbatim}

\noindent If the abscissa axis is set to be logarithmic then the functions are
evaluated at logarithmically-space points along the axis; otherwise, they are
samples at linearly-spaced points.

The \indmodt{select}, \indmodt{using} and \indmodt{every} modifiers operate in
the same manner in the {\tt tabulate} command as in the {\tt plot} command.
Thus the following example would write out the third, sixth and ninth columns
of the \datafile\ {\tt input.dat}, but only when the arcsine of the value in the
fourth column is positive:

\begin{verbatim}
set output 'filtered.dat'
tabulate 'input.dat' using 3:6:9 select (asin($4)>0)
\end{verbatim}

The numerical display format used for each column of the output file is
automatically chosen to preserve accuracy whilst simultaneously being as easily
human readable as possible.  Thus, columns which contain only integers are
displayed as such, and scientific notation is only used in columns which
contain very large or very small values.

If desired, however, a custom format may be specified using the {\tt with
format} modifier. This can be used both to specify text to appear in between
the columns of data, and to specify the format of the data itself using tokens
such as {\tt \%.5f}, as used by Pyxplot's string substitution operator ({\tt
\%}; see Section~\ref{sec:stringsubop}), and the {\tt sprintf} statement of
many other programming languages.

For example, to tabulate the values of $x^2$ to very many significant figures
with some additional text, one could use:

\begin{verbatim}
tabulate x**2 with format "x = %f ; x**2 = %27.20e"
\end{verbatim}

\noindent This might produce the following output:

\begin{verbatim}
x = 0.000000 ; x**2 =  0.00000000000000000000e+00
x = 0.833333 ; x**2 =  6.94444444444442421371e-01
x = 1.666667 ; x**2 =  2.77777777777778167589e+00
\end{verbatim}

There is flexibility as to how many substitution tokens appear in the format
specification.  If the number of tokens is fewer than the number of columns of
data, then the format is repeated until all the columns have been printed.
Thus, the command

\begin{verbatim}
tabulate x**2 with format "%.3f "
\end{verbatim}

\noindent might produce the output:

\begin{verbatim}
0.000 0.000
0.833 0.694
1.667 2.778
\end{verbatim}

\noindent Note that the space character at the end of the format is important
to ensure that there is a gap between the columns.

If formats are supplied for more columns than are present, then the final
columns are padded with {\tt nan} (not a number).

The data produced by the {\tt tabulate} command can be sorted in order of any
arbitrary metric by supplying an expression after the {\tt sortby} modifier.
The data are sorted in order from the lowest value of this expression to the
highest.


\section{Function fitting}
\label{sec:fit_command}

The \indcmdt{fit} can be used to fit arbitrary functional forms to \datapoint s
read from files. It can be used to produce best-fit lines\index{best fit
lines}\footnote{Another way of producing best-fit lines is to use the {\tt
interpolate} command; more details are given in
Section~\ref{sec:spline_command}} for datasets, or to determine gradients and
other mathematical properties of data by looking at the parameters associated
with the best-fitting functional form.

The following simple example fits a straight line to data in a file called {\tt
data.dat}:

\begin{verbatim}
f(x) = a*x+b
fit f() 'data.dat' index 1 using 2:3 via a,b
\end{verbatim}

\noindent The first line specifies the functional form which is to be used.
The coefficients of this function, {\tt a} and {\tt b}, which are to be
varied during the fitting process, are listed after the keyword \indkeyt{via}
in the {\tt fit} command.  The modifiers \indmodt{index}, \indmodt{every},
\indmodt{select} and \indmodt{using} have the same meanings as in the {\tt plot} command.

When a function of $n$ variables is being fit, at least $n+1$ columns (or rows
-- see Section~\ref{sec:horizontal_datafiles}) of data must be specified after
the {\tt using} modifier. By default, the first $n+1$ columns are used. These
correspond to the values of each of the $n$ arguments to the function, plus
finally the value which the output from the function is aiming to match.

If an additional column is specified, then this is taken to contain the
standard error in the value that the output from the function is aiming to
match, and can be used to weight the \datapoint s which are being used to
constrain the fit.

As an example, below we generate a \datafile\ containing samples of a square
wave using the {\tt tabulate} command and fit the first three terms of a
truncated Fourier series to it:

\begin{verbatim}
set samples 10
set output 'square.dat'
tabulate [-pi:pi] 1-2*heaviside(x)

f(x) = a1*sin(x) + a3*sin(3*x) + a5*sin(5*x)
fit f() 'square.dat' via a1, a3, a5
set xlabel '$x$' ; set ylabel '$y$'
plot 'square.dat' title 'data' with points pointsize 2, \
     f(x) title 'Fitted function' with lines
\end{verbatim}

\begin{center}
\includegraphics[width=8cm]{examples/eps/ex_fitting}
\end{center}

As the \indcmdt{fit} works, it displays statistics including the best fit
values of each of the fitting parameters, the uncertainties in each of them,
and the covariance matrix. These can be useful for analysing the security of
the fit achieved, but calculating the uncertainties in the best fit parameters
and the covariance matrix can be time consuming, especially when many
parameters are being fitted simultaneously. The optional word {\tt
withouterrors} can be included immediately before the filename of the input
\datafile\ to substantially speed up cases where this information is not
required.

By default, the starting values for each of the fitting parameters is
$1.0$. However, if the variables to be used in the fitting process are already
set before the {\tt fit} command is called, these initial values are used
instead. For example, the following would use the initial values
$\{a=100,b=50\}$:
\begin{verbatim}
f(x) = a*x+b
a = 100
b = 50
fit f() 'data.dat' index 1 using 2:3 via a,b
\end{verbatim}

\noindent If any of the fitting coefficients are not dimensionless -- that is,
they have physical units such as meters or seconds -- then an initial value
with the appropriate units {\it must} be specified.

A few points are worth noting:

\begin{itemize}
\item A series of ranges may be specified after the {\tt fit} command, using
the same syntax as in the {\tt plot} command, as described in
Section~\ref{sec:plot_ranges}. If ranges are specified then only \datapoint s
falling within these ranges are used in the fitting process; the ranges refer
to each of the $n$ variables of the fitted function in order:
\begin{verbatim}
fit [0:10] f() 'data.dat' via a
\end{verbatim}

\item As with all numerical fitting procedures, the {\tt fit} command comes
with caveats. It uses a generic fitting algorithm, and may not work well with
poorly behaved or ill-constrained problems. It works best when all of the
values it is attempting to fit are of order unity. For example, in a problem
where $a$ was of order $10^{10}$, the following might fail:
\begin{verbatim}
f(x) = a*x
fit f() 'data.dat' via a
\end{verbatim}
However, better results might be achieved if $a$ were artificially made of
order unity, as in the following script:
\begin{verbatim}
f(x) = 1e10*a*x
fit f() 'data.dat' via a
\end{verbatim}

\item For those interested in the mathematical details, the workings of the
{\tt fit} command are discussed in more detail in Appendix~\ref{ch:fit_maths}.

\end{itemize}

\section{Datafile interpolation}
\label{sec:spline_command}
\index{best fit lines}

The \indcmdt{interpolate} can be used to generate a special function within
Pyxplot's mathematical environment which interpolates a set of \datapoint s
supplied from a \datafile. As with other commands, data can also be supplied
from functions, or from a colon-separated list of vectors (see
Section~\ref{sec:vectorplot}). Either one- or two-dimensional interpolation is
possible. Two-dimensional interpolation is described in the next section.

In the case of one-dimensional interpolation, various different types of
interpolation are supported: linear interpolation, power law interpolation,
polynomial interpolation, cubic spline interpolation and akima spline
interpolation. Stepwise interpolation returns the value of the datapoint
nearest to the requested point in argument space. The use of polynomial
interpolation with large datasets is strongly discouraged, as polynomial fits
tend to show severe oscillations between \datapoint s.

Except in the case of stepwise interpolation, extrapolation is not permitted;
if an attempt is made to evaluate an interpolated function beyond the limits of
the \datapoint s which it interpolates, Pyxplot returns an error or value of
not-a-number.  This behaviour can be configured using the \indcmdt{set numeric
errors quiet} (see Section~\ref{sec:num_errs}).

The \indcmdt{interpolate} has similar syntax to the \indcmdt{fit}:

\begin{verbatim}
interpolate ( akima | linear | loglinear | polynomial |
              spline | stepwise |
              2d [( bmp_r | bmp_g | bmp_b )] )
            [<range specification>] <function name>"()"
            '<filename>'
            [ every <expression> {:<expression} ]
            [ index <value> ]
            [ select <expression> ]
            [ using <expression> {:<expression} ]
\end{verbatim}

A very common application of the \indcmdt{interpolate} is to perform arithmetic
functions such as addition or subtraction on datasets which are not sampled at
the same abscissa values. The following example would plot the difference
between two such datasets:

\begin{verbatim}
interpolate linear f() 'data1.dat'
interpolate linear g() 'data2.dat'
plot [min:max] f(x)-g(x)
\end{verbatim}

\noindent Note that it is advisable to supply a range to the {\tt plot} command
in this example: because the two datasets have been turned into continuous
functions, the {\tt plot} command has to guess a range over which to plot them,
unless one is explicitly supplied.

The \indcmdt{spline} is an alias for {\tt interpolate spline}; the following
two statements are equivalent:

\begin{verbatim}
spline f() 'data1.dat'
interpolate spline f() 'data1.dat'
\end{verbatim}

\example{ex:interpolation}{A demonstration of the {\tt linear}, {\tt spline} and {\tt akima} modes of interpolation}{
In this example, we demonstrate the {\tt linear}, {\tt spline} and {\tt akima}
modes of interpolation using an example \datafile\ with non-smooth data
generated using the {\tt tabulate} command (see Section~\ref{sec:tabulate}):
\nlscf
\noindent{\tt f(x) = 0}\newline
\noindent{\tt f(x)[0:1] = 0.5}\newline
\noindent{\tt f(x)[2:4] = cos((x-3)*pi/2)}\newline
\noindent{\tt set samples 20}\newline
\noindent{\tt tabulate [0:4] f(x)}
\nlscf
Having set three functions to interpolate these non-smooth data in different
ways, we plot them with a vertical offset of~$0.1$ between them for clarity.
The interpolated \datafile is plotted with points three times to show where
each of the interpolation functions is pinned.
\nlscf
\input{examples/tex/ex_interpolation_1.tex}
\nlscf
The resulting plot is shown below:
\nlscf
\centerline{\includegraphics[width=\textwidth]{examples/eps/ex_interpolation}}
}

\subsection{Two-dimensional interpolation}

In the case of two-dimensional interpolation, the type of interpolation to be
used is set using the {\tt interpolate} modifier to the \indcmdt{set samples},
and may be changed at any time after the interpolation function has been
created.  The options available are nearest neighbor interpolation -- which is
the two-dimensional equivalent of stepwise interpolation, inverse square
interpolation -- which returns a weighted average of the supplied \datapoint s,
using the inverse squares of their distances from the requested point in
argument space as weights, and Monaghan Lattanzio interpolation, which uses the
weighting function (Monaghan \& Lattanzio 1985)
\begin{eqnarray*}
w(x) & = 1 - \nicefrac{3}{2}v^2 + \nicefrac{3}{4}v^3 & \,\mathrm{for~}0\leq v\leq 1 \\
     & = \nicefrac{1}{4}(2-v)^3                      & \,\mathrm{for~}1\leq v\leq 2
\end{eqnarray*}
where $v=r/h$ for $h=\sqrt{A/n}$, $A$ is the product
$(x_\mathrm{max}-x_\mathrm{min})(y_\mathrm{max}-y_\mathrm{min})$ and $n$ is the
number of input datapoints. These are selected as follows:

\begin{verbatim}
set samples interpolate nearestNeighbor
set samples interpolate inverseSquare
set samples interpolate monaghanLattanzio
\end{verbatim}

The following example creates a function {\tt quad\-ra\-pole(x,y)} which interpolates a quadrapole:

\begin{verbatim}
set samples interpolate inverseSquare
interpolate 2d quadrapole() '--'
-1 -1  1
-1  1 -1
 1 -1 -1
 1  1  1
END
\end{verbatim}

Finally, data can be imported from graphical images in bitmap ({\tt .bmp})
format to produce a function of two arguments returning a value in the
range~$0\to1$ which represents the data in one of the image's three color
channels. The two arguments are the horizontal and vertical position within the
bitmap image, as measured in pixels. This is done using syntax of the form:

\begin{verbatim}
interpolate 2d bmp_b blue() 'myImg.bmp'
\end{verbatim}

\section{Fourier transforms}

The {\tt fft} and {\tt ifft} commands\indcmd{fft}\indcmd{ifft} take Fourier
transforms and inverse Fourier transforms respectively of data. As with other
commands, data can be supplied from a \datafile, from functions, or from a
colon-separated list of vectors (see Section~\ref{sec:vectorplot}). In each
case, a regular grid of abscissa values must be specified on which to take the
discrete Fourier transform, which can extend over an arbitrary number of
dimensions. The following example demonstrates the syntax of these commands as
applied to a two-dimensional top-hat function:

\begin{verbatim}
step(x,y) = tophat(x,0.2) * tophat(y,0.4)
fft  [  0: 1:0.01][  0: 1:0.01] f() of step()
ifft [-50:49:1   ][-50:49:1   ] g() of f()
\end{verbatim}

\noindent In the {\tt fft} command above, $N_x=100$~equally-spaced samples of
the function {\tt step}$(x,y)$ are taken between limits of $x_\mathrm{min}=0$
and $x_\mathrm{max}=1$ for each of $N_y=100$~equally-spaced values of $y$ on an
identical raster, giving a total of $10^4$ samples. These are converted into a
rectangular grid of $10^4$~samples of the Fourier transform {\tt
f}$(\omega_x,\omega_y)$ at

\begin{eqnarray}
\omega_x = \frac{j}{\Delta x} & \textrm{for} & -\frac{N_x}{2}\leq j <\frac{N_x}{2} \; \left(\textrm{equivalently, for} -\frac{N_x}{2\Delta x}\leq \omega_x <\frac{N_x}{2\Delta x} \right), \nonumber \\
\omega_y = \frac{k}{\Delta y} & \textrm{for} & -\frac{N_y}{2}\leq k <\frac{N_y}{2} \; \left(\textrm{equivalently, for} -\frac{N_y}{2\Delta y}\leq \omega_y <\frac{N_y}{2\Delta y} \right). \nonumber
\end{eqnarray}

\noindent where $\Delta x=x_\mathrm{max}-x_\mathrm{min}$ and $\Delta y$ is
analogously defined. These samples are interpolated stepwise, such that an
evaluation of the function {\tt f}$(\omega_x,\omega_y)$ for general inputs
yields the nearest sample, or zero outside the rectangular grid where samples
are available. In general, even the Fourier transforms of real functions are
complex, and their evaluation when complex arithmetic is not enabled (see
Section~\ref{sec:complex_numbers}) is likely to fail. For this reason, a
warning is issued if complex arithmetic is disabled when a Fourier transform
function is evaluated.

In the example above, we go on to convert this set of samples back into the
function with which we started by instructing the \indcmdt{ifft} to take
$N_x=100$~equally-spaced samples along the $\omega_x$-axis between
$\omega_{x,\mathrm{min}}=-{N_x}/{2\Delta x}$ and
$\omega_{x,\mathrm{max}}=(N_x-1)/{2\Delta x}$, with similar sampling along the
$\omega_y$-axis.

Taking the simpler example of a one-dimensional Fourier transform for clarity,
as might be calculated by the instructions
\begin{verbatim}
step(x) = tophat(x,0.2)
fft  [  0: 1:0.01] f() of step()
\end{verbatim}
the {\tt fft} and {\tt ifft} commands\indcmd{fft}\indcmd{ifft} compute,
respectively, discrete sets of samples $F_m$ and $I_n$ of the functions
$F(\omega_x)$ and $I(x)$, which are given by
\begin{displaymath}
F_j = \sum_{m=0}^{N-1} I_m e^{-2\pi ijm/N} \,\delta x,\;\textrm{for}\; -\frac{N}{2}\leq j <\frac{N}{2} ,
\end{displaymath}
\noindent and
\begin{displaymath}
I_j = \sum_{m=0}^{N-1} F_m e^{ 2\pi ijm/N} \,\delta \omega_x,\;\textrm{for}\; -\frac{N}{2}\leq j <\frac{N}{2} ,
\end{displaymath}
\noindent where:
\begin{tabular}{rcp{9cm}}
$I(x)$        & = & Function being Fourier transformed. \\
$F(\omega_x)$ & = & Fourier transform of $I()$. \\
$N$           & = & The number of values sampled along the abscissa axis. \\
$\delta x$    & = & Spacing of values sampled along the abscissa axis. \\
$\delta \omega_x$ & = & Spacing of abscissa values sampled along the $\omega_x$ axis. \\
$i$           & = & $\sqrt{-1}$. \\
\end{tabular}
\vspace{2mm}

It may be shown in the limit that $\delta x$ becomes small -- i.e.\ when the
number of samples taken becomes very large -- that these sums approximate the
integrals
\begin{equation}
F(\omega_x) = \int I(x) e^{-2\pi ix\omega_x} \,\mathrm{d}x ,
\end{equation}
\noindent and
\begin{equation}
I(x) = \int F(\omega_x) e^{ 2\pi ix\omega_x} \,\mathrm{d}\omega_x .
\end{equation}

Fourier transforms may also be taken of data stored in \datafile s using syntax
of the form
\begin{verbatim}
fft [-10:10:0.1] f() of 'datafile.dat'
\end{verbatim}

\noindent In such cases, the data read from the \datafile\ for an
$N$-dimensional FFT must be arranged in $N+1$ columns\footnote{The {\tt using},
{\tt every}, {\tt index} and {\tt select} modifiers can be used to arrange it
into this form.}, with the first $N$ containing the abscissa values for each of
the $N$ dimensions, and the final column containing the data to be Fourier
transformed. The abscissa values must strictly match those in the raster
specified in the {\tt fft} or {\tt ifft} command, and must be arranged strictly
in row-major order.

\example{ex:fourier}{The Fourier transform of a top-hat function}{
It is straightforward to show that the Fourier transform of a top-hat function
of unit width is the function $\sinc(x^\prime=\pi x)=\sin(x^\prime)/x^\prime$. If
\begin{displaymath}
f(x)=\left\{\begin{array}{l}1\;|x|\leq\nicefrac{1}{2}\\0\;|x|>\nicefrac{1}{2}\end{array}\right. ,
\end{displaymath}
the Fourier transform $F(\omega)$ of $f(x)$ is
\begin{eqnarray*}
F(\omega) & = & \int_0^\infty f(x) \exp \left(-2\pi ix\omega\right) \,\mathrm{d}x
            =   \int_{-\nicefrac{1}{2}}^{\nicefrac{1}{2}} \exp\left(-2\pi ix\omega\right) \,\mathrm{d}x
\\        & = & \frac{1}{2\pi\omega}\left[ \exp\left(\pi i\omega\right) - \exp\left(-\pi i\omega\right) \right]
            = \frac{\sin(\pi\omega)}{\pi\omega}
            = \sinc(\pi\omega).
\end{eqnarray*}
\nlnp
In this example, we demonstrate this numerically by taking the Fourier
transform of such a step function, and comparing the result against the
function {\tt sinc(x)} which is pre-defined within Pyxplot:
\nlscf
\noindent{\tt set numerics complex}\newline
\noindent{\tt step(x) = tophat(x,0.5)}\newline
\noindent{\tt fft [-1:1:0.01] f() of step()}\newline
\noindent{\tt plot [-10:10] Re(f(x)), sinc(pi*x)}
\nlscf
Note that the function {\tt Re(x)} is needed in the {\tt plot} statement here,
since although the Fourier transform of a symmetric function is in theory real,
in practice any numerical Fourier transform will yield a small imaginary
component at the level of the accuracy of the numerical method used. Although
the calculated numerical Fourier transform is defined throughout the range
$-50\leq x<50$, discretised with steps of size $\Updelta x=0.5$, we only plot
the central region in order to show clearly the stepping of the function:
\nlscf
\begin{center}
\includegraphics{examples/eps/ex_fft}
\end{center}
\nlscf
In the following steps, we take the square of the function $\sinc(\pi x)$ just
calculated, and then plot the numerical inverse Fourier transform of the
result:
\nlscf
\noindent{\tt g(x) = f(x)**2}\newline
\noindent{\tt ifft [-50:49.5:0.5] h(x) of g(x)}\newline
\noindent{\tt plot [-2:2] Re(h(x))}
\nlscf
\begin{center}
\includegraphics{examples/eps/ex_fft2}
\end{center}
\nlscf
As can be seen, the result is a triangle function. This is the result which
would be expected from the convolution theorem, which states that when the
Fourier transforms of two functions are multiplied together and then inverse
transformed, the result is the convolution of the two original functions. The
convolution of a top-hat function with itself is, indeed, a triangle function.
}

\subsection{Window functions}
\index{window functions}

A range of commonly-used window functions may automatically be applied to data
as it is read into the {\tt fft} and {\tt ifft} commands; these are listed
together with their algebraic forms in Table~\ref{tab:windowfuncs} and shown in
Figure~\ref{fig:windowfuncs}. In each case, the window functions are given for
sample number $n$, which ranges between $0$ and $N_x$. The window functions may
be invoked using the following syntax:

\begin{verbatim}
fft [...] <out>() of <in>() window <window_name>
\end{verbatim}

\noindent Where multi-dimensional FFTs are performed, window functions are
applied to each dimension in turn.  Other arbitrary window functions may be
implemented by pre-multiplying data before entry to the {\tt fft} and {\tt
ifft} commands.

\newlength{\wfgap}
\setlength{\wfgap}{30pt}

\begin{table}
\begin{center}
\begin{tabular}{|>{\columncolor{LightGrey}}l>{\columncolor{LightGrey}}l|}
\hline
{\bf Window Name} & {\bf Algebraic Definition} \\
\hline
Bartlett     & $\displaystyle w(n) = \left( \frac{2}{N_x-1} \right) \left( \frac{N_x-1}{2} - \left| n - \frac{N_x-1}{2} \right| \right)$ \vphantom{\rule{0pt}{20pt}}\\
BartlettHann & $\displaystyle w(n) = a_0 - a_1\left|\frac{n}{N_x-1}-\frac{1}{2}\right| - a_2\cos\left(\frac{2\pi n}{N_x-1}\right),\;\textrm{for}$ \vphantom{\rule{0pt}{\wfgap}}\\
             & $a_0=0.62,\; a_1=0.48,\; a_2=0.38.$ \vphantom{\rule{0pt}{20pt}}\\
Cosine       & $\displaystyle w(n) = \cos\left(\frac{\pi n}{N_x-1} - \frac{\pi}{2} \right)$ \vphantom{\rule{0pt}{\wfgap}}\\
Gauss        & $\displaystyle w(n) = \exp \left\{ -\frac{1}{2}\left[ \frac{n-(N_x-1)/2}{\sigma(N_x-1)/2} \right]^2 \right\},\;\textrm{for}\;\sigma=0.5$ \vphantom{\rule{0pt}{\wfgap}}\\
Hamming      & $\displaystyle w(n) = 0.54 - 0.46\cos\left(\frac{2\pi n}{N_x-1}\right)$ \vphantom{\rule{0pt}{\wfgap}}\\
Hann         & $\displaystyle w(n) = 0.5 \left[ 1 - \cos\left(\frac{2\pi n}{N_x-1}\right) \right]$ \vphantom{\rule{0pt}{\wfgap}}\\
Lanczos      & $\displaystyle w(n) = \mathrm{sinc}\left( \frac{2n}{N_x-1} - 1 \right)$ \vphantom{\rule{0pt}{\wfgap}}\\
Rectangular  & $\displaystyle w(n) = 1$ \vphantom{\rule{0pt}{\wfgap}}\\
Triangular   & $\displaystyle w(n) = \left( \frac{2}{N_x} \right) \left( \frac{N_x}{2} - \left| n - \frac{N_x-1}{2} \right| \right)$ \vphantom{\rule{0pt}{\wfgap}}\\
\hline
\end{tabular}
\end{center}
\caption{Window functions available in the {\tt fft} and {\tt ifft} commands.}
\label{tab:windowfuncs}
\end{table}

\begin{figure}
\begin{center}
\includegraphics{examples/eps/ex_windowfuncs}
\end{center}
\caption{Window functions available in the {\tt fft} and {\tt ifft} commands.}
\label{fig:windowfuncs}
\end{figure}

\section{Histograms}
\label{sec:histogram}

The \indcmdt{histogram} takes a single column of data and produces a function
that represents the frequency distribution of the supplied data values. The
output function consists of a series of discrete intervals which we term {\it
bins}. Within each interval the output function has a constant value,
determined such that the area under each interval -- i.e.\ the integral of the
function over each interval -- is equal to the number of datapoints found
within that interval.  The following simple example

\begin{verbatim}
histogram f() 'input.dat'
\end{verbatim}

\noindent produces a frequency distribution of the data values found in the
first column of the file {\tt input.dat}, which it stores in the function
$f(x)$. The value of this function at any given point is equal to the number of
items in the bin at that point, divided by the width of the bins used. If the
input datapoints are not dimensionless then the output frequency distribution
adopts appropriate units, thus a histogram of data with units of length has
units of one over length.

The number and arrangement of bins used by the \indcmdt{histogram} can be
controlled by means of various modifiers.  The \indmodt{binwidth} modifier sets
the width of the bins used. The \indmodt{binorigin} modifier controls where
their boundaries lie; the \indcmdt{histogram} selects a system of bins which,
if extended to infinity in both directions, would put a bin boundary at the
value specified in the {\tt binorigin} modifier. Thus, if {\tt binorigin 0.1}
were specified, together with a bin width of~20, bin boundaries might lie
at~$20.1$, $40.1$, $60.1$, and so on. Alternatively global defaults for the bin
width and the bin origin can be specified using the {\tt set binwidth} and {\tt
set binorigin} commands respectively. The example

\begin{verbatim}
histogram h() 'input.dat' binorigin 0.5 binwidth 2
\end{verbatim}

\noindent would bin data into bins between $0.5$ and $2.5$, between $2.5$ and
$4.5$, and so forth.

Alternatively the set of bins to be used can be controlled more precisely using
the \indmodt{bins} modifier, which allows an arbitrary set of bin boundaries to
be specified. The example

\begin{verbatim}
histogram g() 'input.dat' bins (1, 2, 4)
\end{verbatim}

\noindent would bin the data into two bins, $x=1\to 2$ and $x=2\to 4$.

A range can be supplied immediately following the {\tt histogram} command,
using the same syntax as in the {\tt plot} and {\tt fit} commands; if such a
range is supplied, only points that fall within that range will be binned.  In
the same way as in the {\tt plot} command, the \indmodt{index},
\indmodt{every}, \indmodt{using} and \indmodt{select} modifiers can be used to
specify which subsets of a \datafile\ should be used.

Two points about the {\tt histogram} command are worthy of note. First,
although histograms are similar to bar charts, they are not the same.  A bar
chart conventionally has the height of each bar equal to the number of points
that it represents, whereas a histogram is a continuous function in which the
area underneath each interval is equal to the number of points within it.
Thus, to produce a bar chart using the {\tt histogram} command, the end result
should be multiplied by the bin width used.

Second, if the function produced by the {\tt histogram} command is plotted
using the {\tt plot} command, samples are automatically taken not at evenly
spaced intervals along the abscissa axis, but at the centres of each bin. If
the \indpst{boxes} plot style is used, the box boundaries are conveniently
drawn to coincide with the bins into which the data were sorted.

\section{Random data generation}

Pyxplot has functions for generating random numbers from a variety of common
probability distributions. These functions are in the {\tt random} module:

\begin{itemize}
\item {\tt random.random()} -- returns a random real number between 0 and~1.
\indfun{random.random()}
\item {\tt random.binomial($p,n$)} -- returns a random sample from a binomial distribution with $n$ independent trials and a success probability $p$.
\indfun{random.binomial($p,n$)}
\item {\tt random.chisq($\nu$)} -- returns a random sample from a $\chi$-squared distribution with $\nu$ degrees of freedom.
\indfun{random.chisq($\nu$)}
\item {\tt random.gaussian($\sigma$)} -- returns a random sample from a Gaussian (normal) distribution of standard deviation $\sigma$ and centred on zero.
\indfun{random.gaussian($\sigma$)}
\item {\tt random.lognormal($\zeta,\sigma$)} -- returns a random sample from the log normal distribution centred on $\zeta$, and of width $\sigma$.
\indfun{random.lognormal($\zeta,\sigma$)}
\item {\tt random.poisson($n$)} -- returns a random integer from a Poisson distribution with mean $n$.
\indfun{random.poisson($n$)}
\item {\tt random.tdist($\nu$)} -- returns a random sample from a $t$-distribution with $\nu$ degrees of freedom.
\indfun{random.tdist($\nu$)}
\end{itemize}

\noindent These functions all rely on a common underlying random number
generator\footnote{The gsl library's default random number generator, {\tt
gsl\_\-rng\_\-default} is used. As of version~1.15, this maps to {\tt
gsl\_\-rng\_\-mt19937} with a default seed of zero. The various probability
distributions above are sampled using the functions {\tt
gsl\_\-ran\_\-binomial} and similar.}, whose seed may be set using the
\indcmdt{set seed}, which should be followed by any integer. The sequence of
random samples generated is always the same after setting any particular seed.

When Pyxplot starts, the seed is implicitly set to zero. {\bf This means that
Pyxplot always produces the same series of random numbers when restarted.} This
series can be reproduced by typing:
\begin{verbatim}
set seed 0
\end{verbatim}

\noindent For applications where this repeatability is undesirable, the
following command may help, using the system clock as a random seed:
\begin{verbatim}
set seed time.now().toUnix()
\end{verbatim}

\noindent This gives a different sequence of random numbers each second.
However, the user is advised to consider carefully whether this is sufficient
for the particular application being implemented.

\example{ex:random}{Using random numbers to estimate the value of $\pi$}{
Pyxplot's functions for generating random numbers are most commonly used for
adding noise to artificially-generated data. In this example, however, we use
them to implement a rather inefficient algorithm for estimating the value of
the mathematical constant $\pi$.  The algorithm works by spreading
randomly-placed samples in the square $\left\{ -1<x<1;\; -1<y<1\right\}$. The
number of these which lie within the circle of unit radius about the origin are
then counted. Since the square has an area of $4\,\mathrm{unit}^2$ and the
circle an area of $\pi\,\mathrm{unit}^2$, the fraction of the points which
lie within the unit circle equals the ratio of these two areas: $\pi/4$.
\nlnp
The following script performs this calculation using $N=5000$~randomly placed
samples. Firstly, the positions of the random samples are generated using the
{\tt random()} function, and written to a file called {\tt random.dat} using the
{\tt tabulate} command. Then, the {\tt foreach datum} command -- which will be
introduced in Section~\ref{sec:foreach_datum} -- is used to loop over these,
counting how many lie within the unit circle.
\nlscf
\input{examples/tex/ex_pi_estimation_1.tex}
\nlfcf
On the author's machine, this script returns a value of $3.1352$ when executed
using the random samples which are returned immediately after starting Pyxplot.
This method of estimating $\pi$ is well modelled as a Poisson process, and the
uncertainty in this result can be estimated from the Poisson distribution to be
$1/\sqrt{N}$. In this case, the uncertainty is $0.01$, in close agreement with
the deviation of the returned value of $3.1352$ from more accurate measures of
$\pi$.
\nlnp
With a little modification, this script can be adapted to produce a diagram of
the datapoints used in its calculation. Below is a modified version of the
second half of the script, which loops over the \datapoint s stored in the
\datafile\ {\tt random.dat}. It uses Pyxplot's vector graphics commands, which
will be introduced in Chapter~\ref{ch:vector_graphics}, to produce such a
diagram:

\nlscf
\input{examples/tex/ex_pi_estimation_2.tex}
\nlscf
The graphical output from this script is shown below. The number of datapoints
has been reduced to {\tt Nsamples}$=500$ for clarity:
\nlscf
\begin{center}
\includegraphics{examples/eps/ex_pi_estimation}
\end{center}
}