1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251
|
\chapter{Sorting and statistics}
\label{sorting}
The sorting system is a vital part of \FORM\ and one of the main reasons why
the speed of \FORM\ compares so favorably with other systems.
A good understanding of what happens during the sorting\index{sorting} of
expressions is essential if one wants to write efficient\index{efficient}
programs. In essence the sorting is done by a tree\index{tree sort} sort.
However due to the nature of mathematical expressions there is a
complication. When two terms are identical with the possible exception of
their coefficient, we will add their coefficients, put this new coefficient
in the place of the coefficient of the first term, and drop the second
term. If the new coefficient happens to be zero, both terms are dropped.
Hence the number of terms during the sort is not fixed. For a tree sort
this is not a major complication\index{complication}. What is more annoying
though is that the new coefficient may take more space inside the storage
than either of the old coefficients. Let us have a look now at what happens
in a \FORM\ program. Much can be seen from the statistics.
\begin{verbatim}
S x1,...,x4;
L F = (x1+...+x4)^4;
.end
Time = 0.01 sec Generated terms = 35
F Terms in output = 35
Bytes used = 628
\end{verbatim}
In this case the program generated 35 terms. Whenever a term is generated
and \FORM\ is done with it (no more statements will act on it), \FORM\
will write it into a buffer which is called the small buffer. Additionally
it stores a pointer to the location of this term inside the small buffer.
Next it will continue generating terms. This process will be stopped by
either of three conditions:
\begin{enumerate}
\item \FORM\ is finished generating terms.
\item The last generated term does not fit inside the space remaining in
the small buffer.
\item There is no space for a pointer to the last generated term inside the
array of pointers.
\end{enumerate}
In either of these three cases \FORM\ will sort the contents of the
small\index{small buffer} buffer\index{buffer!small}. This sorting is done
`by pointers' and hence it is important that the whole small buffer fits
inside the physical memory of the computer. If this would not be the case,
some very inefficient swapping of memory might be the result. During this
sorting \FORM\ may run into the problem that the coefficient of two combined
terms does not fit in the place of one of the two old coefficients. This
means that the combined term will need more space, but because the old
terms might be enclosed by other terms, this space may not be available
locally. To this end \FORM\ has some spare space in the small buffer which is
called the small\index{small extension} extension\index{extension!small}.
Actually the term SmallExtension\index{smallextension} is used for the
combination of the small buffer and its extra space. The extra space is at
least $1/6$ times the size of the small buffer, but typically it will be
about $1/3$ the size of the small buffer. In some exceptional cases (with
heavy use of a polynomial coefficient via the PolyFun\index{polyfun}
command) bigger sizes might be useful.
In the case that the new combined term needs more space than each of the
old terms, the new term is placed in the extension space. If, during the
sort, the extension space becomes exhausted, \FORM\ will make a
garbage\index{garbage collection}
collection of the entire extended small buffer. This will always result in
the extension space becoming empty again, because the notation of the terms
in \FORM\ is such the new combined term will at most occupy an amount
of space equal to the sum of the spaces of the original two terms. In older
versions of \FORM\ this garbage collection was executed by means of a
temporary disk file. In the new version it is done inside the memory by
temporarily allocating a new buffer. Anyway such garbage collections are
relatively rare.
In the above example, the sorting occurred because the generation of terms
was finished. Hence the sorted output is written away in such a way that it
can be used as input for a potential next module (or to be printed).
Hence let us change the size of the small buffer:
\begin{verbatim}
#: SmallSize 300
S x1,...,x4;
L F = (x1+...+x4)^4;
.end
Time = 0.00 sec Generated terms = 13
F 1 Terms left = 13
Bytes used = 236
Time = 0.00 sec Generated terms = 26
F 1 Terms left = 26
Bytes used = 476
Time = 0.00 sec Generated terms = 35
F 1 Terms left = 35
Bytes used = 632
Time = 0.00 sec Generated terms = 35
F Terms in output = 35
Bytes used = 628
\end{verbatim}
Now the size of the small buffer will be only 300 bytes. As a result the
13-th term does not fit. We can see this in the statistics: the 13-th term
has been generated and \FORM\ sorts the small buffer. The output of the 12
sorted terms is written to another buffer, called the
large\index{large buffer} buffer\index{buffer!large}. Inside the large
buffer the terms are lightly compressed. This compression is related to the
fact that in each `patch'\index{patch} the terms are already sorted and
hence we may not have to repeat the identical beginnings of each term.
Hence the amount of space used after this sort is less than the 300 bytes
of the small buffer, even though the 13-th term gave an overflow for these
300 bytes. The small buffer fills up again at the 26-th term and again it
is sorted and the results written to the large buffer. Finally, after 35
terms, the generation is finished. Hence the remains in the small buffer
are also sorted and written as a third `patch' into the large buffer. Then
the large buffer is sorted. For this a different sort technique is used. It
is assumed that the large buffer is not always residing inside the physical
memory. Hence parts of it may be swapped out temporarily. With the size of
current days memories this may not happen very often, unless one sets the
size of the buffer to something comparable to the memory size of the
computer and several programs are running at the same time. Anyway,
swapping will not affect the large buffer very much. \FORM\ will merge the
`patches' by going sequentially through them with a method called
`tree\index{tree of losers} of losers' in the book by Knuth\index{Knuth}
(the art of computer programming, vol. 3). Because it goes sequentially
through the patches, uses all the information it reads and never needs it
again, this method is indeed rather well resistant to swapping.
The next complication is of course when the large buffer is full. This can
be either because its byte space is full, or because the maximum number of
patches is exceeded. Because the sorting method uses quite a few variables
for each patch, there is a space allocated for them and hence there is a
maximum number of patches. If we set this to 2 (just for demonstration
purposes) we obtain:
\begin{verbatim}
#: SmallSize 200
#: LargePatches 2
S x1,...,x4;
L F = (x1+...+x4)^4;
.end
Time = 0.00 sec Generated terms = 9
F 1 Terms left = 9
Bytes used = 164
Time = 0.00 sec Generated terms = 17
F 1 Terms left = 17
Bytes used = 312
Time = 0.00 sec Generated terms = 26
F 1 Terms left = 26
Bytes used = 478
Time = 0.00 sec
F Terms active = 26
Bytes used = 474
Time = 0.00 sec Generated terms = 35
F 1 Terms left = 35
Bytes used = 630
Time = 0.00 sec
F Terms active = 35
Bytes used = 786
Time = 0.00 sec Generated terms = 35
F Terms in output = 35
Bytes used = 628
\end{verbatim}
We see that after the third small buffer has been sorted, the third patch
cannot be written to the large buffer. Hence the large buffer is sorted
(indicated by the special statistics involving the phrase `Terms active').
The result of this is written as a sorted patch to the
sort\index{sort file} file. This file is one of the
temporary\index{temporary files} files that \FORM\ can create. It has the
extension .sor\index{.sor extension}. Now the third patch can be written
into the --by now empty-- large buffer. At the end of term generation, the
last small buffer is sorted, its results written into the large buffer,
then that is sorted and its results written as the final patch into the
sort file. Then, finally the patches in the sort file are merged in a
method similar to the way the large buffer is sorted. This final sort is a
disk\index{disk to disk sort} to disk sort. Hence it can use the disk
rather intensely and the use of the CPU may drop temporarily, although it
is nothing so dramatic as when the computer is involved in heavy
inefficient swapping as can be the case with many other algebra programs.
Also, this is usually only a small fraction of the running time of the
program. The exception may be when \FORM\ is running several processes and
they are all using disk sorts simultaneously. In that case some file
systems may not be very good at handling the ensuing
traffic\index{traffic jam} jams.
Also the disk to disk sort will have a maximum number of patches that can
be sorted simultaneously. If this number is exceeded there will be one or
more extra stages\index{stages in the sorting} in the sorting, all of which
will be disk to disk sorts. It is advisable to tune the setup parameters in
such a way that one can prevent this, because it involves usually needless
use of resources. One can try to increase the parameter
FilePatches\index{filepatches}, but the problem is that \FORM\ uses a
caching\index{caching} system to buffer the inputs from the sort file. The
cache buffers have to have a size that is at least twice the maximum size
of a term. For each patch it needs a buffer and all buffers together should
fit inside the combination of the large buffer and the small extended
buffer. This puts an upper limit on the number of file patches.
Additionally this buffer (SortIOsize\index{sortiosize}) should not be very
small, because otherwise the disk IO operations are very inefficient. Hence
it helps often to increase the size of the small buffer and the large
buffer first. That gives fewer patches. Additionally it in turn can allow
for more file patches that are not too small.
One thing that one can see now is that if terms are to cancel or to add, it
is advantageous if this happens already in an early stage of the sorting.
This means that it is most efficient if these terms will end up in the
small buffer at the same time. This should explain the example given in the
section on brackets\index{brackets}. This way fewer terms are written to
the large buffer and/or the sort file, which means that less disk space
will be used.
The sizes of buffers involved can all be tuned to a given hardware. How
this is done is explained in the chapter on the setup\index{setup} \ref{setup}.
When \FORM\ is dealing with the arguments\index{arguments of functions} of
functions and if an argument is a multiterm subexpression, also such
subexpressions need to be sorted. In older versions of \FORM\ this was done
inside the at that moment remaining space of the small buffer and its
extension. The reason was that such subexpressions would be rather short
(they would have to fit inside a function argument and were hence limited
by the maximum size of a term) and buffer space was hard to come by in
computers with small memories. In the new version of \FORM\ other
subexpression sorts were added: the sorting in the term environment (see
\ref{substaterm}) and the sorting of \$-expressions. Both sorts do not have
the restriction of the maximum size of a term. They can result in
expressions that are arbitrarily long (although that might not give
efficient programs). Hence the sorting of subexpressions have now their own
buffers. And more than one such set may be needed if for instance the term
environment is used in a nested fashion. Of course the settings for the
buffers of this `subsort' are not quite as large as for the main buffers.
And the user can of course also influence their settings as explained in
the chapter on the setup \ref{setup}. This chapter gives also all default
values.
There is one restriction on the sorting of function arguments and
\$-expressions: They are not allowed to go into the stage4 sorting. Any
such attempt will result in an error message and the suggestion to raise
the size of the buffers for this type of sorting.
When \FORM\ is running in parallel mode (either \TFORM\ or \ParFORM) each worker
will need its own buffers. In \ParFORM\ in which the processors each control
their own memory, the size of each of these buffers are the same as for the
master process. In \TFORM\ with its shared memory the sizes that the user
selects for the sort buffers and the scratch file caches refer to the
buffers of the master thread. The workers each get basically buffers with
1/N times the size of the buffer of the master. They may be made a bit
bigger when potential conflicts with MaxTermSize occur.
|