File: float.tex

package info (click to toggle)
form 5.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 8,312 kB
  • sloc: ansic: 110,546; cpp: 20,395; sh: 5,874; makefile: 545
file content (305 lines) | stat: -rw-r--r-- 12,779 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305

\chapter{Floating point arithmetic}
\label{floatingpoint}

Starting with version 5.0, \FORM{} is equiped with arbitrary precision floating point 
arithmetic. The low level routines are handled by the GMP and MPFR libraries, 
which are available on most systems and if missing can be easily picked up 
from the internet. This chapter describes the commands, functions, and behaviour
of \FORM's floating point sytem.

\section{Initializing and closing the floating point system}
Before any floating-point operations can be performed, \FORM{} must activate the 
floating point system and set the working precision. This initialization allocates 
the internal data structures used by the GMP and MPFR libraries. The system remains 
active until the end of the program, or until it is explicitly closed.
The two statements that control these operations are:
\begin{description}
\item[\#StartFloat] This instruction initializes the floating 
point system and allocates the necessary internal arrays. 
It takes either one or two arguments:
\begin{verbatim}
    #StartFloat <precision> [,MZV=<maximumweight>]
\end{verbatim}
The first argument is mandatory and specifies the desired precision. It must 
be a positive integer followed by either a \texttt{b} (for precision in bits) 
or \texttt{d} (for precision in decimal digits).
\FORM{} will round to at least this precision. 
The second argument is optional and only needed when working with multiple 
zeta values (MZVs) or Euler sums. It specifies the maximum weight 
that will be used. The evaluation of the sums requires a 
number of auxiliary arrays that depend on this weight. The default weight is zero. 
\item[\#EndFloat] This instruction releases all arrays allocated for the 
floating point system. Note that if one would like to change the precision during a run, 
this is now possible with a new \texttt{\#StartFloat} instruction. 
\end{description}
Example programs that illustrate the use of these statements and the 
functionality of \FORM's floating point system are given below. 


\section{Conversion between rational and floating point coefficients}
A term in an expression can have a rational or floating point coefficient. 
The following statements convert between the two.
\begin{description}
\item[ToFloat] Converts rational coefficients to 
floating point numbers in the precision specified by \texttt{\#StartFloat}. 
From this point on, the coefficient will be floating point. 
\item[ToRational] Attempts to convert floating point coefficients to 
rational numbers. To this end it uses continued fractions as in
\begin{eqnarray}
	x \;\rightarrow\; n_0 + \frac{1}{\,n_1 + \frac{1}{\,n_2 + \frac{1}{\,n_3 + \cdots}}}\;,
	\nonumber
\end{eqnarray}
with $x$ a floating point number. The algorithm keeps track of the 
remaining precision and if $1/n_i$ is close to this precision it truncates 
the sequence at $n_{i-1}$. After that it works out the corresponding fraction. 
It could be that $x$ cannot be expressed as a fraction within the given precision. 
This can usually be seen by that the fractions are `rather wild', or that 
the result changes when the precision is increased. This statement can also 
be abbreviated as \texttt{ToRat}.
\end{description}

The above statements operate on ground level coefficient only. To convert numbers 
inside a function argument, one must use the \texttt{Argument} environment. 
For example: 
\begin{verbatim}
    CFunction f;
    #StartFloat 10d
    Local F = 0.1666666666*f(0.1428571429);
    ToRat;
    Print "<1> %t";
    Argument f;
        ToRat;
    EndArgument;
    Print "<2> %t";
    .end
<1>  + 1/6*f(1.428571429e-01)
<2>  + 1/6*f(1/7)
\end{verbatim}
The argument environment may be nested.
Similarly, the statements \texttt{Evaluate}, \texttt{StrictRounding} and \texttt{Chop} act at
the ground level. To have them act on function argument, one uses the \texttt{Argument} environment.
These statements are explained further below. 

\section{Evaluation of functions and symbols}
Before version 5.0, \FORM{} already reserved function names for many common mathematical
functions. These functions can now be evaluated numerically using:

\begin{description}
\item[Evaluate] This statement evaluates the mathematical functions and or symbols numerically:
\begin{verbatim}
Evaluate [function(s)],[symbol(s)];
\end{verbatim}
where the argument specifies the function(s) and/or symbol(s) to evaluate. 
More than one function and/or symbol may be listed. 
If this statement is used without arguments, all floating point functions and symbols that \FORM{} 
knows will be evaluated. Currently, the full list of functions that can be evaluated numerically reads 
\begin{verbatim}
sqrt_, ln_, eexp_, li2_, gamma_, agm_, 
sin_, cos_, tan_, asin_, acos_, atan_, atan2_, 
sinh_, cosh_, tanh_, asinh_, acosh_, atanh_,
mzv_, euler_, mzvhalf_,
\end{verbatim}
where the functions on the last line denote the multiple zeta values, Euler sums and 
harmonic polylogarithms of argument $1/2$ respectively.
The list of symbols/constants that can be evaluated is 
\begin{verbatim}
pi_, ee_, em_,
\end{verbatim}
where \texttt{ee\_}\index{ee\_} denotes the basis of the natural logarithm 
and \texttt{em\_}\index{em\_} the Euler-Mascheroni constant.

In addition, the functions \texttt{lin\_}, \texttt{hpl\_} and \texttt{mpl\_} are reserved function names, 
but currently have no numerical evaluation.
\end{description}


\section{Rounding behaviour}
\begin{description}
\item[StrictRounding] This statement rounds floating point numbers to a 
given precision:
\begin{verbatim}
	StrictRounding [<precision>];
\end{verbatim}
where \texttt{<precision>} is an optional argument that specifies the rounding 
precision in either digits or bits, using the same syntax as 
\texttt{\#startfloat}. If omitted, the default precision is used. 

Internally, the GMP and MPFR libraries may use extra precision beyond that set by 
\texttt{\#startfloat}. As a result, terms that print the same may still differ slightly 
due to this extra precision and therefore fail to merge. For example:
\begin{verbatim}
    #startfloat 6d
    CFunction f;
    Local F = f(1.0)+f(1.0000001);
    Print;
    .sort
\end{verbatim}
results in \texttt{F = f(1.0e+00)+f(1.0e+00);}. Although it may appear 
that the terms should merge, the extra precision maintained by GMP 
prevents this, even though it is not displayed at 6 digits of 
precision. Using the strictrounding statement, one can force rounding 
to exactly 6 digits. Indeed:
\begin{verbatim}
    Argument f;
        strictrounding;
    Argument;
    Print;
    .end
\end{verbatim}
results in \texttt{F = 2*f(1.0e+00);}.
Notice that rounding in bits may produce unexpected results when viewed 
in decimal digits. For example, the decimal number 1.1e-4 cannot be 
represented exactly in binary. Its binary representation, up to 20 bits of precision, is 
$1.1100110101011111101*2^{-14}$. When rounded to 5 bits, this becomes 
$1.1101*2^{-14}$, which in decimal digits appears as 
1.10626220703125e-04.
\item[Chop] This statement removes floating point numbers that are {\em smaller}
in absolute magnitude than a specified threshold. It takes one argument:
\begin{verbatim}
    Chop <delta>;
\end{verbatim}
All floating point numbers with absolute value {\em less} than \texttt{<delta>} are replaced by 0. 
Terms with no floating point coefficient are left untouched. The threshold \texttt{<delta>} 
can be a floating point number, integer, rational number, or power. Because 
statements in \FORM{} act term by term, it is often important to sort before invoking the 
chop statement. Otherwise, terms might be removed individually, while after 
sorting and combining, their combined floating point coefficient could exceed 
the specified chop threshold. 
\item[Format floatprecision] This instruction controls how many digits are 
displayed when printing floating point numbers. It only affects output 
formatting and does not influence the internal precision or accuracy of 
computations. It can be used in three ways: in case of 
\begin{verbatim}
    Format floatprecision;
\end{verbatim}
\FORM{} prints floats with the number of digits specified by the current 
\texttt{\#startfloat} instruction. With
\begin{verbatim}
    Format floatprecision <precision>;
\end{verbatim}
\FORM{} prints the number of digits specified by \texttt{<precision>}. 
The syntax is the same as for the precision in \texttt{\#startfloat}. 
If the requested precision exceeds the precision specified by 
\texttt{\#startfloat}, only the available digits are printed. Finally, with 
\begin{verbatim}
    Format floatprecision off;
\end{verbatim}
the floating point numbers are printed in raw internal format, see also section \ref{sec:float_raw}. 
\end{description}

\section{Examples}
The following example shows some work with Multiple Zeta Values (MZV's):
\begin{verbatim}
    #StartFloat 500b, MZV=15
    L F1 =
    	-mzv_(8,1,1,5)
    	+29056868/39414375*mzv_(2)^6*mzv_(3)
    	-47576/40425*mzv_(2)^5*mzv_(5)
    	-163291/18375*mzv_(2)^4*mzv_(7)
    	-4/105*mzv_(2)^3*mzv_(3)^3
    	-450797/11025*mzv_(2)^3*mzv_(9)
    	+7/5*mzv_(2)^2*mzv_(3)^2*mzv_(5)
    	+16/25*mzv_(2)^2*mzv_(3)*mzv_(5,3)
    	+454049/1400*mzv_(2)^2*mzv_(11)
    	-16/25*mzv_(2)^2*mzv_(5,3,3)
    	+3*mzv_(2)*mzv_(3)^2*mzv_(7)
    	+61/14*mzv_(2)*mzv_(3)*mzv_(5)^2
    	+2/7*mzv_(2)*mzv_(3)*mzv_(7,3)
    	+2172853/420*mzv_(2)*mzv_(13)
    	-2/7*mzv_(2)*mzv_(7,3,3)
    	+1/7*mzv_(2)*mzv_(5,5,3)
    	-33/4*mzv_(3)^2*mzv_(9)
    	-133/6*mzv_(3)*mzv_(5)*mzv_(7)
    	-25/9*mzv_(3)*mzv_(9,3)
    	-244/105*mzv_(5)^3
    	-359/105*mzv_(5)*mzv_(7,3)
    	+3/10*mzv_(7)*mzv_(5,3)
    	+89/18*mzv_(9,3,3)
    	+569/105*mzv_(7,3,5);
    L F2 = mzv_(15);
    Evaluate mzv_;
    Print +f;
    .sort

   F1 =
       9.1234206877960755900164875575406726239325002222490534540605137258846994\
       916348297032751308227224952419629422497720599224543719959652966613231560\
       6913926e+03;

   F2 =
       1.0000305882363070204935517285106450625876279487068581775065699328933322\
       671563422795730723343470175484943669684442492832530297757588781904321794\
       40477e+00;

    Skip F1,F2;
    L	X = F1/F2;
    Torational;
    Print +f;
    .end

   X =
      229903169/25200;

  0.08 sec out of 0.09 sec
\end{verbatim}
In the first module, \texttt{\#startfloat} initializes the floating point system with 
500 bits of precision and a maximum weight for the MZVs and Euler sums of 15. 
The \texttt{mzv\_} functions are then evaluated with the \texttt{Evaluate}
statement. In the second module we divide the 
numbers and convert the result to a rational. It is a good idea to try this 
with various precisions to see whether this is stable. With 60 bits the 
final answer would be
\begin{verbatim}
    145537942402475031/15952490572234;
\end{verbatim}
while at 150 bits we have already the same answer as with 500 bits. The 
fraction that is obtained by this program can be proven to be correct.


\section{Raw form}
\label{sec:float_raw}
Internally, floating point numbers are represented by the function \texttt{float\_}, 
i.e. \texttt{float\_(prec, size, exp, limbs)}. The integer arguments encode the 
internal representation of the floating point number as in the GMP library:
\begin{description}
\item[prec] The precision of the mantissa in limbs.
\item[size] The number of limbs currently in use.
\item[exp] The exponent, determining the location of the implied radix point.
\item[limbs] The limbs packed as the numerator of a \FORM{} rational. 
\end{description}
In a normalized term containing \texttt{float\_}, the rational coefficient must 
be either $1/1$ or $-1/1$, where the sign of the term is absorbed into the rational 
coefficient. 
Furthermore, the \texttt{float\_} is protected from the pattern matcher and from 
statements that act on functions -- such as \texttt{Transform}, \texttt{Argument}, 
\texttt{Normalize} etc.
The following program illustrates this:
%
\begin{verbatim}
    CFunction f;
    #StartFloat 10d
    Local F = 1.23456789 + f(1,2);
    Identify f?(?a) = f(10);
    Print "<1> %t";
    .sort
<1>  + 1.23456789e+00
<1>  + f(10)
    #EndFloat
    Normalize;
    Print "<2> %t";
    .sort
<2>  + float_(2,3,1,420101683733788795657820481376616399786)
<2>  + 10*f(1)
    #StartFloat 5d
    Print "<3> %t";
    .end
<3>  + 1.2346e+00
<3>  + 10*f(1)
\end{verbatim}
%
As shown, the \texttt{id}-statement does not effect the \texttt{float\_} function. 
Here we also see the use of the preprocessor statement \texttt{\#EndFloat} which closes 
the floating point system. After this statement, the \texttt{float\_} function becomes a 
regular function. Its protected status, however, persists so that \texttt{id}-statements 
or statements like \texttt{Normalize} still do not modify it.