File: float_next.qbk

package info (click to toggle)
scipy 1.16.0-1exp7
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 234,820 kB
  • sloc: cpp: 503,145; python: 344,611; ansic: 195,638; javascript: 89,566; fortran: 56,210; cs: 3,081; f90: 1,150; sh: 848; makefile: 785; pascal: 284; csh: 135; lisp: 134; xml: 56; perl: 51
file content (292 lines) | stat: -rw-r--r-- 11,329 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
[section:next_float Floating-Point Representation Distance (ULP),
   and Finding Adjacent Floating-Point Values]

[@http://en.wikipedia.org/wiki/Unit_in_the_last_place Unit of Least Precision or Unit in the Last Place]
is the gap between two different, but as close as possible, floating-point numbers.

Most decimal values, for example 0.1, cannot be exactly represented as floating-point values,
but will be stored as the
[@http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding
closest representable floating-point].

Functions are provided for finding adjacent greater and lesser floating-point values,
and estimating the number of gaps between any two floating-point values.

The floating-point type (FPT) must have has a fixed number of bits in the representation.
The number of bits may set at runtime, but must be the same for all numbers.
For example, __NTL_quad_float type (fixed 128-bit representation),
__NTL_RR type (arbitrary but fixed decimal digits, default 150) or
__multiprecision __cpp_dec_float and__cpp_bin_float  are fixed at runtime,
but [*not] a type that extends the representation to provide an exact representation
for any number, for example [@http://keithbriggs.info/xrc.html XRC eXact Real in C].

The accuracy of mathematical functions can be assessed and displayed in terms of __ULP,
often as a ulps plot or by binning the differences as a histogram.
Samples are evaluated using the implementation under test and compared with 'known good'
representation obtained using a more accurate method.  Other implementations, often using
arbitrary precision arithmetic, for example __WolframAlpha are one source of references
values.  The other method, used widely in Boost.Math special functions, it to carry out
the same algorithm, but using a higher precision type, typically using Boost.Multiprecision
types like `cpp_bin_float_quad` for 128-bit (about 35 decimal digit precision), or
`cpp_bin_float_50` (for 50 decimal digit precision).

When converted to a particular machine representation, say `double`, say using a `static_cast`,
the value is the nearest representation possible for the `double` type.  This value
cannot be 'wrong' by more than half a __ulp, and can be obtained using the Boost.Math function `ulp`.
(Unless the algorithm is fundamentally flawed, something that should be revealed by 'sanity'
checks using some independent sources).

See some discussion and example plots by Cleve Moler of Mathworks
[@https://blogs.mathworks.com/cleve/2017/01/23/ulps-plots-reveal-math-function-accurary/
ulps plots reveal math-function accuracy].

[section:nextafter Finding the Next Representable Value in a Specific Direction (nextafter)]

[h4 Synopsis]

``
#include <boost/math/special_functions/next.hpp>
``

  namespace boost{ namespace math{

  template <class FPT>
  FPT nextafter(FPT val, FPT direction);

  }} // namespaces

[h4 Description - nextafter]

This is an implementation of the `nextafter` function included in the C99 standard.
(It is also effectively an implementation of the C99 `nexttoward` legacy function
which differs only having a `long double` direction,
and can generally serve in its place if required).

[note The C99 functions must use suffixes f and l to distinguish `float` and `long double` versions.
C++ uses the template mechanism instead.]

Returns the next representable value after /x/ in the direction of /y/.  If
`x == y` then returns /x/.  If /x/ is non-finite then returns the result of
a __domain_error.  If there is no such value in the direction of /y/ then
returns an __overflow_error.

[warning The template parameter FTP must be a floating-point type.
An integer type, for example, will produce an unhelpful error message.]

[tip Nearly always, you just want the next or prior representable value,
so instead use `float_next` or `float_prior` below.]

[h4 Examples - nextafter]

The two representations using a 32-bit float either side of unity are:
``
The nearest (exact) representation of 1.F is      1.00000000
nextafter(1.F, 999) is                            1.00000012
nextafter(1/f, -999) is                           0.99999994

The nearest (not exact) representation of 0.1F is 0.100000001
nextafter(0.1F, 10) is                            0.100000009
nextafter(0.1F, 10) is                            0.099999994
``

[endsect] [/section:nextafter Finding the Next Representable Value in a Specific Direction (nextafter)]

[section:float_next Finding the Next Greater Representable Value (float_next)]

[h4 Synopsis]

``
#include <boost/math/special_functions/next.hpp>
``

   namespace boost{ namespace math{

   template <class FPT>
   FPT float_next(FPT val);

   }} // namespaces

[h4 Description - float_next]

Returns the next representable value which is greater than /x/.
If /x/ is non-finite then returns the result of
a __domain_error.  If there is no such value greater than /x/ then
returns an __overflow_error.

Has the same effect as

  nextafter(val, (std::numeric_limits<FPT>::max)());

[endsect] [/section:float_next Finding the Next Greater Representable Value (float_prior)]

[section:float_prior Finding the Next Smaller Representable Value (float_prior)]

[h4 Synopsis]

``
#include <boost/math/special_functions/next.hpp>
``

   namespace boost{ namespace math{

   template <class FPT>
   FPT float_prior(FPT val);

   }} // namespaces


[h4 Description - float_prior]

Returns the next representable value which is less than /x/.
If /x/ is non-finite then returns the result of
a __domain_error.  If there is no such value less than /x/ then
returns an __overflow_error.

Has the same effect as

  nextafter(val, -(std::numeric_limits<FPT>::max)());  // Note most negative value -max.

[endsect] [/section:float_prior Finding the Next Smaller Representable Value (float_prior)]

[section:float_distance Calculating the Representation Distance
   Between Two floating-point Values (ULP) float_distance]

Function float_distance finds the number of gaps/bits/ULP between any two floating-point values.
If the significands of floating-point numbers are viewed as integers,
then their difference is the number of ULP/gaps/bits different.

[h4 Synopsis]

``
#include <boost/math/special_functions/next.hpp>
``

   namespace boost{ namespace math{

   template <class FPT>
   FPT float_distance(FPT a, FPT b);

   }} // namespaces

[h4 Description - float_distance]

Returns the distance between /a/ and /b/: the result is always
a signed integer value (stored in floating-point type FPT)
representing the number of distinct representations between /a/ and /b/.

Note that

* `float_distance(a, a)` always returns 0.
* `float_distance(float_next(a), a)` always returns -1.
* `float_distance(float_prior(a), a)` always returns 1.

The function `float_distance` is equivalent to calculating the number
of ULP (Units in the Last Place) between /a/ and /b/ except that it
returns a signed value indicating whether `a > b` or not.

If the distance is too great then it may not be able
to be represented as an exact integer by type FPT,
but in practice this is unlikely to be a issue.

[endsect] [/section:float_distance Calculating the Representation Distance
   Between Two floating-point Values (ULP) float_distance]

[section:float_advance Advancing a floating-point Value by a Specific
Representation Distance (ULP) float_advance]

Function `float_advance` advances a floating-point number by a specified number
of ULP.

[h4 Synopsis]

``
#include <boost/math/special_functions/next.hpp>
``

   namespace boost{ namespace math{

   template <class FPT>
   FPT float_advance(FPT val, int distance);

   }} // namespaces

[h4 Description - float_advance]

Returns a floating-point number /r/ such that `float_distance(val, r) == distance`.

[endsect] [/section:float_advance]

[section:ulp Obtaining the Size of a Unit In the Last Place - ULP]

Function `ulp` gives the size of a unit-in-the-last-place for a specified floating-point value.

[h4 Synopsis]

``
#include <boost/math/special_functions/ulp.hpp>
``

   namespace boost{ namespace math{

   template <class FPT>
   FPT ulp(const FPT& x);

   template <class FPT, class Policy>
   FPT ulp(const FPT& x, const Policy&);

   }} // namespaces

[h4 Description - ulp]

Returns one [@http://en.wikipedia.org/wiki/Unit_in_the_last_place unit in the last place] of ['x].

Corner cases are handled as follows:

* If the argument is a NaN, then raises a __domain_error.
* If the argument is an infinity, then raises an __overflow_error.
* If the argument is zero then returns the smallest representable value: for example for type
`double` this would be either `std::numeric_limits<double>::min()` or `std::numeric_limits<double>::denorm_min()`
depending whether denormals are supported (which have the values `2.2250738585072014e-308` and `4.9406564584124654e-324` respectively).
* If the result is too small to represent, then returns the smallest representable value.
* Always returns a positive value such that `ulp(x) == ulp(-x)`.

[*Important:]  The behavior of this function is aligned to that of [@http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#ulp%28double%29
Java's ulp function], please note
however that this function should only ever be used for rough and ready calculations as there are enough
corner cases to trap even careful programmers.  In particular:

* The function is asymmetrical, which is to say, given `u = ulp(x)` if `x > 0` then `x + u` is the
next floating-point value, but `x - u` is not necessarily the previous value.  Similarly, if
`x < 0` then `x - u` is the previous floating-point value, but `x + u` is not necessarily the next
value.  The corner cases occur at power of 2 boundaries.
* When the argument becomes very small, it may be that there is no floating-point value that
represents one ULP.  Whether this is the case or not depends not only on whether the hardware
may ['sometimes] support denormals (as signalled by `boost::math::detail::has_denorm_now<FPT>()`), but also whether these are
currently enabled at runtime (for example on SSE hardware, the DAZ or FTZ flags will disable denormal support).
In this situation, the `ulp` function may return a value that is many orders of magnitude too large.

In light of the issues above, we recommend that:

* To move between adjacent floating-point values always use __float_next, __float_prior or __nextafter (`std::nextafter`
is another candidate, but our experience is that this also often breaks depending which optimizations and
hardware flags are in effect).
* To move several floating-point values away use __float_advance.
* To calculate the edit distance between two floats use __float_distance.

There is none the less, one important use case for this function:

If it is known that the true result of some function is x[sub t] and the calculated result
is x[sub c], then the error measured in ulp is simply [^fabs(x[sub t] - x[sub c]) / ulp(x[sub t])].

[endsect] [/section ulp]

[endsect] [/ section:next_float Floating-Point Representation Distance (ULP),
   and Finding Adjacent Floating-Point Values]

[/
  Copyright 2008 John Maddock and Paul A. Bristow.
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]