File: tutorial_numeric_limits.qbk

package info (click to toggle)
boost1.83 1.83.0-5
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 545,632 kB
  • sloc: cpp: 3,857,086; xml: 125,552; ansic: 34,414; python: 25,887; asm: 5,276; sh: 4,799; ada: 1,681; makefile: 1,629; perl: 1,212; pascal: 1,139; sql: 810; yacc: 478; ruby: 102; lisp: 24; csh: 6
file content (755 lines) | stat: -rw-r--r-- 29,176 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
[/
  Copyright 2011 - 2020 John Maddock.
  Copyright 2013 - 2019 Paul A. Bristow.
  Copyright 2013 Christopher Kormanyos.

  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]

[section:limits Numeric Limits]

Boost.Multiprecision tries hard to implement `std::numeric_limits` for all types
as far as possible and meaningful because experience with Boost.Math has shown that this aids portability.

The [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf C++ standard library]
defines `std::numeric_limits` in section 18.3.2.

This in turn refers to the C standard
[@http://www.open-std.org/jtc1/sc22/wg11/docs/n507.pdf SC22/WG11 N507 DRAFT INTERNATIONAL ISO/IEC STANDARD
 WD 10967-1]
Information technology Language independent arithmetic Part 1: Integer and floating-point arithmetic.

That C Standard in turn refers to

[@https://doi.org/10.1109/IEEESTD.1985.82928 IEEE754 IEEE Standard for Binary
Floating-Point Arithmetic]

There is a useful summary of `std::numeric_limits` at
[@http://www.cplusplus.com/reference/limits/numeric_limits/ C++ reference].

The chosen backend often determines how completely `std::numeric_limits` is available.

Compiler options, processor type, and definition of macros or assembler instructions to control denormal numbers will alter
the values in the tables given below.

[warning GMP's extendable floatin-point `mpf_t` does not have a concept of overflow:
operations that lead to overflow eventually run of out of resources
and terminate with stack overflow (often after several seconds).]

[section:constants std::numeric_limits<>  constants]

[h4 is_specialized]

`true` for all arithmetic types (integer, floating and fixed-point)
for which `std::numeric_limits<T>::numeric_limits` is specialized.

A typical test is

  if (std::numeric_limits<T>::is_specialized == false)
  {
    std::cout << "type " << typeid(T).name()  << " is not specialized for std::numeric_limits!" << std::endl;
  // ...
  }

Typically `numeric_limits<T>::is_specialized` is `true` for all `T` where the compile-time constant
members of `numeric_limits` are indeed known at compile time, and don't vary at runtime.  For example
floating-point types with runtime-variable precision such as `mpfr_float` have no `numeric_limits`
specialization as it would be impossible to define all the members at compile time.  In contrast
the precision of a type such as `mpfr_float_50` is known at compile time, and so it ['does] have a
`numeric_limits` specialization.

Note that not all the `std::numeric_limits` member constants and functions are meaningful for all user-defined types (UDT),
such as the decimal and binary multiprecision types provided here.  More information on this is given in the sections below.

[h4 infinity]

For floating-point types, [infin] is defined wherever possible,
but clearly infinity is meaningless for __arbitrary_precision arithmetic backends,
and there is one floating-point type (GMP's `mpf_t`, see __mpf_float) which has no notion
of infinity or NaN at all.

A typical test whether infinity is implemented is

  if(std::numeric_limits<T>::has_infinity)
  {
     std::cout << std::numeric_limits<T>::infinity() << std::endl;
  }

and using tests like this is strongly recommended to improve portability.

[warning If the backend is switched to a type that does not support infinity (or similarly NaNs) then,
without checks like this, there will be trouble.]

[h4 is_signed]

`std::numeric_limits<T>::is_signed == true` if the type `T` is signed.

For __fundamental binary types, the sign is held in a single bit,
but for other types (`cpp_dec_float` and `cpp_bin_float`)
it may be a separate storage element, usually `bool`.

[h4 is_exact]

`std::numeric_limits<T>::is_exact == true` if type T uses exact representations.

This is defined as `true` for all integer types and `false` for floating-point types.

[@http://stackoverflow.com/questions/14203654/stdnumeric-limitsis-exact-what-is-a-usable-definition A usable definition]
has been discussed.

ISO/IEC 10967-1, Language independent arithmetic, noted by the C++ Standard defines

  A floating-point type F shall be a finite subset of [real].

The important practical distinction is that all integers (up to `max()`) can be stored exactly.

[@http://en.wikipedia.org/wiki/Rational_number Rational]
types using two integer types are also exact.

Floating-point types [*cannot store all real values]
(those in the set of [real]) [*exactly].
For example, 0.5 can be stored exactly in a binary floating-point, but 0.1 cannot.
What is stored is the nearest representable real value, that is, rounded to nearest.

Fixed-point types (usually decimal) are also defined as exact, in that they only
store a [*fixed precision], so half cents or pennies (or less) cannot be stored.
The results of computations are rounded up or down,
just like the result of integer division stored as an integer result.

There are number of proposals to
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3407.html
add Decimal floating-point Support to C++].

[@http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n2849.pdf Decimal TR].

And also
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3352.html
C++ Binary Fixed-Point Arithmetic].

[h4 is_bounded]

`std::numeric_limits<T>::is_bounded == true` if the set of values represented by the type `T` is finite.

This is `true` for all __fundamental_type integer, fixed and floating-point types,
and most multi-precision types.

It is only `false` for a few __arbitrary_precision types like `cpp_int`.

Rational and fixed-exponent representations are exact but not integer.

[h4 is_modulo]

`std::numeric_limits<T>::is_modulo` is defined as `true` if adding two positive values of type T
can yield a result less than either value.

`is_modulo == true` means that the type does not overflow, but, for example,
'wraps around' to zero, when adding one to the `max()` value.

For most __fundamental integer types, `std::numeric_limits<>::is_modulo` is `true`.

`bool` is the only exception.

The modulo behaviour is sometimes useful,
but also can be unexpected, and sometimes undesired, behaviour.

Overflow of signed integers can be especially unexpected,
possibly causing change of sign.

Boost.Multiprecision integer type `cpp_int` is not modulo
because as an __arbitrary_precision types,
it expands to hold any value that the machine resources permit.

However fixed precision __cpp_int's may be modulo if they are unchecked
(i.e. they behave just like __fundamental integers), but not if they are checked
(overflow causes an exception to be raised).

__fundamental and multi-precision floating-point types are normally not modulo.

Where possible, overflow is to `std::numeric_limits<>::infinity()`,
provided `std::numeric_limits<>::has_infinity == true`.

[h4 radix]

Constant `std::numeric_limits<T>::radix` returns either 2 (for __fundamental and binary types)
or 10 (for decimal types).

[h4 digits]

The number of `radix` digits that be represented without change:

* for integer types, the number of [*non-sign bits] in the significand.
* for floating types, the number of [*radix digits] in the significand.

The values include any implicit bit, so for example, for the ubiquious
`double` using 64 bits
([@http://en.wikipedia.org/wiki/Double_precision_floating-point_format IEEE binary64 ]),
`digits` == 53, even though there are only 52 actual bits of the significand stored in the representation.
The value of `digits` reflects the fact that there is one implicit bit which is always set to 1.

The Boost.Multiprecision binary types do not use an implicit bit, so the
`digits` member reflects exactly how many bits of precision were requested:

  typedef number<cpp_bin_float<53, digit_base_2> >   float64;
  typedef number<cpp_bin_float<113, digit_base_2> >  float128;
  std::numeric_limits<float64>::digits == 53.
  std::numeric_limits<float128>::digits == 113.

For the most common case of `radix == 2`,
`std::numeric_limits<T>::digits` is the number of bits in the representation,
not counting any sign bit.

For a decimal integer type, when `radix == 10`, it is the number of decimal digits.

[h4 digits10]

Constant `std::numeric_limits<T>::digits10` returns the number of
decimal digits that can be represented without change or loss.

For example, `numeric_limits<unsigned char>::digits10` is 2.

This somewhat inscrutable definition means that an `unsigned char`
can hold decimal values `0..99`
without loss of precision or accuracy, usually from truncation.

Had the definition been 3 then that would imply it could hold 0..999,
but as we all know, an 8-bit `unsigned char` can only hold 0..255,
and an attempt to store 256 or more will involve loss or change.

For bounded integers, it is thus [*one less] than number of decimal digits
you need to display the biggest integer `std::numeric_limits<T>::max()`.
This value can be used to predict the layout width required for

[digits10_1]

For example, `unsigned short` is often stored in 16 bits,
so the maximum value is 0xFFFF or 65535.

[digits10_2]


For bounded floating-point types,
if we create a `double` with a value with `digits10` (usually 15) decimal digits,
`1e15` or `1000000000000000` :

[digits10_3]

and we can increment this value to `1000000000000001`
as expected and show the difference too.

But if we try to repeat this with more than `digits10` digits,

[digits10_4]

then we find that when we add one it has no effect,
and display show that there is loss of precision. See
[@http://en.wikipedia.org/wiki/Loss_of_significance Loss of significance or cancellation error].

So `digits10` is the number of decimal digits [*guaranteed] to be correct.

For example, 'round-tripping' for `double`:

* If a decimal string with at most `digits10`( == 15) significant decimal digits
is converted to `double` and then converted back to the
same number of significant decimal digits,
then the final string will match the original 15 decimal digit string.
* If a `double` floating-point number is converted to a decimal string
with at least 17 decimal digits
and then converted back to `double`,
then the result will be binary identical to the original `double` value.

For most purposes, you will much more likely want
`std::numeric_limits<>::max_digits10`,
the number of decimal digits that ensure that a change of one least significant bit (__ULP)
produces a different decimal digits string.

For the most common `double` floating-point type,`max_digits10` is `digits10+2`,
but you should use C++11 `max_digits10`
where possible (see [link boost_multiprecision.tut.limits.constants.max_digits10 below]).

[h4:max_digits10 max_digits10]

`std::numeric_limits<T>::max_digits10` was added for floating-point
because `digits10` decimal digits are insufficient to show
a least significant bit (ULP) change giving puzzling displays like

  0.666666666666667 != 0.666666666666667

from failure to 'round-trip', for example:

[max_digits10_2]

If you wish to ensure that a change of one least significant bit (ULP)
produces a different decimal digits string,
then `max_digits10` is the precision to use.

For example:

[max_digits10_3]

will display [pi] to the maximum possible precision using a `double`.

[max_digits10_4]

For integer types, `max_digits10` is implementation-dependent,
but is usually `digits10 + 2`.
This is the output field-width required for the maximum value of the type T
`std::numeric_limits<T>::max()` ['including a sign and a space].

So this will produce neat columns.

  std::cout << std::setw(std::numeric_limits<int>::max_digits10) ...

The extra two or three least-significant digits are 'noisy' and may be junk,
but if you want to 'round-trip' - printing a value out as a decimal digit string and reading it back in -
(most commonly during serialization and de-serialization)
you must use `os.precision(std::numeric_limits<T>::max_digits10)`.

[note For Microsoft Visual Studio 2010,
`std::numeric_limits<float>::max_digits10` is wrongly defined as 8. It should be 9.
]

[note For Microsoft Visual Studio before 2013 and the default floating-point
format, a small range of double-precision floating-point values with a
significand of approximately 0.0001 to 0.004 and exponent values of 1010 to
1014 do not round-trip exactly being off by one least significant bit,
for probably every third value of the significand.

A workaround is using the scientific or exponential format `std::scientific`.

Other older compilers also fail to implement round-tripping entirely fault-free, for example, see
[@https://www.exploringbinary.com/incorrectly-rounded-conversions-in-gcc-and-glibc/  Incorrectly Rounded Conversions in GCC and GLIBC].

For more details see
[@https://www.exploringbinary.com/incorrect-round-trip-conversions-in-visual-c-plus-plus/ Incorrect Round-Trip Conversions in Visual C++],
and references therein
and
[@https://arxiv.org/pdf/1310.8121.pdf  Easy Accurate Reading and Writing of Floating-Point Numbers, Aubrey Jaffer (August 2018)].

Microsoft VS2017 and other recent compilers, now use the
[@https://doi.org/10.1145/3192366.3192369 Ryu fast float-to-string conversion by Ulf Adams]
algorithm, claimed to be both exact and fast for 32 and 64-bit floating-point numbers.
] [/note]

[h4 round_style]

The rounding style determines how the result of floating-point operations
is treated when the result cannot be [*exactly represented] in the significand.
Various rounding modes may be provided:

* round to nearest up or down (default for floating-point types).
* round up (toward positive infinity).
* round down (toward negative infinity).
* round toward zero (integer types).
* no rounding (if decimal radix).
* rounding mode is not determinable.

For integer types, `std::numeric_limits<T>::round_style` is always towards zero, so

  std::numeric_limits<T>::round_style == std::round_to_zero;

A decimal type, `cpp_dec_float` rounds in no particular direction,
which is to say it doesn't round at all.
And since there are several guard digits,
it's not really the same as truncation (round toward zero) either.

For floating-point types, it is normal to round to nearest.

  std::numeric_limits<T>::round_style == std::round_to_nearest;

See function `std::numeric_limits<T>::round_error` for the maximum error (in ULP)
that rounding can cause.

[h4 has_denorm_loss]

`true` if a loss of precision is detected as a
[@http://en.wikipedia.org/wiki/Denormalization denormalization] loss,
rather than an inexact result.

Always `false` for integer types.

`false` for all types which do not have `has_denorm` == `std::denorm_present`.

[h4 denorm_style]

[@http://en.wikipedia.org/wiki/Denormal_number Denormalized values] are
representations with a variable number of exponent bits that can permit
gradual underflow, so that, if type T is `double`.

 std::numeric_limits<T>::denorm_min() < std::numeric_limits<T>::min()

A type may have any of the following `enum float_denorm_style` values:

* `std::denorm_absent`, if it does not allow denormalized values.
(Always used for all integer and exact types).
* `std::denorm_present`, if the floating-point type allows denormalized values.
*`std::denorm_indeterminate`, if indeterminate at compile time.

[h4 Tinyness before rounding]

`bool std::numeric_limits<T>::tinyness_before`

`true` if a type can determine that a value is too small
to be represent as a normalized value before rounding it.

Generally true for `is_iec559` floating-point __fundamantal types,
but false for integer types.

Standard-compliant IEEE 754 floating-point implementations may detect the floating-point underflow at three predefined moments:

# After computation of a result with absolute value smaller than
`std::numeric_limits<T>::min()`,
such implementation detects ['tinyness before rounding] (e.g. UltraSparc).

# After rounding of the result to `std::numeric_limits<T>::digits` bits,
if the result is tiny, such implementation detects ['tinyness after rounding]
(e.g. SuperSparc).

# If the conversion of the rounded tiny result to subnormal form
resulted in the loss of precision, such implementation detects ['denorm loss].

[endsect] [/section:constants std::numeric_limits<> Constants]

[section:functions `std::numeric_limits<>` functions]

[h4:max_function `max` function]

Function `(std::numeric_limits<T>::max)()` returns the largest finite value
that can be represented by the type T.  If there is no such value (and
`numeric_limits<T>::bounded` is `false`) then returns `T()`.

For __fundamental types there is usually a corresponding MACRO value TYPE_MAX,
where TYPE is CHAR, INT, FLOAT etc.

Other types, including those provided by a typedef,
for example `INT64_T_MAX` for `int64_t`, may provide a macro definition.

To cater for situations where no `numeric_limits` specialization is available
(for example because the precision of the type varies at runtime),
packaged versions of this (and other functions) are provided using

  #include <boost/math/tools/precision.hpp>

  T = boost::math::tools::max_value<T>();

Of course, these simply use `(std::numeric_limits<T>::max)()` if available,
but otherwise 'do something sensible'.

[h4 lowest function]

Since C++11: `std::numeric_limits<T>::lowest()` is

* For integral types, the same as function `min()`.
* For floating-point types, generally the negative of `max()`
(but implementation-dependent).

[digits10_5]

[h4:min_function `min` function]

Function `(std::numeric_limits<T>::min)()` returns the minimum finite value
that can be represented by the type T.

For __fundamental types, there is usually a corresponding MACRO value TYPE_MIN,
where TYPE is CHAR, INT, FLOAT etc.

Other types, including those provided by a `typedef`,
for example, `INT64_T_MIN` for `int64_t`, may provide a macro definition.

For floating-point types,
it is more fully defined as the ['minimum positive normalized value].

See `std::numeric_limits<T>::denorm_min()` for the smallest denormalized value, provided

  std::numeric_limits<T>::has_denorm == std::denorm_present

To cater for situations where no `numeric_limits` specialization is available
(for example because the precision of the type varies at runtime),
packaged versions of this (and other functions) are provided using

  #include <boost/math/tools/precision.hpp>

  T = boost::math::tools::min_value<T>();

Of course, these simply use `std::numeric_limits<T>::min()` if available.

[h4 denorm_min function]

Function `std::numeric_limits<T>::denorm_min()`
returns the smallest
[@http://en.wikipedia.org/wiki/Denormal_number denormalized value],
provided

  std::numeric_limits<T>::has_denorm == std::denorm_present

[denorm_min_1]

The exponent is effectively reduced from -308 to -324
(though it remains encoded as zero and leading zeros appear in the significand,
thereby losing precision until the significand reaches zero).

[h4 round_error]

Function `std::numeric_limits<T>::round_error()` returns the maximum error
(in units of __ULP)
that can be caused by any basic arithmetic operation.

  round_style == std::round_indeterminate;

The rounding style is indeterminable at compile time.

For floating-point types, when rounding is to nearest,
only half a bit is lost by rounding, and `round_error == 0.5`.
In contrast when rounding is towards zero, or plus/minus infinity,
we can loose up to one bit from rounding, and `round_error == 1`.

For integer types, rounding always to zero, so at worst almost one bit can be rounded,
so `round_error == 1`.

`round_error()` can be used with `std::numeric_limits<T>::epsilon()` to estimate
the maximum potential error caused by rounding.  For typical floating-point types,
`round_error() = 1/2`, so half epsilon is the maximum potential error.

[round_error_1]

There are, of course, many occasions when much bigger loss of precision occurs,
for example, caused by
[@http://en.wikipedia.org/wiki/Loss_of_significance Loss of significance or cancellation error]
or very many iterations.

[h4:epsilon epsilon]

Function `std::numeric_limits<T>::epsilon()` is meaningful only for non-integral types.

It returns the difference between `1.0` and the next value representable
by the floating-point type T.
So it is a one least-significant-bit change in this floating-point value.

For `double` (`float_64t`) it is `2.2204460492503131e-016`
showing all possibly significant 17 decimal digits.

[epsilon_1]

We can explicitly increment by one bit using the function `boost::math::float_next()`
and the result is the same as adding `epsilon`.

[epsilon_2]

Adding any smaller value, like half `epsilon`,  will have no effect on this value.

[epsilon_3]

So this cancellation error leaves the values equal, despite adding half `epsilon`.

To achieve greater portability over platform and floating-point type,
Boost.Math and Boost.Multiprecision provide a package of functions that
'do something sensible' if the standard `numeric_limits` is not available.
To use these `#include <boost/math/tools/precision.hpp>`.

[epsilon_4]

[h5:FP_tolerance Tolerance for Floating-point Comparisons]

[@https://en.wikipedia.org/wiki/Machine_epsilon Machine epsilon [epsilon]]
is very useful to compute a tolerance when comparing floating-point values,
a much more difficult task than is commonly imagined.

The C++ standard specifies [@https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon  `std::numeric_limits<>::epsilon()`]
and Boost.Multiprecision implements this (where possible) for its program-defined types analogous to the
__fundamental floating-point types like `double` `float`.

For more information than you probably want (but still need) see
[@http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html What Every Computer Scientist Should Know About Floating-Point Arithmetic]

The naive test comparing the absolute difference between two values and a tolerance
does not give useful results if the values are too large or too small.

So Boost.Test uses an algorithm first devised by Knuth
for reliably checking if floating-point values are close enough.

See Donald. E. Knuth. The art of computer programming (vol II).
Copyright 1998 Addison-Wesley Longman, Inc., 0-201-89684-2.
Addison-Wesley Professional; 3rd edition. (The relevant equations are in paragraph 4.2.2, Eq. 36 and 37.)

See [@https://www.boost.org/doc/libs/release/libs/test/doc/html/boost_test/testing_tools/extended_comparison/floating_point/floating_points_comparison_theory.html Boost.Math floating_point comparison]
for more details.

See also:

[@http://adtmag.com/articles/2000/03/15/comparing-floats-how-to-determine-if-floating-quantities-are-close-enough-once-a-tolerance-has-been.aspx Alberto Squassia, Comparing floats]

[@http://adtmag.com/articles/2000/03/16/comparing-floats-how-to-determine-if-floating-quantities-are-close-enough-once-a-tolerance-has-been.aspx Alberto Squassia, Comparing floats code]

[@https://www.boost.org/doc/libs/release/libs/test/doc/html/boost_test/testing_tools/extended_comparison/floating_point.html Boost.Test Floating-Point_Comparison]

[tolerance_1]

used thus:

  cd ./test
  BOOST_CHECK_CLOSE_FRACTION(expected, calculated, tolerance);

(There is also a version BOOST_CHECK_CLOSE using tolerance as a [*percentage] rather than a fraction;
usually the fraction version is simpler to use).

[tolerance_2]

[h4:infinity Infinity - positive and negative]

For floating-point types only, for which
`std::numeric_limits<T>::has_infinity == true`,
function `std::numeric_limits<T>::infinity()`
provides an implementation-defined representation for [infin].

The 'representation' is a particular bit pattern reserved for infinity.
For IEEE754 system (for which `std::numeric_limits<T>::is_iec559 == true`)
[@http://en.wikipedia.org/wiki/IEEE_754-1985#Positive_and_negative_infinity positive and negative infinity]
are assigned bit patterns for all defined floating-point types.

Confusingly, the string resulting from outputting this representation, is also
implementation-defined. And the string that can be input to generate the representation is also implementation-defined.

For example, the output is `1.#INF` on Microsoft systems, but `inf` on most *nix platforms.

This implementation-defined-ness has hampered use of infinity (and NaNs)
but __Boost_Math and __Boost_Multiprecision work hard to provide a sensible representation
for [*all] floating-point types, not just the __fundamental_types,
which with the use of suitable facets to define the input and output strings, makes it possible
to use these useful features portably and including __Boost_Serialization.

[h4 Not-A-Number NaN]

[h5 Quiet_NaN]

For floating-point types only, for which
`std::numeric_limits<T>::has_quiet_NaN == true`,
function `std::numeric_limits<T>::quiet_NaN()`
provides an implementation-defined representation for NaN.

[@http://en.wikipedia.org/wiki/NaN NaNs] are values to indicate that the
result of an assignment or computation is meaningless.
A typical example is `0/0` but there are many others.

NaNs may also be used, to represent missing values: for example,
these could, by convention, be ignored in calculations of statistics like means.

Many of the problems with a representation for
[@http://en.wikipedia.org/wiki/NaN Not-A-Number] has hampered portable use,
similar to those with infinity.

[nan_1]

But using Boost.Math and suitable facets can permit portable use
of both NaNs and positive and negative infinity.

[facet_1]

[h5 Signaling NaN]

For floating-point types only, for which
`std::numeric_limits<T>::has_signaling_NaN == true`,
function `std::numeric_limits<T>::signaling_NaN()`
provides an implementation-defined representation for NaN that causes a hardware trap.
It should be noted however, that at least one implementation of this function causes a hardware
trap to be triggered simply by calling `std::numeric_limits<T>::signaling_NaN()`, and not only
by using the value returned.

[endsect] [/section:functions std::numeric_limits<>  functions]

[/ Tables of values for numeric_limits for various __fundamental and cpp_bin_float types]
[include numeric_limits_32_tables.qbk]
[/include numeric_limits_64_tables.qbk]

[section:how_to_tell How to Determine the Kind of a Number From `std::numeric_limits`]

Based on the information above, one can see that different kinds of numbers can be
differentiated based on the information stored in `std::numeric_limits`.  This is
in addition to the traits class [link boost_multiprecision.ref.number.traits_class_support
number_category] provided by this library.

[h4 Integer Types]

For an integer type T, all of the following conditions hold:

   std::numeric_limits<T>::is_specialized == true
   std::numeric_limits<T>::is_integer == true
   std::numeric_limits<T>::is_exact == true
   std::numeric_limits<T>::min_exponent == 0
   std::numeric_limits<T>::max_exponent == 0
   std::numeric_limits<T>::min_exponent10 == 0
   std::numeric_limits<T>::max_exponent10 == 0

In addition the type is /signed/ if:

   std::numeric_limits<T>::is_signed == true

If the type is arbitrary precision then:

   std::numeric_limits<T>::is_bounded == false

Otherwise the type is bounded, and returns a non zero value
from:

   std::numeric_limits<T>::max()

and has:

   std::numeric_limits<T>::is_modulo == true

if the type implements modulo arithmetic on overflow.

[h4 Rational Types]

Rational types are just like integers except that:

   std::numeric_limits<T>::is_integer == false

[h4 Fixed Precision Types]

There appears to be no way to tell these apart from rational types, unless they set:

   std::numeric_limits<T>::is_exact == false

This is because these types are in essence a rational type with a fixed denominator.

[h4 floating-point Types]

For a floating-point type T, all of the following conditions hold:

   std::numeric_limits<T>::is_specialized == true
   std::numeric_limits<T>::is_integer == false
   std::numeric_limits<T>::is_exact == false
   std::numeric_limits<T>::min_exponent != 0
   std::numeric_limits<T>::max_exponent != 0
   std::numeric_limits<T>::min_exponent10 != 0
   std::numeric_limits<T>::max_exponent10 != 0

In addition the type is /signed/ if:

   std::numeric_limits<T>::is_signed == true

And the type may be decimal or binary depending on the value of:

   std::numeric_limits<T>::radix

In general, there are no arbitrary precision floating-point types, and so:

   std::numeric_limits<T>::is_bounded == false

[h4 Exact floating-point Types]

Exact floating-point types are a [@http://en.wikipedia.org/wiki/Field_%28mathematics%29 field]
composed of an arbitrary precision integer scaled by an exponent.  Such types
have no division operator and are the same as floating-point types except:

      std::numeric_limits<T>::is_exact == true

[h4 Complex Numbers]

For historical reasons, complex numbers do not specialize `std::numeric_limits`, instead you must
inspect `std::numeric_limits<typename T::value_type>`.

[endsect] [/section:how_to_tell How to Determine the Kind of a Number From `std::numeric_limits`]

[endsect] [/section:limits Numeric Limits]