1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
|
Supported arithmetic
====================
Rounding directions
-------------------
Some of the classes of operators presented in the following sections are
templated by a rounding direction. This is the direction chosen when
converting a real number that cannot be exactly represented in the
destination format.
There are eleven directions:
``zr``
toward zero
``aw``
away from zero
``dn``
toward minus infinity (down)
``up``
toward plus infinity
``od``
to odd mantissas
``ne``
to nearest, tie breaking to even mantissas
``no``
to nearest, tie breaking to odd mantissas
``nz``
to nearest, tie breaking toward zero
``na``
to nearest, tie breaking away from zero
``nd``
to nearest, tie breaking toward minus infinity
``nu``
to nearest, tie breaking toward plus infinity
The rounding directions mandated by the IEEE-754 standard are ``ne``
(default mode, rounding to nearest), ``zr``, ``dn``, ``up``, and ``na``
(introduced for decimal arithmetic).
Floating-point operators
------------------------
This class of operators covers all the formats whose number sets
are :math:`F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}`. In
particular, IEEE-754 floating-point formats (with subnormal numbers) are
part of this class, if we set apart overflow issues. Both parameters
p and d select a particular format. The last parameter selects the
rounding direction.
::
float< precision, minimum_exponent, rounding_direction >(...)
Formats with no minimal exponent (and thus no underflow) are also
available:
::
float< precision, rounding_direction >(...)
Having to remember the precision and minimum exponent parameters may be
a bit tedious, so an alternate syntax is provided: instead of these two
parameters, a name can be given to the ``float`` class.
::
float< name, rounding_direction >(...)
There are four predefined formats:
``ieee_32``
IEEE-754 single precision
``ieee_64``
IEEE-754 double precision
``ieee_128``
IEEE-754 quadruple precision
``x86_80``
extended precision on x86-like processors
Fixed-point operators
---------------------
This class of operators covers all the formats whose number sets
are :math:`F(e) = \{m \cdot 2^e\}`. The first parameter selects the
weight of the least significant bit. The second parameter selects the
rounding direction.
::
fixed< lsb_weight, rounding_direction >(...)
Rounding to integer is a special case of fixed point rounding of weight
0. A syntactic shortcut is provided.
::
int< rounding_direction >(...)
|