File: arithmetic.rst

package info (click to toggle)
gappa 1.6.0-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,316 kB
sloc: cpp: 11,864; python: 59; makefile: 19; sh: 5
file content (111 lines) | stat: -rw-r--r-- 2,541 bytes
Supported arithmetic
====================

Rounding directions
-------------------

Some of the classes of operators presented in the following sections are
templated by a rounding direction. This is the direction chosen when
converting a real number that cannot be exactly represented in the
destination format.

There are eleven directions:

``zr``
   toward zero

``aw``
   away from zero

``dn``
   toward minus infinity (down)

``up``
   toward plus infinity

``od``
   to odd mantissas

``ne``
   to nearest, tie breaking to even mantissas

``no``
   to nearest, tie breaking to odd mantissas

``nz``
   to nearest, tie breaking toward zero

``na``
   to nearest, tie breaking away from zero

``nd``
   to nearest, tie breaking toward minus infinity

``nu``
   to nearest, tie breaking toward plus infinity

The rounding directions mandated by the IEEE-754 standard are ``ne``
(default mode, rounding to nearest), ``zr``, ``dn``, ``up``, and ``na``
(introduced for decimal arithmetic).

Floating-point operators
------------------------

This class of operators covers all the formats whose number sets
are :math:`F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}`. In
particular, IEEE-754 floating-point formats (with subnormal numbers) are
part of this class, if we set apart overflow issues. Both parameters
p and d select a particular format. The last parameter selects the
rounding direction.

::

   float< precision, minimum_exponent, rounding_direction >(...)

Formats with no minimal exponent (and thus no underflow) are also
available:

::

   float< precision, rounding_direction >(...)

Having to remember the precision and minimum exponent parameters may be
a bit tedious, so an alternate syntax is provided: instead of these two
parameters, a name can be given to the ``float`` class.

::

   float< name, rounding_direction >(...)

There are four predefined formats:

``ieee_32``
   IEEE-754 single precision

``ieee_64``
   IEEE-754 double precision

``ieee_128``
   IEEE-754 quadruple precision

``x86_80``
   extended precision on x86-like processors

Fixed-point operators
---------------------

This class of operators covers all the formats whose number sets
are :math:`F(e) = \{m \cdot 2^e\}`. The first parameter selects the
weight of the least significant bit. The second parameter selects the
rounding direction.

::

   fixed< lsb_weight, rounding_direction >(...)

Rounding to integer is a special case of fixed point rounding of weight
0. A syntactic shortcut is provided.

::

   int< rounding_direction >(...)