File: arithmetic.rst

package info (click to toggle)
gappa 1.6.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,316 kB
  • sloc: cpp: 11,864; python: 59; makefile: 19; sh: 5
file content (111 lines) | stat: -rw-r--r-- 2,541 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Supported arithmetic
====================

Rounding directions
-------------------

Some of the classes of operators presented in the following sections are
templated by a rounding direction. This is the direction chosen when
converting a real number that cannot be exactly represented in the
destination format.

There are eleven directions:

``zr``
   toward zero

``aw``
   away from zero

``dn``
   toward minus infinity (down)

``up``
   toward plus infinity

``od``
   to odd mantissas

``ne``
   to nearest, tie breaking to even mantissas

``no``
   to nearest, tie breaking to odd mantissas

``nz``
   to nearest, tie breaking toward zero

``na``
   to nearest, tie breaking away from zero

``nd``
   to nearest, tie breaking toward minus infinity

``nu``
   to nearest, tie breaking toward plus infinity

The rounding directions mandated by the IEEE-754 standard are ``ne``
(default mode, rounding to nearest), ``zr``, ``dn``, ``up``, and ``na``
(introduced for decimal arithmetic).

Floating-point operators
------------------------

This class of operators covers all the formats whose number sets
are :math:`F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}`. In
particular, IEEE-754 floating-point formats (with subnormal numbers) are
part of this class, if we set apart overflow issues. Both parameters
p and d select a particular format. The last parameter selects the
rounding direction.

::

   float< precision, minimum_exponent, rounding_direction >(...)

Formats with no minimal exponent (and thus no underflow) are also
available:

::

   float< precision, rounding_direction >(...)

Having to remember the precision and minimum exponent parameters may be
a bit tedious, so an alternate syntax is provided: instead of these two
parameters, a name can be given to the ``float`` class.

::

   float< name, rounding_direction >(...)

There are four predefined formats:

``ieee_32``
   IEEE-754 single precision

``ieee_64``
   IEEE-754 double precision

``ieee_128``
   IEEE-754 quadruple precision

``x86_80``
   extended precision on x86-like processors

Fixed-point operators
---------------------

This class of operators covers all the formats whose number sets
are :math:`F(e) = \{m \cdot 2^e\}`. The first parameter selects the
weight of the least significant bit. The second parameter selects the
rounding direction.

::

   fixed< lsb_weight, rounding_direction >(...)

Rounding to integer is a special case of fixed point rounding of weight
0. A syntactic shortcut is provided.

::

   int< rounding_direction >(...)