1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
|
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=0.4"/>
<meta name="google" content="notranslate" />
<link rel="canonical" href="https://sleef.org/additional.xhtml" />
<link rel="icon" href="favicon.png" />
<link rel="stylesheet" type="text/css" href="texlike.css"/>
<link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=Ubuntu" />
<link rel="stylesheet" type="text/css" href="sleef.css"/>
<title>SLEEF Documentation</title>
</head>
<body translate="no" class="notranslate">
<h1>SLEEF Documentation - Additional Notes</h1>
<h2>Table of contents</h2>
<ul class="none" style="font-family: arial, sansserif; padding-left: 0.5cm;">
<li><a class="underlined" href="index.xhtml">Introduction</a></li>
<li><a class="underlined" href="compile.xhtml">Compiling and installing the library</a></li>
<li><a class="underlined" href="purec.xhtml">Math library reference</a></li>
<li><a class="underlined" href="dft.xhtml">DFT library reference</a></li>
<li><a class="underlined" href="misc.xhtml">Other tools included in the package</a></li>
<li><a class="underlined" href="benchmark.xhtml">Benchmark results</a></li>
<li> </li>
<li><a class="underlined" href="additional.xhtml">Additional notes</a></li>
<ul class="disc">
<li><a href="additional.xhtml#gnuabi">About the GNUABI version of the library</a></li>
<li><a href="additional.xhtml#dispatcher">How the dispatcher works</a></li>
<li><a href="additional.xhtml#ulp">ULP, gradual underflow and flush-to-zero mode</a></li>
<li><a href="additional.xhtml#sincospi">About sincospi</a></li>
<li><a href="additional.xhtml#logo">About the logo</a></li>
</ul>
</ul>
<h2 id="gnuabi">About the GNUABI version of the library</h2>
<p class="noindent">
The GNUABI version of the library (libsleefgnuabi.so) is built for
x86 and aarch64 architectectures. This library provides an API
compatible with <a class="underlined"
href="https://sourceware.org/glibc/wiki/libmvec">libmvec</a> in
glibc, and the API comforms to the <a class="underlined"
href="https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt">vector
ABI</a>. This library is built and installed by default, and some
compilers may call the functions in this library.
</p>
<h2 id="dispatcher">How the dispatcher works</h2>
<p class="noindent">
Fig. 7.1 shows a simplified code of our dispatcher. There is only
one exported function <b class="func">mainFunc</b>. When
<b class="func">mainFunc</b> is called for the first
time, <b class="func">dispatcherMain</b> is called internally,
since <i class="var">funcPtr</i> is initialized to the pointer to
<b class="func">dispatcherMain</b>(line 14). It then detects if the
CPU supports SSE 4.1(line 7), and
rewrites <i class="var">funcPtr</i> to a pointer to the function
that utilizes SSE 4.1 or SSE 2, depending on the result of CPU
feature detection(line 10). When
<b class="func">mainFunc</b> is called for the second time, it does
not execute the
<b class="func">dispatcherMain</b>. It just executes the function
pointed by the pointer stored in <i class="var">funcPtr</i> during
the execution of
<b class="func">dispatcherMain</b>.
</p>
<p>
There are a few advantages in our dispatcher. The first advantage is
that it does not require any compiler-specific extension. The second
advantage is simplicity. There are only 18 lines of simple
code. Since the dispatchers are completely separated for each
function, there is not much room for bugs to get in.
</p>
<p>
The third advantage is low overhead. You might think that the
overhead is one function call including execution of prologue and
epilogue. However, since modern compilers eliminate redundant
execution of the prologue, epilogue and return instruction, the
actual overhead is just one jmp instruction. This is very fast since
it is not conditional.
</p>
<p>
The fourth advantage is thread safety. There is only one variable
shared among threads, which is <i class="var">funcPtr</i>. There are
only two possible values for this pointer variable. The first value
is the pointer to the <b class="func">dispatcherMain</b>, and the
second value is the pointer to either <b class="func">funcSSE2</b>
or <b class="func">funcSSE4</b>, depending on the availability of
extensions. Once <i class="var">funcPtr</i> is substituted with the
pointer to <b class="func">funcSSE2</b>
or <b class="func">funcSSE4</b>, it will not be changed in the
future. It is obvious that the code works in all the cases.
</p>
<pre class="code">
<code>static double (*funcPtr)(double arg);</code>
<code></code>
<code>static double dispatcherMain(double arg) {</code>
<code> double (*p)(double arg) = funcSSE2;</code>
<code></code>
<code>#if the compiler supports SSE4.1</code>
<code> if (SSE4.1 is available on the CPU) p = funcSSE4;</code>
<code>#endif</code>
<code></code>
<code> funcPtr = p;</code>
<code> return (*funcPtr)(arg);</code>
<code>}</code>
<code></code>
<code>static double (*funcPtr)(double arg) = dispatcherMain;</code>
<code></code>
<code>double mainFunc(double arg) {</code>
<code> return (*funcPtr)(arg);</code>
<code>}</code>
</pre>
<p style="text-align:center; margin-bottom: 1.0cm;">
Fig. 7.1: Simplified code of our dispatcher
</p>
<h2 id="ulp">ULP, gradual underflow and flush-to-zero mode</h2>
<p class="noindent">
ULP stands for "unit in the last place", which is sometimes used for
measuring accuracy of calculations. 1 ULP is basically the distance
between the two closest floating point number, which depends on the
exponent of the FP number. The accuracy of calculations by reputable
math libraries is usually between 0.5 and 1 ULP. Here, the accuracy
means the largest error of calculation, which only happens in the
worst case. SLEEF math library provides multiple accuracy choices
for some math functions. Many functions have 3.5-ULP and 1-ULP
versions, and 3.5-ULP versions are significantly faster than 1-ULP
versions. If you care more about execution speed than accuracy, it
is advised to use the 3.5-ULP versions along with -ffast-math or
"unsafe math optimization" options for the compiler.
</p>
<p>
In IEEE 754 standard, underflow does not happen abruptly when the
exponent becomes zero. Instead, denormal numbers are produced which
has less precision, and this is sometimes called gradual
underflow. On some implementation which is not IEEE-754 conformant,
flush-to-zero mode is used since it is easier to implement. In
flush-to-zero mode, numbers smaller than the smallest normalized
number cannot be represented, and it is replaced with zero. Because
of this, the accuracy of calculation may be influenced in some
cases. The smallest normalized precision number can be referred with
DBL_MIN for double precision, and FLT_MIN for single precision. The
naming of these macros is a little bit confusing because DBL_MIN is
not the smallest double precision number.
</p>
<p>
You can see known maximum errors in math functions in glibc
on <a class="underlined"
href="http://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html">
this page.</a>
</p>
<h2 id="sincospi">About sincospi</h2>
<p class="noindent">
The sincospi series of functions evaluates <i class="math">sin(
π<i class="var">a</i> )</i> and <i class="math">cos(
π<i class="var">a</i> )</i> simultaneously. These functions are
added to SLEEF as of version 3.0. There are a few reasons that I
added these functions.
</p>
<p>
C standards include specifications for functions that evaluate
trigonometric functions. In order to do calculations for evaluating
these functions, reduction of an argument is required. This involves
a multiple precision multiplication with <i class="math">π</i>,
which requires many operations of addition and multiplication. This
is slow especially if accurate evaluation is required. By designing
the function in a way that the argument is pre-multiplied
by <i class="math">π</i>, this reduction can be eliminated. This
leads to faster and more accurate evaluation.
</p>
<p>
The second reason is that sincospi functions are handy for
implementing an FFT library. FFT libraries need to evaluate
trigonometric functions for generating twiddle factors that is used in
the butterfly operations. Since the butterfly operations are
repeatedly applied, the error in twiddle factors accumulates. Thus, we
want to make the error in twiddle factors as small as possible. In an
FFT of power-of-two size, twiddle factors are
<i class="math">sin( π<i class="var">m</i> /
2<sup><i class="var">n</i></sup> )</i> where <i class="var">m</i>
and <i class="var">n</i> are integer. If we just use the usual
trigonometric functions defined in the C standards with the
precision same as that used for butterfly operations, we already
have error when calculating arguments, since
π<i class="var">m</i> / 2<sup><i class="var">n</i></sup> cannot
be represented as a floating point value without error. On the
other hand, if we use sincospi function, the argument can be
accurately represented by a radix 2 FP number. Thus, we can
calculate twiddle factors with better accuracy.
</p>
<p>
The third reason is that sinpi is needed internally to implement
gamma functions.
</p>
<h2 id="logo">About the logo</h2>
<p>
It is a soup ladle. Sleef means a soup ladle in Dutch.
</p>
<br/>
<p style="text-align:center; margin-top:1cm;">
<a class="nothing" href="sleeflogo3.svg">
<img src="sleeflogo3.png" alt="logo" width="40%" height="40%" />
</a>
<br />
Fig. 7.2: SLEEF logo
</p>
<p class="footer">
Copyright © 2018 SLEEF Project.<br/>
SLEEF is open-source software and is distributed under the Boost Software License, Version 1.0.
</p>
<!--TEST-->
</body>
</html>
|