File: benchmark.xhtml

package info (click to toggle)
sleef 3.9.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 8,564 kB
  • sloc: ansic: 49,154; cpp: 6,095; makefile: 38
file content (237 lines) | stat: -rw-r--r-- 8,548 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=0.4"/>
<meta name="google" content="notranslate" />
<link rel="canonical" href="https://sleef.org/benchmark.xhtml" />
<link rel="icon" href="favicon.png" />
<link rel="stylesheet" type="text/css" href="texlike.css"/>
<link rel="stylesheet" type="text/css" href="sleef.css"/>
<title>SLEEF - Benchmark Results</title>
</head>
<body translate="no" class="notranslate">
<h1>SLEEF - Benchmark Results</h1>

<h2>Table of contents</h2>

<ul class="none" style="font-family: arial, sansserif; padding-left: 0.5cm;">
  <li><a class="underlined" href="index.xhtml">Introduction</a></li>
  <li><a class="underlined" href="compile.xhtml">Compiling and installing the library</a></li>
  <li><a class="underlined" href="purec.xhtml">Math library reference</a></li>
  <li><a class="underlined" href="quad.xhtml"> Quad-precision math library reference</a></li>
  <li><a class="underlined" href="dft.xhtml">DFT library reference</a></li>
  <li><a class="underlined" href="misc.xhtml">Other tools included in the package</a></li>
  <li>&nbsp;</li>
  <li><a class="underlined" href="benchmark.xhtml">Benchmark results</a></li>
    <ul class="disc">
      <li><a href="benchmark.xhtml#libm">Vectorized math library</a></li>
      <li><a href="benchmark.xhtml#dft">Discrete Fourier transform</a></li>
    </ul>
  <li>&nbsp;</li>
  <li><a class="underlined" href="additional.xhtml">Additional notes</a></li>
</ul>

<h2 id="libm">Vectorized math library</h2>

<p class="noindent" style="font-size:1em; margin-top:0.5cm;">
  The graphs below show comparison of the execution time between
  <a class="undecorated"
     href="https://github.com/shibatch/sleef">SLEEF</a>-3.2 compiled
  with GCC-7.2 and Intel SVML included in Intel C Compiler 18.0.1.
</p>

<p style="font-size:1em;">
  The execution time of each function is measured by executing each
  function 10^8 times and taking the average time. Each time a
  function is executed, a uniformly distributed random number is set
  to each element of the argument vector(each element is set a
  different value.)  The ranges of the random number for each
  function are shown below. Argument vectors are generated before
  the measurement, and the time to generate random argument vectors
  is not included in the execution time.
</p>

<br/>

<ul>
  <li>Trigonometric functions : [0, 6.28] and [0, 10^6] for
    double-precision functions. [0, 6.28] and [0, 30000] for
    single-precision functions.</li>
  <li>Log : [0, 10^300] and [0, 10^38] for double-precision
    functions and single-precision functions, respectively.</li>
  <li>Exp : [-700, 700] and [-100, 100] for double-precision
    functions and single-precision functions, respectively.</li>
  <li>Pow : [-30, 30] for both the first and the second
    arguments.</li>
  <li>Asin : [-1, 1]</li>
  <li>Atan : [-10, 10]</li>
  <li>Atan2 : [-10, 10] for both the first and the second
    arguments.</li>
</ul>

<br/>

<p style="font-size:1em;">
  The accuracy of SVML functions can be chosen by compiler options,
  not the function names. "-fimf-max-error=1.0" option is specified
  to icc to obtain the 1-ulp-accuracy results, and
  "-fimf-max-error=5.0" option is used for the 5-ulp-accuracy
  results.
</p>

<p style="font-size:1em;">
  Those results are measured on a PC with Intel Core i7-6700 CPU @
  3.40GHz with Turbo Boost turned off. The CPU should be always
  running at 3.4GHz during the measurement.
</p>

<p style="font-size:1.2em; margin-top:1.0cm;">
  <b style="color:#0050a0;">Click graphs to magnify</b>.
</p>

<p style="margin-bottom:1cm;">&nbsp;</p>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="trigdp.png">
    <img src="trigdp.png" alt="Performance graph for DP trigonometric functions" width="50%" height="50%"/>
  </a>
  <br />
  Fig. 6.1: Execution time of double precision trigonometric functions
</p>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="trigsp.png">
    <img src="trigsp.png" alt="Performance graph for SP trigonometric functions" width="50%" height="50%"/>
  </a>
  <br />
  Fig. 6.2: Execution time of single precision trigonometric functions
</p>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="nontrigdp.png">
    <img src="nontrigdp.png" alt="Performance graph for other DP functions" width="50%" height="50%"/>
  </a>
  <br />
  Fig. 6.3: Execution time of double precision log, exp, pow and inverse trigonometric functions
</p>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="nontrigsp.png">
    <img src="nontrigsp.png" alt="Performance graph for other SP functions" width="50%" height="50%"/>
  </a>
  <br />
  Fig. 6.4: Execution time of single precision log, exp, pow and inverse trigonometric functions
</p>

<p class="footer">
  Copyright &copy; 2010-2025 SLEEF Project, Naoki Shibata and contributors.<br/>
  SLEEF is open-source software and is distributed under the Boost Software License, Version 1.0.
</p>


<h2 id="dft">Discrete Fourier transform</h2>

<p>
Below is the result of performance comparison between SleefDFT and
FFTW3. The graphs show the performance of complex transform by both
libraries, with the following settings.
</p>

<br/>

<ul>
  <li>Compiler : gcc version 14.2.0 (Ubuntu 14.2.0-4ubuntu2~24.04)</li>
  <li>CPU : Ryzen 9 7950X (clock frequency fixed at 4.5GHz)</li>
  <li>SLEEF build option : -DSLEEF_BUILD_DFT=True -DSLEEFDFT_ENABLE_STREAM=True -DSLEEFDFT_MAXBUTWIDTH=7</li>
  <li>FFTW version 3.3.10-1ubuntu3</li>
</ul>

<br/>

<p>
The vertical axis represents the performance in Mflops calculated in
the way indicated in the FFTW web site. The horizontal axis represents
log2 of the size of transform. Execution plans were made with
SLEEF_MODE_MEASURE mode and FFTW_MEASURE mode, respectively.
</p>

<br/><br/>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="dftzen4dp.png">
    <img src="dftzen4dp.png" alt="Performance of transform in double precision on Ryzen 9 7950X" width="60%" height="60%"/>
  </a>
  <br />
  Fig. 6.5: Performance of transform in double precision on Ryzen 9 7950X
</p>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="dftzen4sp.png">
    <img src="dftzen4sp.png" alt="Performance of transform in single precision on Ryzen 9 7950X" width="60%" height="60%"/>
  </a>
  <br />
  Fig. 6.6: Performance of transform in single precision on Ryzen 9 7950X
</p>

<p>
Below is the result of comparison on M1 MacBook Pro 16-inch with the following settings.
</p>

<br/>

<ul>
  <li>OS : Ubuntu 24.04, Linux 6.8.0-1011-asahi-arm</li>
  <li>Compiler : Clang version 19.1.1 (1ubuntu1~24.04.2)</li>
  <li>FFTW version 3.3.10-1ubuntu3</li>
</ul>

<br/><br/>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="dftm1dp.png">
    <img src="dftm1dp.png" alt="Performance of transform in double precision on M1 Pro" width="60%" height="60%"/>
  </a>
  <br />
  Fig. 6.7: Performance of transform in double precision on M1 MacBook Pro
</p>

<p style="text-align:center; margin-bottom:2cm;">
  <a class="nothing" href="dftm1sp.png">
    <img src="dftm1sp.png" alt="Performance of transform in single precision on M1 Pro" width="60%" height="60%"/>
  </a>
  <br />
  Fig. 6.8: Performance of transform in single precision on M1 MacBook Pro
</p>

<br/>

<p>
When benchmarking on your own, please keep in mind that the <b style="color:#ff0000;">CPU clock must be fixed</b>.
</p>


<p class="footer">
  Copyright &copy; 2010-2025 SLEEF Project, Naoki Shibata and contributors.<br/>
  SLEEF is open-source software and is distributed under the Boost Software License, Version 1.0.
</p>

<script type="text/javascript">
var sc_project=13098265; 
var sc_invisible=1; 
var sc_security="518de45e"; 
</script>
<script type="text/javascript"
src="https://www.statcounter.com/counter/counter.js"
async="async"></script>
<noscript><div class="statcounter"><a title="Web Analytics"
href="https://statcounter.com/" target="_blank"><img
class="statcounter"
src="https://c.statcounter.com/13098265/0/518de45e/1/"
alt="Web Analytics"
referrerPolicy="no-referrer-when-downgrade"></img></a></div></noscript>


</body>
</html>