File: bench2.html

package info (click to toggle)
lg-issue23 4-2
  • links: PTS
  • area: main
  • in suites: potato
  • size: 2,360 kB
  • ctags: 430
  • sloc: makefile: 36; sh: 4
file content (495 lines) | stat: -rw-r--r-- 20,604 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
<!--startcut ==========================================================-->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>GNU/Linux Benchmarking - Practical aspects</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0020F0"
ALINK="#FF0000">
<!--endcut ============================================================-->

<H4>
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>

<P> <HR> <P> 
<!--===================================================================-->

<center>
<H1>GNU/Linux Benchmarking - Practical Aspects</H1>

<H2>by 
<A HREF="mailto:andrewbalsa@usa.net">Andr&eacute; D. Balsa </A></H2>v0.4, 26 November 1997
</center>
<P><HR><P> 
<EM>This is the second article in a series of 4 articles on GNU/Linux Benchmarking, to be published by the Linux Gazette. The first article presented some basic benchmarking concepts and analyzed the Whetstone benchmark in more detail. The present article deals with practical issues in GNU/Linux benchmarking: what benchmarks already exist, where to find them, what they effectively measure and how to run them. And if you are not happy with the available benchmarks, some guidelines to write your own. Also, an application benchmark (Linux kernel 2.0.0 compilation) is analyzed in detail.</EM><HR>
<P>
<H2>1. <A HREF="./bench2.html#ss1">The DOs and DON'Ts of GNU/Linux benchmarking</A></H2>

<P>
<H2>2. <A HREF="./bench2.html#ss2">A roundup of benchmarks for Linux</A></H2>

<P>
<H2>3. <A HREF="./bench2.html#ss3">Devising or writing a new Linux benchmark</A></H2>

<P>
<H2>4. <A HREF="./bench2.html#ss4">An application benchmark: Linux 2.0.0 kernel compilation with gcc</A></H2>
<UL>
<LI><A HREF="./bench2.html#ss4.1">4.1 General benchmark features</A>
<LI><A HREF="./bench2.html#ss4.2">4.2 Benchmarking procedure</A>
<LI><A HREF="./bench2.html#ss4.3">4.3 Examining the results</A>
</UL>

<P>
<H2>5. <A HREF="./bench2.html#ss5">Next month</A></H2>

<P> <HR><P> 

<H2><A NAME="ss1">1. The DOs and DON'Ts of GNU/Linux benchmarking</A></H2>


<P>GNU/Linux is a great OS in terms of performance, and we can hope it will only get better over time. But that is a very vague statement: we need figures to prove it. What information can benchmarks effectively provide us with? What aspects of microcomputer performance can we measure under GNU/Linux?
<P>Kurt Fitzner reminded me of an old saying: <B></B><B><EM>"When performance is measured, performance increases."</EM></B>
<P>Let's list some general benchmarking rules (not necessarily in order of decreasing priority) that should be followed to obtain accurate and meaningful benchmarking data, resulting in real GNU/Linux performance gains:
<OL>
<LI>Use GPLed source code for the benchmarks, preferably easily available on the Net.</LI>
<LI>Use standard tools. Avoid benchmarking tools that have been optimized for a specific system/equipment/architecture.</LI>
<LI>Use Linux/Unix/Posix benchmarks. Mac, DOS and Windows benchmarks will not help much.</LI>
<LI>Don't quote your results to three decimal figures. A resolution of 0.1% is more than adequate. Precision of 1% is more than enough.</LI>
<LI>Report your results in standard format/metric/units/report forms.</LI>
<LI>Completely describe the configuration being tested.</LI>
<LI>Don't include irrelevant data.</LI>
<LI>If variance in results is significant, report alongside results; try to explain why this is so.</LI>
<LI>Comparative benchmarking is more informative. When doing comparative benchmarking, modify a single test variable at a time. Report results for each combination.</LI>
<LI>Decide beforehand what characteristic of a system you want to benchmark. Use the right tools to measure this characteristic.</LI>
<LI>Check your results. Repeat each benchmark once or twice before publicly reporting your results.</LI>
<LI>Don't set out to benchmark trying to prove that equipment A is better than equipment B; you may be in for a surprise...</LI>
<LI>Avoid benchmarking one-of-a-kind or proprietary equipment. This may be very interesting for experimental purposes, but the information resulting from such benchmarks is absolutely useless to other Linux users.</LI>
<LI>Share any meaningful information you may have come up with. If there is a lesson to be learned from the Linux style of development, it's that sharing information is paramount.</LI>
</OL>
<P> <HR><P> 
<H2><A NAME="ss2">2. A roundup of benchmarks for Linux</A></H2>


<P>These are some benchmarks I have collected over the Net. A few are Linux-specific, others are portable across a wide range of Unix-compatible systems, and some are even more generic.
<UL>
<LI><B>UnixBench</B>. A fundamental high-level Linux benchmark suite, Unixbench integrates CPU and file I/O tests, as well as system behaviour under various user loads. Originally written by staff members at BYTE magazine, it has been heavily modified by David C. Niemi.</LI>
<LI><B>BYTEmark</B> as modified by Uwe Mayer. A CPU benchmark suite, reporting CPU/cache/memory , integer and floating-point performance. Again, this test originated at BYTE magazine. Uwe did the port to Linux, and recently improved the reporting part of the test.</LI>
<LI><B>Xengine</B> by Kazuhiko Shutoh. This is a cute little X window tool/toy that basically reports on the speed with which a system will redraw a coloured bitmap on screen (a simulation of a four cycle engine). I like it because it is unpretentious while at the same time providing a useful measure of X server performance. It will also run at any resolution and pixel depth.</LI>
<LI><B>Whetstone</B>. A floating point benchmark by Harold Curnow.</LI>
<LI><B>Xbench</B> by Claus Gittinger. Xbench generates the famous xstone rating for Xserver performance comparisons.</LI>
<LI><B>XMark93</B>. Like xbench, this is a script that uses X11's x11perf and computes an index (in Xmarks). It was written a few years later than xbench and IMHO provides a better metric for X server performance.</LI>
<LI><B>Webstone 2.01</B>. An excellent tool for Web server performance testing. Although Webstone is copyight by Silicon Graphics, it's license allows free copying and examination of the source code.</LI>
<LI><B>Stream</B> by John D. McCalpin. This program is based on the concept of "machine balance" (sustainable memory bandwidth vs. FPU performance). This has been found to be a central bottleneck for computer architectures in scientific applications.</LI>
<LI><B>Cachebench</B> by Philip J. Mucci. By plotting memory access bandwidth vs. data size, this program will provide a wealth of benchmarking data on the memory subsystem (L1, L2 and main memory).</LI>
<LI><B>Bonnie</B> by Tim Bray. A high-level synthetic benchmark, bonnie is useful for file I/O throughput benchmarking.</LI>
<LI><B>Iozone</B> by Bill Norcott. Measures sequential file i/o throughput. The new 2.01 version supports raw devices and CD-ROM drives.</LI>
<LI><B>Netperf</B> is copyright Hewlett-Packard. This is a sophisticated tool for network performance analysis. Compared to ttcp and ping, it verges on overkill. Source code is freely available.</LI>
<LI><B>Ttcp</B>. A "classic" tool for network performance measurements, ttcp will measure the point-to-point <B>bandwidth</B> over a network connection.</LI>
<LI><B>Ping</B>. Another ubiquitous tool for network performance measurements, ping will measure the <B>latency</B> of a network connection.</LI>
<LI><B>Perlbench</B> by David Niemi. A small, portable benchmark written entirely in Perl.</LI>
<LI><B>Hdparm</B> by Mark Lord. Hdparm's -t and -T options can be used to measure disk-to-memory (disk reads) transfer rates. Hdparm allows setting various EIDE disk parameters and is very useful for EIDE driver tuning. Some commands can also be used with SCSI disks.</LI>
<LI><B>Dga</B> with b option. This is a small demo program for XFree's DGA extension, and I would never have looked at it were it not for Koen Gadeyne, who added the b command to dga. This command runs a small test of CPU/video memory bandwidth.</LI>
<LI><B>MDBNCH</B>. This is a large ANSI-standard FORTRAN 77 program used as an application benchmark, written by Furio Ercolessi. It accesses a large data set in a very irregular pattern, generating misses in both the L1 and L2 caches.</LI>
<LI><B>Doom</B> :-) Doom has a demo mode activated by running <CODE>doom -timedemo demo3</CODE>. Anton Ertl has setup a Web page listing results for various architectures/OS's.</LI>
</UL>
<P>All the benchmarks listed above are available by <B>ftp</B> or <B>http</B> from the 
<A HREF="http://www.tux.org/bench">Linux Benchmarking Project </A>server in the download directory: www.tux.org/pub/bench or from the Links page.

<P> <HR> <P> 
<H2><A NAME="ss3">3. Devising or writing a new Linux benchmark</A></H2>


<P>We have seen last month that (nearly) all benchmarks are based on either of two simple algorithms, or combinations/variations of these:
<OL>
<LI>Measuring the number of iterations of a given task executed over a fixed, predetermined time interval.</LI>
<LI>Measuring the time needed for the execution of a fixed, predetermined number of iterations of a given task.</LI>
</OL>

<P>We also saw that the Whetstone benchmark would use a combination of these two procedures to "calibrate" itself for optimum resolution, effectively providing a workaround for the low resolution timer available on PC type machines.
<P>Note that some newer benchmarks use new, exotic algorithms to estimate system performance, e.g. the Hint benchmark. I'll get back to Hint in a future article.
<P>Right now, let's see what algorithm 2 would look like:
<BLOCKQUOTE>
<EM>initialize loop_count</EM>
</BLOCKQUOTE>

<BLOCKQUOTE>
<EM>start_time = time()</EM>
</BLOCKQUOTE>

<BLOCKQUOTE>
<EM>repeat </EM>
</BLOCKQUOTE>

<BLOCKQUOTE>
<BLOCKQUOTE>
<EM>benchmark_kernel()</EM>
</BLOCKQUOTE>
</BLOCKQUOTE>

<BLOCKQUOTE>
<BLOCKQUOTE>
<EM>decrement loop_count</EM>
</BLOCKQUOTE>
</BLOCKQUOTE>

<BLOCKQUOTE>
<EM>until loop_count = 0</EM>
</BLOCKQUOTE>

<BLOCKQUOTE>
<EM>duration = time() - start_time</EM>
</BLOCKQUOTE>

<BLOCKQUOTE>
<EM>report_results()</EM>
</BLOCKQUOTE>

<P>Here, <EM>time()</EM> is a system library call which returns, for example, the elapsed wall-clock time since the last system boot. <EM>Benchmark_kernel()</EM> is obviously exercising the system feature or characteristic we are trying to measure.
<P>Even this trivial benchmarking algorithm makes some basic assumptions about the system being tested and will report totally erroneous results if some precautions are not taken:
<OL>
<LI>If the benchmark kernel executes so quickly that the looping instructions take a significant percentage of total loop processor clock cycles to execute, results will be skewed. Preferably, <EM>benchmark_kernel() </EM>should have a duration of &gt; 100 x duration of looping instructions.</LI>
<LI>Depending on system hardware, one will have to adjust <EM>loop_count</EM> so that total length duration &gt; 100 x clock resolution (for 1% bechmark precision) or 1000 x clock resolution (for 0.1% benchmark precision). On PC hardware, clock resolution is 10 ms.</LI>
<LI>We mentionned above that we used a straightforward wall-clock <EM>time()</EM> function. If the system load is high and our benchmark gets only 3% of the CPU time, we will get completely erroneous results! And of course on a multi-user, pre-emptive, multi-tasking OS like GNU/Linux, it's impossible to guarantee exclusive use of the CPU by our benchmark.</LI>
</OL>

<P>You can substitute the benchmark "kernel" with whatever computing task interests you more or comes closer to your specific benchmarking needs.
<P>Examples of such kernels would be:
<UL>
<LI>For FPU performance measurements: a sampling of FPU operations.</LI>
<LI>Various calculations using matrices and/or vectors.</LI>
<LI>Any test accessing a peripheral i.e. disk or serial i/o.</LI>
</UL>

<P>For good examples of actual C source code, see the UnixBench and Whetstone benchmark sources.

<P> <HR><P> 

<H2><A NAME="ss4">4. An application benchmark: Linux 2.0.0 kernel compilation with gcc</A></H2>


<P>The more one gets to use and know GNU/Linux, and the more often one compiles the Linux kernel. Very quickly it becomes a habit: as soon as a new kernel version comes out, we download the tar.gz source file and recompile it a few times, fine-tuning the new features.
<P>This is the main reason for proposing kernel compilation as an application benchmark: it is a very common task for all GNU/Linux users. Note that the application that is being directly tested is not the Linux kernel itself, it's <B>gcc</B>. I guess most GNU/Linux users use gcc everyday.
<P>The Linux kernel is being used here as a (large) standard data set. Since this is a large program (gcc) with a wide variety of instructions, processing a large data set (the Linux kernel) with a wide variety of data structures, we assume it will exercise a good subset of OS functions like file I/O, swapping, etc and a good subset of the hardware too: CPU, memory, caches, hard disk, hard disk controller/driver combination, PCI or ISA I/O bus. Obviously this is not a test for X server performance, even if you launch the compilation from an xterm window! And the FPU is not exercised either (but we already tested our FPU with Whetstone, didn't we?). Now, I have noticed that test results are almost independent of hard disk performance, at least on the various systems I had available. The <B>real bottleneck</B> for this test is CPU/cache performance.
<P>Why specify the Linux kernel version 2.0.0 as our standard data set? Because it is widely available, as most GNU/Linux users have an old CD-ROM distribution with the Linux kernel 2.0.0 source, and also because it in quite near in terms of size and structure to present-day kernels. So it's not exactly an out-of-anybody's-hat data set: it's a typical real-world data set.
<P>Why not let users compile <B>any</B> Linux 2.x kernel and report results? Because then we wouldn't be able to compare results anymore. Aha you say, but what about the different gcc and libc versions in the various systems being tested? Answer: they are part of your GNU/Linux system and so also get their performance measured by this benchmark, and this is exactly the behaviour we want from an application benchmark. Of course, gcc and libc versions must be reported, just like CPU type, hard disk, total RAM, etc (see the Linux Benchmarking Toolkit Report Form).
<H2><A NAME="ss4.1">4.1 General benchmark features</A></H2>


<P>Basically what goes on during a gcc kernel compilation (make zImage) is that: 
<OL>
<LI>Gcc is loaded in memory, </LI>
<LI>Gcc gets fed sequentially the various Linux kernel pieces that make up the kernel, and finally</LI>
<LI>The linker is called to create the zImage file (a compressed image file of the Linux kernel).</LI>
</OL>

<P>Step 2 is where most of the time is spent.
<P>This test is quite stable between different runs. It is also relatively insensitive to small loads (e.g. it can be run in an xterm window) and completes in less than 15 minutes on most recent machines.

<H2><A NAME="ss4.2">4.2 Benchmarking procedure</A></H2>


<H3>Getting the source.</H3>


<P>Do I really have to tell you where to get the kernel 2.0.0 source? OK, then: ftp://sunsite.unc.edu/pub/Linux/kernel/source/2.0.x or any of its mirrors, or any recent GNU/Linux CD-ROM set with a copy of sunsite.unc.edu. Download the 2.0.0 kernel, gunzip and untar under a test directory (<CODE>tar zxvf linux-2.0.tar.gz</CODE> will do the trick).
<H3>Compiling and running</H3>


<P>Cd to the linux directory you just created and type <CODE>make config</CODE>. Press &lt;Enter&gt; to answer all questions with their default value. Now type <CODE>make dep ; make clean ; sync ; time make zImage</CODE>. Depending on your machine, you can go and have lunch or just an expresso. You can't (yet) blink and be done with it, even on a 600 MHz Alpha. By the way, if you are going to run this test on an Alpha, you will have to cross-compile the kernel targetting the i386 architecture so that your results are comparable to the more ubiquitous x86 machines.

<H2><A NAME="ss4.3">4.3 Examining the results</A></H2>


<H3>Example 1 </H3>


<P>This is what I get on my test GNU/Linux box:
<P><CODE>186.90user 19.30system 3:40.75elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k</CODE>
<P><CODE>0inputs+0outputs (147838major+170260minor)pagefaults 0swaps</CODE>
<P>The most important figure here is the total elapsed time: <B>3 min 41 s </B>(there is no need to report fractions of seconds).
<H3>Hardware setup description</H3>


<P>If you were to complain that the above benchmark is useless without a description of the machine being tested, you'd be 100% correct! So, here is the LBT Report Form for this machine:

<P>LINUX BENCHMARKING TOOLKIT REPORT FORM
<PRE>
CPU 
</PRE>

<PRE>
=== 
</PRE>

<PRE>
Vendor: AMD
</PRE>

<PRE>
Model: K6-200
</PRE>

<PRE>
Core clock:208 MHz (2.5 x 83MHz)
</PRE>

<PRE>
Motherboard vendor: ASUS
</PRE>

<PRE>
Mbd. model: P55T2P4
</PRE>

<PRE>
Mbd. chipset: Intel HX
</PRE>

<PRE>
Bus type: PCI
</PRE>

<PRE>
Bus clock: 41.5 MHz
</PRE>

<PRE>
Cache total: 512 Kb
</PRE>

<PRE>
Cache type/speed: Pipeline burst 6 ns
</PRE>

<PRE>
SMP (number of processors): 1
</PRE>

<PRE>
RAM 
</PRE>

<PRE>
=== 
</PRE>

<PRE>
Total: 32 MB
</PRE>

<PRE>
Type: EDO SIMMs
</PRE>

<PRE>
Speed: 60 ns
</PRE>

<PRE>
Disk 
</PRE>

<PRE>
==== 
</PRE>

<PRE>
Vendor: IBM
</PRE>

<PRE>
Model: IBM-DCAA-34430
</PRE>

<PRE>
Size: 4.3 GB
</PRE>

<PRE>
Interface: EIDE
</PRE>

<PRE>
Driver/Settings: Bus Master DMA mode 2
</PRE>

<PRE>
Video board 
</PRE>

<PRE>
=========== 
</PRE>

<PRE>
Vendor: Generic S3
</PRE>

<PRE>
Model: Trio64-V2
</PRE>

<PRE>
Bus: PCI
</PRE>

<PRE>
Video RAM type: 60 ns EDO DRAM 
</PRE>

<PRE>
Video RAM total: 2 MB
</PRE>

<PRE>
X server vendor: XFree86
</PRE>

<PRE>
X server version: 3.3
</PRE>

<PRE>
X server chipset choice: S3 accelerated 
</PRE>

<PRE>
Resolution/vert. refresh rate: 1152x864 @ 70 Hz
</PRE>

<PRE>
Color depth: 16 bits
</PRE>

<PRE>
Kernel 
</PRE>

<PRE>
====== 
</PRE>

<PRE>
Version: 2.0.29
</PRE>

<PRE>
Swap size: 64 MB
</PRE>

<PRE>
gcc 
</PRE>

<PRE>
=== 
</PRE>

<PRE>
Version: 2.7.2.1
</PRE>

<PRE>
Options: -O2
</PRE>

<PRE>
libc version: 5.4.23 
</PRE>

<PRE>
Test notes 
</PRE>

<PRE>
==========
</PRE>

<PRE>
Very light system load.
</PRE>

<PRE>
RESULTS 
</PRE>

<PRE>
======== 
</PRE>

<PRE>
Linux kernel 2.0.0 Compilation Time: 3 m 41 s
</PRE>

<PRE>
Whetstone Double Precision (FPU) INDEX: N/A
</PRE>

<PRE>
UnixBench 4.10 system INDEX: N/A
</PRE>

<PRE>
Xengine: N/A
</PRE>

<PRE>
BYTEmark integer INDEX: N/A
</PRE>

<PRE>
BYTEmark memory INDEX: N/A
</PRE>

<PRE>
Comments
</PRE>

<PRE>
========= 
</PRE>

<PRE>
Just tested kernel 2.0.0 compilation.
</PRE>

<H3>General comments</H3>


<P>Again, you will want to compare your results to those obtained on different machines/configurations. You will find some results on my Web site about 6x86s/Linux, in the 
<A HREF="http://www.tux.org/~balsa/linux/cyrix/p0c.html">November News </A>page.
<P>This of course is pure GNU/Linux benchmarking, unless you want to go ahead and try to cross compile the Linux kernel on a Windows95 box!? ;-)

<P> <HR><P> 
<H2><A NAME="ss5">5. Next month</A></H2>


<P>I expect that by next month you will have downloaded and tested a few benchmarks, or even started writing your own. So, in the next article: <EM>Collecting and Interpreting Linux Benchmarking Data</EM>
<UL>
<LI>Correct uses of Linux benchmarking data.</LI>
<LI>Architecture specific issues of Linux benchmarks.</LI>
<LI>Benchmarking Linux SMP systems.</LI>
<LI>Examples of more complex benchmarks: UnixBench and BYTEmark.</LI>
</UL>


<!--===================================================================-->
<P> <hr> <P> 
<center><H5>Copyright &copy; 1997, Andr&eacute; D. Balsa <BR> 
Published in Issue 23 of the Linux Gazette, December 1997</H5></center>

<!--===================================================================-->
<P> <hr> <P> 
<A HREF="./lg_toc23.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif" 
ALT="[ TABLE OF CONTENTS ]"></A>
<A HREF="../lg_frontpage.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="./gm.html"><IMG SRC="../gx/back2.gif"
ALT=" Back "></A>
<A HREF="./cftp.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
<P> <hr> <P> 
<!--startcut ==========================================================-->
</BODY>
</HTML>
<!--endcut ============================================================-->