1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371
|
________________________________________________________________________
PYBENCH - A Python Benchmark Suite
________________________________________________________________________
Extendable suite of low-level benchmarks for measuring
the performance of the Python implementation
(interpreter, compiler or VM).
pybench is a collection of tests that provides a standardized way to
measure the performance of Python implementations. It takes a very
close look at different aspects of Python programs and let's you
decide which factors are more important to you than others, rather
than wrapping everything up in one number, like the other performance
tests do (e.g. pystone which is included in the Python Standard
Library).
pybench has been used in the past by several Python developers to
track down performance bottlenecks or to demonstrate the impact of
optimizations and new features in Python.
The command line interface for pybench is the file pybench.py. Run
this script with option '--help' to get a listing of the possible
options. Without options, pybench will simply execute the benchmark
and then print out a report to stdout.
Micro-Manual
------------
Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
the benchmark suite using default settings and 'pybench.py -f <file>'
to have it store the results in a file too.
It is usually a good idea to run pybench.py multiple times to see
whether the environment, timers and benchmark run-times are suitable
for doing benchmark tests.
You can use the comparison feature of pybench.py ('pybench.py -c
<file>') to check how well the system behaves in comparison to a
reference run.
If the differences are well below 10% for each test, then you have a
system that is good for doing benchmark testings. Of you get random
differences of more than 10% or significant differences between the
values for minimum and average time, then you likely have some
background processes running which cause the readings to become
inconsistent. Examples include: web-browsers, email clients, RSS
readers, music players, backup programs, etc.
If you are only interested in a few tests of the whole suite, you can
use the filtering option, e.g. 'pybench.py -t string' will only
run/show the tests that have 'string' in their name.
This is the current output of pybench.py --help:
"""
------------------------------------------------------------------------
PYBENCH - a benchmark test suite for Python interpreters/compilers.
------------------------------------------------------------------------
Synopsis:
pybench.py [option] files...
Options and default settings:
-n arg number of rounds (10)
-f arg save benchmark to file arg ()
-c arg compare benchmark with the one in file arg ()
-s arg show benchmark in file arg, then exit ()
-w arg set warp factor to arg (10)
-t arg run only tests with names matching arg ()
-C arg set the number of calibration runs to arg (20)
-d hide noise in comparisons (0)
-v verbose output (not recommended) (0)
--with-gc enable garbage collection (0)
--with-syscheck use default sys check interval (0)
--timer arg use given timer (time.time)
-h show this help text
--help show this help text
--debug enable debugging
--copyright show copyright
--examples show examples of usage
Version:
2.1
The normal operation is to run the suite and display the
results. Use -f to save them for later reuse or comparisons.
Available timers:
time.time
time.clock
systimes.processtime
Examples:
python3.0 pybench.py -f p30.pybench
python3.1 pybench.py -f p31.pybench
python pybench.py -s p31.pybench -c p30.pybench
"""
License
-------
See LICENSE file.
Sample output
-------------
"""
-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.0
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time
Calibrating tests. Please wait...
Running 10 round(s) of the suite at warp factor 10:
* Round 1 done in 6.388 seconds.
* Round 2 done in 6.485 seconds.
* Round 3 done in 6.786 seconds.
...
* Round 10 done in 6.546 seconds.
-------------------------------------------------------------------------------
Benchmark: 2006-06-12 12:09:25
-------------------------------------------------------------------------------
Rounds: 10
Warp: 10
Timer: time.time
Machine Details:
Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
Processor: x86_64
Python:
Implementation: CPython
Executable: /usr/local/bin/python
Version: 3.0
Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
Bits: 64bit
Build: Oct 1 2005 15:24:35 (#1)
Unicode: UCS2
Test minimum average operation overhead
-------------------------------------------------------------------------------
BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
CompareFloats: 109ms 110ms 0.09us 0.361ms
CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
CompareIntegers: 137ms 138ms 0.08us 0.542ms
CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
CompareLongs: 100ms 104ms 0.10us 0.316ms
CompareStrings: 111ms 115ms 0.12us 0.929ms
CompareUnicode: 108ms 128ms 0.17us 0.693ms
ConcatStrings: 142ms 155ms 0.31us 0.562ms
ConcatUnicode: 119ms 127ms 0.42us 0.384ms
CreateInstances: 123ms 128ms 1.14us 0.367ms
CreateNewInstances: 121ms 126ms 1.49us 0.335ms
CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
DictCreation: 108ms 109ms 0.27us 0.361ms
DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
ForLoops: 110ms 111ms 4.46us 0.063ms
IfThenElse: 118ms 119ms 0.09us 0.685ms
ListSlicing: 116ms 120ms 8.59us 0.103ms
NestedForLoops: 125ms 137ms 0.09us 0.019ms
NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
Recursion: 156ms 166ms 3.32us 0.452ms
SecondImport: 112ms 118ms 1.18us 0.180ms
SecondPackageImport: 118ms 127ms 1.27us 0.180ms
SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
SmallLists: 105ms 116ms 0.17us 0.366ms
SmallTuples: 108ms 128ms 0.24us 0.406ms
SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
StringMappings: 115ms 121ms 0.48us 0.405ms
StringPredicates: 120ms 129ms 0.18us 2.064ms
StringSlicing: 111ms 127ms 0.23us 0.781ms
TryExcept: 125ms 126ms 0.06us 0.681ms
TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
TupleSlicing: 117ms 120ms 0.46us 0.066ms
UnicodeMappings: 156ms 160ms 4.44us 0.429ms
UnicodePredicates: 117ms 121ms 0.22us 2.487ms
UnicodeProperties: 115ms 153ms 0.38us 2.070ms
UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
-------------------------------------------------------------------------------
Totals: 6283ms 6673ms
"""
________________________________________________________________________
Writing New Tests
________________________________________________________________________
pybench tests are simple modules defining one or more pybench.Test
subclasses.
Writing a test essentially boils down to providing two methods:
.test() which runs .rounds number of .operations test operations each
and .calibrate() which does the same except that it doesn't actually
execute the operations.
Here's an example:
------------------
from pybench import Test
class IntegerCounting(Test):
# Version number of the test as float (x.yy); this is important
# for comparisons of benchmark runs - tests with unequal version
# number will not get compared.
version = 1.0
# The number of abstract operations done in each round of the
# test. An operation is the basic unit of what you want to
# measure. The benchmark will output the amount of run-time per
# operation. Note that in order to raise the measured timings
# significantly above noise level, it is often required to repeat
# sets of operations more than once per test round. The measured
# overhead per test round should be less than 1 second.
operations = 20
# Number of rounds to execute per test run. This should be
# adjusted to a figure that results in a test run-time of between
# 1-2 seconds (at warp 1).
rounds = 100000
def test(self):
""" Run the test.
The test needs to run self.rounds executing
self.operations number of operations each.
"""
# Init the test
a = 1
# Run test rounds
#
for i in range(self.rounds):
# Repeat the operations per round to raise the run-time
# per operation significantly above the noise level of the
# for-loop overhead.
# Execute 20 operations (a += 1):
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
a += 1
def calibrate(self):
""" Calibrate the test.
This method should execute everything that is needed to
setup and run the test - except for the actual operations
that you intend to measure. pybench uses this method to
measure the test implementation overhead.
"""
# Init the test
a = 1
# Run test rounds (without actually doing any operation)
for i in range(self.rounds):
# Skip the actual execution of the operations, since we
# only want to measure the test's administration overhead.
pass
Registering a new test module
-----------------------------
To register a test module with pybench, the classes need to be
imported into the pybench.Setup module. pybench will then scan all the
symbols defined in that module for subclasses of pybench.Test and
automatically add them to the benchmark suite.
Breaking Comparability
----------------------
If a change is made to any individual test that means it is no
longer strictly comparable with previous runs, the '.version' class
variable should be updated. Therefafter, comparisons with previous
versions of the test will list as "n/a" to reflect the change.
Version History
---------------
2.1: made some minor changes for compatibility with Python 3.0:
- replaced cmp with divmod and range with max in Calls.py
(cmp no longer exists in 3.0, and range is a list in
Python 2.x and an iterator in Python 3.x)
2.0: rewrote parts of pybench which resulted in more repeatable
timings:
- made timer a parameter
- changed the platform default timer to use high-resolution
timers rather than process timers (which have a much lower
resolution)
- added option to select timer
- added process time timer (using systimes.py)
- changed to use min() as timing estimator (average
is still taken as well to provide an idea of the difference)
- garbage collection is turned off per default
- sys check interval is set to the highest possible value
- calibration is now a separate step and done using
a different strategy that allows measuring the test
overhead more accurately
- modified the tests to each give a run-time of between
100-200ms using warp 10
- changed default warp factor to 10 (from 20)
- compared results with timeit.py and confirmed measurements
- bumped all test versions to 2.0
- updated platform.py to the latest version
- changed the output format a bit to make it look
nicer
- refactored the APIs somewhat
1.3+: Steve Holden added the NewInstances test and the filtering
option during the NeedForSpeed sprint; this also triggered a long
discussion on how to improve benchmark timing and finally
resulted in the release of 2.0
1.3: initial checkin into the Python SVN repository
Have fun,
--
Marc-Andre Lemburg
mal@lemburg.com
|