File: perf.rst

package info (click to toggle)
ispc 1.28.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 97,620 kB
  • sloc: cpp: 77,067; python: 8,303; yacc: 3,337; lex: 1,126; ansic: 631; sh: 475; makefile: 17
file content (91 lines) | stat: -rw-r--r-- 3,154 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
===========
Performance
===========

The SPMD programming model that ``ispc`` provides makes it easy to harness the
computational power available in SIMD vector units on modern CPUs, while
its basis in C makes it easy for programmers to adopt and use
productively.  This page summarizes the performance of ``ispc`` with the
workloads in the ``examples/`` directory of the ``ispc`` distribution.

These results were measured on a 4-core Apple iMac with a 4-core 3.4GHz
Intel® Core-i7 processor using the Intel® AVX instruction set.  The basis
for comparison is a reference C++ implementation compiled with gcc 4.2.1,
the version distributed with OS X 10.7.2.  (The reference implementation is
also included in the ``examples/`` directory.)

.. list-table:: Performance of ``ispc`` with a variety of the workloads
   from the ``examples/`` directory of the ``ispc`` distribution, compared to
   a reference C++ implementation compiled with gcc 4.2.1.

  * - Workload
    - ``ispc``, 1 core
    - ``ispc``, 4 cores
  * - `AOBench`_ (512 x 512 resolution)
    - 6.19x
    - 28.06x
  * - `Binomial Options`_ (128k options)
    - 7.94x
    - 33.43x
  * - `Black-Scholes Options`_ (128k options)
    - 8.45x
    - 32.48x
  * - `Deferred Shading`_ (1280p)
    - 5.02x
    - 23.06x
  * - `Mandelbrot Set`_
    - 6.21x
    - 20.28x
  * - `Perlin Noise Function`_
    - 5.37x
    - n/a
  * - `Ray Tracer`_ (Sponza dataset)
    - 4.31x
    - 20.29x
  * - `3D Stencil`_
    - 4.05x
    - 15.53x
  * - `Volume Rendering`_
    - 3.60x
    - 17.53x


.. _AOBench: https://github.com/ispc/ispc/tree/main/examples/cpu/aobench
.. _Binomial Options: https://github.com/ispc/ispc/tree/main/examples/cpu/options
.. _Black-Scholes Options: https://github.com/ispc/ispc/tree/main/examples/cpu/options
.. _Deferred Shading: https://github.com/ispc/ispc/tree/main/examples/cpu/deferred
.. _Mandelbrot Set: https://github.com/ispc/ispc/tree/main/examples/cpu/mandelbrot_tasks
.. _Ray Tracer: https://github.com/ispc/ispc/tree/main/examples/cpu/rt
.. _Perlin Noise Function: https://github.com/ispc/ispc/tree/main/examples/cpu/noise
.. _3D Stencil: https://github.com/ispc/ispc/tree/main/examples/cpu/stencil
.. _Volume Rendering: https://github.com/ispc/ispc/tree/main/examples/cpu/volume_rendering


The following table shows speedups for a number of the examples on a
2.40GHz, 40-core Intel® Xeon E7-8870 system with the Intel® SSE4
instruction set, running Microsoft Windows Server 2008 Enterprise.  Here,
the serial C/C++ baseline code was compiled with MSVC 2010.
 
.. list-table:: Performance of ``ispc`` with a variety of the workloads
   from the ``examples/`` directory of the ``ispc`` distribution, on 
   system with 40 CPU cores.

  * - Workload
    - ``ispc``, 40 cores
  * - AOBench (2048 x 2048 resolution)
    - 182.36x
  * - Binomial Options (2m options)
    - 63.85x
  * - Black-Scholes Options (2m options)
    - 83.97x
  * - Ray Tracer (Sponza dataset)
    - 195.67x
  * - Volume Rendering
    - 243.18x


Notices & Disclaimers
=====================

Performance varies by use, configuration and other factors. Learn more at
www.intel.com/PerformanceIndex.