1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
|
Performance of the Virtual Machine in NumExpr2.0
================================================
Numexpr 2.0 leverages a new virtual machine completely based on the new ndarray
iterator introduced in NumPy 1.6. This represents a nice combination of the
advantages of using the new iterator, while retaining the ability to avoid
copies in memory as well as the multi-threading capabilities of the previous
virtual machine (1.x series).
The increased performance of the new virtual machine can be seen in several
scenarios, like:
* *Broadcasting*. Expressions containing arrays that needs to be broadcasted,
will not need additional memory (i.e. they will be broadcasted on-the-fly).
* *Non-native dtypes*. These will be translated to native dtypes on-the-fly,
so there is not need to convert the whole arrays first.
* *Fortran-ordered arrays*. The new iterator will find the best path to
optimize operations on such arrays, without the need to transpose them first.
There is a drawback though: performance with small arrays suffers a bit because
of higher set-up times for the new virtual machine. See below for detailed
benchmarks.
Some benchmarks for best-case scenarios
---------------------------------------
Here you have some benchmarks of some scenarios where the new virtual machine
actually represents an advantage in terms of speed (also memory, but this is
not shown here). As you will see, the improvement is notable in many areas,
ranging from 3x to 6x faster operations.
Broadcasting
^^^^^^^^^^^^
>>> a = np.arange(1e3)
>>> b = np.arange(1e6).reshape(1e3, 1e3)
>>> timeit ne.evaluate("a*(b+1)") # 1.4.2
100 loops, best of 3: 16.4 ms per loop
>>> timeit ne.evaluate("a*(b+1)") # 2.0
100 loops, best of 3: 5.2 ms per loop
Non-native types
^^^^^^^^^^^^^^^^
>>> a = np.arange(1e6, dtype=">f8")
>>> b = np.arange(1e6, dtype=">f8")
>>> timeit ne.evaluate("a*(b+1)") # 1.4.2
100 loops, best of 3: 17.2 ms per loop
>>> timeit ne.evaluate("a*(b+1)") # 2.0
100 loops, best of 3: 6.32 ms per loop
Fortran-ordered arrays
^^^^^^^^^^^^^^^^^^^^^^
>>> a = np.arange(1e6).reshape(1e3, 1e3).copy('F')
>>> b = np.arange(1e6).reshape(1e3, 1e3).copy('F')
>>> timeit ne.evaluate("a*(b+1)") # 1.4.2
10 loops, best of 3: 32.8 ms per loop
>>> timeit ne.evaluate("a*(b+1)") # 2.0
100 loops, best of 3: 5.62 ms per loop
Mix of 'non-native' arrays, Fortran-ordered, and using broadcasting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> a = np.arange(1e3, dtype='>f8').copy('F')
>>> b = np.arange(1e6, dtype='>f8').reshape(1e3, 1e3).copy('F')
>>> timeit ne.evaluate("a*(b+1)") # 1.4.2
10 loops, best of 3: 21.2 ms per loop
>>> timeit ne.evaluate("a*(b+1)") # 2.0
100 loops, best of 3: 5.22 ms per loop
Longer setup-time
^^^^^^^^^^^^^^^^^
The only drawback of the new virtual machine is during the computation of
small arrays::
>>> a = np.arange(10)
>>> b = np.arange(10)
>>> timeit ne.evaluate("a*(b+1)") # 1.4.2
10000 loops, best of 3: 22.1 µs per loop
>>> timeit ne.evaluate("a*(b+1)") # 2.0
10000 loops, best of 3: 30.6 µs per loop
i.e. the new virtual machine takes a bit more time to set-up (around 8 µs in
this machine). However, this should be not too important because for such a
small arrays NumPy is always a better option::
>>> timeit c = a*(b+1)
100000 loops, best of 3: 4.16 µs per loop
And for arrays large enough the difference is negligible::
>>> a = np.arange(1e6)
>>> b = np.arange(1e6)
>>> timeit ne.evaluate("a*(b+1)") # 1.4.2
100 loops, best of 3: 5.77 ms per loop
>>> timeit ne.evaluate("a*(b+1)") # 2.0
100 loops, best of 3: 5.77 ms per loop
Conclusion
----------
The new virtual machine introduced in numexpr 2.0 brings more performance in
many different scenarios (broadcast, non-native dtypes, Fortran-orderd arrays),
while it shows slightly worse performance for small arrays. However, as
numexpr is more geared to compute large arrays, the new virtual machine should
be good news for numexpr users in general.
|