1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
|
Performance
===========
.. currentmodule:: numpy.random
Recommendation
--------------
The recommended generator for general use is `PCG64` or its upgraded variant
`PCG64DXSM` for heavily-parallel use cases. They are statistically high quality,
full-featured, and fast on most platforms, but somewhat slow when compiled for
32-bit processes. See :ref:`upgrading-pcg64` for details on when heavy
parallelism would indicate using `PCG64DXSM`.
`Philox` is fairly slow, but its statistical properties have
very high quality, and it is easy to get an assuredly-independent stream by using
unique keys. If that is the style you wish to use for parallel streams, or you
are porting from another system that uses that style, then
`Philox` is your choice.
`SFC64` is statistically high quality and very fast. However, it
lacks jumpability. If you are not using that capability and want lots of speed,
even on 32-bit processes, this is your choice.
`MT19937` `fails some statistical tests`_ and is not especially
fast compared to modern PRNGs. For these reasons, we mostly do not recommend
using it on its own, only through the legacy `~.RandomState` for
reproducing old results. That said, it has a very long history as a default in
many systems.
.. _`fails some statistical tests`: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf
Timings
-------
The timings below are the time in ns to produce 1 random value from a
specific distribution. The original `MT19937` generator is
much slower since it requires 2 32-bit values to equal the output of the
faster generators.
Integer performance has a similar ordering.
The pattern is similar for other, more complex generators. The normal
performance of the legacy `RandomState` generator is much
lower than the other since it uses the Box-Muller transform rather
than the Ziggurat method. The performance gap for Exponentials is also
large due to the cost of computing the log function to invert the CDF.
The column labeled MT19973 uses the same 32-bit generator as
`RandomState` but produces random variates using `Generator`.
.. csv-table::
:header: ,MT19937,PCG64,PCG64DXSM,Philox,SFC64,RandomState
:widths: 14,14,14,14,14,14,14
32-bit Unsigned Ints,3.3,1.9,2.0,3.3,1.8,3.1
64-bit Unsigned Ints,5.6,3.2,2.9,4.9,2.5,5.5
Uniforms,5.9,3.1,2.9,5.0,2.6,6.0
Normals,13.9,10.8,10.5,12.0,8.3,56.8
Exponentials,9.1,6.0,5.8,8.1,5.4,63.9
Gammas,37.2,30.8,28.9,34.0,27.5,77.0
Binomials,21.3,17.4,17.6,19.3,15.6,21.4
Laplaces,73.2,72.3,76.1,73.0,72.3,82.5
Poissons,111.7,103.4,100.5,109.4,90.7,115.2
The next table presents the performance in percentage relative to values
generated by the legacy generator, ``RandomState(MT19937())``. The overall
performance was computed using a geometric mean.
.. csv-table::
:header: ,MT19937,PCG64,PCG64DXSM,Philox,SFC64
:widths: 14,14,14,14,14,14
32-bit Unsigned Ints,96,162,160,96,175
64-bit Unsigned Ints,97,171,188,113,218
Uniforms,102,192,206,121,233
Normals,409,526,541,471,684
Exponentials,701,1071,1101,784,1179
Gammas,207,250,266,227,281
Binomials,100,123,122,111,138
Laplaces,113,114,108,113,114
Poissons,103,111,115,105,127
Overall,159,219,225,174,251
.. note::
All timings were taken using Linux on an AMD Ryzen 9 3900X processor.
Performance on different Operating Systems
------------------------------------------
Performance differs across platforms due to compiler and hardware availability
(e.g., register width) differences. The default bit generator has been chosen
to perform well on 64-bit platforms. Performance on 32-bit operating systems
is very different.
The values reported are normalized relative to the speed of MT19937 in
each table. A value of 100 indicates that the performance matches the MT19937.
Higher values indicate improved performance. These values cannot be compared
across tables.
64-bit Linux
~~~~~~~~~~~~
===================== ========= ======= =========== ======== =======
Distribution MT19937 PCG64 PCG64DXSM Philox SFC64
===================== ========= ======= =========== ======== =======
32-bit Unsigned Ints 100 168 166 100 182
64-bit Unsigned Ints 100 176 193 116 224
Uniforms 100 188 202 118 228
Normals 100 128 132 115 167
Exponentials 100 152 157 111 168
Overall 100 161 168 112 192
===================== ========= ======= =========== ======== =======
64-bit Windows
~~~~~~~~~~~~~~
The relative performance on 64-bit Linux and 64-bit Windows is broadly similar
with the notable exception of the Philox generator.
===================== ========= ======= =========== ======== =======
Distribution MT19937 PCG64 PCG64DXSM Philox SFC64
===================== ========= ======= =========== ======== =======
32-bit Unsigned Ints 100 155 131 29 150
64-bit Unsigned Ints 100 157 143 25 154
Uniforms 100 151 144 24 155
Normals 100 129 128 37 150
Exponentials 100 150 145 28 159
**Overall** 100 148 138 28 154
===================== ========= ======= =========== ======== =======
32-bit Windows
~~~~~~~~~~~~~~
The performance of 64-bit generators on 32-bit Windows is much lower than on 64-bit
operating systems due to register width. MT19937, the generator that has been
in NumPy since 2005, operates on 32-bit integers.
===================== ========= ======= =========== ======== =======
Distribution MT19937 PCG64 PCG64DXSM Philox SFC64
===================== ========= ======= =========== ======== =======
32-bit Unsigned Ints 100 24 34 14 57
64-bit Unsigned Ints 100 21 32 14 74
Uniforms 100 21 34 16 73
Normals 100 36 57 28 101
Exponentials 100 28 44 20 88
**Overall** 100 25 39 18 77
===================== ========= ======= =========== ======== =======
.. note::
Linux timings used Ubuntu 20.04 and GCC 9.3.0. Windows timings were made on
Windows 10 using Microsoft C/C++ Optimizing Compiler Version 19 (Visual
Studio 2019). All timings were produced on an AMD Ryzen 9 3900X processor.
|