1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357
|
/**
@mainpage Random123: a Library of Counter-Based Random Number Generators
The Random123 library is a collection of counter-based random
number generators (@ref CBRNG "CBRNGs") for CPUs (C and C++) and GPUs (CUDA and OpenCL), as described in
<a href="http://dl.acm.org/citation.cfm?doid=2063405"><i>Parallel Random Numbers: As Easy
as 1, 2, 3</i>, Salmon, Moraes, Dror & Shaw, SC11, Seattle, Washington, USA, 2011, ACM </a>.
They are intended for use in statistical
applications and Monte Carlo simulation
and have passed all of the rigorous
SmallCrush, Crush and BigCrush tests in the
<a href="http://www.iro.umontreal.ca/~simardr/testu01/tu01.html">
extensive TestU01 suite</a> of statistical tests for random number generators.
They are \b not suitable for use in cryptography or security
even though they are constructed using principles drawn from cryptography.
CBRNGs are as fast as, or faster than conventional RNGs, much
easier to parallelize, use minimal memory/cache resources, and
require very little code. On modern architectures, the
Random123 CBRNGs require a few cycles per byte of random data
returned and return random data in convenient sizes (arrays of
two or four elements, each element is an unsigned integer of 32
or 64 bits. The range of random numbers is the full
representable range of the 32 or 64 bit unsigned integer)
The \c <Random123/u01.h> header contains utility functions
to convert 32- and 64-bit unsigned integers to open or closed
ranges of single or double precision floating point numbers.
The Random123 library was written by John Salmon and Mark Moraes. It is
available from
<a href="http://deshawresearch.com/resources_random123.html">
http://deshawresearch.com/resources_random123.html.</a> Please see
the @ref LICENSE "license" for terms and conditions. Please
send
feedback, including bug reports, suggestions, patches, etc. to
random123@deshawresearch.com.
\section overview Overview
Unlike conventional RNGs, counter-based RNGs are
<b>stateless</b> functions (or function classes i.e. functors) whose
arguments are a \e counter, and a \e key
and returns a result of the same type as the counter.
result = CBRNGname(counter, key)
The returned result is a deterministic function of the key and counter,
i.e. a unique (counter, key) tuple will always produce the same
result. The result is highly sensitive to small changes in the inputs,
so that the sequence of values produced by simply
incrementing the counter (or key) is effectively indistinguishable from a
sequence of samples of a uniformly distributed random variable.
For all the CBRNGs in the Random123 library, the result and
counter are the same type, specifically an array of \e N words,
where words have a width of \e W bits, encapsulated in @ref
arrayNxW "r123arrayNxW" structs, or equivalently, for C++, in
the @ref r123::Array1x32 "ArrayNxW" typedefs in the r123
namespace. Keys are usually also arrayMxW types, but sometimes M is
a different size than the counter N (e.g. Philox keys are half the
number of elements as the counter, Threefry and ARS are the same number,
AES uses an opaque key type rather than an array) The N random
numbers returned in \c result.v[] are unsigned integers of
width W (32 or 64), and the range of the random numbers is the full
range of the unsigned integer of that width (i.e. 0 to 2^W-1)
In C++, all public names (classes, structs, typedefs, etc) are in the
\c r123 namespace. In C, the public names (functions, enums, structs,
typedefs) begin either with \c %r123 or with one of the RNG family names, e.g., \c
threefry, \c philox, \c ars, \c aesni. The RNG functions themselves have names like
\c philox4x32. C++ class names are capitalized, e.g., \c Threefry4x32.
\section families The different families of Random123 generators
Several families of CBRNGs are available in this version of the library:
<ul>
<li> @ref ThreefryNxW "Threefry" is a <b>non-cryptographic</b>
adaptation of the Threefish block cipher from the <a href="http://www.skein-hash.info/"> Skein Hash Function</a>.
See @ref r123::Threefry2x32, @ref r123::Threefry4x32, @ref r123::Threefry2x64, @ref r123::Threefry4x64.
<li> @ref PhiloxNxW "Philox" uses a Feistel network and integer multiplication.
See @ref r123::Philox2x32, @ref r123::Philox4x32, @ref r123::Philox2x64, @ref r123::Philox4x64.
The Nx64 forms are only available on hardware
that supports 64-bit multiplication producing a 128-bit result.
<li> @ref AESNI "AESNI" uses the Advanced Encryption Standard (AES) New Instruction,
available on certain modern x86 processors (some models of Intel Westmere and Sandy Bridge,
and AMD Interlagos, as of 2011). AESNI CBRNGs can operate on four 32bit words (internally converting
them to the 128bit SSE type needed by the AES-NI instructions, or on a single m128i "word",
which holds the SSE type.
See @ref r123::AESNI4x32, @ref r123::AESNI1xm128i.
<li> @ref AESNI "ARS" (Advanced Randomization System) is a \b non-cryptographic simplification of @ref AESNI "AESNI".
See @ref r123::ARS4x32_R, @ref r123::ARS1xm128i_R.
</ul>
\section install Installation and Testing
The Random123 library is implemented entirely in header files. Thus,
there is nothing to compile before using it and nothing to link after
you have <c>\#include</c>d it in your source files. Simply direct your C or
C++ compiler to find the header files in the \c include/ directory that
was unpacked from the distribution tar file and use the Random123
header files, types and functions in your application.
In addition to the \c include/ files which implement the library the
distribution also contains an \c examples/ directory. Users are <b>
STRONGLY ADVISED </b> to compile and run the tests in examples/ before using
Random123 in an application (see <c> @ref ExamplesREADME "examples/README"</c>).
Do not use the library if any tests fail. (It is not a failure for
a test to report that it cannot run because of missing
hardware capabilities like 64bit multiply,
SSE, AES-NI or compiler capabilities)
\section usage Usage
\subsection CxxAPI C++ API
A typical C++ use case might look like:
@code
#include <Random123/philox.h>
typedef r123::Philox4x32 RNG;
RNG rng;
RNG::ctr_type c={{}};
RNG::ukey_type uk={{}};
uk[0] = ???; // some user_supplied_seed
RNG::key_type k=uk;
for(...){
c[0] = ???; // some loop-dependent application variable
c[1] = ???; // another loop-dependent application variable
RNG::ctr_type r = rng(c, k);
// use the random values in r for some operation related to
// this iteration on objectid
}
@endcode
On each iteration,\c r contains an array of 4 32-bit random values that
will not be repeated by any other call to \c rng as long as \c c and \c k
are not reused.
In the example above, we use the @ref r123::Philox4x32, but any of the
other @ref CBRNG "CBRNGs" would serve equally well. Also note that
for most CBRNGs, the ukey_type and the key_type are identical; the code
could just as well ignore the ukey_type and directly construct the
key_type. However, for the @ref AESNI "AESNI" CBRNGs, the key_type is opaque, and
must be constructed from a ukey_type, as shown.
\subsection Capi The C API
In C, the example above could be written as:
@code
#include <Random123/philox.h>
philox4x32_ctr_t c={{}};
philox4x32_ukey_t uk={{}};
uk.v[0] = user_supplied_seed;
philox4x32_key_t k = philox4x32keyinit(uk);
for(...){
c.v[0] = ???; /* some loop-dependent application variable */
c.v[1] = ???; /* another loop-dependent application variable */
philox4x32_ctr_t r = philox4x32(c, k);
}
@endcode
In C, access to the contents of the counter and key is through
the fixed-size array member \c v.
\section cuda The CUDA platform
All relevant functions in the C and C++ APIs for Random123 are declared
as CUDA device functions if they are included in a CUDA kernel source file
and compiled with a CUDA compiler (nvcc). They can be used exactly
as described/documented for regular C or C++ programs. Note that
CUDA device functions and host functions share the same namespace, so
it is not currently possible to use Random123 functions in
both the host portion and the device portion of the same .cu source file.
To work around this, you must compile Random123-using host code in
a separate .c source file from your .cu device-resident code.
The Nx32 forms are faster than the Nx64 variants on current (2011)
32-bit GPU architectures.
It has been reported that Random123 uses 16 bytes of
static memory per thread. This is undesirable and not intentional,
but we do not have a workaround other than to suggest adjusting memory
allocation accordingly.
The
pi_cuda.cu and pi_cudapp.cu examples illustrate the use of CUDA.
\section opencl The OpenCL platform
The functions in the Random123 C API can all be used in
OpenCL kernels, just as in regular C functions.
As with CUDA, the Nx32 forms are faster than the Nx64 variants on current (2011)
32-bit GPU architectures.
The pi_opencl.c and pi_opencl_kernel.ocl examples illustrate the use
of OpenCL.
\section cplusplus0x C++0X \<random\> interface
In addition to the stateless ("pure/functional") C++ API above,
the Random123 package includes two C++ classes
that leverage the C++0X \<random\> API.
<ul>
<li>r123::MicroURNG provides an adapter class that provides a
more conventional interface compatible with the C++0X URNG
(uniform random number generator) API; the MicroURNG adapter can
be used with C++0x random number distributions and is
fast/lightweight enough that a new MicroURNG can be instantiated
with a unique key,counter tuple and used for each call to a
distribution, there is little or no overhead to creating
billions of unique MicroURNGs. This adapter retains one of the
key advantages of CBRNGs -- complete application control over
the RNG state.
<li>r123::Engine provides the C++0x Random Engine API. This can
also be used with any of the C++0X random distributions, but
sacrifices the application control over RNG state that is a
defining characteristic of CBRNGs.
</ul>
\section gsl The GNU Scientific Library (GSL) interface
In addition to the stateless ("pure/functional") C API above,
the Random123 package includes two C adapter interfaces
to the <a href="http://www.gnu.org/s/gsl/">GNU Scientific Library (GSL).</a>
<ul>
<li>The \ref GSL_MICRORNG macro allows the application to
define a GSL random number generator. It
can be used with GSL random distributions but still provides the
application with complete control over the RNG state (it is
analogous to the MicroURNG class, in that it uses shorter
periods, and is intended to be instantiated in large numbers for
a few calls to the random distribution).
<li>The \ref GSL_CBRNG macro allows the application to create a GSL
RNG with a completely conventional interface, sacrificing
application control over the internal RNG state.
</ul>
\section u01 Generating uniformly distributed and Gaussian distributed floats and doubles
The Random123 library provides generators for uniformly distributed
random \b integers. Often, applications want random \b real values or
samples from other distributions. The general problem of generating
samples from arbitrary distributions is beyond the scope of the Random123
library. One can, of course, use GSL or MicroURNG and the
distributions in the C++11 \<random\> library, but a few simple cases
are common enough that all that extra machinery seems like overkill.
We have included code in the examples/ directory which developers may
find useful.
<ul>
<li> examples/uniform.hpp - C++ functions that convert random integers to
random, uniformly distributed floating point values.
<li> examples/u01fixedpt.h - C functions that convert random integers to
random, uniformly distributed, equi-spaced, i.e., fixed point,
values.
<li> examples/ua.hpp - C++11 functions that convert r123arrays of
uniformly distributed integers into std::arrays of uniformly
distributed floating point types. The return type is std::array
because it is far easier, with template logic, to return a
std::array of the correct size than an r123array of the correct
size.
<li> examples/boxmuller.hpp - C++ functions that take two
uniformly distributed integers (32 or 64 bit) and
return a pair of Gaussian distributed floats or doubles.
</ul>
The Box-Muller method of generating Gaussian random variables is
particularly well suited to Random123 because it deterministically
consumes exactly two uniform randoms to generate exactly two gaussian
randoms. It uses math library functions: sincos, log and sqrt which
may be slow on some platforms, but which are surprisingly fast on
others. Notably, on GPUs, the lack of branching in the Box-Muller
method and hardware support for math functions overcomes the
transcendental function overhead, making it the fastest generator of
Gaussians that we are aware of.
\subsection Examples Tests and Benchmarks
The @ref ExamplesREADME "examples/" directory, contains tests, examples and benchmarks.
<ul>
<li> Unit tests for individual components and "known-answer-tests", which
should be run to ensure that these RNGs build correctly on desired platforms.
These help to provide assurance that the code is being compiled correctly.
<li> Complete, short programs estimate pi by counting the number of random
points that fall inside a circle inscribed in a square, demonstrating
the C, C++, AES, GSL, OpenCL, CUDA and C++0x APIs.
<li> Header files, including uniform.hpp, ufixed01.h, ua.hpp, and boxmuller.hpp containing code that
users may find useful but that are outside the scope of the Random123 library itself.
<li> Some highly abstracted timing harnesses are provided
which measure performance of a variety of generators in different
programming environments.
</ul>
\section portability Portability
Although we have done our best to make Random123 portable and standards conforming,
it is an unfortunate fact that there is no portable code. There is only
code that has been ported.
We have tested the Random123 library with the following infrastructure
<ul>
<li>Linux, gcc (multiple versions from 3.4.3 through 5.2), on x86_64.
<li>Linux, clang-2.9, 3.0, 3.1, 3.3 and 3.6 on x86_64.
<li>Linux, clang-3.0 and 3.1 with lib++ (2012-04-19 svn checkout) on x86_64.
<li>Linux, open64-4.2.4 on x86_64.
<li>Linux, Intel icc and icpc 12.0.2 on x86_64.
<li>Linux, OpenCL (NVIDIA SDK 4.0.17) on GTX480, M2090, GTX580 and GTX680 GPUs.
<li>Linux, OpenCL (AMD APP SDK 2.4 or 2.5), on x86_64 CPUs and Radeon HD6970 GPUs.
<li>Linux, OpenCL (Intel OpenCL 1.5), on x86_64 CPUs.
<li>Linux, NVIDIA CUDA 4.1.15, 4.2.6, 5.5.22 and 7.5.1. (NOTE: We recommend against the use of CUDA before 4.1)
<li>Linux, gcc-4.1.2 and 4.4.1 on x86.
<li>Solaris, both gcc-3.4.3 and Sun C/C++ 5.8, on x86_64.
<li>FreeBSD 8.2, gcc-4.2.1, on x86_64.
<li>MacOS X 5.8, gcc-4.0.1, on x86.
<li>MacOS X 5.8, llvm-2.9.1 on x86 (problems with catching C++ exceptions).
<li>Windows 7, Microsoft Visual Studio, version 10.0, Microsoft C/C++ compiler 16.00.
</ul>
Others have reported success on
<ul>
<li>MacOS, OpenCL on x86_64 CPUs
<li>Linux, gcc-4.7.2 on Powerpc64 (BlueGene/Q)
<li>Linux, Portland Group Compiler on Powerpc64 (BlueGene/Q)
<li>Linux, IBM xlc on Powerpc64 (BlueGene/Q)
</ul>
\section warnings Warnings
With some compilation options, the CUDA nvcc compiler warns about
unreachable code in array.h. The compiler doesn't recognize that the
code that is unreachable for some values of some macro parameters, is
actually reachable for other values of the parameters. It is possible
to disable that particular warning for a specific compilation unit by
adding -Wcudafe --diag_suppress=111 to the compilation command
line.
\section contributors Contributors
We welcome feedback to random123@deshawresearch.com about ports to other environments.
We are grateful for contributions from the following users:
<ul>
<li> Geoffrey Irving and Gabriel Rockefeller - BlueGene/Q and powerpc ports
<li> Yan Zhou - MacOS and clang ports
<li> David Lawrie - allowing 64-bit philox to compile for both host and device with CUDA
<li> Bogdan Opanchuk - pointing out the inconsistent rotation constants in the implementation of threefry2xW in version 1.07 and earlier.
</ul>
*/
|