File: array-random.rst

package info (click to toggle)
dask 2024.12.1%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 20,024 kB
  • sloc: python: 105,182; javascript: 1,917; makefile: 159; sh: 88
file content (89 lines) | stat: -rw-r--r-- 3,924 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
Random Number Generation
========================

Dask's random number routines produce pseudo random numbers using combinations
of a ``BitGenerator`` to create sequences and a ``Generator`` to use those
sequences to sample from different statistical distributions.

Since Dask version 2023.2.1, the ``Generator`` can be initialized with a number
of different ``BitGenerator`` classes. It exposes many different probability
distributions. The legacy ``RandomState`` random number routines are still
available, but are considered frozen and will not be getting any updates.

Differences with NumPy
----------------------

Dask follows the NumPy interface for random number generation with some
differences:

- Methods under ``dask.array.random`` take a ``chunks`` keyword.
- Dask tries to be backend agnostic. In other words, you can mostly use CuPy
  and NumPy interchangeably as a backend for random number generation. Any
  library providing a similar interface should also work with some effort.

Notes
-----

- **BitGenerators:** Objects that generate random sequences. These are
  provided by a backend library such as NumPy or CuPy and are typically
  unsigned integer words filled with sequences of either 32 or 64 random
  bits.

- **Generators:** Objects that transform sequences of random bits from a
  ``BitGenerator`` into sequences of numbers that follow a specific probability
  distribution (such as uniform, Normal or Binomial) within a specified
  interval.

- Dask does not guarantee that the same number generator is used across versions.
  This means that numbers generated by ``dask.array.random`` by a new version may
  not be the same as the previous one, even when the same seed and distribution
  are used. As better algorithms evolve, the bit stream may change.

- Dask does not guarantee parity in the generated numbers with any third party
  library. In particular, numbers generated by Dask and NumPy or CuPy will differ
  even when given the same seed and ``BitGenerator``. Dask tends to spawn ``SeedSequence``
  children to produce independent random number streams in parallel.

- Many of the RandomState methods are exported as functions in ``dask.array.random``.
  This usage is discouraged, as it is implemented via a global RandomState instance
  which is not advised on two counts:

  1. It uses global state, which means results will change as the code changes.
  2. It uses a RandomState rather than the more modern Generator.

  For backward compatible legacy reasons, we cannot change this. Use 
  ``dask.array.random.default_rng()`` to get a Generator and use its methods instead.

- ``Generator.integers`` is now the canonical way to generate integer random numbers
  from a discrete uniform distribution. The `endpoint` keyword can be used to
  specify open or closed intervals. This replaces both `randint` and `random_integers`.

- ``Generator.random`` is now the canonical way to generate floating-point random
  numbers, which replaces `random_sample`. The ``dask.array.random.random``
  method still uses ``RandomState`` for backwards compatibility and should be
  avoided for new code. Please use ``Generator.random`` instead.

Quick Start
-----------

Call ``default_rng`` to get a new instance of a Generator, then call its methods to
obtain samples from different distributions. By default, ``Generator`` uses bits
provided by PCG64 which has better statistical properties than the legacy MT19937
used in ``RandomState``.

.. code-block:: python

    # Do this (new version)
    import dask.array as da
    rng = da.random.default_rng()
    vals = rng.standard_normal(10)
    more_vals = rng.standard_normal(10)

    # instead of this (legacy version)
    import dask.array as da
    vals = da.random.standard_normal(10)
    more_vals = da.random.standard_normal(10)



For further info, please see `NumPy docs <https://numpy.org/devdocs/reference/random/index.html>`_