File: sampling_dau.rst

package info (click to toggle)
scipy 1.17.0-1exp2
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 235,340 kB
  • sloc: cpp: 506,914; python: 357,038; ansic: 215,028; javascript: 89,566; fortran: 19,308; cs: 3,081; f90: 1,150; sh: 860; makefile: 519; pascal: 284; lisp: 134; xml: 56; perl: 51
file content (132 lines) | stat: -rw-r--r-- 4,714 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132

.. _sampling-dau:

Discrete Alias Urn (DAU)
========================

.. currentmodule:: scipy.stats.sampling

* Required: probability vector (PV) or the PMF along with a finite domain
* Speed:

  * Set-up: slow (linear with the vector-length)
  * Sampling: very fast 


DAU samples from distributions with arbitrary but finite probability vectors
(PV) of length N. The algorithm is based on an ingenious method by A.J.
Walker and requires a table of size (at least) N. It needs one random number
and only one comparison for each generated random variate. The setup time for
constructing the tables is O(N).

    >>> import numpy as np
    >>> from scipy.stats.sampling import DiscreteAliasUrn
    >>> 
    >>> pv = [0.18, 0.02, 0.8]
    >>> urng = np.random.default_rng()
    >>> rng = DiscreteAliasUrn(pv, random_state=urng)
    >>> rng.rvs()
    0      # may vary

By default, the probability vector is indexed starting at 0. However, this
can be changed by passing a ``domain`` parameter. When ``domain`` is given
in combination with the PV, it has the effect of relocating the
distribution from ``(0, len(pv))`` to ``(domain[0]``, ``domain[0] + len(pv))``.
``domain[1]`` is ignored in this case.

   >>> rng = DiscreteAliasUrn(pv, domain=(10, 13), random_state=urng)
   >>> rng.rvs()
   12    # may vary

The method also works when no probability vector but a PMF is given.
In that case, a bounded (finite) domain must also be given either by
passing the ``domain`` parameter explicitly or by providing a ``support``
method in the distribution object:

    >>> class Distribution:
    ...     def __init__(self, c):
    ...         self.c = c
    ...     def pmf(self, x):
    ...         return x**self.c
    ...     def support(self):
    ...         return (0, 10)
    ... 
    >>> dist = Distribution(2)
    >>> rng = DiscreteAliasUrn(dist, random_state=urng)
    >>> rng.rvs()
    10    # may vary

.. plot::
    :alt: " "

    >>> import matplotlib.pyplot as plt
    >>> from scipy.stats.sampling import DiscreteAliasUrn
    >>> class Distribution:
    ...     def __init__(self, c):
    ...         self.c = c
    ...     def pmf(self, x):
    ...         return x**self.c
    ...     def support(self):
    ...         return (0, 10)
    ... 
    >>> dist = Distribution(2)
    >>> urng = np.random.default_rng()
    >>> rng = DiscreteAliasUrn(dist, random_state=urng)
    >>> rvs = rng.rvs(1000)
    >>> fig = plt.figure()
    >>> ax = fig.add_subplot(111)
    >>> x = np.arange(1, 11)
    >>> fx = dist.pmf(x)
    >>> fx = fx / fx.sum()
    >>> ax.plot(x, fx, 'bo', label='true distribution')
    >>> ax.vlines(x, 0, fx, lw=2)
    >>> ax.hist(rvs, bins=np.r_[x, 11]-0.5, density=True, alpha=0.5, color='r',
    ...         label='samples')
    >>> ax.set_xlabel('x')
    >>> ax.set_ylabel('PMF(x)')
    >>> ax.set_title('Discrete Alias Urn Samples')
    >>> plt.legend()
    >>> plt.show()

.. note:: As :class:`~DiscreteAliasUrn` expects PMF with signature
          ``def pmf(self, x: float) -> float``, it first vectorizes the
          PMF using ``np.vectorize`` and then evaluates it over all the
          points in the domain. But if the PMF is already vectorized,
          it is much faster to just evaluate it at each point in the domain
          and pass the obtained PV instead along with the domain.
          For example, ``pmf`` methods of SciPy's discrete distributions
          are vectorized and a PV can be obtained by doing:

          >>> from scipy.stats import binom
          >>> from scipy.stats.sampling import DiscreteAliasUrn
          >>> dist = binom(10, 0.2)  # distribution object
          >>> domain = dist.support()  # the domain of your distribution
          >>> x = np.arange(domain[0], domain[1] + 1)
          >>> pv = dist.pmf(x)
          >>> rng = DiscreteAliasUrn(pv, domain=domain)

          Domain is required here to relocate the distribution.

The performance can be slightly influenced by setting the size of the used
table which can be changed by passing a ``urn_factor`` parameter.

    >>> # use a table twice the length of PV.
    >>> urn_factor = 2
    >>> rng = DiscreteAliasUrn(pv, urn_factor=urn_factor, random_state=urng)
    >>> rng.rvs()
    2    # may vary

.. note:: It is recommended to keep this parameter under 2.

Please see [1]_ and [2]_ for more details on this method.


References
----------

.. [1] UNU.RAN reference manual, Section 5.8.2,
       "DAU - (Discrete) Alias-Urn method",
       http://statmath.wu.ac.at/software/unuran/doc/unuran.html#DAU
.. [2] A.J. Walker (1977). An efficient method for generating discrete
       random variables with general distributions, ACM Trans. Math.
       Software 3, pp. 253-256.