File: lssm.rst

package info (click to toggle)
python-bayespy 0.6.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,132 kB
  • sloc: python: 22,402; makefile: 156
file content (452 lines) | stat: -rw-r--r-- 14,132 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
..
   Copyright (C) 2014 Jaakko Luttinen

   This file is licensed under the MIT License. See LICENSE for a text of the
   license.


.. testsetup::

   import numpy as np
   np.random.seed(1)


Linear state-space model
========================


Model
-----

.. currentmodule:: bayespy.nodes

In linear state-space models a sequence of :math:`M`-dimensional observations
:math:`\mathbf{Y}=(\mathbf{y}_1,\ldots,\mathbf{y}_N)` is assumed to be generated
from latent :math:`D`-dimensional states
:math:`\mathbf{X}=(\mathbf{x}_1,\ldots,\mathbf{x}_N)` which follow a first-order
Markov process:

.. math::

   \begin{aligned}
   \mathbf{x}_{n} &= \mathbf{A}\mathbf{x}_{n-1} + \text{noise} \,,
   \\
   \mathbf{y}_{n} &= \mathbf{C}\mathbf{x}_{n} + \text{noise} \,,
   \end{aligned}

where the noise is Gaussian, :math:`\mathbf{A}` is the :math:`D\times D` state
dynamics matrix and :math:`\mathbf{C}` is the :math:`M\times D` loading
matrix. Usually, the latent space dimensionality :math:`D` is assumed to be much
smaller than the observation space dimensionality :math:`M` in order to model
the dependencies of high-dimensional observations efficiently.

In order to construct the model in BayesPy, first import relevant nodes:

>>> from bayespy.nodes import GaussianARD, GaussianMarkovChain, Gamma, Dot

The data vectors will be 30-dimensional:

>>> M = 30

There will be 400 data vectors:

>>> N = 400

Let us use 10-dimensional latent space:

>>> D = 10

The state dynamics matrix :math:`\mathbf{A}` has ARD prior:
    
>>> alpha = Gamma(1e-5,
...               1e-5,
...               plates=(D,),
...               name='alpha')
>>> A = GaussianARD(0,
...                 alpha,
...                 shape=(D,),
...                 plates=(D,),
...                 name='A')

Note that :math:`\mathbf{A}` is a :math:`D\times{}D`-dimensional matrix.
However, in BayesPy it is modelled as a collection (``plates=(D,)``) of
:math:`D`-dimensional vectors (``shape=(D,)``) because this is how the variables
factorize in the posterior approximation of the state dynamics matrix in
:class:`GaussianMarkovChain`.  The latent states are constructed as

>>> X = GaussianMarkovChain(np.zeros(D),
...                         1e-3*np.identity(D),
...                         A,
...                         np.ones(D),
...                         n=N,
...                         name='X')

where the first two arguments are the mean and precision matrix of the initial
state, the third argument is the state dynamics matrix and the fourth argument
is the diagonal elements of the precision matrix of the innovation noise.  The
node also needs the length of the chain given as the keyword argument ``n=N``.
Thus, the shape of this node is ``(N,D)``.

The linear mapping from the latent space to the observation space is modelled
with the loading matrix which has ARD prior:

>>> gamma = Gamma(1e-5,
...               1e-5,
...               plates=(D,),
...               name='gamma')
>>> C = GaussianARD(0,
...                 gamma,
...                 shape=(D,),
...                 plates=(M,1),
...                 name='C')

Note that the plates for ``C`` are ``(M,1)``, thus the full shape of the node is
``(M,1,D)``.  The unit plate axis is added so that ``C`` broadcasts with ``X``
when computing the dot product:

>>> F = Dot(C, 
...         X,
...         name='F')

This dot product is computed over the :math:`D`-dimensional latent space, thus
the result is a :math:`M\times{}N`-dimensional matrix which is now represented
with plates ``(M,N)`` in BayesPy:

>>> F.plates
(30, 400)

We also need to use random initialization either for ``C`` or ``X`` in order to
find non-zero latent space because by default both ``C`` and ``X`` are
initialized to zero because of their prior distributions.  We use random
initialization for ``C`` and then we must update ``X`` the first time before
updating ``C``:

>>> C.initialize_from_random()
    
The precision of the observation noise is given gamma prior:

>>> tau = Gamma(1e-5,
...             1e-5,
...             name='tau')

The observations are noisy versions of the dot products:

>>> Y = GaussianARD(F,
...                 tau,
...                 name='Y')

The variational Bayesian inference engine is then construced as:

>>> from bayespy.inference import VB
>>> Q = VB(X, C, gamma, A, alpha, tau, Y)

Note that ``X`` is given before ``C``, thus ``X`` is updated before ``C`` by
default.

Data
----

Now, let us generate some toy data for our model.  Our true latent space is four
dimensional with two noisy oscillator components, one random walk component and
one white noise component.

>>> w = 0.3
>>> a = np.array([[np.cos(w), -np.sin(w), 0, 0], 
...               [np.sin(w), np.cos(w),  0, 0], 
...               [0,         0,          1, 0],
...               [0,         0,          0, 0]])

The true linear mapping is just random:

>>> c = np.random.randn(M,4)

Then, generate the latent states and the observations using the model equations:

>>> x = np.empty((N,4))
>>> f = np.empty((M,N))
>>> y = np.empty((M,N))
>>> x[0] = 10*np.random.randn(4)
>>> f[:,0] = np.dot(c,x[0])
>>> y[:,0] = f[:,0] + 3*np.random.randn(M)
>>> for n in range(N-1):
...     x[n+1] = np.dot(a,x[n]) + [1,1,10,10]*np.random.randn(4)
...     f[:,n+1] = np.dot(c,x[n+1])
...     y[:,n+1] = f[:,n+1] + 3*np.random.randn(M)

We want to simulate missing values, thus we create a mask which randomly removes
80% of the data:

>>> from bayespy.utils import random
>>> mask = random.mask(M, N, p=0.2)
>>> Y.observe(y, mask=mask)


Inference
---------

As we did not define plotters for our nodes when creating the model, it is done
now for some of the nodes:

>>> import bayespy.plot as bpplt
>>> X.set_plotter(bpplt.FunctionPlotter(center=True, axis=-2))
>>> A.set_plotter(bpplt.HintonPlotter())
>>> C.set_plotter(bpplt.HintonPlotter())
>>> tau.set_plotter(bpplt.PDFPlotter(np.linspace(0.02, 0.5, num=1000)))

This enables plotting of the approximate posterior distributions during VB
learning.  The inference engine can be run using :func:`VB.update` method:

>>> Q.update(repeat=10)
Iteration 1: loglike=-1.439704e+05 (... seconds)
...
Iteration 10: loglike=-1.051441e+04 (... seconds)

The iteration progresses a bit slowly, thus we'll consider parameter expansion
to speed it up.

Parameter expansion
+++++++++++++++++++

Section :ref:`sec-parameter-expansion` discusses parameter expansion for
state-space models to speed up inference.  It is based on a rotating the latent
space such that the posterior in the observation space is not affected:

.. math::

   \mathbf{y}_n = \mathbf{C}\mathbf{x}_n =
   (\mathbf{C}\mathbf{R}^{-1}) (\mathbf{R}\mathbf{x}_n) \,.

Thus, the transformation is
:math:`\mathbf{C}\rightarrow\mathbf{C}\mathbf{R}^{-1}` and
:math:`\mathbf{X}\rightarrow\mathbf{R}\mathbf{X}`.  In order to keep the
dynamics of the latent states unaffected by the transformation, the state
dynamics matrix :math:`\mathbf{A}` must be transformed accordingly:

.. math::

   \mathbf{R}\mathbf{x}_n = \mathbf{R}\mathbf{A}\mathbf{R}^{-1}
   \mathbf{R}\mathbf{x}_{n-1} \,,

resulting in a transformation
:math:`\mathbf{A}\rightarrow\mathbf{R}\mathbf{A}\mathbf{R}^{-1}`.  For more
details, refer to :cite:`Luttinen:2013` and :cite:`Luttinen:2010`.  In BayesPy,
the transformations are available in
:mod:`bayespy.inference.vmp.transformations`:

>>> from bayespy.inference.vmp import transformations

The rotation of the loading matrix along with the ARD parameters is defined as:
    
>>> rotC = transformations.RotateGaussianARD(C, gamma)

For rotating ``X``, we first need to define the rotation of the state dynamics
matrix:

>>> rotA = transformations.RotateGaussianARD(A, alpha)

Now we can define the rotation of the latent states:

>>> rotX = transformations.RotateGaussianMarkovChain(X, rotA)

The optimal rotation for all these variables is found using rotation optimizer:

>>> R = transformations.RotationOptimizer(rotX, rotC, D)

Set the parameter expansion to be applied after each iteration:

>>> Q.callback = R.rotate

Now, run iterations until convergence:

>>> Q.update(repeat=1000)
Iteration 11: loglike=-1.010...e+04 (... seconds)
...
Iteration 58: loglike=-8.906...e+03 (... seconds)
Converged at iteration ...

..
    Iteration 60: loglike=-8.906259e+03 (... seconds)
    Converged at iteration 60.

Results
-------

Because we have set the plotters, we can plot those nodes as:

>>> Q.plot(X, A, C, tau)

.. plot::

   import numpy as np
   np.random.seed(1)
   from bayespy.nodes import GaussianARD, GaussianMarkovChain, Gamma, Dot
   M = 30
   N = 400
   D = 10
   alpha = Gamma(1e-5,
                 1e-5,
                 plates=(D,),
                 name='alpha')
   A = GaussianARD(0,
                   alpha,
                   shape=(D,),
                   plates=(D,),
                   name='A')
   X = GaussianMarkovChain(np.zeros(D),
                           1e-3*np.identity(D),
                           A,
                           np.ones(D),
                           n=N,
                           name='X')
   gamma = Gamma(1e-5,
                 1e-5,
                 plates=(D,),
                 name='gamma')
   C = GaussianARD(0,
                   gamma,
                   shape=(D,),
                   plates=(M,1),
                   name='C')
   F = Dot(C, 
           X,
           name='F')
   C.initialize_from_random()
   tau = Gamma(1e-5,
               1e-5,
               name='tau')
   Y = GaussianARD(F,
                   tau,
                   name='Y')
   from bayespy.inference import VB
   Q = VB(X, C, gamma, A, alpha, tau, Y)
   w = 0.3
   a = np.array([[np.cos(w), -np.sin(w), 0, 0], 
                 [np.sin(w), np.cos(w),  0, 0], 
                 [0,         0,          1, 0],
                 [0,         0,          0, 0]])
   c = np.random.randn(M,4)
   x = np.empty((N,4))
   f = np.empty((M,N))
   y = np.empty((M,N))
   x[0] = 10*np.random.randn(4)
   f[:,0] = np.dot(c,x[0])
   y[:,0] = f[:,0] + 3*np.random.randn(M)
   for n in range(N-1):
       x[n+1] = np.dot(a,x[n]) + [1,1,10,10]*np.random.randn(4)
       f[:,n+1] = np.dot(c,x[n+1])
       y[:,n+1] = f[:,n+1] + 3*np.random.randn(M)
   from bayespy.utils import random
   mask = random.mask(M, N, p=0.2)
   Y.observe(y, mask=mask)
   import bayespy.plot as bpplt
   X.set_plotter(bpplt.FunctionPlotter(center=True, axis=-2))
   A.set_plotter(bpplt.HintonPlotter())
   C.set_plotter(bpplt.HintonPlotter())
   tau.set_plotter(bpplt.PDFPlotter(np.linspace(0.02, 0.5, num=1000)))
   Q.update(repeat=10)
   from bayespy.inference.vmp import transformations
   rotC = transformations.RotateGaussianARD(C, gamma)
   rotA = transformations.RotateGaussianARD(A, alpha)
   rotX = transformations.RotateGaussianMarkovChain(X, rotA)
   R = transformations.RotationOptimizer(rotX, rotC, D)
   Q.callback = R.rotate
   Q.update(repeat=1000)
   Q.plot(X, A, C, tau)
   bpplt.pyplot.show()

There are clearly four effective components in ``X``: random walk (component
number 1), random oscillation (7 and 10), and white noise (9).  These dynamics
are also visible in the state dynamics matrix Hinton diagram.  Note that the
white noise component does not have any dynamics.  Also ``C`` shows only four
effective components.  The posterior of ``tau`` captures the true value
:math:`3^{-2}\approx0.111` accurately.  We can also plot predictions in the
observation space:

>>> bpplt.plot(F, center=True)

.. plot::

   import numpy as np
   np.random.seed(1)
   from bayespy.nodes import GaussianARD, GaussianMarkovChain, Gamma, Dot
   M = 30
   N = 400
   D = 10
   alpha = Gamma(1e-5,
                 1e-5,
                 plates=(D,),
                 name='alpha')
   A = GaussianARD(0,
                   alpha,
                   shape=(D,),
                   plates=(D,),
                   name='A')
   X = GaussianMarkovChain(np.zeros(D),
                           1e-3*np.identity(D),
                           A,
                           np.ones(D),
                           n=N,
                           name='X')
   gamma = Gamma(1e-5,
                 1e-5,
                 plates=(D,),
                 name='gamma')
   C = GaussianARD(0,
                   gamma,
                   shape=(D,),
                   plates=(M,1),
                   name='C')
   F = Dot(C, 
           X,
           name='F')
   C.initialize_from_random()
   tau = Gamma(1e-5,
               1e-5,
               name='tau')
   Y = GaussianARD(F,
                   tau,
                   name='Y')
   from bayespy.inference import VB
   Q = VB(X, C, gamma, A, alpha, tau, Y)
   w = 0.3
   a = np.array([[np.cos(w), -np.sin(w), 0, 0], 
                 [np.sin(w), np.cos(w),  0, 0], 
                 [0,         0,          1, 0],
                 [0,         0,          0, 0]])
   c = np.random.randn(M,4)
   x = np.empty((N,4))
   f = np.empty((M,N))
   y = np.empty((M,N))
   x[0] = 10*np.random.randn(4)
   f[:,0] = np.dot(c,x[0])
   y[:,0] = f[:,0] + 3*np.random.randn(M)
   for n in range(N-1):
       x[n+1] = np.dot(a,x[n]) + [1,1,10,10]*np.random.randn(4)
       f[:,n+1] = np.dot(c,x[n+1])
       y[:,n+1] = f[:,n+1] + 3*np.random.randn(M)
   from bayespy.utils import random
   mask = random.mask(M, N, p=0.2)
   Y.observe(y, mask=mask)
   import bayespy.plot as bpplt
   Q.update(repeat=10)
   from bayespy.inference.vmp import transformations
   rotC = transformations.RotateGaussianARD(C, gamma)
   rotA = transformations.RotateGaussianARD(A, alpha)
   rotX = transformations.RotateGaussianMarkovChain(X, rotA)
   R = transformations.RotationOptimizer(rotX, rotC, D)
   Q.callback = R.rotate
   Q.update(repeat=1000)
   bpplt.plot(F, center=True)
   bpplt.pyplot.show()

We can also measure the performance numerically by computing root-mean-square
error (RMSE) of the missing values:

>>> from bayespy.utils import misc
>>> misc.rmse(y[~mask], F.get_moments()[0][~mask])
5.18...


This is relatively close to the standard deviation of the noise (3), so the
predictions are quite good considering that only 20% of the data was used.