1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
|
"""
Routines for Gaussian Mixture Models and learning with Expectation Maximization
===============================================================================
This module contains classes and function to compute multivariate Gaussian
densities (diagonal and full covariance matrices), Gaussian mixtures, Gaussian
mixtures models and an Em trainer.
More specifically, the module defines the following classes, functions:
- densities.gauss_den: function to compute multivariate Gaussian pdf
- gauss_mix.GM: defines the GM (Gaussian Mixture) class. A Gaussian Mixture can
be created from its parameters weights, mean and variances, or from its meta
parameters d (dimension of the Gaussian) and k (number of components in the
mixture). A Gaussian Model can then be sampled or plot (if d>1, plot
confidence ellipsoids projected on 2 chosen dimensions, if d == 1, plot the
pdf of each component and fill the zone of confidence for a given level)
- gmm_em.GMM: defines a class GMM (Gaussian Mixture Model). This class is
constructed from a GM model gm, and can be used to train gm. The GMM can be
initiated by kmean or at random, and can compute sufficient statistics, and
update its parameters from the sufficient statistics.
- kmean.kmean: implements a kmean algorithm. We cannot use scipy.cluster.vq
kmeans, since its does not give membership of observations.
Example of use:
---------------
>>> #++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> # Create an artificial 2 dimension, 3 clusters GM model, samples it
>>> #++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> w, mu, va = GM.gen_param(2, 3, 'diag', spread = 1.5)
>>> gm = GM.fromvalues(w, mu, va)
>>>
>>> # Sample 1000 frames from the model
>>> data = gm.sample(1000)
>>>
>>> #++++++++++++++++++++++++
>>> # Learn the model with EM
>>> #++++++++++++++++++++++++
>>> # Init the model
>>> lgm = GM(d, k, mode)
>>> gmm = GMM(lgm, 'kmean')
>>>
>>> # The actual EM, with likelihood computation. The threshold
>>> # is compared to the (linearly appromixated) derivative of the likelihood
>>> em = EM()
>>> like = em.train(data, gmm, maxiter = 30, thresh = 1e-8)
Files example.py and example2.py show more capabilities of the toolbox, including
plotting capabilities (using matplotlib) and model selection using Bayesian
Information Criterion (BIC).
Bibliography:
- Maximum likelihood from incomplete data via the EM algorithm in Journal of
the Royal Statistical Society, Series B, 39(1):1--38, 1977, by A. P.
Dempster, N. M. Laird, and D. B. Rubin
- Bayesian Approaches to Gaussian Mixture Modelling (1998) by Stephen J.
Roberts, Dirk Husmeier, Iead Rezek, William Penny in IEEE Transactions on
Pattern Analysis and Machine Intelligence
Copyright: David Cournapeau 2006
License: BSD-style (see LICENSE.txt in main source directory)
"""
version = '0.5.7dev'
depends = ['linalg', 'stats']
ignore = False
|