File: neutral_test.rst

package info (click to toggle)
python-cogent 1.4.1-1.2
  • links: PTS, VCS
  • area: non-free
  • in suites: squeeze
  • size: 13,260 kB
  • ctags: 20,087
  • sloc: python: 116,163; ansic: 732; makefile: 74; sh: 9
file content (156 lines) | stat: -rw-r--r-- 4,850 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
A test of the neutral theory
============================

.. sectionauthor:: Gavin Huttley

This file contains an example for performing a likelihood ratio test of neutrality. The test compares a model where the codon model parameter omega is constrained to be the same for all edges against one where each edge has its' own omega. From cogent import all the components we need.

.. doctest::

    >>> from cogent import LoadSeqs, LoadTree
    >>> from cogent.evolve.models import MG94GTR
    >>> from cogent.maths import stats

Get your alignment and tree.

.. doctest::

    >>> al = LoadSeqs("data/long_testseqs.fasta")
    >>> t = LoadTree("data/test.tree")

We use a Goldman Yang 1994 model.

.. doctest::

    >>> sm = MG94GTR()

Make the controller object

.. doctest::

    >>> lf = sm.makeLikelihoodFunction(t, digits=2, space=2)

Get the likelihood function object this object performs the actual likelihood calculation.

.. doctest::

    >>> lf.setAlignment(al)

By default, parameters other than branch lengths are treated as global in scope, so we don't need to do anything special here. We can influence how rigorous the optimisation will be, and switch between the global and local optimisers provided in the toolkit using arguments to the optimise method. The ``global_tolerance=1.0`` argument specifies conditions for an early break from simulated annealing which will be automatically followed by the Powell local optimiser. .. note:: the 'results' are of course nonsense.

.. doctest::

    >>> lf.optimise(global_tolerance = 1.0, show_progress=False)

View the resulting maximum-likelihood parameter values

.. doctest::

    >>> print lf
    Likelihood Function Table
    ===================================
     A/C   A/G   A/T   C/G   C/T  omega
    -----------------------------------
    1.02  3.36  0.73  0.95  3.71   0.90
    -----------------------------------
    =========================
         edge  parent  length
    -------------------------
        Human  edge.0    0.09
    HowlerMon  edge.0    0.12
       edge.0  edge.1    0.12
        Mouse  edge.1    0.84
       edge.1    root    0.06
    NineBande    root    0.28
     DogFaced    root    0.34
    -------------------------
    =============
    motif  mprobs
    -------------
        T    0.23
        C    0.19
        A    0.37
        G    0.21
    -------------

We'll get the lnL and number of free parameters for later use.

.. doctest::

    >>> null_lnL = lf.getLogLikelihood()
    >>> null_nfp = lf.getNumFreeParams()

Specify each edge has it's own omega by just modifying the existing ``lf``. This means the new function will start with the above values.

.. doctest::

    >>> lf.setParamRule("omega", is_independent = True)

Optimise the likelihood function, this time just using the local optimiser.

.. doctest::

    >>> lf.optimise(local = True, show_progress=False)

View the resulting maximum-likelihood parameter values.

.. doctest::

    >>> print lf
    Likelihood Function Table
    ============================
     A/C   A/G   A/T   C/G   C/T
    ----------------------------
    1.03  3.38  0.73  0.95  3.72
    ----------------------------
    ================================
         edge  parent  length  omega
    --------------------------------
        Human  edge.0    0.09   0.59
    HowlerMon  edge.0    0.12   0.96
       edge.0  edge.1    0.11   1.13
        Mouse  edge.1    0.83   0.92
       edge.1    root    0.06   0.39
    NineBande    root    0.28   1.28
     DogFaced    root    0.34   0.84
    --------------------------------
    =============
    motif  mprobs
    -------------
        T    0.23
        C    0.19
        A    0.37
        G    0.21
    -------------

Get out an annotated tree, it looks just like a tree, but has the maximum-likelihood parameter estimates attached to each tree edge. This object can be used for plotting, or to provide starting estimates to a related model.

.. doctest::

    >>> at = lf.getAnnotatedTree()

Getting the maximum likelihood estimates for post-processing out can be done in numerous ways. Here I use the ``getStatisticsAsDict`` method.

.. doctest::

    >>> sd = lf.getStatisticsAsDict(with_edge_names=True)

The lnL's from the two models are now used to calculate the likelihood ratio statistic (``LR``) it's degrees-of-freedom (``df``) and the probability (``P``) of observing the LR.

.. doctest::

    >>> LR = 2 * (lf.getLogLikelihood() - null_lnL)
    >>> df = lf.getNumFreeParams() - null_nfp
    >>> P = stats.chisqprob(LR, df)

Print this and look up a chi-sq with number of edges - 1 degrees of freedom.

.. doctest::

    >>> print "Likelihood ratio statistic = ", LR
    Likelihood ratio statistic =  8...
    >>> print "degrees-of-freedom = ", df
    degrees-of-freedom =  6
    >>> print "probability = ", P
    probability =  0.2...