File: empirical_protein_models.rst

package info (click to toggle)
python-cogent 1.4.1-1.2
  • links: PTS, VCS
  • area: non-free
  • in suites: squeeze
  • size: 13,260 kB
  • ctags: 20,087
  • sloc: python: 116,163; ansic: 732; makefile: 74; sh: 9
file content (93 lines) | stat: -rw-r--r-- 2,891 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
Use an empirical protein substitution model
===========================================

.. sectionauthor:: Gavin Huttley

This file contains an example of importing an empirically determined protein substitution matrix such as Dayhoff et al 1978 and using it to create a substitution model. The globin alignment is from the PAML distribution.

.. doctest::

    >>> from cogent import LoadSeqs, LoadTree, PROTEIN
    >>> from cogent.evolve.substitution_model import EmpiricalProteinMatrix
    >>> from cogent.parse.paml_matrix import PamlMatrixParser

Make a tree object.  In this case from a string.

.. doctest::

    >>> treestring="(((rabbit,rat),human),goat-cow,marsupial);"
    >>> t = LoadTree(treestring=treestring)

Import the alignment, explicitly setting the ``moltype`` to be protein

.. doctest::

    >>> al = LoadSeqs('data/abglobin_aa.phylip',
    ...                interleaved=True,
    ...                moltype=PROTEIN,
    ...                )

Open the file that contains the empirical matrix and parse the matrix and frequencies.

.. doctest::

    >>> matrix_file = open('data/dayhoff.dat')

The ``PamlMatrixParser`` will import the matrix and frequency from files designed for Yang's PAML package.  This format is the lower half of the matrix in three letter amino acid name order white space delineated followed by motif frequencies in the same order.

.. doctest::

    >>> empirical_matrix, empirical_frequencies = PamlMatrixParser(matrix_file)

Create an Empirical Protein Matrix Substitution model object.  This will take the unscaled empirical matrix and use it and the motif frequencies to create a scaled Q matrix.

.. doctest::

    >>> sm = EmpiricalProteinMatrix(empirical_matrix, empirical_frequencies)

Make a parameter controller, likelihood function object and optimise.

.. doctest::

    >>> lf = sm.makeLikelihoodFunction(t)
    >>> lf.setAlignment(al)
    >>> lf.optimise(show_progress = False)
    >>> print lf.getLogLikelihood()
    -1706...
    >>> print lf
    Likelihood Function Table
    =============================
         edge    parent    length
    -----------------------------
       rabbit    edge.0    0.0785
          rat    edge.0    0.1750
       edge.0    edge.1    0.0324
        human    edge.1    0.0545
       edge.1      root    0.0269
     goat-cow      root    0.0972
    marsupial      root    0.2424
    -----------------------------
    ===============
    motif    mprobs
    ---------------
        A    0.0871
        C    0.0335
        D    0.0469
        E    0.0495
        F    0.0398
        G    0.0886
        H    0.0336
        I    0.0369
        K    0.0805
        L    0.0854
        M    0.0148
        N    0.0404
        P    0.0507
        Q    0.0383
        R    0.0409
        S    0.0696
        T    0.0585
        V    0.0647
        W    0.0105
        Y    0.0299
    ---------------