File: seqsim_alignment_simulation.rst

package info (click to toggle)
python-cogent 1.5.3-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 16,424 kB
  • ctags: 24,343
  • sloc: python: 134,200; makefile: 100; ansic: 17; sh: 10
file content (123 lines) | stat: -rw-r--r-- 3,550 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
Seqsim Simple Alignment Simulation Example
==========================================

.. sectionauthor:: Julia Goodrich

This is a very simple example of how to use the ``seqsim`` module to simulate
an alignment for a tree starting with a random sequence and substitution rate 
matrix (q). The rate matrix gives the rate constant of going from one character 
in the sequence to another character in the sequence, the Q matrix determines 
the rate of change of the sequence.

First we will perform the necessary imports:

* ``Rates`` is an object that stores the rate matrix data, it can also be used 
    to generate a random rate matrix given an ``Alphabet``.

* ``DnaUsage`` is a ``Usage`` object that stores the usage of each nucleotide.

* ``DnaPairs`` is an Alphabet it stores the DNA pairs (AA,AT,AC,...), it can
    be passed into the ``Rates`` object, defining the rate matrix pairs for DNA.

* ``DNA`` is a ``MolType`` object for DNA.

* ``RangeNode`` is the main ``seqsim`` Node object, it allows for the easy 
    evolution of sequences.

* ``DndParser`` is a parser for a newick format tree.

.. doctest::

    >>> from cogent.seqsim.usage import Rates, DnaUsage
    >>> from cogent.core.usage import DnaPairs
    >>> from cogent.core.moltype import DNA
    >>> from cogent.core.alignment import Alignment
    >>> from cogent.seqsim.tree import RangeNode
    >>> from cogent.parse.tree import DndParser
    
Now, lets specify a 4 taxon tree:

.. doctest::

    >>> t = DndParser('(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1);', 
    ... constructor = RangeNode)
    
To generate a random DNA sequence, we first specify nucleotide frequencies 
with the ``DnaUsage`` object. Then we create a random DNA sequence that is 
five bases long.

.. doctest::

    >>> u = DnaUsage({'A':0.5,'T':0.2,'C':0.15,'G':0.25})
    >>> s = DNA.ModelSeq(u.randomIndices(5))
    >>> q = Rates.random(DnaPairs)

Set q at the base of the tree and propagate it to all nodes in the tree,

.. doctest::

    >>> t.Q = q
    >>> t.propagateAttr('Q')

Set a P matrix (probability matrix) from every Q matrix on each node: 
    P(t) = e^(Qt),

.. doctest::

    >>> t.assignP()

Use ``evolve`` to evolve sequences for each tip, Note: must evolve sequence
data, not sequence object itself (for speed)

.. doctest::
    >>> t.evolve(s._data)

Build alignment,

.. doctest::

    >>> seqs = {}
    >>> for n in t.tips():
    ...     seqs[n.Name] = DNA.ModelSeq(n.Sequence)
    >>> aln = Alignment(seqs)

The result is a Cogent ``Alignment`` object, which can be used the same way as
any other alignment object.

``evolveSeqs`` can be used instead of evolve to evolve multiple sequences
according to the same tree (can model either different genes, or different rate
categories within a gene that you then combine, etc...),

.. doctest::

    >>> from numpy import concatenate

First you need to use ``assignPs`` to assign the proper P matricies given rates:

.. doctest::

    >>> t.assignPs([.5, .75, 1])

There needs to be the same number of random sequences as there are rate 
catigories so we create a list of 3 random sequences,

.. doctest::

    >>> s = [DNA.ModelSeq(u.randomIndices(5))._data for i in range(0,3)]

Then use ``evolveSeqs`` to evolve a sequence for every tip with every rate.

.. doctest::

    >>> t.evolveSeqs(s)

Now to concatenate the sequences,

.. doctest::

    >>> seqs = {}
    >>> for n in t.tips():
    ...     for s in n.Sequences:
    ...         seqs[n.Name] = DNA.ModelSeq(concatenate(tuple(n.Sequences)))
    >>> aln = Alignment(seqs)