File: building_alignments.rst

package info (click to toggle)
python-cogent 1.5.3-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 16,424 kB
  • ctags: 24,343
  • sloc: python: 134,200; makefile: 100; ansic: 17; sh: 10
file content (138 lines) | stat: -rw-r--r-- 5,106 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
*******************
Building alignments
*******************

.. authors, Gavin Huttley, Kristian Rother, Patrick Yannul

Using the cogent aligners
=========================

Running a pairwise Needleman-Wunsch-Alignment
---------------------------------------------

.. doctest::
    
    >>> from cogent.align.algorithm import nw_align
    >>> seq1 = 'AKSAMITNY'
    >>> seq2 = 'AKHSAMMIT'
    >>> print nw_align(seq1,seq2)
    ('AK-SAM-ITNY', 'AKHSAMMIT--')

Running a progressive aligner
-----------------------------

We import useful functions and then load the sequences to be aligned.

.. doctest::
    
    >>> from cogent import LoadSeqs, LoadTree, DNA
    >>> seqs = LoadSeqs('data/test2.fasta', aligned=False, moltype=DNA)

For nucleotides
^^^^^^^^^^^^^^^

We load a canned nucleotide substitution model and the progressive aligner ``TreeAlign`` function.

.. doctest::
    
    >>> from cogent.evolve.models import HKY85
    >>> from cogent.align.progressive import TreeAlign

We first align without providing a guide tree. The ``TreeAlign`` algorithm builds pairwise alignments and estimates the substitution model parameters and pairwise distances. The distances are used to build a neighbour joining tree and the median value of substitution model parameters are provided to the substitution model for the progressive alignment step.

.. doctest::
    
    >>> aln, tree = TreeAlign(HKY85(), seqs)
    Param Estimate Summary Stats: kappa
    ==============================
            Statistic        Value
    ------------------------------
                Count           10
                  Sum        1e+06
               Median        4.256
                 Mean        1e+05
    StandardDeviation    3.162e+05
             Variance        1e+11
    ------------------------------
    >>> aln
    5 x 60 text alignment: NineBande[-C-----GCCA...], Mouse[GCAGTGAGCCA...], DogFaced[GCAAGGAGCCA...], ...

We then align using a guide tree (pre-estimated) and specifying the ratio of transitions to transversions (kappa).

.. doctest::
    
    >>> tree = LoadTree(treestring='(((NineBande:0.0128202449453,Mouse:0.184732725695):0.0289459522137,DogFaced:0.0456427810916):0.0271363715538,Human:0.0341320714654,HowlerMon:0.0188456837006)root;')
    >>> params={'kappa': 4.0}
    >>> aln, tree = TreeAlign(HKY85(), seqs, tree=tree, param_vals=params)
    >>> aln
    5 x 60 text alignment: NineBande[-C-----GCCA...], Mouse[GCAGTGAGCCA...], DogFaced[GCAAGGAGCCA...], ...

For codons
^^^^^^^^^^

We load a canned codon substitution model and use a pre-defined tree and parameter estimates.

.. doctest::
    
    >>> from cogent.evolve.models import MG94HKY
    >>> tree = LoadTree(treestring='((NineBande:0.0575781680031,Mouse:0.594704139406):0.078919659556,DogFaced:0.142151930069,(HowlerMon:0.0619991555435,Human:0.10343006422):0.0792423439112)')
    >>> params={'kappa': 4.0, 'omega': 1.3}
    >>> aln, tree = TreeAlign(MG94HKY(), seqs, tree=tree, param_vals=params)
    >>> aln
    5 x 60 text alignment: NineBande[------CGCCA...], Mouse[GCAGTGAGCCA...], DogFaced[GCAAGGAGCCA...], ...

Building alignments with 3rd-party apps such as muscle or clustalw
==================================================================

See :ref:`alignment-controllers`.

Converting gaps from aa-seq alignment to nuc seq alignment
==========================================================

We load some unaligned DNA sequences and show their translation.

.. doctest::
    
    >>> from cogent import LoadSeqs, DNA, PROTEIN
    >>> seqs = [('hum', 'AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT'),
    ...         ('mus', 'AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG'),
    ...         ('rat', 'CTGAACAAGCAGCCACTTTCAAACAAGAAA')]
    >>> unaligned_DNA = LoadSeqs(data=seqs, moltype = DNA, aligned = False)
    >>> print unaligned_DNA.toFasta()
    >hum
    AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT
    >mus
    AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG
    >rat
    CTGAACAAGCAGCCACTTTCAAACAAGAAA
    >>> print unaligned_DNA.getTranslation()
    >hum
    KQIQESSENGSLAARQERQAQVNLT
    >mus
    KQIQESGESGSLAARQERQAQVNLT
    >rat
    LNKQPLSNKK
    <BLANKLINE>

We load an alignment of these protein sequences.

.. doctest::
    
    >>> aligned_aa_seqs = [('hum', 'KQIQESSENGSLAARQERQAQVNLT'),
    ...                    ('mus', 'KQIQESGESGSLAARQERQAQVNLT'),
    ...                    ('rat', 'LNKQ------PLS---------NKK')]
    >>> aligned_aa = LoadSeqs(data = aligned_aa_seqs, moltype = PROTEIN)

We then obtain an alignment of the DNA sequences from the alignment of their translation.

.. doctest::
    
    >>> aligned_DNA = aligned_aa.replaceSeqs(unaligned_DNA)
    >>> print aligned_DNA
    >hum
    AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT
    >mus
    AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG
    >rat
    CTGAACAAGCAG------------------CCACTTTCA---------------------------AACAAGAAA
    <BLANKLINE>