File: building_alignments.rst

package info (click to toggle)
python-cogent 2024.5.7a1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 74,600 kB
  • sloc: python: 92,479; makefile: 117; sh: 16
file content (110 lines) | stat: -rw-r--r-- 3,534 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
.. jupyter-execute::
    :hide-code:

    import set_working_directory

*******************
Building alignments
*******************

.. authors, Gavin Huttley, Kristian Rother, Patrick Yannul

Using a ``cogent3`` progressive aligner for nucleotides
=======================================================

We load a canned nucleotide substitution model and the progressive aligner ``tree_align`` function.

.. jupyter-execute::

    from cogent3 import load_unaligned_seqs, make_tree
    from cogent3.align.progressive import tree_align

We first align without providing a guide tree. The ``tree_align`` algorithm builds pairwise alignments and estimates the substitution model parameters and pairwise distances. The distances are used to build a neighbour joining tree and the median value of substitution model parameters are provided to the substitution model for the progressive alignment step.

.. jupyter-execute::

    seqs = load_unaligned_seqs("data/test2.fasta", moltype="dna")
    aln, tree = tree_align("HKY85", seqs, show_progress=False)
    aln

We then align using a guide tree (pre-estimated) and specifying the ratio of transitions to transversions (kappa).

.. jupyter-execute::

    tree = make_tree(
        "(((NineBande:0.013,Mouse:0.185):0.023,DogFaced:0.046):0.027,Human:0.034,HowlerMon:0.019)"
    )
    params = {"kappa": 4.0}
    aln, tree = tree_align(
        "HKY85", seqs, tree=tree, param_vals=params, show_progress=False
    )
    aln

Using a ``cogent3`` progressive aligner for codons
==================================================

We load a canned codon substitution model and use a pre-defined tree and parameter estimates.

.. jupyter-execute::

    from cogent3 import load_unaligned_seqs, make_tree
    from cogent3.align.progressive import tree_align

    seqs = load_unaligned_seqs("data/test2.fasta", moltype="dna")
    tree = make_tree(
        "((NineBande:0.058,Mouse:0.595):0.079,DogFaced:0.142,(HowlerMon:0.062,Human:0.103):0.079)"
    )
    params = {"kappa": 4.0, "omega": 1.3}
    aln, tree = tree_align(
        "MG94HKY", seqs, tree=tree, param_vals=params, show_progress=False
    )
    aln

Converting gaps from aa-seq alignment to nuc seq alignment
==========================================================

We load some unaligned DNA sequences and show their translation.

.. jupyter-execute::

    from cogent3 import make_unaligned_seqs
    from cogent3.align.progressive import tree_align
    from cogent3.evolve.models import get_model

    seqs = [
        (
            "hum",
            "AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT",
        ),
        (
            "mus",
            "AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG",
        ),
        ("rat", "CTGAACAAGCAGCCACTTTCAAACAAGAAA"),
    ]
    unaligned_DNA = make_unaligned_seqs(seqs, moltype="dna")
    unaligned_DNA

.. jupyter-execute::

    unaligned_DNA.get_translation()

We load an alignment of these protein sequences.

.. jupyter-execute::

    from cogent3 import make_aligned_seqs

    aligned_aa_seqs = [
        ("hum", "KQIQESSENGSLAARQERQAQVNLT"),
        ("mus", "KQIQESGESGSLAARQERQAQVNLT"),
        ("rat", "LNKQ------PLS---------NKK"),
    ]
    aligned_aa = make_aligned_seqs(aligned_aa_seqs, moltype="protein")

We then obtain an alignment of the DNA sequences from the alignment of their translation.

.. jupyter-execute::

    aligned_DNA = aligned_aa.replace_seqs(unaligned_DNA, aa_to_codon=True)
    aligned_DNA