File: building_phylogenies.rst

package info (click to toggle)
python-cogent 2024.5.7a1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 74,600 kB
  • sloc: python: 92,479; makefile: 117; sh: 16
file content (99 lines) | stat: -rw-r--r-- 3,675 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
.. jupyter-execute::
    :hide-code:

    import set_working_directory

********************
Building phylogenies
********************

Building A Phylogenetic Tree From Pairwise Distances
====================================================

Directly via ``alignment.quick_tree()``
=======================================

Both the ``ArrayAlignment`` and ``Alignment`` classes support this.

.. jupyter-execute::

    from cogent3 import load_aligned_seqs

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    tree = aln.quick_tree(calc="TN93", show_progress=False)
    tree = tree.balanced()  # purely for display
    print(tree.ascii_art())

The ``quick_tree()`` method also supports non-parametric bootstrapping. The number of resampled alignments is specified using the ``bootstrap`` argument. In the following, trees are estimated from 100 resampled alignments and merged into a single consensus topology using a weighted consensus tree algorithm.

.. jupyter-execute::

    tree = aln.quick_tree(calc="TN93", bootstrap=100, show_progress=False)

Using the ``DistanceMatrix`` object
-----------------------------------

.. jupyter-execute::

    from cogent3 import load_aligned_seqs

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    dists = aln.distance_matrix(calc="TN93")
    tree = dists.quick_tree(show_progress=False)
    tree = tree.balanced()  # purely for display
    print(tree.ascii_art())

Explicitly via ``DistanceMatrix`` and ``cogent3.phylo.nj.nj()```
----------------------------------------------------------------

.. jupyter-execute::

    from cogent3 import load_aligned_seqs
    from cogent3.phylo import nj

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    dists = aln.distance_matrix(calc="TN93")
    tree = nj.nj(dists, show_progress=False)
    tree = tree.balanced()  # purely for display
    print(tree.ascii_art())

Directly from a pairwise distance ``dict``
------------------------------------------

.. jupyter-execute::

    from cogent3.phylo import nj

    dists = {("a", "b"): 2.7, ("c", "b"): 2.33, ("c", "a"): 0.73}
    tree = nj.nj(dists, show_progress=False)
    print(tree.ascii_art())

By Least-squares
================

We illustrate the phylogeny reconstruction by least-squares using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space. Here ``a`` is the number of taxa to exhaustively evaluate all possible phylogenies for. Successive taxa are added to the top ``k`` trees (measured by the least-squares metric) and ``k`` trees are kept at each iteration.

.. jupyter-execute::

    from cogent3.phylo.least_squares import WLS
    from cogent3.util.deserialise import deserialise_object

    dists = deserialise_object("data/dists_for_phylo.json")
    ls = WLS(dists)
    stat, tree = ls.trex(a=5, k=5, show_progress=False)

Other optional arguments that can be passed to the ``trex`` method are: ``return_all``, whether the ``k`` best trees at the final step are returned as a ``ScoredTreeCollection`` object; ``order``, a series of tip names whose order defines the sequence in which tips will be added during tree building (this allows the user to randomise the input order).

By ML
=====

We illustrate the phylogeny reconstruction using maximum-likelihood using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space.

.. jupyter-execute::

    from cogent3 import load_aligned_seqs
    from cogent3.evolve.models import F81
    from cogent3.phylo.maximum_likelihood import ML

    aln = load_aligned_seqs("data/primate_brca1.fasta")
    ml = ML(F81(), aln)