File: calc_genetic_distance.rst

package info (click to toggle)
python-cogent 2024.5.7a1%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 74,600 kB
  • sloc: python: 92,479; makefile: 117; sh: 16
file content (120 lines) | stat: -rw-r--r-- 3,736 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
.. jupyter-execute::
    :hide-code:

    import set_working_directory

****************************
Genetic distance calculation
****************************

Fast pairwise distance estimation
=================================

For a limited number of evolutionary models a fast implementation is
available.

.. jupyter-execute::

    from cogent3 import available_distances

    available_distances()

Computing genetic distances using the ``Alignment`` object
==========================================================

Abbreviations listed from ``available_distances()`` can be used as values for the ``distance_matrix(calc=<abbreviation>)``.

.. jupyter-execute::

    from cogent3 import load_aligned_seqs

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    dists = aln.distance_matrix(calc="tn93", show_progress=False)
    dists

Using the distance calculator directly
======================================

.. jupyter-execute::

    from cogent3 import get_distance_calculator, load_aligned_seqs

    aln = load_aligned_seqs("data/primate_brca1.fasta")
    dist_calc = get_distance_calculator("tn93", alignment=aln)
    dist_calc

.. jupyter-execute::

    dist_calc.run(show_progress=False)
    dists = dist_calc.get_pairwise_distances()
    dists

The distance calculation object can provide more information. For instance, the standard errors.

.. jupyter-execute::

    dist_calc.stderr

Likelihood based pairwise distance estimation
=============================================

The standard ``cogent3`` likelihood function can also be used to estimate distances. Because these require numerical optimisation they can be significantly slower than the fast estimation approach above.

The following will use the F81 nucleotide substitution model and perform numerical optimisation.

.. jupyter-execute::

    from cogent3 import get_model, load_aligned_seqs
    from cogent3.evolve import distance

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    d = distance.EstimateDistances(aln, submodel=get_model("F81"))
    d.run(show_progress=False)
    dists = d.get_pairwise_distances()
    dists

*****************************************************
Get the names of sequences with max pairwise distance
*****************************************************

Given a ``DistanceMatrix`` object, finding the sequences that have the maximum pairwise distance is achieved through the ``max_pair`` method. 

.. jupyter-execute::
    :raises:
    
    from cogent3 import load_aligned_seqs

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    dists = aln.distance_matrix(calc="tn93", show_progress=False)
    dists.max_pair()
    
To find the maximum distance, index the ``DistanceMatrix`` with the result of ``max_pair``.

.. jupyter-execute::
    :raises:
    
    dists[dists.max_pair()]

*****************************************************
Get the names of sequences with min pairwise distance
*****************************************************

Given a ``DistanceMatrix`` object, finding the sequences that have the minimum pairwise distance is achieved through the ``min_pair`` method. 

.. note:: As the distance between a sequence and itself is zero, and this is not informative, ``min_pair`` will return the smallest distance not on the diagonal.

.. jupyter-execute::
    :raises:
    
    from cogent3 import load_aligned_seqs

    aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
    dists = aln.distance_matrix(calc="tn93", show_progress=False)
    dists.min_pair()

To find the minimum distance, index the ``DistanceMatrix`` with the result of ``min_pair``.

.. jupyter-execute::
    :raises:

    dists[dists.min_pair()]