File: calculate_pairwise_distances.rst

package info (click to toggle)
python-cogent 1.4.1-1.2
  • links: PTS, VCS
  • area: non-free
  • in suites: squeeze
  • size: 13,260 kB
  • ctags: 20,087
  • sloc: python: 116,163; ansic: 732; makefile: 74; sh: 9
file content (88 lines) | stat: -rw-r--r-- 2,973 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
.. _calculating-pairwise-distances:

Calculate pairwise distances between sequences
==============================================

.. sectionauthor:: Gavin Huttley

An example of how to calculate the pairwise distances for a set of sequences.

.. doctest::

    >>> from cogent import LoadSeqs
    >>> from cogent.phylo import distance

Import a substitution model (or create your own)

.. doctest::

    >>> from cogent.evolve.models import HKY85

Load my alignment

.. doctest::

    >>> al = LoadSeqs("data/long_testseqs.fasta")

Create a pairwise distances object with your alignment and substitution model

.. doctest::

    >>> d = distance.EstimateDistances(al, submodel= HKY85())

Printing ``d`` before execution shows its status.

.. doctest::

    >>> print d
    =========================================================================
    Seq1 \ Seq2       Human    HowlerMon       Mouse    NineBande    DogFaced
    -------------------------------------------------------------------------
          Human           *     Not Done    Not Done     Not Done    Not Done
      HowlerMon    Not Done            *    Not Done     Not Done    Not Done
          Mouse    Not Done     Not Done           *     Not Done    Not Done
      NineBande    Not Done     Not Done    Not Done            *    Not Done
       DogFaced    Not Done     Not Done    Not Done     Not Done           *
    -------------------------------------------------------------------------

Which in this case is to simply indicate nothing has been done.

.. doctest::

    >>> d.run(show_progress=False)
    >>> print d
    =====================================================================
    Seq1 \ Seq2     Human    HowlerMon     Mouse    NineBande    DogFaced
    ---------------------------------------------------------------------
          Human         *       0.0730    0.3363       0.1804      0.1972
      HowlerMon    0.0730            *    0.3487       0.1865      0.2078
          Mouse    0.3363       0.3487         *       0.3813      0.4022
      NineBande    0.1804       0.1865    0.3813            *      0.2019
       DogFaced    0.1972       0.2078    0.4022       0.2019           *
    ---------------------------------------------------------------------

Note that pairwise distances can be distributed for computation across multiple CPU's. In this case, when statistics (like distances) are requested only the master CPU returns data.

We'll write a phylip formatted distance matrix.

.. doctest::

    >>> d.writeToFile('dists_for_phylo.phylip', format="phylip")

We'll also save the distances to file in Python's pickle format.

.. doctest::

    >>> import cPickle
    >>> f = open('dists_for_phylo.pickle', "w")
    >>> cPickle.dump(d.getPairwiseDistances(), f)
    >>> f.close()

.. clean up

.. doctest::
    :hide:
    
    >>> import os
    >>> for file_name in 'dists_for_phylo.phylip', 'dists_for_phylo.pickle':
    ...     os.remove(file_name)