File: __init__.py

package info (click to toggle)
python-skbio 0.5.1-2
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 16,556 kB
  • ctags: 7,222
  • sloc: python: 42,085; ansic: 670; makefile: 180; sh: 10
file content (195 lines) | stat: -rw-r--r-- 5,879 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
r"""
Tree representations (:mod:`skbio.tree`)
========================================

.. currentmodule:: skbio.tree

This module provides functionality for working with trees, including
phylogenetic trees and hierarchies. Functionality is provided for constructing
trees, for traversing in multiple ways, comparisons, fetching subtrees, and
more. This module supports trees that are multifurcating and nodes that have
single descendants.

Classes
-------

.. autosummary::
   :toctree: generated/

    TreeNode

Phylogenetic Reconstruction
---------------------------

.. autosummary::
   :toctree: generated/

    nj

Utility Functions
-----------------

.. autosummary::
   :toctree: generated/

    majority_rule

Exceptions
----------

.. autosummary::
   :toctree: generated/

   TreeError
   NoLengthError
   DuplicateNodeError
   MissingNodeError
   NoParentError

Examples
--------

>>> from skbio import TreeNode
>>> from io import StringIO

A new tree can be constructed from a Newick string. Newick is a common format
used to represent tree objects within a file. Newick was part of the original
PHYLIP package from Joseph Felsenstein's group (defined `here
<http://goo.gl/fIY1Iq>`_), and is based around representing nesting with
parentheses. For instance, the following string describes a 3 taxon tree, with
one internal node:

    ((A, B)C, D)root;

Where A, B, and D are tips of the tree, and C is an internal node that covers
tips A and B.

Now let's construct a simple tree and dump an ASCII representation:

>>> tree = TreeNode.read(StringIO("((A, B)C, D)root;"))
>>> print(tree.is_root()) # is this the root of the tree?
True
>>> print(tree.is_tip()) # is this node a tip?
False
>>> print(tree.ascii_art())
                    /-A
          /C-------|
-root----|          \-B
         |
          \-D

There are a few common ways to traverse a tree, and depending on your use,
some methods are more appropriate than others. Wikipedia has a well written
page on tree `traversal methods <http://goo.gl/K4Ufl>`_, and will go into
further depth than what we'll cover here. We're only going to cover two of the
commonly used traversals here, preorder and postorder, but we will show
examples of two other common helper traversal methods to gather tips or
internal nodes.

The first traversal we'll cover is a preorder traversal in which you evaluate
from root to tips, looking at the left most child first. For instance:

>>> for node in tree.preorder():
...    print(node.name)
root
C
A
B
D

The next method we'll look at is a postorder traveral which will evaluate the
left subtree tips first before walking back up the tree:

>>> for node in tree.postorder():
...    print(node.name)
A
B
C
D
root

`TreeNode` provides two helper methods as well for iterating over just the tips
or for iterating over just the internal nodes.

>>> for node in tree.tips():
...    print("Node name: %s, Is a tip: %s" % (node.name, node.is_tip()))
Node name: A, Is a tip: True
Node name: B, Is a tip: True
Node name: D, Is a tip: True

>>> for node in tree.non_tips():
...    print("Node name: %s, Is a tip: %s" % (node.name, node.is_tip()))
Node name: C, Is a tip: False

Note, by default, `non_tips` will ignore `self` (which is the root in this
case).  You can pass the `include_self` flag to `non_tips` if you wish to
include `self`.

The `TreeNode` provides a few ways to compare trees. First, let's create two
similar trees and compare their topologies using `compare_subsets`. This
distance is the fraction of common clades present in the two trees, where a
distance of 0 means the trees contain identical clades, and a distance of 1
indicates the trees do not share any common clades:

>>> tree1 = TreeNode.read(StringIO("((A, B)C, (D, E)F, (G, H)I)root;"))
>>> tree2 = TreeNode.read(StringIO("((G, H)C, (D, E)F, (B, A)I)root;"))
>>> tree3 = TreeNode.read(StringIO("((D, B)C, (A, E)F, (G, H)I)root;"))
>>> print(tree1.compare_subsets(tree1))  # identity case
0.0
>>> print(tree1.compare_subsets(tree2))  # same tree but different clade order
0.0
>>> print(tree1.compare_subsets(tree3))  # only 1 of 3 common subsets
0.6666666666666667

We can additionally take into account branch length when computing distances
between trees. First, we're going to construct two new trees with described
branch length, note the difference in the Newick strings:

>>> tree1 = \
...     TreeNode.read(StringIO("((A:0.1, B:0.2)C:0.3, D:0.4, E:0.5)root;"))
>>> tree2 = \
...     TreeNode.read(StringIO("((A:0.4, B:0.8)C:0.3, D:0.1, E:0.5)root;"))

In these two trees, we've added on a description of length from the node to
its parent, so for instance:

>>> for node in tree1.postorder():
...     print(node.name, node.length)
A 0.1
B 0.2
C 0.3
D 0.4
E 0.5
root None

Now let's compare two trees using the distances computed pairwise between tips
in the trees. The distance computed, by default, is the correlation of all
pairwise tip-to-tip distances between trees:

>>> print(tree1.compare_tip_distances(tree1))  # identity case
0.0
>>> print(tree1.compare_tip_distances(tree2))
0.120492524415

"""

# ----------------------------------------------------------------------------
# Copyright (c) 2013--, scikit-bio development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------

from skbio.util import TestRunner

from ._tree import TreeNode
from ._nj import nj
from ._majority_rule import majority_rule
from ._exception import (TreeError, NoLengthError, DuplicateNodeError,
                         MissingNodeError, NoParentError)

__all__ = ['TreeNode', 'nj', 'majority_rule', 'TreeError', 'NoLengthError',
           'DuplicateNodeError', 'MissingNodeError', 'NoParentError']

test = TestRunner(__file__).test