File: unifrac.rst

package info (click to toggle)
python-cogent 1.4.1-1.2
  • links: PTS, VCS
  • area: non-free
  • in suites: squeeze
  • size: 13,260 kB
  • ctags: 20,087
  • sloc: python: 116,163; ansic: 732; makefile: 74; sh: 9
file content (60 lines) | stat: -rw-r--r-- 1,617 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Run a Fast Unifrac community analysis
=====================================

.. sectionauthor:: Justin Kuczynski

Below is a simple example of using the fast unifrac function.

first we import some tools

.. doctest::

    >>> from cogent.parse.tree import DndParser
    >>> from cogent.maths.unifrac.fast_unifrac import fast_unifrac
    >>> from cogent.maths.unifrac.fast_tree import UniFracTreeNode

then we make a small example tree with tips B, C, D representing the relationship
between species B, C, and D

.. doctest::

    >>> tree_str = "(B:0.2,(C:0.3,D:0.4)E:0.6)F;"
    >>> tr = DndParser(tree_str, UniFracTreeNode)
    >>> print tr.asciiArt() # doctest: +SKIP
              /-B
    -F-------|
             |          /-C
              \E-------|
                        \-D

here's what the sample (rows) by sequence (cols) abundance matrix looks like::

    ...    [10,11,0]
    ...    [2,0,9]
    ...    [2,2,2]

and here it is in dict format for unifrac

.. doctest::

    >>> envs = {'B':{'sample1':10, 'sample2':2, 'sample3':2},
    ...        'C':{'sample1':11,'sample2':0, 'sample3':2},
    ...        'D':{'sample1':0, 'sample2':9, 'sample3':2}
    ...        }
    

now we run unifrac::


    
    >>> res = fast_unifrac(tr, envs)
    >>> print res['distance_matrix'] # doctest: +SKIP
    
    (array([[ 0.        ,  0.46666667,  0.26666667],
           [ 0.46666667,  0.        ,  0.2       ],
           [ 0.26666667,  0.2       ,  0.        ]]),
           ['sample1', 'sample2', 'sample3'])
    

the pcoa results are misleading for such a small dataset, but the distance
matrix is accurate