File: _upgma.py

package info (click to toggle)
python-skbio 0.6.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 11,924 kB
  • sloc: python: 67,527; ansic: 672; makefile: 225
file content (99 lines) | stat: -rw-r--r-- 3,075 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# ----------------------------------------------------------------------------
# Copyright (c) 2013--, scikit-bio development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE.txt, distributed with this software.
# ----------------------------------------------------------------------------

from scipy.cluster.hierarchy import linkage
from skbio.tree import TreeNode
from ._utils import _check_dm


def upgma(dm, weighted=False):
    r"""Perform unweighted pair group method with arithmetic mean (UPGMA) or its
    weighted variant (WPGMA) for phylogenetic reconstruction.

    Parameters
    ----------
    dm : skbio.DistanceMatrix
        The input distance matrix.
    weighted : bool, optional
        If True, WPGMA is performed instead of UPGMA. WPGMA is a variant of UPGMA
        which is unbiased towards the size of subtrees computed.

    Returns
    -------
    TreeNode
        A TreeNode object with estimated edge values.

    See Also
    --------
    nj

    Notes
    -----
    UPGMA (unweighted pair group method with arithmetic mean) is a simple hierarchical
    clustering method that iteratively groups proximal taxa or taxon groups to form a
    tree structure. A weighted variant is known as WPGMA, and both variants are due to
    Sokal and Michener [1]_.

    This function wraps SciPy's :func:`~scipy.cluster.hierarchy.linkage` function, with
    the ``method`` parameter set as "average" (UPGMA) or "weighted" (WPGMA). It takes a
    scikit-bio DistanceMatrix object and returns a scikit-bio TreeNode object.

    UPGMA creates a rooted and ultrametric tree -- all tips will have the same height
    (distance from the root node).

    References
    ----------
    .. [1] Sokal, R.R., & Michener, C.D. (1958). A statistical method for
       evaluating systematic relationships. University of Kansas science
       bulletin, 38, 1409-1438.

    Examples
    --------
    Define a distance matrix object for the taxa a, b, and c.

    >>> from skbio import DistanceMatrix

    >>> data = [[0, 1, 2],
    ...         [1, 0, 3],
    ...         [2, 3, 0]]
    >>> ids = list('abc')
    >>> dm = DistanceMatrix(data, ids)

    Construct a tree using UPGMA.

    >>> tree = upgma(dm)
    >>> print(tree.ascii_art())
              /-c
    ---------|
             |          /-a
              \--------|
                        \-b

    The tree also has estimated edge values assigned to each edge.

    >>> print(tree)
    (c:1.25,(a:0.5,b:0.5):0.75);
    <BLANKLINE>
    """
    _check_dm(dm)

    # If weighted is set to 'False', UPGMA is performed
    if weighted is False:
        linkage_matrix = linkage(
            dm.condensed_form(), method="average", metric="euclidean"
        )
    # Otherwise, WPGMA is performed
    else:
        linkage_matrix = linkage(
            dm.condensed_form(), method="weighted", metric="euclidean"
        )

    # Construct the TreeNode from the linkage matrix
    tree = TreeNode.from_linkage_matrix(linkage_matrix, dm.ids)

    return tree