File: cor_cophenetic.Rd

package info (click to toggle)
r-cran-dendextend 1.14.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 2,888 kB
  • sloc: sh: 13; makefile: 2
file content (135 lines) | stat: -rw-r--r-- 4,032 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cor_cophenetic.R
\name{cor_cophenetic}
\alias{cor_cophenetic}
\alias{cor_cophenetic.default}
\alias{cor_cophenetic.dendlist}
\title{Cophenetic correlation between two trees}
\usage{
cor_cophenetic(dend1, ...)

\method{cor_cophenetic}{default}(
  dend1,
  dend2,
  method_coef = c("pearson", "kendall", "spearman"),
  ...
)

\method{cor_cophenetic}{dendlist}(
  dend1,
  which = c(1L, 2L),
  method_coef = c("pearson", "kendall", "spearman"),
  ...
)
}
\arguments{
\item{dend1}{a tree (dendrogram/hclust/phylo, or dendlist)}

\item{...}{Ignored.}

\item{dend2}{Either a tree (dendrogram/hclust/phylo), or a \link{dist} object (for example, from the original data matrix).}

\item{method_coef}{a character string indicating which correlation coefficient
is to be computed. One of "pearson" (default), "kendall", or "spearman",
can be abbreviated. Passed to \link{cor}.}

\item{which}{an integer vector of length 2, indicating
which of the trees in a dendlist object should have
their cor_cophenetic calculated.}
}
\value{
The correlation between cophenetic
}
\description{
Cophenetic correlation coefficient for two trees.

Assumes the labels in the two trees fully match. If they do not
please first use \link{intersect_trees} to have them matched.
}
\details{
From \link{cophenetic}:
The cophenetic distance between two observations that have been clustered
is defined to be the intergroup dissimilarity at which the two observations
are first combined into a single cluster. Note that this distance has many
ties and restrictions.

cor_cophenetic calculates the correlation between two cophenetic distance
matrices of the two trees.

The value can range between -1 to 1. With near 0 values meaning that
the two trees are not statistically similar.
For exact p-value one should result to a permutation test. One such option
will be to permute over the labels of one tree many times, and calculating
the distriubtion under the null hypothesis (keeping the trees topologies
constant).

Notice that this measure IS affected by the height of a branch.
}
\examples{

\dontrun{

set.seed(23235)
ss <- sample(1:150, 10)
hc1 <- iris[ss, -5] \%>\%
  dist() \%>\%
  hclust("com")
hc2 <- iris[ss, -5] \%>\%
  dist() \%>\%
  hclust("single")
dend1 <- as.dendrogram(hc1)
dend2 <- as.dendrogram(hc2)
#    cutree(dend1)

cophenetic(hc1)
cophenetic(hc2)
# notice how the dist matrix for the dendrograms have different orders:
cophenetic(dend1)
cophenetic(dend2)

cor(cophenetic(hc1), cophenetic(hc2)) # 0.874
cor(cophenetic(dend1), cophenetic(dend2)) # 0.16
# the difference is becasue the order of the distance table in the case of
# stats:::cophenetic.dendrogram will change between dendrograms!

# however, this is consistant (since I force-sort the rows/columns):
cor_cophenetic(hc1, hc2)
cor_cophenetic(dend1, dend2)

cor_cophenetic(dendlist(dend1, dend2))

# we can also use different cor methods (almost the same result though):
cor_cophenetic(hc1, hc2, method = "spearman") # 0.8456014
cor_cophenetic(dend1, dend2, method = "spearman") #


# cophenetic correlation is about 10 times (!) faster than bakers_gamma cor:
library(microbenchmark)
microbenchmark(
  cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE),
  cor_cophenetic = cor_cophenetic(dend1, dend2),
  times = 10
)

# but only because of the cutree for dendrogram. When allowing hclust cutree
# it is only about twice as fast:
microbenchmark(
  cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = TRUE),
  cor_cophenetic = cor_cophenetic(dend1, dend2),
  times = 10
)
}

}
\references{
Sokal, R. R. and F. J. Rohlf. 1962. The comparison of dendrograms by
objective methods. Taxon, 11:33-40

Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy: The Principles
and Practice of Numerical Classification, p. 278 ff; Freeman, San Francisco.

\url{https://en.wikipedia.org/wiki/Cophenetic_correlation}
}
\seealso{
\link{cophenetic}, \link{cor_bakers_gamma}
}