File: Bk.Rd

package info (click to toggle)
r-cran-dendextend 1.14.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 2,888 kB
  • sloc: sh: 13; makefile: 2
file content (95 lines) | stat: -rw-r--r-- 2,862 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bk_method.R
\name{Bk}
\alias{Bk}
\title{Bk - Calculating Fowlkes-Mallows Index for two dendrogram}
\usage{
Bk(tree1, tree2, k, warn = dendextend_options("warn"), ...)
}
\arguments{
\item{tree1}{a dendrogram/hclust/phylo object.}

\item{tree2}{a dendrogram/hclust/phylo object.}

\item{k}{an integer scalar or vector with the desired number
of cluster groups.
If missing - the Bk will be calculated for a default k range of
2:(nleaves-1).
No point in checking k=1/k=n, since both will give Bk=1.}

\item{warn}{logical (default from dendextend_options("warn") is FALSE).
Set if warning are to be issued, it is safer to keep this at TRUE,
but for keeping the noise down, the default is FALSE.}

\item{...}{Ignored (passed to FM_index_R).}
}
\value{
A list (of k's length) of Fowlkes-Mallows index between two dendrogram for
a scalar/vector of k values.
The names of the lists' items is the k for which it was calculated.
}
\description{
Bk is the calculation of Fowlkes-Mallows index for a series of k cuts
for two dendrograms.
}
\details{
From Wikipedia:

Fowlkes-Mallows index (see references) is an external evaluation method
that is used to determine the similarity between two clusterings
(clusters obtained after a clustering algorithm). This measure of similarity
could be either between two hierarchical clusterings or a clustering and
a benchmark classification. A higher the value for the Fowlkes-Mallows index
indicates a greater similarity between the clusters and the benchmark
classifications.
}
\examples{

\dontrun{

set.seed(23235)
ss <- TRUE # sample(1:150, 10 )
hc1 <- hclust(dist(iris[ss, -5]), "com")
hc2 <- hclust(dist(iris[ss, -5]), "single")
tree1 <- as.dendrogram(hc1)
tree2 <- as.dendrogram(hc2)
#    cutree(tree1)

Bk(hc1, hc2, k = 3)
Bk(hc1, hc2, k = 2:10)
Bk(hc1, hc2)

Bk(tree1, tree2, k = 3)
Bk(tree1, tree2, k = 2:5)

system.time(Bk(hc1, hc2, k = 2:5)) # 0.01
system.time(Bk(hc1, hc2)) # 1.28
system.time(Bk(tree1, tree2, k = 2:5)) # 0.24 # after fixes.
system.time(Bk(tree1, tree2, k = 2:10)) # 0.31 # after fixes.
system.time(Bk(tree1, tree2)) # 7.85
Bk(tree1, tree2, k = 99:101)

y <- Bk(hc1, hc2, k = 2:10)
plot(unlist(y) ~ c(2:10), type = "b", ylim = c(0, 1))

# can take a few seconds
y <- Bk(hc1, hc2)
plot(unlist(y) ~ as.numeric(names(y)),
  main = "Bk plot", pch = 20,
  xlab = "k", ylab = "FM Index",
  type = "b", ylim = c(0, 1)
)
# we are still missing some hypothesis testing here.
# for this we'll have the Bk_plot function.
}
}
\references{
Fowlkes, E. B.; Mallows, C. L. (1 September 1983).
"A Method for Comparing Two Hierarchical Clusterings".
Journal of the American Statistical Association 78 (383): 553.

\url{https://en.wikipedia.org/wiki/Fowlkes-Mallows_index}
}
\seealso{
\link{FM_index}, \link{cor_bakers_gamma}, \link{Bk_plot}
}