File: FM_index_R.Rd

package info (click to toggle)
r-cran-dendextend 1.9.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 2,880 kB
  • sloc: sh: 13; makefile: 2
file content (94 lines) | stat: -rw-r--r-- 3,480 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bk_method.R
\name{FM_index_R}
\alias{FM_index_R}
\title{Calculating Fowlkes-Mallows index in R}
\usage{
FM_index_R(A1_clusters, A2_clusters, assume_sorted_vectors = FALSE,
  warn = dendextend_options("warn"), ...)
}
\arguments{
\item{A1_clusters}{a numeric vector of cluster grouping (numeric) of items,
with a name attribute of item name for each element from group A1.
These are often obtained by using some k cut on a dendrogram.}

\item{A2_clusters}{a numeric vector of cluster grouping (numeric) of items,
with a name attribute of item name for each element from group A2.
These are often obtained by using some k cut on a dendrogram.}

\item{assume_sorted_vectors}{logical (FALSE). Can we assume to two group 
vectors are sorter so that they have the same order of items?
IF FALSE (default), then the vectors will be sorted based on their
name attribute.}

\item{warn}{logical (default from dendextend_options("warn") is FALSE).
Set if warning are to be issued, it is safer to keep this at TRUE,
but for keeping the noise down, the default is FALSE.}

\item{...}{Ignored.}
}
\value{
The Fowlkes-Mallows index between two vectors of clustering groups.

Includes the attributes E_FM and V_FM for the relevant expectancy and
variance under the null hypothesis of no-relation.
}
\description{
Calculating Fowlkes-Mallows index.

As opposed to the \code{\link{FM_index_profdpm}} function, the \code{FM_index_R}
function also calculates the expectancy and variance of the FM Index
under the null hypothesis of no relation.
}
\details{
From Wikipedia:

Fowlkes-Mallows index (see references) is an external evaluation method 
that is used to determine the similarity between two clusterings
(clusters obtained after a clustering algorithm). This measure of similarity
could be either between two hierarchical clusterings or a clustering and
a benchmark classification. A higher the value for the Fowlkes-Mallows index
indicates a greater similarity between the clusters and the benchmark 
classifications.
}
\examples{

\dontrun{

set.seed(23235)
ss <- TRUE # sample(1:150, 10 )
hc1 <- hclust(dist(iris[ss,-5]), "com")
hc2 <- hclust(dist(iris[ss,-5]), "single")
# dend1 <- as.dendrogram(hc1)
# dend2 <- as.dendrogram(hc2)
#    cutree(dend1)   

FM_index_R(cutree(hc1, k=3), cutree(hc1, k=3)) # 1
set.seed(1341)
FM_index_R(cutree(hc1, k=3), sample(cutree(hc1, k=3)), assume_sorted_vectors =TRUE) # 0.38037
FM_index_R(cutree(hc1, k=3), sample(cutree(hc1, k=3)), assume_sorted_vectors =FALSE) # 1 again :)
FM_index_R(cutree(hc1, k=3), cutree(hc2, k=3)) # 0.8059
FM_index_R(cutree(hc1, k=30), cutree(hc2, k=30)) # 0.4529

fo <- function(k) FM_index_R(cutree(hc1, k), cutree(hc2, k)) 
lapply(1:4, fo)
ks <- 1:150
plot(sapply(ks, fo)~ ks, type = "b", main = "Bk plot for the iris dataset")

clu_1 <- cutree(hc2, k = 100) # this is a lie - since this one is NOT well defined!
clu_2 <- cutree(as.dendrogram(hc2), k = 100) # We see that we get a vector of NAs for this...

FM_index_R(clu_1, clu_2) # NA

}
}
\references{
Fowlkes, E. B.; Mallows, C. L. (1 September 1983).
"A Method for Comparing Two Hierarchical Clusterings".
Journal of the American Statistical Association 78 (383): 553.

\url{http://en.wikipedia.org/wiki/Fowlkes-Mallows_index}
}
\seealso{
\link{cor_bakers_gamma}, \code{\link{FM_index_profdpm}}
}