File: clusterGraph.Rnw

package info (click to toggle)
r-bioc-graph 1.60.0-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 2,076 kB
  • sloc: ansic: 842; makefile: 12; sh: 3
file content (130 lines) | stat: -rw-r--r-- 3,136 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
%
% NOTE -- ONLY EDIT clusterGraph.Rnw!!!
% clusterGraph.tex file will get overwritten.
%
%\VignetteIndexEntry{clusterGraph and distGraph}
%\VignetteDepends{graph, stats}
%\VignetteKeywords{Graph, clustering, machine learning}
%\VignettePackage{graph}
\documentclass{article}

\usepackage{hyperref}

\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}

\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}


\begin{document}
\title{How To use the clusterGraph and distGraph classes}

\maketitle

\section*{Introduction}

Graphs can be used to help explore different clustering
algorithms. While they have not been extensively used in this regard
that is probably due to the lack of software rather than for any other
reasons. As we demonstrate below, one can easily and naturally compare
different aspects of clustering algorithms using these tools.


\section*{clusterGraph}

A \textit{clusterGraph} is a graph defined on a set of nodes that have
been clustered or grouped in some fashion.
The grouping must form a partition of the nodes.
In this graph all nodes within the same cluster are adjacent while
there are no edges between clusters.

Thus, each cluster is a complete graph but there are no between
cluster edges.

<<clustering>>=
library("graph")
library("cluster")
data(ruspini)

pm <- pam(ruspini, 4)

cG <- new("clusterGraph",
     clusters = split(names(pm$clustering), pm$clustering))


nodes(cG)
@

We now have a graph that we could perform various operations on.
For example, we could try a second clustering algorithm on the same
data and see if the two largely agree.

<<kmeans>>=
library(stats)

km = kmeans(ruspini, 4)
cG.km = new("clusterGraph",
      clusters=split(as.character(1:75), km$cluster))

inBoth = intersection(cG.km, cG)
@

The graph \Robject{inBoth} is of length \Sexpr{length(inBoth)}
indicating that there are that many distinct groups.
One could, compute various measures of correspondence between the two
clustering algorithms using the graph representation.


\section*{distGraph}

We use this same data to consider some potential uses for the
\Rclass{distGraph} class. Others have considered a similar structure
for exploring clustering algorithms.

%%FIXME: track down the Butte et al and the Shamir references

<<>>=

d1 = dist(ruspini)
dG = new("distGraph", Dist=d1)

rl = NULL
j=1
for(i in c(40, 30, 10, 5) ){
  nG = threshold(dG, i)
  rl[[j]] = connComp(nG)
  j=j+1
}
@

We can then examine the components of \Robject{rl} to see how the
graph is being reduced.
<<howmany>>=

sapply(rl, length)

@

<<somecomps, echo=FALSE, results = hide>>=

 dr = range(d1)
 rl.lens = sapply(rl[[4]], length)

@
We see that when we remove all distances that are bigger than 5 units
(the range of distances was from \Sexpr{round(dr[1], 3)}
to \Sexpr{round(dr[2],3)}) there are still only \Sexpr{length(rl[[4]])}
connected components - one of which is of size
\Sexpr{max(rl.lens)}.

\end{document}