File: specc.Rd

package info (click to toggle)
r-cran-kernlab 0.9-33-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,216 kB
  • sloc: cpp: 6,011; ansic: 935; makefile: 2
file content (153 lines) | stat: -rw-r--r-- 6,283 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
\name{specc}
\alias{specc}
\alias{specc,matrix-method}
\alias{specc,formula-method}
\alias{specc,list-method}
\alias{specc,kernelMatrix-method}
\alias{show,specc-method}
\title{Spectral Clustering}
\description{
A spectral clustering algorithm. Clustering is performed by
embedding the data into the subspace of the eigenvectors 
of an affinity matrix.
}
\usage{
\S4method{specc}{formula}(x, data = NULL, na.action = na.omit, ...)

\S4method{specc}{matrix}(x, centers,
      kernel = "rbfdot", kpar = "automatic", 
      nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6,
      iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)

\S4method{specc}{kernelMatrix}(x, centers, nystrom.red = FALSE, iterations = 200, ...)

\S4method{specc}{list}(x, centers,
      kernel = "stringdot", kpar = list(length=4, lambda=0.5),
      nystrom.red = FALSE, nystrom.sample = length(x)/6,
      iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)
}

\arguments{
 \item{x}{the matrix of data to be clustered, or a symbolic
    description of the model to be fit, or a kernel Matrix of class
    \code{kernelMatrix}, or a list of character vectors.}

  \item{data}{an optional data frame containing the variables in the model.
    By default the variables are taken from the environment which
    `specc' is called from.}

\item{centers}{Either the number of clusters or a set of initial cluster
    centers. If the first, a random set of rows in the eigenvectors
    matrix are chosen as the initial centers.}

\item{kernel}{the kernel function used in computing the affinity matrix. 
    This parameter can be set to any function, of class kernel, which computes a dot product between two
    vector arguments. kernlab provides the most popular kernel functions
    which can be used by setting the kernel parameter to the following
    strings:
    \itemize{
      \item \code{rbfdot} Radial Basis kernel function "Gaussian"
      \item \code{polydot} Polynomial kernel function
      \item \code{vanilladot} Linear kernel function
      \item \code{tanhdot} Hyperbolic tangent kernel function
      \item \code{laplacedot} Laplacian kernel function
      \item \code{besseldot} Bessel kernel function
      \item \code{anovadot} ANOVA RBF kernel function
      \item \code{splinedot} Spline kernel 
      \item \code{stringdot} String kernel
    }
    The kernel parameter can also be set to a user defined function of
    class kernel by passing the function name as an argument.
  }

  \item{kpar}{a character string or the list of hyper-parameters (kernel parameters).
    The default character string \code{"automatic"} uses a heuristic to determine a
    suitable value for the width parameter of the RBF kernel.
    The second option \code{"local"} (local scaling) uses a more advanced heuristic
     and sets a width parameter for every point in the data set. This is
    particularly useful when the data incorporates multiple scales.
    A list can also be used containing the parameters to be used with the
    kernel function. Valid parameters for existing kernels are :
    \itemize{
      \item \code{sigma} inverse kernel width for the Radial Basis
      kernel function "rbfdot" and the Laplacian kernel "laplacedot".
      \item \code{degree, scale, offset} for the Polynomial kernel "polydot"
      \item \code{scale, offset} for the Hyperbolic tangent kernel
      function "tanhdot"
      \item \code{sigma, order, degree} for the Bessel kernel "besseldot". 
      \item \code{sigma, degree} for the ANOVA kernel "anovadot".
      \item \code{length, lambda, normalized} for the "stringdot" kernel
       where length is the length of the strings considered, lambda the
       decay factor and normalized a logical parameter determining if the
       kernel evaluations should be normalized.
    }
    
    Hyper-parameters for user defined kernels can be passed through the
    kpar parameter as well.}

  \item{nystrom.red}{use nystrom method to calculate eigenvectors. When
    \code{TRUE} a sample of the dataset is used to calculate the
    eigenvalues, thus only a \eqn{n x m} matrix where \eqn{n} the sample size
    is stored in memory (default: \code{FALSE}}

  \item{nystrom.sample}{number of data points to use for estimating the
    eigenvalues when using the nystrom method. (default : dim(x)[1]/6)}

  \item{mod.sample}{proportion of data to use when estimating sigma (default: 0.75)}	

  \item{iterations}{the maximum number of iterations allowed. }

  \item{na.action}{the action to perform on NA}

  \item{\dots}{additional parameters}
    
}
\details{
  Spectral clustering  works by embedding the data points of the
  partitioning problem into the
  subspace of the \eqn{k} largest eigenvectors of a normalized affinity/kernel matrix.
Using a simple clustering method like \code{kmeans} on the embedded points usually
leads to good performance. It can be shown that spectral clustering methods boil down to 
 graph partitioning.\cr
The data can be passed to the \code{specc} function in a \code{matrix} or a
\code{data.frame}, in addition \code{specc} also supports input in the form of a
kernel matrix of class \code{kernelMatrix} or as a list of character
vectors where a string kernel has to be used.}
\value{
 An S4 object of class \code{specc} which extends the class \code{vector}
 containing integers indicating the cluster to which
 each point is allocated. The following slots contain useful information
 
  \item{centers}{A matrix of cluster centers.}
  \item{size}{The number of point in each cluster}
  \item{withinss}{The within-cluster sum of squares for each cluster}
  \item{kernelf}{The kernel function used}
}
\references{
  Andrew Y. Ng, Michael I. Jordan, Yair Weiss\cr
  \emph{On Spectral Clustering: Analysis and an Algorithm}\cr
  Neural Information Processing Symposium 2001\cr
  \url{https://papers.neurips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf}

}
\author{Alexandros Karatzoglou \cr \email{alexandros.karatzoglou@ci.tuwien.ac.at}
}


\seealso{\code{\link{kkmeans}}, \code{\link{kpca}}, \code{\link{kcca}} }
\examples{
## Cluster the spirals data set.
data(spirals)

sc <- specc(spirals, centers=2)

sc
centers(sc)
size(sc)
withinss(sc)

plot(spirals, col=sc)

}
\keyword{cluster}