File: specc.Rd

package info (click to toggle)
r-cran-kernlab 0.9-33-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 2,216 kB
sloc: cpp: 6,011; ansic: 935; makefile: 2
file content (153 lines) | stat: -rw-r--r-- 6,283 bytes
parent folder | download | duplicates (2)
\name{specc}
\alias{specc}
\alias{specc,matrix-method}
\alias{specc,formula-method}
\alias{specc,list-method}
\alias{specc,kernelMatrix-method}
\alias{show,specc-method}
\title{Spectral Clustering}
\description{
A spectral clustering algorithm. Clustering is performed by
embedding the data into the subspace of the eigenvectors 
of an affinity matrix.
}
\usage{
\S4method{specc}{formula}(x, data = NULL, na.action = na.omit, ...)

\S4method{specc}{matrix}(x, centers,
      kernel = "rbfdot", kpar = "automatic", 
      nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6,
      iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)

\S4method{specc}{kernelMatrix}(x, centers, nystrom.red = FALSE, iterations = 200, ...)

\S4method{specc}{list}(x, centers,
      kernel = "stringdot", kpar = list(length=4, lambda=0.5),
      nystrom.red = FALSE, nystrom.sample = length(x)/6,
      iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)
}

\arguments{
 \item{x}{the matrix of data to be clustered, or a symbolic
    description of the model to be fit, or a kernel Matrix of class
    \code{kernelMatrix}, or a list of character vectors.}

  \item{data}{an optional data frame containing the variables in the model.
    By default the variables are taken from the environment which
    `specc' is called from.}

\item{centers}{Either the number of clusters or a set of initial cluster
    centers. If the first, a random set of rows in the eigenvectors
    matrix are chosen as the initial centers.}

\item{kernel}{the kernel function used in computing the affinity matrix. 
    This parameter can be set to any function, of class kernel, which computes a dot product between two
    vector arguments. kernlab provides the most popular kernel functions
    which can be used by setting the kernel parameter to the following
    strings:
    \itemize{
      \item \code{rbfdot} Radial Basis kernel function "Gaussian"
      \item \code{polydot} Polynomial kernel function
      \item \code{vanilladot} Linear kernel function
      \item \code{tanhdot} Hyperbolic tangent kernel function
      \item \code{laplacedot} Laplacian kernel function
      \item \code{besseldot} Bessel kernel function
      \item \code{anovadot} ANOVA RBF kernel function
      \item \code{splinedot} Spline kernel 
      \item \code{stringdot} String kernel
    }
    The kernel parameter can also be set to a user defined function of
    class kernel by passing the function name as an argument.
  }

  \item{kpar}{a character string or the list of hyper-parameters (kernel parameters).
    The default character string \code{"automatic"} uses a heuristic to determine a
    suitable value for the width parameter of the RBF kernel.
    The second option \code{"local"} (local scaling) uses a more advanced heuristic
     and sets a width parameter for every point in the data set. This is
    particularly useful when the data incorporates multiple scales.
    A list can also be used containing the parameters to be used with the
    kernel function. Valid parameters for existing kernels are :
    \itemize{
      \item \code{sigma} inverse kernel width for the Radial Basis
      kernel function "rbfdot" and the Laplacian kernel "laplacedot".
      \item \code{degree, scale, offset} for the Polynomial kernel "polydot"
      \item \code{scale, offset} for the Hyperbolic tangent kernel
      function "tanhdot"
      \item \code{sigma, order, degree} for the Bessel kernel "besseldot". 
      \item \code{sigma, degree} for the ANOVA kernel "anovadot".
      \item \code{length, lambda, normalized} for the "stringdot" kernel
       where length is the length of the strings considered, lambda the
       decay factor and normalized a logical parameter determining if the
       kernel evaluations should be normalized.
    }
    
    Hyper-parameters for user defined kernels can be passed through the
    kpar parameter as well.}

  \item{nystrom.red}{use nystrom method to calculate eigenvectors. When
    \code{TRUE} a sample of the dataset is used to calculate the
    eigenvalues, thus only a \eqn{n x m} matrix where \eqn{n} the sample size
    is stored in memory (default: \code{FALSE}}

  \item{nystrom.sample}{number of data points to use for estimating the
    eigenvalues when using the nystrom method. (default : dim(x)[1]/6)}

  \item{mod.sample}{proportion of data to use when estimating sigma (default: 0.75)}	

  \item{iterations}{the maximum number of iterations allowed. }

  \item{na.action}{the action to perform on NA}

  \item{\dots}{additional parameters}
    
}
\details{
  Spectral clustering  works by embedding the data points of the
  partitioning problem into the
  subspace of the \eqn{k} largest eigenvectors of a normalized affinity/kernel matrix.
Using a simple clustering method like \code{kmeans} on the embedded points usually
leads to good performance. It can be shown that spectral clustering methods boil down to 
 graph partitioning.\cr
The data can be passed to the \code{specc} function in a \code{matrix} or a
\code{data.frame}, in addition \code{specc} also supports input in the form of a
kernel matrix of class \code{kernelMatrix} or as a list of character
vectors where a string kernel has to be used.}
\value{
 An S4 object of class \code{specc} which extends the class \code{vector}
 containing integers indicating the cluster to which
 each point is allocated. The following slots contain useful information
 
  \item{centers}{A matrix of cluster centers.}
  \item{size}{The number of point in each cluster}
  \item{withinss}{The within-cluster sum of squares for each cluster}
  \item{kernelf}{The kernel function used}
}
\references{
  Andrew Y. Ng, Michael I. Jordan, Yair Weiss\cr
  \emph{On Spectral Clustering: Analysis and an Algorithm}\cr
  Neural Information Processing Symposium 2001\cr
  \url{https://papers.neurips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf}

}
\author{Alexandros Karatzoglou \cr \email{alexandros.karatzoglou@ci.tuwien.ac.at}
}


\seealso{\code{\link{kkmeans}}, \code{\link{kpca}}, \code{\link{kcca}} }
\examples{
## Cluster the spirals data set.
data(spirals)

sc <- specc(spirals, centers=2)

sc
centers(sc)
size(sc)
withinss(sc)

plot(spirals, col=sc)

}
\keyword{cluster}