File: quickSubCluster.Rd

package info (click to toggle)
r-bioc-scran 1.18.5%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 1,856 kB
  • sloc: cpp: 960; sh: 13; makefile: 2
file content (122 lines) | stat: -rw-r--r-- 5,810 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/quickSubCluster.R
\name{quickSubCluster}
\alias{quickSubCluster}
\alias{quickSubCluster,ANY-method}
\alias{quickSubCluster,SummarizedExperiment-method}
\alias{quickSubCluster,SingleCellExperiment-method}
\title{Quick and dirty subclustering}
\usage{
quickSubCluster(x, ...)

\S4method{quickSubCluster}{ANY}(x, normalize = TRUE, ...)

\S4method{quickSubCluster}{SummarizedExperiment}(x, ...)

\S4method{quickSubCluster}{SingleCellExperiment}(
  x,
  groups,
  normalize = TRUE,
  prepFUN = NULL,
  min.ncells = 50,
  clusterFUN = NULL,
  BLUSPARAM = NNGraphParam(),
  format = "\%s.\%s",
  assay.type = "counts",
  simplify = FALSE
)
}
\arguments{
\item{x}{A matrix of counts or log-normalized expression values (if \code{normalize=FALSE}),
where each row corresponds to a gene and each column corresponds to a cell.

Alternatively, a \linkS4class{SummarizedExperiment} or \linkS4class{SingleCellExperiment} object containing such a matrix.}

\item{...}{For the generic, further arguments to pass to specific methods.

For the ANY and SummarizedExperiment methods, further arguments to pass to the SingleCellExperiment method.}

\item{normalize}{Logical scalar indicating whether each subset of \code{x} should be log-transformed prior to further analysis.}

\item{groups}{A vector of group assignments for all cells, usually corresponding to cluster identities.}

\item{prepFUN}{A function that accepts a single \linkS4class{SingleCellExperiment} object and returns another \linkS4class{SingleCellExperiment} containing any additional elements required for clustering (e.g., PCA results).}

\item{min.ncells}{An integer scalar specifying the minimum number of cells in a group to be considered for subclustering.}

\item{clusterFUN}{A function that accepts a single \linkS4class{SingleCellExperiment} object and returns a vector of cluster assignments for each cell in that object.}

\item{BLUSPARAM}{A \linkS4class{BlusterParam} object that is used to specify the clustering via \code{\link{clusterRows}}.
Only used when \code{clusterFUN=NULL}.}

\item{format}{A string to be passed to \code{\link{sprintf}}, specifying how the subclusters should be named with respect to the parent level in \code{groups} and the level returned by \code{clusterFUN}.}

\item{assay.type}{String or integer scalar specifying the relevant assay.}

\item{simplify}{Logical scalar indicating whether just the subcluster assignments should be returned.}
}
\value{
By default, a named \linkS4class{List} of \linkS4class{SingleCellExperiment} objects.
Each object corresponds to a level of \code{groups} and contains a \code{"subcluster"} column metadata field with the subcluster identities for each cell.
The \code{\link{metadata}} of the List also contains \code{index}, a list of integer vectors specifying the cells in \code{x} in each returned SingleCellExperiment object; 
and \code{subcluster}, a character vector of subcluster identities (see next).

If \code{simplify=TRUE}, the character vector of subcluster identities is returned.
This is of length equal to \code{ncol(x)} and each entry follows the format defined in \code{format}.
(Unless the number of cells in the parent cluster is less than \code{min.cells}, in which case the parent cluster's name is used.)
}
\description{
Performs a quick subclustering for all cells within each group.
}
\details{
\code{quickSubCluster} is a simple convenience function that loops over all levels of \code{groups} to perform subclustering.
It subsets \code{x} to retain all cells in one level and then runs \code{prepFUN} and \code{clusterFUN} to cluster them.
Levels with fewer than \code{min.ncells} are not subclustered and have \code{"subcluster"} set to the name of the level.

The distinction between \code{prepFUN} and \code{clusterFUN} is that the former's calculations are preserved in the output.
For example, we would put the PCA in \code{prepFUN} so that the PCs are returned in the \code{\link{reducedDims}} for later use.
In contrast, \code{clusterFUN} is only used to obtain the subcluster assignments so any intermediate objects are lost.

By default, \code{prepFUN} will run \code{\link{modelGeneVar}}, take the top 10% of genes with large biological components with \code{\link{getTopHVGs}}, and then run \code{\link{denoisePCA}} to perform the PCA.
\code{clusterFUN} will then perform clustering on the PC matrix with \code{\link{clusterRows}} and \code{BLUSPARAM}.
Either or both of these functions can be replaced with custom functions.

% We use denoisePCA+modelGeneVar by default here, because we hope that each parent cluster is reasonably homogeneous.
% This allows us to assume that the trend is actually a good estimate of the technical noise.
% We don't use the other modelGeneVar*'s to avoid making assumptions about the available of spike-ins, UMI data, etc.

The default behavior of this function is the same as running \code{\link{quickCluster}} on each subset with default parameters except for \code{min.size=0}.
}
\examples{
library(scuttle)
sce <- mockSCE(ncells=200)

# Lowering min.size for this small demo:
clusters <- quickCluster(sce, min.size=50)

# Getting subclusters:
out <- quickSubCluster(sce, clusters)

# Defining custom prep functions:
out2 <- quickSubCluster(sce, clusters, 
    prepFUN=function(x) {
        dec <- modelGeneVarWithSpikes(x, "Spikes")
        top <- getTopHVGs(dec, prop=0.2)
        scater::runPCA(x, subset_row=top, ncomponents=25)
    }
)

# Defining custom cluster functions:
out3 <- quickSubCluster(sce, clusters, 
    clusterFUN=function(x) {
        kmeans(reducedDim(x, "PCA"), sqrt(ncol(x)))$cluster
    }
)

}
\seealso{
\code{\link{quickCluster}}, for a related function to quickly obtain clusters.
}
\author{
Aaron Lun
}