1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/summarizeAssayByGroup.R
\name{summarizeAssayByGroup}
\alias{summarizeAssayByGroup}
\alias{summarizeAssayByGroup,ANY-method}
\alias{summarizeAssayByGroup,SummarizedExperiment-method}
\title{Summarize an assay by group}
\usage{
summarizeAssayByGroup(x, ...)
\S4method{summarizeAssayByGroup}{ANY}(
x,
ids,
subset.row = NULL,
subset.col = NULL,
statistics = c("mean", "sum", "num.detected", "prop.detected", "median"),
store.number = "ncells",
threshold = 0,
BPPARAM = SerialParam()
)
\S4method{summarizeAssayByGroup}{SummarizedExperiment}(x, ..., assay.type = "counts")
}
\arguments{
\item{x}{A numeric matrix containing features in rows and cells in columns.
Alternatively, a \linkS4class{SummarizedExperiment} object containing such a matrix.}
\item{...}{For the generics, further arguments to be passed to specific methods.
For the SummarizedExperiment method, further arguments to be passed to the ANY method.}
\item{ids}{A factor (or vector coercible into a factor) specifying the group to which each cell in \code{x} belongs.
Alternatively, a \linkS4class{DataFrame} of such vectors or factors,
in which case each unique combination of levels defines a group.}
\item{subset.row}{An integer, logical or character vector specifying the features to use.
If \code{NULL}, defaults to all features.}
\item{subset.col}{An integer, logical or character vector specifying the cells to use.
Defaults to all cells with non-\code{NA} entries of \code{ids}.}
\item{statistics}{Character vector specifying the type of statistics to be computed, see Details.}
\item{store.number}{String specifying the field of the output \code{\link{colData}} to store the number of cells in each group.
If \code{NULL}, nothing is stored.}
\item{threshold}{A numeric scalar specifying the threshold above which a gene is considered to be detected.}
\item{BPPARAM}{A \linkS4class{BiocParallelParam} object specifying whether summation should be parallelized.}
\item{assay.type}{A string or integer scalar specifying the assay of \code{x} containing the assay to be summarized.}
}
\value{
A SummarizedExperiment is returned with one column per level of \code{ids}.
Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row).
Columns are ordered by \code{levels(ids)} and the number of cells per level is reported in the \code{"ncells"} column metadata.
For DataFrame \code{ids}, each column corresponds to a unique combination of levels (recorded in the \code{\link{colData}}).
}
\description{
From an assay matrix, compute summary statistics for groups of cells.
A typical example would be to compute various summary statistics for clusters.
}
\details{
These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature.
A typical application would be to sum counts across all cells in each cluster to obtain \dQuote{pseudo-bulk} samples for further analyses,
e.g., differential expression analyses between conditions.
For each feature, the chosen assay can be aggregated by:
\itemize{
\item \code{"sum"}, the sum of all values in each group.
This makes the most sense for raw counts, to allow models to account for the mean-variance relationship.
\item \code{"mean"}, the mean of all values in each group.
This makes the most sense for normalized and/or transformed assays.
\item \code{"median"}, the median of all values in each group.
This makes the most sense for normalized and/or transformed assays,
usually generated from large counts where discreteness is less of an issue.
\item \code{"num.detected"} and \code{"prop.detected"}, the number and proportion of values in each group that are non-zero.
This makes the most sense for raw counts or sparsity-preserving transformations.
}
Any \code{NA} values in \code{ids} are implicitly ignored and will not be considered during summation.
This may be useful for removing undesirable cells by setting their entries in \code{ids} to \code{NA}.
Alternatively, we can explicitly select the cells of interest with \code{subset_col}.
If \code{ids} is a factor and contains unused levels, they will not be represented as columns in the output.
}
\examples{
example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)
out <- summarizeAssayByGroup(example_sce, ids)
out
batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- summarizeAssayByGroup(example_sce,
DataFrame(label=ids, batch=batches))
head(out2)
}
\seealso{
\code{\link{aggregateAcrossCells}}, which also combines information in the \code{\link{colData}} of \code{x}.
}
\author{
Aaron Lun
}
|