File: summarizeAssayByGroup.Rd

package info (click to toggle)
r-bioc-scuttle 1.16.0%2Bdfsg-3
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 912 kB
sloc: cpp: 531; sh: 7; makefile: 2
file content (104 lines) | stat: -rw-r--r-- 4,749 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/summarizeAssayByGroup.R
\name{summarizeAssayByGroup}
\alias{summarizeAssayByGroup}
\alias{summarizeAssayByGroup,ANY-method}
\alias{summarizeAssayByGroup,SummarizedExperiment-method}
\title{Summarize an assay by group}
\usage{
summarizeAssayByGroup(x, ...)

\S4method{summarizeAssayByGroup}{ANY}(
  x,
  ids,
  subset.row = NULL,
  subset.col = NULL,
  statistics = c("mean", "sum", "num.detected", "prop.detected", "median"),
  store.number = "ncells",
  threshold = 0,
  BPPARAM = SerialParam()
)

\S4method{summarizeAssayByGroup}{SummarizedExperiment}(x, ..., assay.type = "counts")
}
\arguments{
\item{x}{A numeric matrix containing features in rows and cells in columns.
Alternatively, a \linkS4class{SummarizedExperiment} object containing such a matrix.}

\item{...}{For the generics, further arguments to be passed to specific methods.

For the SummarizedExperiment method, further arguments to be passed to the ANY method.}

\item{ids}{A factor (or vector coercible into a factor) specifying the group to which each cell in \code{x} belongs.
Alternatively, a \linkS4class{DataFrame} of such vectors or factors, 
in which case each unique combination of levels defines a group.}

\item{subset.row}{An integer, logical or character vector specifying the features to use.
If \code{NULL}, defaults to all features.}

\item{subset.col}{An integer, logical or character vector specifying the cells to use.
Defaults to all cells with non-\code{NA} entries of \code{ids}.}

\item{statistics}{Character vector specifying the type of statistics to be computed, see Details.}

\item{store.number}{String specifying the field of the output \code{\link{colData}} to store the number of cells in each group.
If \code{NULL}, nothing is stored.}

\item{threshold}{A numeric scalar specifying the threshold above which a gene is considered to be detected.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object specifying whether summation should be parallelized.}

\item{assay.type}{A string or integer scalar specifying the assay of \code{x} containing the assay to be summarized.}
}
\value{
A SummarizedExperiment is returned with one column per level of \code{ids}.
Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row).
Columns are ordered by \code{levels(ids)} and the number of cells per level is reported in the \code{"ncells"} column metadata.
For DataFrame \code{ids}, each column corresponds to a unique combination of levels (recorded in the \code{\link{colData}}).
}
\description{
From an assay matrix, compute summary statistics for groups of cells.
A typical example would be to compute various summary statistics for clusters.
}
\details{
These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature.
A typical application would be to sum counts across all cells in each cluster to obtain \dQuote{pseudo-bulk} samples for further analyses, 
e.g., differential expression analyses between conditions.

For each feature, the chosen assay can be aggregated by:
\itemize{
\item \code{"sum"}, the sum of all values in each group.
This makes the most sense for raw counts, to allow models to account for the mean-variance relationship.
\item \code{"mean"}, the mean of all values in each group.
This makes the most sense for normalized and/or transformed assays.
\item \code{"median"}, the median of all values in each group.
This makes the most sense for normalized and/or transformed assays, 
usually generated from large counts where discreteness is less of an issue.
\item \code{"num.detected"} and \code{"prop.detected"}, the number and proportion of values in each group that are non-zero.
This makes the most sense for raw counts or sparsity-preserving transformations.
}

Any \code{NA} values in \code{ids} are implicitly ignored and will not be considered during summation.
This may be useful for removing undesirable cells by setting their entries in \code{ids} to \code{NA}.
Alternatively, we can explicitly select the cells of interest with \code{subset_col}.

If \code{ids} is a factor and contains unused levels, they will not be represented as columns in the output.
}
\examples{
example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- summarizeAssayByGroup(example_sce, ids)
out

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- summarizeAssayByGroup(example_sce, 
      DataFrame(label=ids, batch=batches))
head(out2)
}
\seealso{
\code{\link{aggregateAcrossCells}}, which also combines information in the \code{\link{colData}} of \code{x}.
}
\author{
Aaron Lun
}