File: sumCountsAcrossCells.Rd

package info (click to toggle)
r-bioc-scuttle 1.0.4%2Bdfsg-5
links: PTS, VCS
area: main
in suites: bullseye
size: 728 kB
sloc: cpp: 356; sh: 17; makefile: 2
file content (108 lines) | stat: -rw-r--r-- 5,250 bytes
parent folder | download | duplicates (3)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sumCountsAcrossCells.R
\name{sumCountsAcrossCells}
\alias{sumCountsAcrossCells}
\alias{sumCountsAcrossCells,ANY-method}
\alias{sumCountsAcrossCells,SummarizedExperiment-method}
\title{Sum expression across groups of cells}
\usage{
sumCountsAcrossCells(x, ...)

\S4method{sumCountsAcrossCells}{ANY}(
  x,
  ids,
  subset.row = NULL,
  subset.col = NULL,
  store.number = "ncells",
  average = FALSE,
  BPPARAM = SerialParam(),
  subset_row = NULL,
  subset_col = NULL,
  store_number = NULL
)

\S4method{sumCountsAcrossCells}{SummarizedExperiment}(x, ..., assay.type = "counts", exprs_values = NULL)
}
\arguments{
\item{x}{A numeric matrix of expression values (usually counts) containing features in rows and cells in columns.
Alternatively, a \linkS4class{SummarizedExperiment} object containing such a matrix.}

\item{...}{For the generics, further arguments to be passed to specific methods.

For the SummarizedExperiment method, further arguments to be passed to the ANY method.}

\item{ids}{A factor specifying the group to which each cell in \code{x} belongs.
Alternatively, a \linkS4class{DataFrame} of such vectors or factors, 
in which case each unique combination of levels defines a group.}

\item{subset.row}{An integer, logical or character vector specifying the features to use.
If \code{NULL}, defaults to all features.
For the \linkS4class{SingleCellExperiment} method, this argument will not affect alternative Experiments,
where aggregation is always performed for all features (or not at all, depending on \code{use_alt_exps}).}

\item{subset.col}{An integer, logical or character vector specifying the cells to use.
Defaults to all cells with non-\code{NA} entries of \code{ids}.}

\item{store.number}{String specifying the field of the output \code{\link{colData}} to store the number of cells in each group.
If \code{NULL}, nothing is stored.}

\item{average}{Logical scalar indicating whether the average should be computed instead of the sum.
Alternatively, a string containing \code{"mean"}, \code{"median"} or \code{"none"}, specifying the type of average.
(\code{"none"} is equivalent to \code{FALSE}.)}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object specifying whether summation should be parallelized.}

\item{subset_row, subset_col, exprs_values, store_number}{Soft-deprecated equivalents to the arguments described above.}

\item{assay.type}{A string or integer scalar specifying the assay of \code{x} containing the matrix of counts
(or any other expression quantity that can be meaningfully summed).}
}
\value{
A SummarizedExperiment is returned with one column per level of \code{ids}.
Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row).
Columns are ordered by \code{levels(ids)} and the number of cells per level is reported in the \code{"ncells"} column metadata.
For DataFrame \code{ids}, each column corresponds to a unique combination of levels (recorded in the \code{\link{colData}}).
}
\description{
Sum counts or average expression values for each feature across groups of cells.
This function is deprecated; use \code{\link{summarizeAssayByGroup}} instead.
}
\details{
These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature.
A typical application would be to sum counts across all cells in each cluster to obtain \dQuote{pseudo-bulk} samples for further analyses, e.g., differential expression analyses between conditions.

The behaviour of \code{sumCountsAcrossCells} is equivalent to that of \code{\link{colsum}}.
However, this function can operate on any matrix representation in \code{object};
can do so in a parallelized manner for large matrices without resorting to block processing;
and can natively support combinations of multiple factors in \code{ids}.

Any \code{NA} values in \code{ids} are implicitly ignored and will not be considered during summation.
This may be useful for removing undesirable cells by setting their entries in \code{ids} to \code{NA}.
Alternatively, we can explicitly select the cells of interest with \code{subset_col}.

Setting \code{average=TRUE} will compute the average in each set rather than the sum.
This is particularly useful if \code{x} contains expression values that have already been normalized in some manner,
as computing the average avoids another round of normalization to account for differences in the size of each set.
The same effect is obtained by setting \code{average="mean"},
while setting \code{average="median"} will instead compute the median across all cells.
}
\examples{
example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- sumCountsAcrossCells(example_sce, ids)
head(out)

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- sumCountsAcrossCells(example_sce, 
      DataFrame(label=ids, batch=batches))
head(out2)
}
\seealso{
\code{\link{aggregateAcrossCells}}, which also combines information in the \code{colData}.

\code{\link{numDetectedAcrossCells}}, which computes the number of cells with detected expression in each group.
}
\author{
Aaron Lun
}