File: aggregateAcrossCells.Rd

package info (click to toggle)
r-bioc-scuttle 1.0.4%2Bdfsg-5
links: PTS, VCS
area: main
in suites: bullseye
size: 728 kB
sloc: cpp: 356; sh: 17; makefile: 2
file content (178 lines) | stat: -rw-r--r-- 8,898 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregateAcrossCells.R
\name{aggregateAcrossCells}
\alias{aggregateAcrossCells}
\alias{aggregateAcrossCells,SummarizedExperiment-method}
\alias{aggregateAcrossCells,SingleCellExperiment-method}
\title{Aggregate data across groups of cells}
\usage{
aggregateAcrossCells(x, ...)

\S4method{aggregateAcrossCells}{SummarizedExperiment}(
  x,
  ids,
  ...,
  statistics = NULL,
  average = NULL,
  suffix = FALSE,
  subset.row = NULL,
  subset.col = NULL,
  store.number = "ncells",
  coldata.merge = NULL,
  use.assay.type = "counts",
  subset_row = NULL,
  subset_col = NULL,
  store_number = "ncells",
  coldata_merge = NULL,
  use_exprs_values = NULL
)

\S4method{aggregateAcrossCells}{SingleCellExperiment}(
  x,
  ids,
  ...,
  subset.row = NULL,
  subset.col = NULL,
  use.altexps = TRUE,
  use.dimred = TRUE,
  dimred.stats = NULL,
  suffix = FALSE,
  subset_row = NULL,
  subset_col = NULL,
  use_altexps = NULL,
  use_dimred = NULL
)
}
\arguments{
\item{x}{A \linkS4class{SingleCellExperiment} or \linkS4class{SummarizedExperiment}
containing one or more matrices of expression values to be aggregated;
possibly along with \code{\link{colData}}, \code{\link{reducedDims}} and \code{\link{altExps}} elements.}

\item{...}{For the generic, further arguments to be passed to specific methods.

For the SummarizedExperiment method, further arguments to be passed to \code{\link{summarizeAssayByGroup}}.

For the SingleCellExperiment method, arguments to be passed to the SummarizedExperiment method.}

\item{ids}{A factor (or vector coercible into a factor) specifying the group to which each cell in \code{x} belongs.
Alternatively, a \linkS4class{DataFrame} of such vectors or factors, 
in which case each unique combination of levels defines a group.}

\item{statistics}{Character vector specifying the type of statistics to be computed, see \code{?\link{summarizeAssayByGroup}}.
If not specified, defaults to \code{"sum"}.}

\item{average}{Deprecated, specifies whether to compute the average - use \code{statistics="mean"} instead.
Only used if \code{statistics=NULL}.}

\item{suffix}{Logical scalar indicating whether to always suffix the assay name with the statistic type.}

\item{subset.row}{An integer, logical or character vector specifying the features to use.
If \code{NULL}, defaults to all features.
For the \linkS4class{SingleCellExperiment} method, this argument will not affect alternative Experiments,
where aggregation is always performed for all features (or not at all, depending on \code{use.altexps}).}

\item{subset.col}{An integer, logical or character vector specifying the cells to use.
Defaults to all cells with non-\code{NA} entries of \code{ids}.}

\item{store.number}{String specifying the field of the output \code{\link{colData}} to store the number of cells in each group.
If \code{NULL}, nothing is stored.}

\item{coldata.merge}{A named list of functions specifying how each column metadata field should be aggregated.
Each function should be named according to the name of the column in \code{\link{colData}} to which it applies.
Alternatively, a single function can be supplied, see below for more details.}

\item{use.assay.type}{A character or integer vector specifying the assay(s) of \code{x} containing count matrices.}

\item{subset_row, subset_col, store_number, use_exprs_values, use_altexps, use_dimred, coldata_merge}{Soft deprecated equivalents to the arguments described above.}

\item{use.altexps}{Logical scalar indicating whether aggregation should be performed for alternative experiments. 
Alternatively, a character or integer vector specifying the alternative experiments to be aggregated.}

\item{use.dimred}{Logical scalar indicating whether aggregation should be performed for dimensionality reduction results.
Alternatively, a character or integer vector specifying the dimensionality reduction results to be aggregated.}

\item{dimred.stats}{A character vector specifying how the reduced dimensions should be aggregated by group.
This can be one or more of \code{"mean"} and \code{"median"}.}
}
\value{
A SummarizedExperiment of the same class of \code{x} is returned containing summed/averaged matrices 
generated by \code{\link{summarizeAssayByGroup}} on all assays in \code{use.assay.type}.
Column metadata are also aggregated according to the rules in \code{coldata.merge}, see below.

For the SingleCellExperiment method, 
the output also contains aggregated values for the reduced dimensions and alternative Experiments.
}
\description{
Sum counts or average expression values for each feature across groups of cells,
while also aggregating values in the \code{\link{colData}} and other fields in a SummarizedExperiment.
}
\details{
This function summarizes the assay values in \code{x} for each group in \code{ids} using \code{\link{summarizeAssayByGroup}}
while also aggregating metadata across cells in a \dQuote{sensible} manner.
This makes it useful for obtaining an aggregated \linkS4class{SummarizedExperiment} during an analysis session;
in contrast, \code{\link{summarizeAssayByGroup}} is more lightweight and is better for use inside other functions.

Aggregation of the \code{\link{colData}} is controlled using functions in \code{coldata.merge}.
This can either be:
\itemize{
\item A function that takes a subset of entries for any given column metadata field and returns a single value.
This can be set to, e.g., \code{\link{sum}} or \code{\link{median}} for numeric covariates,
or a function that takes the most abundant level for categorical factors.
\item A named list of such functions, where each function is applied to the column metadata field after which it is named.
Any field that does not have an entry in \code{coldata.merge} is \dQuote{unspecified} and handled as described below.
A list element can also be set to \code{FALSE}, in which case no aggregation is performed for the corresponding field.
\item \code{NULL}, in which case all fields are considered to be unspecified.
\item \code{FALSE}, in which case no aggregation of column metadata is performed.
}
For any unspecified field, we check if all cells of a group have the same value.
If so, that value is reported, otherwise a \code{NA} is reported for the offending group.

By default, each matrix values is returned with the same name as the original per-cell matrix from which it was derived.
If \code{statistics} is of length greater than 1 or \code{suffix=TRUE},
the names of all aggregated matrices are suffixed with their type of aggregate statistic.

If \code{ids} is a \linkS4class{DataFrame}, the combination of levels corresponding to each column is also reported in the column metadata.
Otherwise, the level corresponding to each column is reported in the \code{ids} column metadata field as well as in the column names.
}
\section{Dealing with SingleCellExperiments}{

If \code{x} is a \linkS4class{SingleCellExperiment}, aggregation is repeated for each entry of \code{\link{altExps}}.
This is done by calling \code{aggregateAcrossCells} on that entry with the same arguments used for the main Experiment -
as such, any column metadata in those entries will also be aggregated following the rules in \code{coldata.merge}.
The exception is \code{subset.row}, which is not applied to the alternative Experiments as the feature sets are different.

If \code{x} is a \linkS4class{SingleCellExperiment}, each entry of \code{\link{reducedDims}} is averaged across cells.
This assumes that the average of low-dimensional coordinates has some meaning for a group of cells but the sum does not.
We can explicitly specify computation of the \code{"mean"} or \code{"median"} (or both) with \code{dimred.stats}.
If \code{dimred.stats} is of length greater than 1 or \code{suffix=TRUE},
the name of each matrix in the output \code{\link{reducedDims}} is suffixed with the type of average.

Users can tune the behavior of the function for these additional fields with \code{use.altexps} and \code{use.dimred}.
Note that if the alternative experiments themselves are \linkS4class{SingleCellExperiment}s,
any further nested alternative experiment or reduced dimensions will always be aggregated
regardless of the value of \code{use.altexps} or \code{use.dimred}.
}

\examples{
example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)
out <- aggregateAcrossCells(example_sce, ids)
out

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- aggregateAcrossCells(example_sce, 
      DataFrame(label=ids, batch=batches))
out2

# Using another column metadata merge strategy.
example_sce$stuff <- runif(ncol(example_sce))
out3 <- aggregateAcrossCells(example_sce, ids, 
     coldata_merge=list(stuff=sum))
out3
}
\seealso{
\code{\link{summarizeAssayByGroup}}, which does the heavy lifting at the assay level.
}
\author{
Aaron Lun
}