File: perCellQCMetrics.Rd

package info (click to toggle)
r-bioc-scuttle 1.16.0%2Bdfsg-3
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 912 kB
sloc: cpp: 531; sh: 7; makefile: 2
file content (176 lines) | stat: -rw-r--r-- 7,577 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/perCellQCMetrics.R
\name{perCellQCMetrics}
\alias{perCellQCMetrics}
\alias{perCellQCMetrics,ANY-method}
\alias{perCellQCMetrics,SummarizedExperiment-method}
\alias{perCellQCMetrics,SingleCellExperiment-method}
\title{Per-cell quality control metrics}
\usage{
perCellQCMetrics(x, ...)

\S4method{perCellQCMetrics}{ANY}(
  x,
  subsets = NULL,
  percent.top = integer(0),
  threshold = 0,
  BPPARAM = SerialParam(),
  flatten = TRUE,
  percent_top = NULL,
  detection_limit = NULL
)

\S4method{perCellQCMetrics}{SummarizedExperiment}(x, ..., assay.type = "counts", exprs_values = NULL)

\S4method{perCellQCMetrics}{SingleCellExperiment}(
  x,
  subsets = NULL,
  percent.top = integer(0),
  ...,
  flatten = TRUE,
  assay.type = "counts",
  use.altexps = NULL,
  percent_top = NULL,
  exprs_values = NULL,
  use_altexps = NULL
)
}
\arguments{
\item{x}{A numeric matrix of counts with cells in columns and features in rows.

Alternatively, a \linkS4class{SummarizedExperiment} or \linkS4class{SingleCellExperiment} object containing such a matrix.}

\item{...}{For the generic, further arguments to pass to specific methods.

For the SummarizedExperiment and SingleCellExperiment methods, further arguments to pass to the ANY method.}

\item{subsets}{A named list containing one or more vectors 
(a character vector of feature names, a logical vector, or a numeric vector of indices),
used to identify interesting feature subsets such as ERCC spike-in transcripts or mitochondrial genes.}

\item{percent.top}{An integer vector specifying the size(s) of the top set of high-abundance genes.
Used to compute the percentage of library size occupied by the most highly expressed genes in each cell.}

\item{threshold}{A numeric scalar specifying the threshold above which a gene is considered to be detected.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object specifying how parallelization should be performed.}

\item{flatten}{Logical scalar indicating whether the nested \linkS4class{DataFrame}s in the output should be flattened.}

\item{percent_top, detection_limit, exprs_values, use_altexps}{Soft deprecated equivalents to the arguments described above.}

\item{assay.type}{A string or integer scalar indicating which \code{assays} in the \code{x} contains the count matrix.}

\item{use.altexps}{Logical scalar indicating whether QC statistics should be computed for alternative Experiments in \code{x}.
If \code{TRUE}, statistics are computed for all alternative experiments. 

Alternatively, an integer or character vector specifying the alternative Experiments to use to compute QC statistics.

Alternatively \code{NULL}, in which case we only use alternative experiments that contain the specified \code{assay.type}.

Alternatively \code{FALSE}, in which case alternative experiments are not used.}
}
\value{
A \linkS4class{DataFrame} of QC statistics where each row corresponds to a column in \code{x}.
This contains the following fields:
\itemize{
\item \code{sum}: numeric, the sum of counts for each cell.
\item \code{detected}: numeric, the number of observations above \code{threshold}.
}

If \code{flatten=FALSE}, the DataFrame will contain the additional columns:
\itemize{
\item \code{percent.top}: numeric matrix, the percentage of counts assigned to the top most highly expressed genes.
Each column of the matrix corresponds to an entry of \code{percent.top}, sorted in increasing order.
\item \code{subsets}: A nested DataFrame containing statistics for each subset, see Details.
\item \code{altexps}: A nested DataFrame containing statistics for each alternative experiment, see Details.
This is only returned for the SingleCellExperiment method.
\item \code{total}: numeric, the total sum of counts for each cell across main and alternative Experiments.
This is only returned for the SingleCellExperiment method.
}

If \code{flatten=TRUE}, nested matrices and DataFrames are flattened to remove the hierarchical structure from the output DataFrame.
}
\description{
Compute per-cell quality control metrics for a count matrix or a \linkS4class{SingleCellExperiment}.
}
\details{
This function calculates useful QC metrics for identification and removal of potentially problematic cells.
Obvious per-cell metrics are the sum of counts (i.e., the library size) and the number of detected features.
The percentage of counts in the top features also provides a measure of library complexity.

If \code{subsets} is specified, these statistics are also computed for each subset of features.
This is useful for investigating gene sets of interest, e.g., mitochondrial genes, Y chromosome genes.
These statistics are stored as nested \linkS4class{DataFrame}s in the \code{subsets} field of the output.
For example, if the input \code{subsets} contained \code{"Mito"} and \code{"Sex"}, the output would look like:
\preformatted{  output 
  |-- sum
  |-- detected
  |-- percent.top
  +-- subsets
      |-- Mito
      |   |-- sum
      |   |-- detected
      |   +-- percent
      +-- Sex 
          |-- sum
          |-- detected
          +-- percent
}
Here, the \code{percent} field contains the percentage of each cell's count \code{sum} assigned to each subset. 

If \code{use.altexps=TRUE}, the same statistics are computed for each alternative experiment in \code{x}.
This can also be an integer or character vector specifying the alternative Experiments to use.
These statistics are also stored as nested \linkS4class{DataFrame}s, this time in the \code{altexps} field of the output.
For example, if \code{x} contained the alternative Experiments \code{"Spike"} and \code{"Ab"}, the output would look like:
\preformatted{  output 
  |-- sum
  |-- detected
  |-- percent.top
  +-- altexps 
  |   |-- Spike
  |   |   |-- sum
  |   |   |-- detected
  |   |   +-- percent
  |   +-- Ab
  |       |-- sum
  |       |-- detected
  |       +-- percent
  +-- total 
}
The \code{total} field contains the total sum of counts for each cell across the main and alternative Experiments.
The \code{percent} field contains the percentage of the \code{total} count in each alternative Experiment for each cell.

Note that the denominator for \code{altexps$...$percent} is not the same as the denominator for \code{subset$...$percent}.
For example, if \code{subsets} contains a set of mitochondrial genes, the mitochondrial percentage would be computed as a fraction of the total endogenous coverage,
while the \code{altexps} percentage would be computed as a fraction of the total coverage across all (endogenous and artificial) features.

If \code{flatten=TRUE}, the nested DataFrames are flattened by concatenating the column names with underscores.
This means that, say, the \code{subsets$Mito$sum} nested field becomes the top-level \code{subsets_Mito_sum} field.
A flattened structure is more convenient for end-users performing interactive analyses,
but less convenient for programmatic access as artificial construction of strings is required.
}
\examples{
example_sce <- mockSCE()
stats <- perCellQCMetrics(example_sce)
stats

# With subsets.
stats2 <- perCellQCMetrics(example_sce, subsets=list(Mito=1:10), 
    flatten=FALSE)
stats2$subsets

# With alternative Experiments.
pretend.spike <- ifelse(seq_len(nrow(example_sce)) < 10, "Spike", "Gene")
alt_sce <- splitAltExps(example_sce, pretend.spike)
stats3 <- perCellQCMetrics(alt_sce, flatten=FALSE)
stats3$altexps


}
\seealso{
\code{\link{addPerCellQCMetrics}}, to add the QC metrics to the column metadata.
}
\author{
Aaron Lun
}