File: geometricSizeFactors.Rd

package info (click to toggle)
r-bioc-scuttle 1.16.0%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 912 kB
  • sloc: cpp: 531; sh: 7; makefile: 2
file content (76 lines) | stat: -rw-r--r-- 3,406 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/geometricSizeFactors.R
\name{geometricSizeFactors}
\alias{geometricSizeFactors}
\alias{geometricSizeFactors,ANY-method}
\alias{geometricSizeFactors,SummarizedExperiment-method}
\alias{computeGeometricFactors}
\title{Compute geometric size factors}
\usage{
geometricSizeFactors(x, ...)

\S4method{geometricSizeFactors}{ANY}(
  x,
  subset.row = NULL,
  pseudo.count = 1,
  BPPARAM = SerialParam()
)

\S4method{geometricSizeFactors}{SummarizedExperiment}(x, ..., assay.type = "counts")

computeGeometricFactors(x, ...)
}
\arguments{
\item{x}{For \code{geometricSizeFactors}, a numeric matrix of counts with one row per feature and column per cell.
Alternatively, a \linkS4class{SummarizedExperiment} or \linkS4class{SingleCellExperiment} containing such counts.

For \code{computeGeometricFactors}, only a \linkS4class{SingleCellExperiment} containing a count matrix is accepted.}

\item{...}{For the \code{geometricSizeFactors} generic, arguments to pass to specific methods.
For the SummarizedExperiment method, further arguments to pass to the ANY method.

For \code{computeGeometricFactors}, further arguments to pass to \code{geometricSizeFactors}.}

\item{subset.row}{A vector specifying whether the size factors should be computed from a subset of rows of \code{x}.}

\item{pseudo.count}{Numeric scalar specifying the pseudo-count to add during log-transformation.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object indicating how calculations are to be parallelized.
Only relevant when \code{x} is a \linkS4class{DelayedArray} object.}

\item{assay.type}{String or integer scalar indicating the assay of \code{x} containing the counts.}
}
\value{
For \code{geometricSizeFactors}, a numeric vector of size factors is returned for all methods.

For \code{computeGeometricFactors}, \code{x} is returned containing the size factors in \code{\link{sizeFactors}(x)}.
}
\description{
Define per-cell size factors from the geometric mean of counts per cell.
}
\details{
The geometric mean provides an alternative measure of the average coverage per cell,
in contrast to the library size factors (i.e., the arithmetic mean) computed by \code{\link{librarySizeFactors}}.
The main advantage of the geometric mean is that it is more robust to large outliers, due to the slowly increasing nature of the log-transform at large values;
in the normalization context, this translates to greater resistance to coposition biases from a few strongly upregulated genes.

On the other hand, the geometric mean is a poor estimator of the relative bias at low or zero counts.
This is because the scaling of the coverage applies to the expectation of the raw counts, 
so the geometric mean only becomes an accurate estimator if the mean of the logs approaches the log of the mean (usually at high counts).
The arbitrary pseudo-count also has a bigger influence at low counts.

As such, the geometric mean is only well-suited for deeply sequenced features, e.g., antibody-derived tags.
}
\examples{
example_sce <- mockSCE()
summary(geometricSizeFactors(example_sce))
}
\seealso{
\code{\link{normalizeCounts}} and \code{\link{logNormCounts}}, where these size factors are used by default.

\code{\link{geometricSizeFactors}} and \code{\link{medianSizeFactors}}, 
for two other simple methods of computing size factors.
}
\author{
Aaron Lun
}