1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/methods.R
\name{estimateSizeFactors,DESeqDataSet-method}
\alias{estimateSizeFactors,DESeqDataSet-method}
\title{Estimate the size factors for a \code{\link{DESeqDataSet}}}
\usage{
\S4method{estimateSizeFactors}{DESeqDataSet}(
object,
type = c("ratio", "poscounts", "iterate"),
locfunc = stats::median,
geoMeans,
controlGenes,
normMatrix,
quiet = FALSE
)
}
\arguments{
\item{object}{a DESeqDataSet}
\item{type}{Method for estimation: either "ratio", "poscounts", or "iterate".
"ratio" uses the standard median ratio method introduced in DESeq. The size factor is the
median ratio of the sample over a "pseudosample": for each gene, the geometric mean
of all samples.
"poscounts" and "iterate" offer alternative estimators, which can be
used even when all genes contain a sample with a zero (a problem for the
default method, as the geometric mean becomes zero, and the ratio undefined).
The "poscounts" estimator deals with a gene with some zeros, by calculating a
modified geometric mean by taking the n-th root of the product of the non-zero counts.
This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples.
The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and
finding a size factor vector by numerically optimizing the likelihood
of the ~1 model.}
\item{locfunc}{a function to compute a location for a sample. By default, the
median is used. However, especially for low counts, the
\code{\link[genefilter]{shorth}} function from the genefilter package may give better results.}
\item{geoMeans}{by default this is not provided and the
geometric means of the counts are calculated within the function.
A vector of geometric means from another count matrix can be provided
for a "frozen" size factor calculation. The size factors will be
scaled to have a geometric mean of 1 when supplying \code{geoMeans}.}
\item{controlGenes}{optional, numeric or logical index vector specifying those genes to
use for size factor estimation (e.g. housekeeping or spike-in genes)}
\item{normMatrix}{optional, a matrix of normalization factors which do not yet
control for library size. Note that this argument should not be used (and
will be ignored) if the \code{dds} object was created using \code{tximport}.
In this case, the information in \code{assays(dds)[["avgTxLength"]]}
is automatically used to create appropriate normalization factors.
Providing \code{normMatrix} will estimate size factors on the
count matrix divided by \code{normMatrix} and store the product of the
size factors and \code{normMatrix} as \code{\link{normalizationFactors}}.
It is recommended to divide out the row-wise geometric mean of
\code{normMatrix} so the rows roughly are centered on 1.}
\item{quiet}{whether to print messages}
}
\value{
The DESeqDataSet passed as parameters, with the size factors filled
in.
}
\description{
This function estimates the size factors using the
"median ratio method" described by Equation 5 in Anders and Huber (2010).
The estimated size factors can be accessed using the accessor function \code{\link{sizeFactors}}.
Alternative library size estimators can also be supplied
using the assignment function \code{\link{sizeFactors<-}}.
}
\details{
Typically, the function is called with the idiom:
\code{dds <- estimateSizeFactors(dds)}
See \code{\link{DESeq}} for a description of the use of size factors in the GLM.
One should call this function after \code{\link{DESeqDataSet}}
unless size factors are manually specified with \code{\link{sizeFactors}}.
Alternatively, gene-specific normalization factors for each sample can be provided using
\code{\link{normalizationFactors}} which will always preempt \code{\link{sizeFactors}}
in calculations.
Internally, the function calls \code{\link{estimateSizeFactorsForMatrix}},
which provides more details on the calculation.
}
\examples{
dds <- makeExampleDESeqDataSet(n=1000, m=4)
dds <- estimateSizeFactors(dds)
sizeFactors(dds)
dds <- estimateSizeFactors(dds, controlGenes=1:200)
m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4)
dds <- estimateSizeFactors(dds, normMatrix=m)
normalizationFactors(dds)[1:3,]
geoMeans <- exp(rowMeans(log(counts(dds))))
dds <- estimateSizeFactors(dds,geoMeans=geoMeans)
sizeFactors(dds)
}
\references{
Reference for the median ratio method:
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data.
Genome Biology 2010, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106}
}
\seealso{
\code{\link{estimateSizeFactorsForMatrix}}
}
\author{
Simon Anders
}
|