File: estimateSizeFactors.Rd

package info (click to toggle)
r-bioc-deseq2 1.46.0%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 1,748 kB
  • sloc: cpp: 413; makefile: 2
file content (114 lines) | stat: -rw-r--r-- 4,651 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/methods.R
\name{estimateSizeFactors,DESeqDataSet-method}
\alias{estimateSizeFactors,DESeqDataSet-method}
\title{Estimate the size factors for a \code{\link{DESeqDataSet}}}
\usage{
\S4method{estimateSizeFactors}{DESeqDataSet}(
  object,
  type = c("ratio", "poscounts", "iterate"),
  locfunc = stats::median,
  geoMeans,
  controlGenes,
  normMatrix,
  quiet = FALSE
)
}
\arguments{
\item{object}{a DESeqDataSet}

\item{type}{Method for estimation: either "ratio", "poscounts", or "iterate".
"ratio" uses the standard median ratio method introduced in DESeq. The size factor is the
median ratio of the sample over a "pseudosample": for each gene, the geometric mean
of all samples.
"poscounts" and "iterate" offer alternative estimators, which can be
used even when all genes contain a sample with a zero (a problem for the
default method, as the geometric mean becomes zero, and the ratio undefined).
The "poscounts" estimator deals with a gene with some zeros, by calculating a
modified geometric mean by taking the n-th root of the product of the non-zero counts.
This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples.
The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and
finding a size factor vector by numerically optimizing the likelihood
of the ~1 model.}

\item{locfunc}{a function to compute a location for a sample. By default, the
median is used. However, especially for low counts, the
\code{\link[genefilter]{shorth}} function from the genefilter package may give better results.}

\item{geoMeans}{by default this is not provided and the
geometric means of the counts are calculated within the function.
A vector of geometric means from another count matrix can be provided
for a "frozen" size factor calculation. The size factors will be 
scaled to have a geometric mean of 1 when supplying \code{geoMeans}.}

\item{controlGenes}{optional, numeric or logical index vector specifying those genes to
use for size factor estimation (e.g. housekeeping or spike-in genes)}

\item{normMatrix}{optional, a matrix of normalization factors which do not yet
control for library size. Note that this argument should not be used (and
will be ignored) if the \code{dds} object was created using \code{tximport}.
In this case, the information in \code{assays(dds)[["avgTxLength"]]}
is automatically used to create appropriate normalization factors.
Providing \code{normMatrix} will estimate size factors on the
count matrix divided by \code{normMatrix} and store the product of the
size factors and \code{normMatrix} as \code{\link{normalizationFactors}}.
It is recommended to divide out the row-wise geometric mean of
\code{normMatrix} so the rows roughly are centered on 1.}

\item{quiet}{whether to print messages}
}
\value{
The DESeqDataSet passed as parameters, with the size factors filled
in.
}
\description{
This function estimates the size factors using the
"median ratio method" described by Equation 5 in Anders and Huber (2010).
The estimated size factors can be accessed using the accessor function \code{\link{sizeFactors}}.
Alternative library size estimators can also be supplied
using the assignment function \code{\link{sizeFactors<-}}.
}
\details{
Typically, the function is called with the idiom:

\code{dds <- estimateSizeFactors(dds)}

See \code{\link{DESeq}} for a description of the use of size factors in the GLM.
One should call this function after \code{\link{DESeqDataSet}}
unless size factors are manually specified with \code{\link{sizeFactors}}.
Alternatively, gene-specific normalization factors for each sample can be provided using
\code{\link{normalizationFactors}} which will always preempt \code{\link{sizeFactors}}
in calculations.

Internally, the function calls \code{\link{estimateSizeFactorsForMatrix}}, 
which provides more details on the calculation.
}
\examples{

dds <- makeExampleDESeqDataSet(n=1000, m=4)
dds <- estimateSizeFactors(dds)
sizeFactors(dds)

dds <- estimateSizeFactors(dds, controlGenes=1:200)

m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4)
dds <- estimateSizeFactors(dds, normMatrix=m)
normalizationFactors(dds)[1:3,]

geoMeans <- exp(rowMeans(log(counts(dds))))
dds <- estimateSizeFactors(dds,geoMeans=geoMeans)
sizeFactors(dds)

}
\references{
Reference for the median ratio method:

Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data.
Genome Biology 2010, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106}
}
\seealso{
\code{\link{estimateSizeFactorsForMatrix}}
}
\author{
Simon Anders
}