File: glm_gp_impl.Rd

package info (click to toggle)
r-bioc-glmgampoi 1.2.0%2Bdfsg-6
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 704 kB
  • sloc: cpp: 523; ansic: 114; sh: 13; makefile: 2
file content (98 lines) | stat: -rw-r--r-- 5,225 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/glm_gp_impl.R
\name{glm_gp_impl}
\alias{glm_gp_impl}
\title{Internal Function to Fit a Gamma-Poisson GLM}
\usage{
glm_gp_impl(
  Y,
  model_matrix,
  offset = 0,
  size_factors = c("normed_sum", "deconvolution", "poscounts"),
  overdispersion = TRUE,
  overdispersion_shrinkage = TRUE,
  do_cox_reid_adjustment = TRUE,
  subsample = FALSE,
  verbose = FALSE
)
}
\arguments{
\item{Y}{any matrix-like object (e.g. \code{matrix()}, \code{DelayedArray()}, \code{HDF5Matrix()}) with
one column per sample and row per gene.}

\item{model_matrix}{a numeric matrix that specifies the experimental
design. It can be produced using \code{stats::model.matrix()}.
Default: \code{NULL}}

\item{offset}{Constant offset in the model in addition to \code{log(size_factors)}. It can
either be a single number, a vector of length \code{ncol(data)} or a matrix with the
same dimensions as \code{dim(data)}. Note that if data is a \link{DelayedArray} or \link{HDF5Matrix},
\code{offset} must be as well. Default: \code{0}.}

\item{size_factors}{in large scale experiments, each sample is typically of different size
(for example different sequencing depths). A size factor is an internal mechanism of GLMs to
correct for this effect.\cr
\code{size_factors} is either a numeric vector with positive entries that has the same lengths as columns in the data
that specifies the size factors that are used.
Or it can be a string that species the method that is used to estimate the size factors
(one of \code{c("normed_sum", "deconvolution", "poscounts")}).
Note that \code{"normed_sum"} and \code{"poscounts"} are fairly
simple methods and can lead to suboptimal results. For the best performance, I recommend to use
\code{size_factors = "deconvolution"} which calls \code{scran::calculateSumFactors()}. However, you need
to separately install the \code{scran} package from Bioconductor for this method to work.
Also note that \code{size_factors = 1} and \code{size_factors = FALSE} are equivalent. If only a single gene is given,
no size factor is estimated (ie. \code{size_factors = 1}). Default: \code{"normed_sum"}.}

\item{overdispersion}{the simplest count model is the Poisson model. However, the Poisson model
assumes that \eqn{variance = mean}. For many applications this is too rigid and the Gamma-Poisson
allows a more flexible mean-variance relation (\eqn{variance = mean + mean^2 * overdispersion}). \cr
\code{overdispersion} can either be
\itemize{
\item a single boolean that indicates if an overdispersion is estimated for each gene.
\item a numeric vector of length \code{nrow(data)} fixing the overdispersion to those values.
\item the string \code{"global"} to indicate that one dispersion is fit across all genes.
}
Note that \code{overdispersion = 0} and \code{overdispersion = FALSE} are equivalent and both reduce
the Gamma-Poisson to the classical Poisson model. Default: \code{TRUE}.}

\item{overdispersion_shrinkage}{the overdispersion can be difficult to estimate with few replicates. To
improve the overdispersion estimates, we can share information across genes and shrink each individual
overdispersion estimate towards a global overdispersion estimate. Empirical studies show however that
the overdispersion varies based on the mean expression level (lower expression level => higher
dispersion). If \code{overdispersion_shrinkage = TRUE}, a median trend of dispersion and expression level is
fit and used to estimate the variances of a quasi Gamma Poisson model (Lund et al. 2012). Default: \code{TRUE}.}

\item{do_cox_reid_adjustment}{the classical maximum likelihood estimator of the \code{overdisperion} is biased
towards small values. McCarthy \emph{et al.} (2012) showed that it is preferable to optimize the Cox-Reid
adjusted profile likelihood.\cr
\code{do_cox_reid_adjustment} can be either be \code{TRUE} or \code{FALSE} to indicate if the adjustment is
added during the optimization of the \code{overdispersion} parameter. Default: \code{TRUE}.}

\item{subsample}{the estimation of the overdispersion is the slowest step when fitting
a Gamma-Poisson GLM. For datasets with many samples, the estimation can be considerably sped up
without loosing much precision by fitting the overdispersion only on a random subset of the samples.
Default: \code{FALSE} which means that the data is not subsampled. If set to \code{TRUE}, at most 1,000 samples
are considered. Otherwise the parameter just specifies the number of samples that are considered
for each gene to estimate the overdispersion.}

\item{verbose}{a boolean that indicates if information about the individual steps are printed
while fitting the GLM. Default: \code{FALSE}.}
}
\value{
a list with four elements
\itemize{
\item \code{Beta} the coefficient matrix
\item \code{overdispersion} the vector with the estimated overdispersions
\item \code{Mu} a matrix with the corresponding means for each gene
and sample
\item \code{size_factors} a vector with the size factor for each
sample
}
}
\description{
Internal Function to Fit a Gamma-Poisson GLM
}
\seealso{
\code{\link[=glm_gp]{glm_gp()}} and \code{\link[=overdispersion_mle]{overdispersion_mle()}}
}
\keyword{internal}