File: cAIC.Rd

package info (click to toggle)
r-cran-caic4 1.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 356 kB
  • sloc: makefile: 2
file content (188 lines) | stat: -rw-r--r-- 8,067 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cAIC.R
\name{cAIC}
\alias{cAIC}
\title{Conditional Akaike Information for 'lme4' and 'lme'}
\usage{
cAIC(object, method = NULL, B = NULL, sigma.penalty = 1, analytic = TRUE)
}
\arguments{
\item{object}{An object of class merMod either fitted by
\code{\link[lme4]{lmer}} or \code{\link[lme4]{glmer}} of the lme4-package
or an \code{\link[nlme]{lme}} object fro the nlme-package.
Also objects returned form a \code{\link[gamm4]{gamm4}} call are possible.}

\item{method}{Either \code{"conditionalBootstrap"} for the estimation of the
degrees of freedom with the help of conditional Bootstrap or
\code{"steinian"} for analytical representations based on Stein type
formulas. The default is \code{NULL}. In this case the method is choosen
automatically based on the \code{family} argument of the
\code{(g)lmer}-object. For \code{"gaussian"} and \code{"poisson"} this is
the Steinian type estimator, for all others it is the conditional Bootstrap.
For models from the nlme package, only \code{\link[nlme]{lme}} objects, i.e.,
with gaussian response are supported.}

\item{B}{Number of Bootstrap replications. The default is \code{NULL}. Then
B is the minimum of 100 and the length of the response vector.}

\item{sigma.penalty}{An integer value for additional penalization in the analytic 
Gaussian calculation to account for estimated variance components in the residual (co-)variance. 
Per default \code{sigma.penalty} is equal \code{1}, corresponding to a diagonal error 
covariance matrix with only one estimated parameter (sigma). If 
all variance components are known, the value should be set to \code{0}. 
For individual weights (individual variances), this value should be set
to the number of estimated weights. For \code{\link[nlme]{lme}} objects
the penalty term is automatically set by extracting the number of estimated
variance components.}

\item{analytic}{FALSE if the numeric hessian of the (restricted) marginal
log-likelihood from the lmer optimization procedure should be used.
Otherwise (default) TRUE, i.e.  use a analytical version that has to be
computed. Only used for the analytical version of Gaussian responses.}
}
\value{
A \code{cAIC} object, which is a list consisting of: 
1. the conditional log likelihood, i.e. the log likelihood with the random 
effects as penalized parameters; 2. the estimated degrees of freedom; 
3. a list element that is either \code{NULL}
if no new model was fitted otherwise the new (reduced) model, see details;
4. a boolean variable indicating whether a new model was fitted or not; 5.
the estimator of the conditional Akaike information, i.e. minus twice the
log likelihood plus twice the degrees of freedom.
}
\description{
Estimates the conditional Akaike information for models that were fitted in
'lme4' or with 'lme'. Currently all distributions are supported for 'lme4' models, 
based on parametric conditional bootstrap. 
For the Gaussian distribution (from a \code{\link[lme4]{lmer}} or \code{\link[nlme]{lme}}
call) and the Poisson distribution analytical estimators for the degrees of
freedom are available, based on Stein type formulas. Also the conditional
Akaike information for generalized additive models based on a fit via the
'gamm4' or \code{\link[mgcv]{gamm}} calls from the 'mgcv' package can be estimated.
A hands-on tutorial for the package can be found at \url{https://arxiv.org/abs/1803.05664}.
}
\details{
For \code{method = "steinian"} and an object of class \code{merMod} computed
the analytic representation of the corrected conditional AIC in Greven and
Kneib (2010). This is based on a the Stein formula and uses implicit
differentiation to calculate the derivative of the random effects covariance
parameters w.r.t.  the data. The code is adapted form the one provided in
the supplementary material of the paper by Greven and Kneib (2010). The
supplied \code{\link[lme4]{merMod}} model needs to be checked if a random
effects covariance parameter has an optimum on the boundary, i.e. is zero.
And if so the model needs to be refitted with the according random effect
terms omitted. This is also done by the function and the refitted model is
also returned. Notice that the \code{boundary.tol} argument in
\code{\link[lme4]{lmerControl}} has an impact on whether a parameter is
estimated to lie on the boundary of the parameter space. For estimated error
variance the degrees of freedom are increased by one per default. 
\code{sigma.penalty} can be set manually for \code{\link[lme4]{merMod}} models 
if no (0) or more than one variance component (>1) has been estimated. For 
\code{\link[nlme]{lme}} objects this value is automatically defined.

If the object is of class \code{\link[lme4]{merMod}} and has \code{family =
"poisson"} there is also an analytic representation of the conditional AIC
based on the Chen-Stein formula, see for instance Saefken et. al (2014). For
the calculation the model needs to be refitted for each observed response
variable minus the number of response variables that are exactly zero. The
calculation therefore takes longer then for models with Gaussian responses.
Due to the speed and stability of 'lme4' this is still possible, also for
larger datasets.

If the model has Bernoulli distributed responses and \code{method =
"steinian"}, \code{\link{cAIC}} calculates the degrees of freedom based on a
proposed estimator by Efron (2004). This estimator is asymptotically
unbiased if the estimated conditional mean is consistent. The calculation
needs as many model refits as there are data points.

Another more general method for the estimation of the degrees of freedom is
the conditional bootstrap. This is proposed in Efron (2004). For the B
boostrap samples the degrees of freedom are estimated by \deqn{\frac{1}{B -
1}\sum_{i=1}^n\theta_i(z_i)(z_i-\bar{z}),} where \eqn{\theta_i(z_i)} is the
i-th element of the estimated natural parameter.

For models with no random effects, i.e. (g)lms, the \code{\link{cAIC}}
function returns the AIC of the model with scale parameter estimated by REML.
}
\section{WARNINGS }{
 Currently the cAIC can only be estimated for
\code{family} equal to \code{"gaussian"}, \code{"poisson"} and
\code{"binomial"}. Neither negative binomial nor gamma distributed responses
are available. 
Weighted Gaussian models are not yet implemented.
}

\examples{

### Three application examples
b <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
cAIC(b)

b2 <- lmer(Reaction ~ (1 | Days) + (1 | Subject), sleepstudy)
cAIC(b2)

b2ML <- lmer(Reaction ~ (1 + Days | Subject), sleepstudy, REML = FALSE)
cAIC(b2ML)

### Demonstration of boundary case
\dontrun{
set.seed(2017-1-1)
n <- 50
beta <- 2
x <- rnorm(n)
eta <- x*beta
id <- gl(5,10)
epsvar <- 1
data <- data.frame(x = x, id = id)
y_wo_bi <- eta + rnorm(n, 0, sd = epsvar) 

# use a very small RE variance
ranvar <- 0.05
nrExperiments <- 100

sim <- sapply(1:nrExperiments, function(j){

b_i <- scale(rnorm(5, 0, ranvar), scale = FALSE)
y <- y_wo_bi + model.matrix(~ -1 + id) \%*\% b_i
data$y <- y

mixedmod <- lmer(y ~ x + (1 | id), data = data)
linmod <- lm(y ~ x, data = data)

c(cAIC(mixedmod)$caic, cAIC(linmod)$caic)
})

rownames(sim) <- c("mixed model", "linear model")

boxplot(t(sim))


}



}
\references{
Saefken, B., Ruegamer, D., Kneib, T. and Greven, S. (2021):
Conditional Model Selection in Mixed-Effects Models with cAIC4.
<doi:10.18637/jss.v099.i08>

Saefken, B., Kneib T., van Waveren C.-S. and Greven, S. (2014) A
unifying approach to the estimation of the conditional Akaike information in
generalized linear mixed models. Electronic Journal Statistics Vol. 8,
201-225.

Greven, S. and Kneib T. (2010) On the behaviour of marginal and conditional
AIC in linear mixed models. Biometrika 97(4), 773-789.

Efron , B. (2004) The estimation of prediction error. J. Amer. Statist. Ass.
99(467), 619-632.
}
\seealso{
\code{\link[lme4]{lme4-package}}, \code{\link[lme4]{lmer}},
\code{\link[lme4]{glmer}}
}
\author{
Benjamin Saefken, David Ruegamer
}
\keyword{regression}