File: simulate.merMod.Rd

package info (click to toggle)
lme4 2.0-1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 6,860 kB
  • sloc: cpp: 2,543; makefile: 2
file content (153 lines) | stat: -rw-r--r-- 8,021 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
\name{simulate.merMod}
\title{Simulate Responses From \code{\linkS4class{merMod}} Object}
\alias{simulate.merMod}
\alias{.simulateFun}
\description{
  Simulate responses from a \code{"merMod"} fitted model object, i.e.,
  from the model represented by it.
}
\usage{
\method{simulate}{merMod}(object, nsim = 1, seed = NULL,
	 use.u = FALSE, re.form = NA,
	 newdata=NULL, newparams=NULL, family=NULL, cluster.rand=rnorm,
	 allow.new.levels = FALSE, na.action = na.pass, \dots)

.simulateFun(object, nsim = 1, seed = NULL, use.u = FALSE,
             re.form = NA,
             newdata=NULL, newparams=NULL,
             formula=NULL, family=NULL, 
             cluster.rand=rnorm,
             weights=NULL, offset=NULL,
             allow.new.levels = FALSE, na.action = na.pass,
             cond.sim = TRUE, \dots)
}
\arguments{
  \item{object}{(for \code{simulate.merMod}) a fitted model object or
    (for \code{simulate.formula}) a (one-sided) mixed model formula, as
    described for \code{\link{lmer}}.}
  \item{nsim}{positive integer scalar - the number of responses to simulate.}
  \item{seed}{an optional seed to be used in \code{\link{set.seed}}
    immediately before the simulation so as to generate a reproducible sample.}
  \item{use.u}{(logical) if \code{TRUE}, generate a simulation
    conditional on the current random-effects estimates; if \code{FALSE}
    generate new Normally distributed random-effects values. (Redundant
    with \code{re.form}, which is preferred: \code{TRUE} corresponds to
    \code{re.form = NULL} (condition on all random effects), while
    \code{FALSE} corresponds to \code{re.form = ~0} (condition on none
    of the random effects).)}
  \item{re.form}{formula for random effects to condition on.  If
    \code{NULL}, condition on all random effects; if \code{NA} or \code{~0},
    condition on no random effects.  See Details.}
  \item{newdata}{data frame for which to evaluate predictions.}
  \item{newparams}{new parameters to use in evaluating predictions,
    specified as in the \code{start} parameter for \code{\link{lmer}} or
    \code{\link{glmer}} -- a list with components \code{theta} and
    \code{beta} and (for LMMs or GLMMs that estimate a scale parameter)
    \code{sigma}}
  \item{formula}{a (one-sided) mixed model formula, as described for
    \code{\link{lmer}}.}
  \item{family}{a GLM family, as in \code{\link{glmer}}.}
  \item{cluster.rand}{Function that generates \emph{standardized} random cluster effects.  The function takes one
    argument, the number of random values to generate.}
  \item{weights}{prior weights, as in \code{\link{lmer}} or
    \code{\link{glmer}}.}
  \item{offset}{offset, as in \code{\link{glmer}}.}
  \item{allow.new.levels}{(logical) if FALSE (default), then any new
    levels (or \code{NA} values) detected in \code{newdata} will trigger an
    error; if TRUE, then the prediction will use the unconditional
    (population-level) values for data with previously unobserved levels
    (or \code{NA}s).}
  \item{na.action}{what to do with \code{NA} values in new data: see
    \code{\link{na.fail}}}
  \item{cond.sim}{(experimental) simulate the conditional
    distribution?  if \code{FALSE}, simulate only random effects; do not
    simulate from the conditional distribution, rather return the
    predicted group-level values}
  \item{\dots}{optional additional arguments (none are used in
    \code{.simulateFormula})}
}
\seealso{
  \code{\link{bootMer}} for \dQuote{simulestimate}, i.e., where each
  simulation is followed by refitting the model.
}
\details{
  \itemize{
    \item{ordinarily \code{simulate} is used to generate new
      values from an existing, fitted model (\code{merMod} object):
      however, if \code{formula}, \code{newdata}, and \code{newparams} are
      specified, \code{simulate} generates the appropriate model
      structure to simulate from. \code{formula} must be a
      \emph{one-sided} formula (i.e. with an empty left-hand side);
      in general, if \code{f} is a two-sided
      formula, \code{f[-2]} can be used to drop the LHS.}

    \item{The \code{re.form} argument allows the user to specify how the
      random effects are incorporated in the simulation.  All of the
      random effects terms included in \code{re.form} will be
      \emph{conditioned on} - that is, the conditional modes of those
      random effects will be included in the deterministic part of the
      simulation. (If new levels are used (and \code{allow.new.levels}
      is \code{TRUE}), the conditional modes for these levels will be
      set to the population mode, i.e. values of zero will be used for
      the random effects.)  Conversely, the random effect terms that are
      \emph{not} included in \code{re.form} will be \emph{simulated
      from} - that is, new values will be chosen for each group based on
      the estimated random-effects variances.

      The default behaviour (using \code{re.form=NA}) is to condition on
      none of the random effects, simulating new values for all of the
      random effects.

    }
    \item{For Gaussian fits, \code{sigma} specifies the residual
      standard deviation; for Gamma fits, it specifies the square 
      root of the dispersion parameter. The shape
      parameter can be calculated by \code{1/sigma(fitted_model)^2}.  
      For negative binomial fits, the overdispersion parameter is 
      specified via the family, e.g. 
      \code{simulate(..., family=negative.binomial(theta=1.5))}.
    }
    \item{For binomial models, \code{simulate.formula} looks for the
      binomial size first in the \code{weights} argument (if it's supplied),
      second from the left-hand side of the formula (if the formula has been
      specified in success/failure form), and defaults to 1 if neither of
      those have been supplied.
      Simulated responses will be given as proportions, unless the supplied
      formula has a matrix-valued left-hand side, in which case they will be
      given in matrix form. If a left-hand side is given, variables in that
      expression must be available in \code{newdata}.
    }
    \item{For negative binomial models, use the \code{negative.binomial}
      family (from the \CRANpkg{MASS} package)
      and specify the overdispersion parameter via the
      \code{theta} (sic) parameter of the family function, e.g.
      \code{simulate(...,family=negative.binomial(theta=1))} to simulate
      from a geometric distribution (negative binomial with
      overdispersion parameter 1).
    }
    \item{\code{cluster.rand} allows one to test effects of departures from normality for the distribution of cluster random effects, for example by using a heavy-tailed distribution or a mixture distribution.  One can also use a truncated normal to investigate true or false flagging rates; in that case the generated effects will not have a mean of 0 and standard deviation of 1; they are still considered standardized in that the simulation will multiply them by the estimated standard deviation of the cluster random effects.}
}
}
\examples{
## test whether fitted models are consistent with the
##  observed number of zeros in CBPP data set:
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             data = cbpp, family = binomial)
gg <- simulate(gm1,1000)
zeros <- sapply(gg,function(x) sum(x[,"incidence"]==0))
plot(table(zeros))
abline(v=sum(cbpp$incidence==0),col=2)
##
## simulate from a non-fitted model; in this case we are just
## replicating the previous model, but starting from scratch
params <- list(theta=0.5,beta=c(2,-1,-2,-3))
simdat <- with(cbpp,expand.grid(herd=levels(herd),period=factor(1:4)))
simdat$size <- 15
simdat$incidence <- sample(0:1,size=nrow(simdat),replace=TRUE)
form <- formula(gm1)[-2]  ## RHS of equation only
simulate(form,newdata=simdat,family=binomial,
    newparams=params)
## simulate from negative binomial distribution instead
simulate(form,newdata=simdat,family=negative.binomial(theta=2.5),
    newparams=params)
}