File: posterior_predict.stanreg.Rd

package info (click to toggle)
r-cran-rstanarm 2.21.1-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 7,964 kB
  • sloc: cpp: 47; sh: 18; makefile: 2
file content (174 lines) | stat: -rw-r--r-- 7,105 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/posterior_predict.R
\name{posterior_predict.stanreg}
\alias{posterior_predict.stanreg}
\alias{posterior_predict}
\alias{posterior_predict.stanmvreg}
\title{Draw from posterior predictive distribution}
\usage{
\method{posterior_predict}{stanreg}(
  object,
  newdata = NULL,
  draws = NULL,
  re.form = NULL,
  fun = NULL,
  seed = NULL,
  offset = NULL,
  ...
)

\method{posterior_predict}{stanmvreg}(
  object,
  m = 1,
  newdata = NULL,
  draws = NULL,
  re.form = NULL,
  fun = NULL,
  seed = NULL,
  ...
)
}
\arguments{
\item{object}{A fitted model object returned by one of the 
\pkg{rstanarm} modeling functions. See \code{\link{stanreg-objects}}.}

\item{newdata}{Optionally, a data frame in which to look for variables with
which to predict. If omitted, the model matrix is used. If \code{newdata}
is provided and any variables were transformed (e.g. rescaled) in the data
used to fit the model, then these variables must also be transformed in
\code{newdata}. This only applies if variables were transformed before
passing the data to one of the modeling functions and \emph{not} if
transformations were specified inside the model formula. Also see the Note
section below for a note about using the \code{newdata} argument with with
binomial models.}

\item{draws}{An integer indicating the number of draws to return. The default
and maximum number of draws is the size of the posterior sample.}

\item{re.form}{If \code{object} contains \code{\link[=stan_glmer]{group-level}}
parameters, a formula indicating which group-level parameters to
condition on when making predictions. \code{re.form} is specified in the
same form as for \code{\link[lme4]{predict.merMod}}. The default,
\code{NULL}, indicates that all estimated group-level parameters are
conditioned on. To refrain from conditioning on any group-level parameters,
specify \code{NA} or \code{~0}. The \code{newdata} argument may include new
\emph{levels} of the grouping factors that were specified when the model
was estimated, in which case the resulting posterior predictions
marginalize over the relevant variables.}

\item{fun}{An optional function to apply to the results. \code{fun} is found
by a call to \code{\link{match.fun}} and so can be specified as a function
object, a string naming a function, etc.}

\item{seed}{An optional \code{\link[=set.seed]{seed}} to use.}

\item{offset}{A vector of offsets. Only required if \code{newdata} is
specified and an \code{offset} argument was specified when fitting the
model.}

\item{...}{For \code{stanmvreg} objects, argument \code{m} can be specified
indicating the submodel for which you wish to obtain predictions.}

\item{m}{Integer specifying the number or name of the submodel}
}
\value{
A \code{draws} by \code{nrow(newdata)} matrix of simulations from the
  posterior predictive distribution. Each row of the matrix is a vector of 
  predictions generated using a single draw of the model parameters from the 
  posterior distribution. The returned matrix will also have class
  \code{"ppd"} to indicate it contains draws from the posterior predictive
  distribution.
}
\description{
The posterior predictive distribution is the distribution of the outcome
implied by the model after using the observed data to update our beliefs
about the unknown parameters in the model. Simulating data from the posterior
predictive distribution using the observed predictors is useful for checking
the fit of the model. Drawing from the posterior predictive distribution at
interesting values of the predictors also lets us visualize how a
manipulation of a predictor affects (a function of) the outcome(s). With new
observations of predictor variables we can use the posterior predictive
distribution to generate predicted outcomes.
}
\note{
For binomial models with a number of trials greater than one (i.e., not
  Bernoulli models), if \code{newdata} is specified then it must include all
  variables needed for computing the number of binomial trials to use for the
  predictions. For example if the left-hand side of the model formula is
  \code{cbind(successes, failures)} then both \code{successes} and
  \code{failures} must be in \code{newdata}. The particular values of
  \code{successes} and \code{failures} in \code{newdata} do not matter so
  long as their sum is the desired number of trials. If the left-hand side of
  the model formula were \code{cbind(successes, trials - successes)} then
  both \code{trials} and \code{successes} would need to be in \code{newdata},
  probably with \code{successes} set to \code{0} and \code{trials} specifying
  the number of trials. See the Examples section below and the
  \emph{How to Use the rstanarm Package} for examples.

For models estimated with \code{\link{stan_clogit}}, the number of 
  successes per stratum is ostensibly fixed by the research design. Thus, when
  doing posterior prediction with new data, the \code{data.frame} passed to
  the \code{newdata} argument must contain an outcome variable and a stratifying
  factor, both with the same name as in the original \code{data.frame}. Then, the 
  posterior predictions will condition on this outcome in the new data.
}
\examples{
if (!exists("example_model")) example(example_model)
yrep <- posterior_predict(example_model)
table(yrep)

\donttest{
# Using newdata
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
dat <- data.frame(counts, treatment, outcome)
fit3 <- stan_glm(
  counts ~ outcome + treatment, 
  data = dat,
  family = poisson(link="log"),
  prior = normal(0, 1, autoscale = FALSE), 
  prior_intercept = normal(0, 5, autoscale = FALSE),
  refresh = 0
)
nd <- data.frame(treatment = factor(rep(1,3)), outcome = factor(1:3))
ytilde <- posterior_predict(fit3, nd, draws = 500)
print(dim(ytilde))  # 500 by 3 matrix (draws by nrow(nd))

ytilde <- data.frame(
  count = c(ytilde),
  outcome = rep(nd$outcome, each = 500)
)
ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) +
  ggplot2::geom_boxplot() +
  ggplot2::ylab("predicted count")


# Using newdata with a binomial model.
# example_model is binomial so we need to set
# the number of trials to use for prediction.
# This could be a different number for each
# row of newdata or the same for all rows.
# Here we'll use the same value for all.
nd <- lme4::cbpp
print(formula(example_model))  # cbind(incidence, size - incidence) ~ ...
nd$size <- max(nd$size) + 1L   # number of trials
nd$incidence <- 0  # set to 0 so size - incidence = number of trials
ytilde <- posterior_predict(example_model, newdata = nd)


# Using fun argument to transform predictions
mtcars2 <- mtcars
mtcars2$log_mpg <- log(mtcars2$mpg)
fit <- stan_glm(log_mpg ~ wt, data = mtcars2, refresh = 0)
ytilde <- posterior_predict(fit, fun = exp)
}

}
\seealso{
\code{\link{pp_check}} for graphical posterior predictive checks.
  Examples of posterior predictive checking can also be found in the
  \pkg{rstanarm} vignettes and demos.

\code{\link{predictive_error}} and \code{\link{predictive_interval}}.
}