1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389
|
---
title: "Plotting MCMC draws using the bayesplot package"
author: "Jonah Gabry"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 3
params:
EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true")
vignette: >
%\VignetteIndexEntry{Plotting MCMC draws}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, child="children/SETTINGS-knitr.txt"}
```
```{r pkgs, include=FALSE}
library("ggplot2")
library("rstanarm")
```
## Introduction
This vignette focuses on plotting parameter estimates from MCMC draws. MCMC
diagnostic plots are covered in the separate vignette
[_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html),
and graphical posterior predictive model checking is covered in the vignette
[_Graphical posterior predictive checks_](https://mc-stan.org/bayesplot/articles/graphical-ppcs.html).
### Setup
In addition to __bayesplot__ we'll load the following packages:
* __ggplot2__, in case we want to customize the ggplot objects created by __bayesplot__
* __rstanarm__, for fitting the example models used throughout the vignette
```{r, eval=FALSE}
library("bayesplot")
library("ggplot2")
library("rstanarm")
```
### Example model
The **bayesplot** package provides various plotting functions for visualizing
Markov chain Monte Carlo (MCMC) draws from the posterior distribution of the
parameters of a Bayesian model. In this vignette we demonstrate a few of
these functions. Example usage of the functions not demonstrated here can be
found in the package documentation.
For demonstration we will use draws obtained using the `stan_glm` function in
the **rstanarm** package (Gabry and Goodrich, 2017), but MCMC draws from using
any package can be used with the functions in the **bayesplot** package. See,
for example, **brms**, which, like **rstanarm**, calls the **rstan** package
internally to use [Stan](https://mc-stan.org/)'s MCMC sampler.
```{r mtcars}
head(mtcars) # see help("mtcars")
```
```{r, eval=FALSE}
# linear regression model using stan_glm
# using '~ .' to include all variables
fit <- stan_glm(mpg ~ ., data = mtcars, seed = 1111)
print(fit)
```
```{r stan_glm, include=FALSE}
fit <- stan_glm(mpg ~ ., data = mtcars, QR = TRUE, seed = 1111)
```
```{r print-fit, echo=FALSE}
print(fit)
```
To use the posterior draws with the functions in the **bayesplot** package
we'll extract them from the fitted model object:
```{r get-draws}
posterior <- as.array(fit)
dim(posterior)
dimnames(posterior)
```
We've used `as.array` above (as opposed to `as.matrix`) because it keeps the
Markov chains separate (`stan_glm` runs four chains by default). Most of the
plots don't actually need the chains to be separate, but for a few of the
plots we make in this vignette we'll want to show the chains individually.
<br>
## Posterior uncertainty intervals
For models fit using MCMC we can compute posterior uncertainty intervals
(sometimes called "credible intervals") in various ways. **bayesplot** currently
provides plots of central intervals based on quantiles, although additional
options may be provided in future releases (e.g., HDIs, which can be useful in
particular cases).
**Documentation:**
* `help("MCMC-intervals")`
* [mc-stan.org/bayesplot/reference/MCMC-intervals](https://mc-stan.org/bayesplot/reference/MCMC-intervals.html)
------
#### mcmc_intervals, mcmc_areas
Central posterior uncertainty intervals can be plotted using the
`mcmc_intervals` function.
```{r mcmc_intervals}
color_scheme_set("red")
mcmc_intervals(posterior, pars = c("cyl", "drat", "am", "sigma"))
```
The default is to show 50% intervals (the thick segments) and 90% intervals
(the thinner outer lines). These defaults can be changed using the `prob`
and `prob_outer` arguments, respectively. The points in the above plot
are posterior medians. The `point_est` argument can be used to select posterior
means instead or to omit the point estimates.
To show the uncertainty intervals as shaded areas under the estimated posterior
density curves we can use the `mcmc_areas` function.
```{r mcmc_areas}
mcmc_areas(
posterior,
pars = c("cyl", "drat", "am", "sigma"),
prob = 0.8, # 80% intervals
prob_outer = 0.99, # 99%
point_est = "mean"
)
```
<br>
## Univariate marginal posterior distributions
**bayesplot** provides functions for looking at histograms or kernel density
estimates of marginal posterior distributions, either with all Markov chains
combined or with the chains separate.
**Documentation:**
* `help("MCMC-distributions")`
* [mc-stan.org/bayesplot/reference/MCMC-distributions](https://mc-stan.org/bayesplot/reference/MCMC-distributions.html)
------
#### mcmc_hist
The `mcmc_hist` function plots marginal posterior distributions
(combining all chains):
```{r mcmc_hist, message=FALSE}
color_scheme_set("green")
mcmc_hist(posterior, pars = c("wt", "sigma"))
```
If we want to plot `log(sigma)` rather than `sigma` we can either
transform the draws in advance or use the `transformations` argument.
```{r mcmc_hist-transform, message=FALSE}
color_scheme_set("blue")
mcmc_hist(posterior, pars = c("wt", "sigma"),
transformations = list("sigma" = "log"))
```
<!-- ```{r mcmc_hist-transform, message=FALSE, eval=FALSE} -->
<!-- # not evaluated to reduce vignette size for CRAN -->
<!-- # full version available at mc-stan.org/bayesplot/articles -->
<!-- color_scheme_set("blue") -->
<!-- mcmc_hist(posterior, pars = c("wt", "sigma"), -->
<!-- transformations = list("sigma" = "log")) -->
<!-- ``` -->
Most of the other functions for plotting MCMC draws also have a
`transformations` argument.
#### mcmc_hist_by_chain
To view separate histograms of each of the four Markov chains we can use
`mcmc_hist_by_chain`, which plots each chain in a separate facet in the plot.
```{r mcmc_hist_by_chain, message=FALSE}
color_scheme_set("brightblue")
mcmc_hist_by_chain(posterior, pars = c("wt", "sigma"))
```
#### mcmc_dens
The `mcmc_dens` function is similar to `mcmc_hist` but plots kernel
density estimates instead of histograms.
```{r mcmc_dens, message=FALSE}
color_scheme_set("purple")
mcmc_dens(posterior, pars = c("wt", "sigma"))
```
#### mcmc_dens_overlay
Like `mcmc_hist_by_chain`, the `mcmc_dens_overlay` function separates the Markov
chains. But instead of plotting each chain individually, the density estimates
are overlaid.
```{r mcmc_dens_overlay, message=FALSE}
mcmc_dens_overlay(posterior, pars = c("wt", "sigma"))
```
#### mcmc_violin
The `mcmc_violin` function plots the density estimates of each chain as violins
and draws horizontal line segments at user-specified quantiles.
```{r mcmc_violin}
color_scheme_set("teal")
mcmc_violin(posterior, pars = c("wt", "sigma"), probs = c(0.1, 0.5, 0.9))
```
<br>
## Bivariate plots
Various functions are available for plotting bivariate marginal posterior
distributions. Some of these functions also take optional arguments
for adding MCMC diagnostic information to the plots. That additional functionality
is discussed in the separate [_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html)
vignette.
**Documentation:**
* `help("MCMC-scatterplots")`
* [mc-stan.org/bayesplot/reference/MCMC-scatterplots](https://mc-stan.org/bayesplot/reference/MCMC-scatterplots.html)
------
#### mcmc_scatter
The `mcmc_scatter` function creates a simple scatterplot of two parameters.
```{r mcmc_scatter}
color_scheme_set("gray")
mcmc_scatter(posterior, pars = c("(Intercept)", "wt"),
size = 1.5, alpha = 0.5)
```
#### mcmc_hex
The `mcmc_hex` function creates a similar plot but using hexagonal binning,
which can be useful to avoid overplotting.
```{r mcmc_hex}
# requires hexbin package
if (requireNamespace("hexbin", quietly = TRUE)) {
mcmc_hex(posterior, pars = c("(Intercept)", "wt"))
}
```
#### mcmc_pairs
In addition to `mcmc_scatter` and `mcmc_hex`, __bayesplot__ now provides
an `mcmc_pairs` function for creating pairs plots with more than two parameters.
```{r mcmc_pairs, message=FALSE}
color_scheme_set("pink")
mcmc_pairs(posterior, pars = c("(Intercept)", "wt", "sigma"),
off_diag_args = list(size = 1.5))
```
<!-- ```{r mcmc_pairs, message=FALSE, eval=FALSE} -->
<!-- # not evaluated to reduce vignette size for CRAN -->
<!-- # full version available at mc-stan.org/bayesplot/articles -->
<!-- color_scheme_set("pink") -->
<!-- mcmc_pairs(posterior, pars = c("(Intercept)", "wt", "sigma"), -->
<!-- off_diag_args = list(size = 1.5)) -->
<!-- ``` -->
The univariate marginal posteriors are shown along the diagonal as histograms,
but this can be changed to densities by setting `diag_fun="dens"`. Bivariate
plots are displayed above and below the diagonal as scatterplots, but it is also
possible to use hex plots by setting `off_diag_fun="hex"`. By default,
`mcmc_pairs` shows some of the Markov chains (half, if an even number of chains)
above the diagonal and the others below. There are many more options for
controlling how the draws should be split between the plots above and below the
diagonal (see the documentation for the `condition` argument), but they are more
useful when MCMC diagnostic information is included. This is discussed in the
[_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html)
vignette.
<br>
## Trace plots
Trace plots are time series plots of Markov chains. In this vignette
we show the standard trace plots that **bayesplot** can make. For models
fit using any Stan interface (or Hamiltonian Monte Carlo in general), the
[_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html)
vignette provides an example of also adding information about divergences
to trace plots.
**Documentation:**
* `help("MCMC-traces")`
* [mc-stan.org/bayesplot/reference/MCMC-traces](https://mc-stan.org/bayesplot/reference/MCMC-traces.html)
-------
#### mcmc_trace
The `mcmc_trace` function creates standard trace plots:
```{r mcmc_trace}
color_scheme_set("blue")
mcmc_trace(posterior, pars = c("wt", "sigma"))
```
If it's hard to see the difference between the chains we can
change to a mixed color scheme, for example:
```{r change-scheme}
color_scheme_set("mix-blue-red")
mcmc_trace(posterior, pars = c("wt", "sigma"),
facet_args = list(ncol = 1, strip.position = "left"))
```
The code above also illustrates the use of the `facet_args` argument, which is a
list of parameters passed to `facet_wrap` in __ggplot2__. Specifying `ncol=1`
means the trace plots will be stacked in a single column rather than placed side
by side, and `strip.position="left"` moves the facet labels to the y-axis
(instead of above each facet).
The [`"viridis"` color scheme](https://CRAN.R-project.org/package=viridis) is
also useful for trace plots because it is comprised of very distinct colors:
```{r viridis-scheme, eval=FALSE}
color_scheme_set("viridis")
mcmc_trace(posterior, pars = "(Intercept)")
```
<!-- ```{r viridis-scheme, eval=FALSE} -->
<!-- # not evaluated to reduce vignette size for CRAN -->
<!-- # full version available at mc-stan.org/bayesplot/articles -->
<!-- color_scheme_set("viridis") -->
<!-- mcmc_trace(posterior, pars = "(Intercept)") -->
<!-- ``` -->
#### mcmc_trace_highlight
The `mcmc_trace_highlight` function uses points instead of lines and reduces the
opacity of all but a single chain (which is specified using the `highlight`
argument).
```{r mcmc_trace_highlight}
mcmc_trace_highlight(posterior, pars = "sigma", highlight = 3)
```
<br>
## References
Gabry, J., and Goodrich, B. (2017). rstanarm: Bayesian Applied Regression
Modeling via Stan. R package version 2.15.3.
https://mc-stan.org/rstanarm/, https://CRAN.R-project.org/package=rstanarm
Gabry, J., Simpson, D., Vehtari, A., Betancourt, M. and Gelman, A. (2019),
Visualization in Bayesian workflow. _J. R. Stat. Soc. A_, 182: 389-402.
\doi:10.1111/rssa.12378. ([journal version](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12378),
[arXiv preprint](https://arxiv.org/abs/1709.01449),
[code on GitHub](https://github.com/jgabry/bayes-vis-paper))
<a id="gabry2019"></a>
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and
Rubin, D. B. (2013). *Bayesian Data Analysis*. Chapman & Hall/CRC Press, London,
third edition.
Stan Development Team. (2017). *Stan Modeling Language Users
Guide and Reference Manual*. https://mc-stan.org/users/documentation/
|