File: plotting-mcmc-draws.Rmd

package info (click to toggle)
r-cran-bayesplot 1.11.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 7,080 kB
  • sloc: sh: 13; makefile: 2
file content (389 lines) | stat: -rw-r--r-- 12,572 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
---
title: "Plotting MCMC draws using the bayesplot package"
author: "Jonah Gabry"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
params:
  EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true")
vignette: >
  %\VignetteIndexEntry{Plotting MCMC draws}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, child="children/SETTINGS-knitr.txt"}
```
```{r pkgs, include=FALSE}
library("ggplot2")
library("rstanarm")
```


## Introduction

This vignette focuses on plotting parameter estimates from MCMC draws. MCMC 
diagnostic plots are covered in the separate vignette
[_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html), 
and graphical posterior predictive model checking is covered in the vignette
[_Graphical posterior predictive checks_](https://mc-stan.org/bayesplot/articles/graphical-ppcs.html).

### Setup

In addition to __bayesplot__ we'll load the following packages: 

* __ggplot2__, in case we want to customize the ggplot objects created by __bayesplot__
* __rstanarm__, for fitting the example models used throughout the vignette

```{r, eval=FALSE}
library("bayesplot")
library("ggplot2")
library("rstanarm")      
```

### Example model

The **bayesplot** package provides various plotting functions for visualizing 
Markov chain Monte Carlo (MCMC) draws from the posterior distribution of the 
parameters of a Bayesian model. In this vignette we demonstrate a few of 
these functions. Example usage of the functions not demonstrated here can be
found in the package documentation.

For demonstration we will use draws obtained using the `stan_glm` function in
the **rstanarm** package (Gabry and Goodrich, 2017), but MCMC draws from using
any package can be used with the functions in the **bayesplot** package. See,
for example, **brms**, which, like **rstanarm**, calls the **rstan** package
internally to use [Stan](https://mc-stan.org/)'s MCMC sampler.

```{r mtcars}
head(mtcars) # see help("mtcars")
```

```{r, eval=FALSE}
# linear regression model using stan_glm
# using '~ .' to include all variables
fit <- stan_glm(mpg ~ ., data = mtcars, seed = 1111)
print(fit)
```

```{r stan_glm, include=FALSE}
fit <- stan_glm(mpg ~ ., data = mtcars, QR = TRUE, seed = 1111)
```

```{r print-fit, echo=FALSE}
print(fit)
```

To use the posterior draws with the functions in the **bayesplot** package
we'll extract them from the fitted model object: 

```{r get-draws}
posterior <- as.array(fit)
dim(posterior)
dimnames(posterior)
```

We've used `as.array` above (as opposed to `as.matrix`) because it keeps the 
Markov chains separate (`stan_glm` runs four chains by default). Most of the
plots don't actually need the chains to be separate, but for a few of the 
plots we make in this vignette we'll want to show the chains individually.


<br>

## Posterior uncertainty intervals

For models fit using MCMC we can compute posterior uncertainty intervals
(sometimes called "credible intervals") in various ways. **bayesplot** currently
provides plots of central intervals based on quantiles, although additional
options may be provided in future releases (e.g., HDIs, which can be useful in
particular cases).

**Documentation:**

* `help("MCMC-intervals")` 
* [mc-stan.org/bayesplot/reference/MCMC-intervals](https://mc-stan.org/bayesplot/reference/MCMC-intervals.html)

------

#### mcmc_intervals, mcmc_areas

Central posterior uncertainty intervals can be plotted using the
`mcmc_intervals` function.

```{r mcmc_intervals}
color_scheme_set("red")
mcmc_intervals(posterior, pars = c("cyl", "drat", "am", "sigma"))
```

The default is to show 50% intervals (the thick segments) and 90% intervals 
(the thinner outer lines). These defaults can be changed using the `prob`
and `prob_outer` arguments, respectively. The points in the above plot 
are posterior medians. The `point_est` argument can be used to select posterior
means instead or to omit the point estimates.

To show the uncertainty intervals as shaded areas under the estimated posterior
density curves we can use the `mcmc_areas` function.

```{r mcmc_areas}
mcmc_areas(
  posterior, 
  pars = c("cyl", "drat", "am", "sigma"),
  prob = 0.8, # 80% intervals
  prob_outer = 0.99, # 99%
  point_est = "mean"
)
```


<br>

## Univariate marginal posterior distributions

**bayesplot** provides functions for looking at histograms or kernel density
estimates of marginal posterior distributions, either with all Markov chains
combined or with the chains separate.

**Documentation:**

* `help("MCMC-distributions")` 
* [mc-stan.org/bayesplot/reference/MCMC-distributions](https://mc-stan.org/bayesplot/reference/MCMC-distributions.html)

------

#### mcmc_hist

The `mcmc_hist` function plots marginal posterior distributions 
(combining all chains):

```{r mcmc_hist, message=FALSE}
color_scheme_set("green")
mcmc_hist(posterior, pars = c("wt", "sigma"))
```

If we want to plot `log(sigma)` rather than `sigma` we can either 
transform the draws in advance or use the `transformations` argument.

```{r mcmc_hist-transform, message=FALSE}
color_scheme_set("blue")
mcmc_hist(posterior, pars = c("wt", "sigma"),
          transformations = list("sigma" = "log"))
```

<!-- ```{r mcmc_hist-transform, message=FALSE, eval=FALSE} -->
<!-- # not evaluated to reduce vignette size for CRAN -->
<!-- # full version available at mc-stan.org/bayesplot/articles -->
<!-- color_scheme_set("blue") -->
<!-- mcmc_hist(posterior, pars = c("wt", "sigma"),  -->
<!--           transformations = list("sigma" = "log")) -->
<!-- ``` -->

Most of the other functions for plotting MCMC draws also have a
`transformations` argument.


#### mcmc_hist_by_chain

To view separate histograms of each of the four Markov chains we can use
`mcmc_hist_by_chain`, which plots each chain in a separate facet in the plot.

```{r mcmc_hist_by_chain, message=FALSE}
color_scheme_set("brightblue")
mcmc_hist_by_chain(posterior, pars = c("wt", "sigma"))
```

#### mcmc_dens

The `mcmc_dens` function is similar to `mcmc_hist` but plots kernel 
density estimates instead of histograms. 

```{r mcmc_dens, message=FALSE}
color_scheme_set("purple")
mcmc_dens(posterior, pars = c("wt", "sigma"))
```


#### mcmc_dens_overlay

Like `mcmc_hist_by_chain`, the `mcmc_dens_overlay` function separates the Markov
chains. But instead of plotting each chain individually, the density estimates
are overlaid.

```{r mcmc_dens_overlay, message=FALSE}
mcmc_dens_overlay(posterior, pars = c("wt", "sigma"))
```

#### mcmc_violin

The `mcmc_violin` function plots the density estimates of each chain as violins
and draws horizontal line segments at user-specified quantiles.

```{r mcmc_violin}
color_scheme_set("teal")
mcmc_violin(posterior, pars = c("wt", "sigma"), probs = c(0.1, 0.5, 0.9))
```



<br>

## Bivariate plots

Various functions are available for plotting bivariate marginal posterior 
distributions. Some of these functions also take optional arguments 
for adding MCMC diagnostic information to the plots. That additional functionality 
is discussed in the separate [_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html)
vignette.

**Documentation:**

* `help("MCMC-scatterplots")` 
* [mc-stan.org/bayesplot/reference/MCMC-scatterplots](https://mc-stan.org/bayesplot/reference/MCMC-scatterplots.html)

------

#### mcmc_scatter

The `mcmc_scatter` function creates a simple scatterplot of two parameters.

```{r mcmc_scatter}
color_scheme_set("gray")
mcmc_scatter(posterior, pars = c("(Intercept)", "wt"), 
             size = 1.5, alpha = 0.5)
```

#### mcmc_hex

The `mcmc_hex` function creates a similar plot but using hexagonal binning,
which can be useful to avoid overplotting.

```{r mcmc_hex}
# requires hexbin package
if (requireNamespace("hexbin", quietly = TRUE)) {
  mcmc_hex(posterior, pars = c("(Intercept)", "wt"))
}
```

#### mcmc_pairs

In addition to `mcmc_scatter` and `mcmc_hex`, __bayesplot__ now provides
an `mcmc_pairs` function for creating pairs plots with more than two parameters.

```{r mcmc_pairs, message=FALSE}
color_scheme_set("pink")
mcmc_pairs(posterior, pars = c("(Intercept)", "wt", "sigma"),
           off_diag_args = list(size = 1.5))
```

<!-- ```{r mcmc_pairs, message=FALSE, eval=FALSE} -->
<!-- # not evaluated to reduce vignette size for CRAN -->
<!-- # full version available at mc-stan.org/bayesplot/articles -->
<!-- color_scheme_set("pink") -->
<!-- mcmc_pairs(posterior, pars = c("(Intercept)", "wt", "sigma"), -->
<!--            off_diag_args = list(size = 1.5)) -->
<!-- ``` -->

The univariate marginal posteriors are shown along the diagonal as histograms, 
but this can be changed to densities by setting `diag_fun="dens"`. Bivariate
plots are displayed above and below the diagonal as scatterplots, but it is also
possible to use hex plots by setting `off_diag_fun="hex"`. By default,
`mcmc_pairs` shows some of the Markov chains (half, if an even number of chains)
above the diagonal and the others below. There are many more options for 
controlling how the draws should be split between the plots above and below the
diagonal (see the documentation for the `condition` argument), but they are more
useful when MCMC diagnostic information is included. This is discussed in the 
[_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html)
vignette.


<br>

## Trace plots 

Trace plots are time series plots of Markov chains. In this vignette 
we show the standard trace plots that **bayesplot** can make. For models
fit using any Stan interface (or Hamiltonian Monte Carlo in general), the 
[_Visual MCMC diagnostics_](https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html)
vignette provides an example of also adding information about divergences
to trace plots.

**Documentation:**

* `help("MCMC-traces")` 
* [mc-stan.org/bayesplot/reference/MCMC-traces](https://mc-stan.org/bayesplot/reference/MCMC-traces.html)

-------

#### mcmc_trace

The `mcmc_trace` function creates standard trace plots:

```{r mcmc_trace}
color_scheme_set("blue")
mcmc_trace(posterior, pars = c("wt", "sigma"))
```

If it's hard to see the difference between the chains we can 
change to a mixed color scheme, for example:

```{r change-scheme}
color_scheme_set("mix-blue-red")
mcmc_trace(posterior, pars = c("wt", "sigma"), 
           facet_args = list(ncol = 1, strip.position = "left"))
```

The code above also illustrates the use of the `facet_args` argument, which is a
list of parameters passed to `facet_wrap` in __ggplot2__. Specifying `ncol=1`
means the trace plots will be stacked in a single column rather than placed side
by side, and `strip.position="left"` moves the facet labels to the y-axis
(instead of above each facet).

The [`"viridis"` color scheme](https://CRAN.R-project.org/package=viridis) is
also useful for trace plots because it is comprised of very distinct colors:

```{r viridis-scheme, eval=FALSE}
color_scheme_set("viridis")
mcmc_trace(posterior, pars = "(Intercept)")
```

<!-- ```{r viridis-scheme, eval=FALSE} -->
<!-- # not evaluated to reduce vignette size for CRAN -->
<!-- # full version available at mc-stan.org/bayesplot/articles -->
<!-- color_scheme_set("viridis") -->
<!-- mcmc_trace(posterior, pars = "(Intercept)") -->
<!-- ``` -->

#### mcmc_trace_highlight

The `mcmc_trace_highlight` function uses points instead of lines and reduces the
opacity of all but a single chain (which is specified using the `highlight`
argument).

```{r mcmc_trace_highlight}
mcmc_trace_highlight(posterior, pars = "sigma", highlight = 3)
```


<br>

## References

Gabry, J., and Goodrich, B. (2017). rstanarm: Bayesian Applied Regression 
Modeling via Stan. R package version 2.15.3. 
https://mc-stan.org/rstanarm/, https://CRAN.R-project.org/package=rstanarm

Gabry, J., Simpson, D., Vehtari, A., Betancourt, M. and Gelman, A. (2019),
Visualization in Bayesian workflow. _J. R. Stat. Soc. A_, 182: 389-402. 
\doi:10.1111/rssa.12378. ([journal version](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12378), 
[arXiv preprint](https://arxiv.org/abs/1709.01449), 
[code on GitHub](https://github.com/jgabry/bayes-vis-paper))
<a id="gabry2019"></a>

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and
Rubin, D. B. (2013). *Bayesian Data Analysis*. Chapman & Hall/CRC Press, London,
third edition.

Stan Development Team. (2017). *Stan Modeling Language Users
Guide and Reference Manual*. https://mc-stan.org/users/documentation/