1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
|
---
title: "Plotting Estimates (Fixed Effects) of Regression Models"
author: "Daniel Lüdecke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Plotting Estimates (Fixed Effects) of Regression Models}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", dev = "png", fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE)
if (!requireNamespace("sjlabelled", quietly = TRUE) ||
!requireNamespace("sjmisc", quietly = TRUE) ||
!requireNamespace("haven", quietly = TRUE) ||
!requireNamespace("ggplot2", quietly = TRUE)) {
knitr::opts_chunk$set(eval = FALSE)
} else {
knitr::opts_chunk$set(eval = TRUE)
library(sjPlot)
}
```
This document describes how to plot estimates as forest plots (or dot whisker plots) of various regression models, using the `plot_model()` function. `plot_model()` is a generic plot-function, which accepts many model-objects, like `lm`, `glm`, `lme`, `lmerMod` etc.
`plot_model()` allows to create various plot tyes, which can be defined via the `type`-argument. The default is `type = "fe"`, which means that fixed effects (model coefficients) are plotted. For mixed effects models, only fixed effects are plotted by default as well.
```{r results='hide'}
library(sjPlot)
library(sjlabelled)
library(sjmisc)
library(ggplot2)
data(efc)
theme_set(theme_sjplot())
```
## Fitting a logistic regression model
First, we fit a model that will be used in the following examples. The examples work in the same way for any other model as well.
```{r results='hide'}
# create binary response
y <- ifelse(efc$neg_c_7 < median(na.omit(efc$neg_c_7)), 0, 1)
# create data frame for fitting model
df <- data.frame(
y = to_factor(y),
sex = to_factor(efc$c161sex),
dep = to_factor(efc$e42dep),
barthel = efc$barthtot,
education = to_factor(efc$c172code)
)
# set variable label for response
set_label(df$y) <- "High Negative Impact"
# fit model
m1 <- glm(y ~., data = df, family = binomial(link = "logit"))
```
## Plotting estimates of generalized linear models
The simplest function call is just passing the model object as argument. By default, estimates are sorted in descending order, with the highest effect at the top.
```{r}
plot_model(m1)
```
The "neutral" line, i.e. the vertical intercept that indicates no effect (x-axis position 1 for most glm's and position 0 for most linear models), is drawn slightly thicker than the other grid lines. You can change the line color with the `vline.color`-argument.
```{r}
plot_model(m1, vline.color = "red")
```
## Sorting estimates
By default, the estimates are sorted in the same order as they were introduced into the model. Use `sort.est = TRUE` to sort estimates in descending order, from highest to lowest value.
```{r}
plot_model(m1, sort.est = TRUE)
```
Another way to sort estimates is to use the `order.terms`-argument. This is a numeric vector, indicating the order of estimates in the plot. In the summary, we see that "sex2" is the first term, followed by the three dependency-categories (position 2-4), the Barthel-Index (5) and two levels for intermediate and high level of education (6 and 7).
```{r}
summary(m1)
```
Now we want the educational levels (6 and 7) first, than gender (1), followed by dependency (2-4)and finally the Barthel-Index (5). Use this order as numeric vector for the `order.terms`-argument.
```{r}
plot_model(m1, order.terms = c(6, 7, 1, 2, 3, 4, 5))
```
## Estimates on the untransformed scale
By default, `plot_model()` automatically exponentiates coefficients, if appropriate (e.g. for models with log or logit link). You can explicitley prevent transformation by setting the `transform`-argument to `NULL`, or apply any transformation by using a character vector with the function name.
```{r}
plot_model(m1, transform = NULL)
plot_model(m1, transform = "plogis")
```
## Showing value labels
By default, just the dots and error bars are plotted. Use `show.values = TRUE` to show the value labels with the estimates values, and use `show.p = FALSE` to suppress the asterisks that indicate the significance level of the p-values. Use `value.offset` to adjust the relative positioning of value labels to the dots and lines.
```{r}
plot_model(m1, show.values = TRUE, value.offset = .3)
```
## Labelling the plot
As seen in the above examples, by default, the plotting-functions of **sjPlot** retrieve value and variable labels if the data is _labelled_, using the [sjlabelled-package](https://cran.r-project.org/package=sjlabelled). If the data is not labelled, the variable names are used. In such cases, use the arguments `title`, `axis.labels` and `axis.title` to annotate the plot title and axes. If you want variable names instead of labels, even for labelled data, use `""` as argument-value, e.g. `axis.labels = ""`, or set `auto.label` to `FALSE`.
Furthermore, `plot_model()` applies case-conversion to all labels by default, using the [snakecase-package](https://cran.r-project.org/package=snakecase). This converts labels into human-readable versions. Use `case = NULL` to turn case-conversion off, or refer to the package-vignette of the **snakecase**-package for further options.
```{r}
data(iris)
m2 <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris)
# variable names as labels, but made "human readable"
# separating dots are removed
plot_model(m2)
# to use variable names even for labelled data
plot_model(m1, axis.labels = "", title = "my own title")
```
## Pick or remove specific terms from plot
Use `terms` resp. `rm.terms` to select specific terms that should (not) be plotted.
```{r}
# keep only coefficients sex2, dep2 and dep3
plot_model(m1, terms = c("sex2", "dep2", "dep3"))
# remove coefficients sex2, dep2 and dep3
plot_model(m1, rm.terms = c("sex2", "dep2", "dep3"))
```
## Standardized estimates
For linear models, you can also plot standardized beta coefficients, using `type = "std"` or `type = "std2"`. These two options differ in the way how coefficients are standardized. `type = "std2"` plots standardized beta values, however, standardization follows Gelman's (2008) suggestion, rescaling the estimates by dividing them by two standard deviations instead of just one.
```{r}
plot_model(m2, type = "std")
```
## Bayesian models (fitted with Stan)
`plot_model()` also supports stan-models fitted with the **rstanarm** or **brms** packages. However, there are a few differences compared to the previous plot examples.
First, of course, there are no _confidence intervals_, but _uncertainty intervals_ - high density intervals, to be precise.
Second, there's not just one interval range, but an _inner_ and _outer_ probability. By default, the inner probability is fixed to `.5` (50%), while the outer probability is specified via `ci.lvl` (which defaults to `.89` (89%) for Bayesian models). However, you can also use the arguments `prob.inner` and `prob.outer` to define the intervals boundaries.
Third, the point estimate is by default the _median_, but can also be another value, like mean. This can be specified with the `bpe`-argument.
```{r results='hide'}
if (require("rstanarm", quietly = TRUE)) {
# make sure we apply a nice theme
library(ggplot2)
theme_set(theme_sjplot())
data(mtcars)
m <- stan_glm(mpg ~ wt + am + cyl + gear, data = mtcars, chains = 1)
# default model
plot_model(m)
# same model, with mean point estimate, dot-style for point estimate
# and different inner/outer probabilities of the HDI
plot_model(
m,
bpe = "mean",
bpe.style = "dot",
prob.inner = .4,
prob.outer = .8
)
}
```
## Tweaking plot appearance
There are several options to customize the plot appearance:
* The `colors`-argument either takes the name of a valid [colorbrewer palette](https://colorbrewer2.org/) (see also the related [vignette](custplot.html)), `"bw"` or `"gs"` for black/white or greyscaled colors, or a string with a color name.
* `value.offset` and `value.size` adjust the positioning and size of value labels, if shown.
* `dot.size` and `line.size` change the size of dots and error bars.
* `vline.color` changes the neutral "intercept" line.
* `width`, `alpha` and `scale` are passed down to certain ggplot-geoms, like `geom_errorbar()` or `geom_density_ridges()`.
```{r}
plot_model(
m1,
colors = "Accent",
show.values = TRUE,
value.offset = .4,
value.size = 4,
dot.size = 3,
line.size = 1.5,
vline.color = "blue",
width = 1.5
)
```
# References
Gelman A (2008) _Scaling regression inputs by dividing by two standard deviations._ Statistics in Medicine 27: 2865–2873.
|