1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312
|
---
title: "Summary of Regression Models as HTML Table"
author: "Daniel Lüdecke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Summary of Regression Models as HTML Table}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE)
if (!requireNamespace("sjlabelled", quietly = TRUE) ||
!requireNamespace("sjmisc", quietly = TRUE) ||
!requireNamespace("lme4", quietly = TRUE) ||
!requireNamespace("pscl", quietly = TRUE) ||
!requireNamespace("glmmTMB", quietly = TRUE)) {
knitr::opts_chunk$set(eval = FALSE)
} else {
knitr::opts_chunk$set(eval = TRUE)
library(sjPlot)
}
```
`tab_model()` is the pendant to `plot_model()`, however, instead of creating plots, `tab_model()` creates HTML-tables that will be displayed either in your IDE's viewer-pane, in a web browser or in a knitr-markdown-document (like this vignette).
HTML is the only output-format, you can't (directly) create a LaTex or PDF output from `tab_model()` and related table-functions. However, it is possible to easily export the tables into Microsoft Word or Libre Office Writer.
This vignette shows how to create table from regression models with `tab_model()`. There's a dedicated vignette that demonstrate how to change the [table layout and appearance with CSS](table_css.html).
**Note!** Due to the custom CSS, the layout of the table inside a knitr-document differs from the output in the viewer-pane and web browser!
```{r}
# load package
library(sjPlot)
library(sjmisc)
library(sjlabelled)
# sample data
data("efc")
efc <- as_factor(efc, c161sex, c172code)
```
## A simple HTML table from regression results
First, we fit two linear models to demonstrate the `tab_model()`-function.
```{r, results='hide'}
m1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data = efc)
m2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + e17age, data = efc)
```
The simplest way of producing the table output is by passing the fitted model as parameter. By default, estimates, confidence intervals (_CI_) and p-values (_p_) are reported. As summary, the numbers of observations as well as the R-squared values are shown.
```{r}
tab_model(m1)
```
## Automatic labelling
As the **sjPlot**-packages features [labelled data](https://strengejacke.github.io/sjlabelled/), the coefficients in the table are already labelled in this example. The name of the dependent variable(s) is used as main column header for each model. For non-labelled data, the coefficient names are shown.
```{r}
data(mtcars)
m.mtcars <- lm(mpg ~ cyl + hp + wt, data = mtcars)
tab_model(m.mtcars)
```
If factors are involved and `auto.label = TRUE`, "pretty" parameters names are used (see [`format_parameters()`](https://easystats.github.io/parameters/reference/format_parameters.html).
```{r}
set.seed(2)
dat <- data.frame(
y = runif(100, 0, 100),
drug = as.factor(sample(c("nonsense", "useful", "placebo"), 100, TRUE)),
group = as.factor(sample(c("control", "treatment"), 100, TRUE))
)
pretty_names <- lm(y ~ drug * group, data = dat)
tab_model(pretty_names)
```
### Turn off automatic labelling
To turn off automatic labelling, use `auto.label = FALSE`, or provide an empty character vector for `pred.labels` and `dv.labels`.
```{r}
tab_model(m1, auto.label = FALSE)
```
Same for models with non-labelled data and factors.
```{r}
tab_model(pretty_names, auto.label = FALSE)
```
## More than one model
`tab_model()` can print multiple models at once, which are then printed side-by-side. Identical coefficients are matched in a row.
```{r}
tab_model(m1, m2)
```
## Generalized linear models
For generalized linear models, the ouput is slightly adapted. Instead of _Estimates_, the column is named _Odds Ratios_, _Incidence Rate Ratios_ etc., depending on the model. The coefficients are in this case automatically converted (exponentiated). Furthermore, pseudo R-squared statistics are shown in the summary.
```{r}
m3 <- glm(
tot_sc_e ~ c160age + c12hour + c161sex + c172code,
data = efc,
family = poisson(link = "log")
)
efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m4 <- glm(
neg_c_7d ~ c161sex + barthtot + c172code,
data = efc,
family = binomial(link = "logit")
)
tab_model(m3, m4)
```
### Untransformed estimates on the linear scale
To plot the estimates on the linear scale, use `transform = NULL`.
```{r}
tab_model(m3, m4, transform = NULL, auto.label = FALSE)
```
## More complex models
Other models, like hurdle- or zero-inflated models, also work with `tab_model()`. In this case, the zero inflation model is indicated in the table. Use `show.zeroinf = FALSE` to hide this part from the table.
```{r}
library(pscl)
data("bioChemists")
m5 <- zeroinfl(art ~ fem + mar + kid5 + ment | kid5 + phd + ment, data = bioChemists)
tab_model(m5)
```
You can combine any model in one table.
```{r}
tab_model(m1, m3, m5, auto.label = FALSE, show.ci = FALSE)
```
## Show or hide further columns
`tab_model()` has some argument that allow to show or hide specific columns from the output:
* `show.est` to show/hide the column with model estimates.
* `show.ci` to show/hide the column with confidence intervals.
* `show.se` to show/hide the column with standard errors.
* `show.std` to show/hide the column with standardized estimates (and their standard errors).
* `show.p` to show/hide the column with p-values.
* `show.stat` to show/hide the column with the coefficients' test statistics.
* `show.df` for linear mixed models, when p-values are based on degrees of freedom with Kenward-Rogers approximation, these degrees of freedom are shown.
### Adding columns
In the following example, standard errors, standardized coefficients and test statistics are also shown.
```{r}
tab_model(m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE)
```
### Removing columns
In the following example, default columns are removed.
```{r}
tab_model(m3, m4, show.ci = FALSE, show.p = FALSE, auto.label = FALSE)
```
### Removing and sorting columns
Another way to remove columns, which also allows to reorder the columns, is the `col.order`-argument. This is a character vector, where each element indicates a column in the output. The value `"est"`, for instance, indicates the estimates, while `"std.est"` is the column for standardized estimates and so on.
By default, `col.order` contains all possible columns. All columns that should shown (see previous tables, for example using `show.se = TRUE` to show standard errors, or `show.st = TRUE` to show standardized estimates) are then printed by default. Colums that are _excluded_ from `col.order` are _not shown_, no matter if the `show*`-arguments are `TRUE` or `FALSE`. So if `show.se = TRUE`, but`col.order` does not contain the element `"se"`, standard errors are not shown. On the other hand, if `show.est = FALSE`, but `col.order` _does include_ the element `"est"`, the columns with estimates are not shown.
In summary, `col.order` can be used to _exclude_ columns from the table and to change the order of colums.
```{r}
tab_model(
m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE,
col.order = c("p", "stat", "est", "std.se", "se", "std.est")
)
```
### Collapsing columns
With `collapse.ci` and `collapse.se`, the columns for confidence intervals and standard errors can be collapsed into one column together with the estimates. Sometimes this table layout is required.
```{r}
tab_model(m1, collapse.ci = TRUE)
```
## Defining own labels
There are different options to change the labels of the column headers or coefficients, e.g. with:
* `pred.labels` to change the names of the coefficients in the _Predictors_ column. Note that the length of `pred.labels` must exactly match the amount of predictors in the _Predictor_ column.
* `dv.labels` to change the names of the model columns, which are labelled with the variable labels / names from the dependent variables.
* Further more, there are various `string.*`-arguments, to change the name of column headings.
```{r}
tab_model(
m1, m2,
pred.labels = c("Intercept", "Age (Carer)", "Hours per Week", "Gender (Carer)",
"Education: middle (Carer)", "Education: high (Carer)",
"Age (Older Person)"),
dv.labels = c("First Model", "M2"),
string.pred = "Coeffcient",
string.ci = "Conf. Int (95%)",
string.p = "P-Value"
)
```
## Including reference level of categorical predictors
By default, for categorical predictors, the variable names and the categories for regression coefficients are shown in the table output.
```{r}
library(glmmTMB)
data("Salamanders")
model <- glm(
count ~ spp + Wtemp + mined + cover,
family = poisson(),
data = Salamanders
)
tab_model(model)
```
You can include the reference level for categorical predictors by setting `show.reflvl = TRUE`.
```{r}
tab_model(model, show.reflvl = TRUE)
```
To show variable names, categories and include the reference level, also set `prefix.labels = "varname"`.
```{r}
tab_model(model, show.reflvl = TRUE, prefix.labels = "varname")
```
## Style of p-values
You can change the style of how p-values are displayed with the argument `p.style`. With `p.style = "stars"`, the p-values are indicated as `*` in the table.
```{r}
tab_model(m1, m2, p.style = "stars")
```
Another option would be scientific notation, using `p.style = "scientific"`, which also can be combined with `digits.p`.
```{r}
tab_model(m1, m2, p.style = "scientific", digits.p = 2)
```
### Automatic matching for named vectors
Another way to easily assign labels are _named vectors_. In this case, it doesn't matter if `pred.labels` has more labels than coefficients in the model(s), or in which order the labels are passed to `tab_model()`. The only requirement is that the labels' names equal the coefficients names as they appear in the `summary()`-output.
```{r}
# example, coefficients are "c161sex2" or "c172code3"
summary(m1)
pl <- c(
`(Intercept)` = "Intercept",
e17age = "Age (Older Person)",
c160age = "Age (Carer)",
c12hour = "Hours per Week",
barthtot = "Barthel-Index",
c161sex2 = "Gender (Carer)",
c172code2 = "Education: middle (Carer)",
c172code3 = "Education: high (Carer)",
a_non_used_label = "We don't care"
)
tab_model(
m1, m2, m3, m4,
pred.labels = pl,
dv.labels = c("Model1", "Model2", "Model3", "Model4"),
show.ci = FALSE,
show.p = FALSE,
transform = NULL
)
```
## Keep or remove coefficients from the table
Using the `terms`- or `rm.terms`-argument allows us to explicitly show or remove specific coefficients from the table output.
```{r}
tab_model(m1, terms = c("c160age", "c12hour"))
```
Note that the names of terms to keep or remove should match the coefficients names. For categorical predictors, one example would be:
```{r}
tab_model(m1, rm.terms = c("c172code2", "c161sex2"))
```
|