File: tab_model_estimates.Rmd

package info (click to toggle)
r-cran-sjplot 2.8.17%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 1,596 kB
  • sloc: sh: 13; makefile: 2
file content (312 lines) | stat: -rw-r--r-- 11,319 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
---
title: "Summary of Regression Models as HTML Table"
author: "Daniel Lüdecke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Summary of Regression Models as HTML Table}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE)

if (!requireNamespace("sjlabelled", quietly = TRUE) ||
    !requireNamespace("sjmisc", quietly = TRUE) ||
    !requireNamespace("lme4", quietly = TRUE) ||
    !requireNamespace("pscl", quietly = TRUE) ||
    !requireNamespace("glmmTMB", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
} else {
  knitr::opts_chunk$set(eval = TRUE)
  library(sjPlot)
}

```

`tab_model()` is the pendant to `plot_model()`, however, instead of creating plots, `tab_model()` creates HTML-tables that will be displayed either in your IDE's viewer-pane, in a web browser or in a knitr-markdown-document (like this vignette).

HTML is the only output-format, you can't (directly) create a LaTex or PDF output from `tab_model()` and related table-functions. However, it is possible to easily export the tables into Microsoft Word or Libre Office Writer.

This vignette shows how to create table from regression models with `tab_model()`. There's a dedicated vignette that demonstrate how to change the [table layout and appearance with CSS](table_css.html).

**Note!** Due to the custom CSS, the layout of the table inside a knitr-document differs from the output in the viewer-pane and web browser!

```{r}
# load package
library(sjPlot)
library(sjmisc)
library(sjlabelled)

# sample data
data("efc")
efc <- as_factor(efc, c161sex, c172code)
```

## A simple HTML table from regression results

First, we fit two linear models to demonstrate the `tab_model()`-function.

```{r, results='hide'}
m1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data = efc)
m2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + e17age, data = efc)
``` 

The simplest way of producing the table output is by passing the fitted model as parameter. By default, estimates, confidence intervals (_CI_) and p-values (_p_) are reported. As summary, the numbers of observations as well as the R-squared values are shown.

```{r}
tab_model(m1)
```

## Automatic labelling

As the **sjPlot**-packages features [labelled data](https://strengejacke.github.io/sjlabelled/), the coefficients in the table are already labelled in this example. The name of the dependent variable(s) is used as main column header for each model. For non-labelled data, the coefficient names are shown.

```{r}
data(mtcars)
m.mtcars <- lm(mpg ~ cyl + hp + wt, data = mtcars)
tab_model(m.mtcars)
```

If factors are involved and `auto.label = TRUE`, "pretty" parameters names are used (see [`format_parameters()`](https://easystats.github.io/parameters/reference/format_parameters.html).

```{r}
set.seed(2)
dat <- data.frame(
  y = runif(100, 0, 100),
  drug = as.factor(sample(c("nonsense", "useful", "placebo"), 100, TRUE)),
  group = as.factor(sample(c("control", "treatment"), 100, TRUE))
)

pretty_names <- lm(y ~ drug * group, data = dat)
tab_model(pretty_names)
```

### Turn off automatic labelling

To turn off automatic labelling, use `auto.label = FALSE`, or provide an empty character vector for `pred.labels` and `dv.labels`.

```{r}
tab_model(m1, auto.label = FALSE)
```

Same for models with non-labelled data and factors.

```{r}
tab_model(pretty_names, auto.label = FALSE)
```

## More than one model

`tab_model()` can print multiple models at once, which are then printed side-by-side. Identical coefficients are matched in a row.

```{r}
tab_model(m1, m2)
```

## Generalized linear models

For generalized linear models, the ouput is slightly adapted. Instead of _Estimates_, the column is named _Odds Ratios_, _Incidence Rate Ratios_ etc., depending on the model. The coefficients are in this case automatically converted (exponentiated). Furthermore, pseudo R-squared statistics are shown in the summary.

```{r}
m3 <- glm(
  tot_sc_e ~ c160age + c12hour + c161sex + c172code, 
  data = efc,
  family = poisson(link = "log")
)

efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m4 <- glm(
  neg_c_7d ~ c161sex + barthtot + c172code,
  data = efc,
  family = binomial(link = "logit")
)

tab_model(m3, m4)
``` 

### Untransformed estimates on the linear scale

To plot the estimates on the linear scale, use `transform = NULL`. 

```{r}
tab_model(m3, m4, transform = NULL, auto.label = FALSE)
``` 

## More complex models

Other models, like hurdle- or zero-inflated models, also work with `tab_model()`. In this case, the zero inflation model is indicated in the table. Use `show.zeroinf = FALSE` to hide this part from the table.

```{r}
library(pscl)
data("bioChemists")
m5 <- zeroinfl(art ~ fem + mar + kid5 + ment | kid5 + phd + ment, data = bioChemists)

tab_model(m5)
```

You can combine any model in one table.

```{r}
tab_model(m1, m3, m5, auto.label = FALSE, show.ci = FALSE)
```

## Show or hide further columns

`tab_model()` has some argument that allow to show or hide specific columns from the output:

* `show.est` to show/hide the column with model estimates.
* `show.ci` to show/hide the column with confidence intervals.
* `show.se` to show/hide the column with standard errors.
* `show.std` to show/hide the column with standardized estimates (and their standard errors).
* `show.p` to show/hide the column with p-values.
* `show.stat` to show/hide the column with the coefficients' test statistics.
* `show.df` for linear mixed models, when p-values are based on degrees of freedom with Kenward-Rogers approximation, these degrees of freedom are shown.

### Adding columns

In the following example, standard errors, standardized coefficients and test statistics are also shown.

```{r}
tab_model(m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE)
``` 

### Removing columns

In the following example, default columns are removed.

```{r}
tab_model(m3, m4, show.ci = FALSE, show.p = FALSE, auto.label = FALSE)
``` 

### Removing and sorting columns

Another way to remove columns, which also allows to reorder the columns, is the `col.order`-argument. This is a character vector, where each element indicates a column in the output. The value `"est"`, for instance, indicates the estimates, while `"std.est"` is the column for standardized estimates and so on.

By default, `col.order` contains all possible columns. All columns that should shown (see previous tables, for example using `show.se = TRUE` to show standard errors, or `show.st = TRUE` to show standardized estimates) are then printed by default. Colums that are _excluded_ from `col.order` are _not shown_, no matter if the `show*`-arguments are `TRUE` or `FALSE`. So if `show.se = TRUE`, but`col.order` does not contain the element `"se"`, standard errors are not shown. On the other hand, if `show.est = FALSE`, but `col.order` _does include_ the element `"est"`, the columns with estimates are not shown.

In summary, `col.order` can be used to _exclude_ columns from the table and to change the order of colums.

```{r}
tab_model(
  m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE,
  col.order = c("p", "stat", "est", "std.se", "se", "std.est")
)
``` 

### Collapsing columns

With `collapse.ci` and `collapse.se`, the columns for confidence intervals and standard errors can be collapsed into one column together with the estimates. Sometimes this table layout is required.

```{r}
tab_model(m1, collapse.ci = TRUE)
``` 

## Defining own labels

There are different options to change the labels of the column headers or coefficients, e.g. with:

* `pred.labels` to change the names of the coefficients in the _Predictors_ column. Note that the length of `pred.labels` must exactly match the amount of predictors in the _Predictor_ column.
* `dv.labels` to change the names of the model columns, which are labelled with the variable labels / names from the dependent variables.
* Further more, there are various `string.*`-arguments, to change the name of column headings.

```{r}
tab_model(
  m1, m2, 
  pred.labels = c("Intercept", "Age (Carer)", "Hours per Week", "Gender (Carer)",
                  "Education: middle (Carer)", "Education: high (Carer)", 
                  "Age (Older Person)"),
  dv.labels = c("First Model", "M2"),
  string.pred = "Coeffcient",
  string.ci = "Conf. Int (95%)",
  string.p = "P-Value"
)
``` 

## Including reference level of categorical predictors

By default, for categorical predictors, the variable names and the categories for regression coefficients are shown in the table output. 

```{r}
library(glmmTMB)
data("Salamanders")
model <- glm(
  count ~ spp + Wtemp + mined + cover,
  family = poisson(),
  data = Salamanders
)

tab_model(model)
```

You can include the reference level for categorical predictors by setting `show.reflvl = TRUE`.

```{r}
tab_model(model, show.reflvl = TRUE)
```

To show variable names, categories and include the reference level, also set `prefix.labels = "varname"`.

```{r}
tab_model(model, show.reflvl = TRUE, prefix.labels = "varname")
```

## Style of p-values

You can change the style of how p-values are displayed with the argument `p.style`. With `p.style = "stars"`, the p-values are indicated as `*` in the table. 

```{r}
tab_model(m1, m2, p.style = "stars")
``` 

Another option would be scientific notation, using `p.style = "scientific"`, which also can be combined with `digits.p`.

```{r}
tab_model(m1, m2, p.style = "scientific", digits.p = 2)
``` 


### Automatic matching for named vectors

Another way to easily assign labels are _named vectors_. In this case, it doesn't matter if `pred.labels` has more labels than coefficients in the model(s), or in which order the labels are passed to `tab_model()`. The only requirement is that the labels' names equal the coefficients names as they appear in the `summary()`-output.

```{r}
# example, coefficients are "c161sex2" or "c172code3"
summary(m1)

pl <- c(
  `(Intercept)` = "Intercept",
  e17age = "Age (Older Person)",
  c160age = "Age (Carer)", 
  c12hour = "Hours per Week", 
  barthtot = "Barthel-Index",
  c161sex2 = "Gender (Carer)",
  c172code2 = "Education: middle (Carer)", 
  c172code3 = "Education: high (Carer)",
  a_non_used_label = "We don't care"
)
 
tab_model(
  m1, m2, m3, m4, 
  pred.labels = pl, 
  dv.labels = c("Model1", "Model2", "Model3", "Model4"),
  show.ci = FALSE, 
  show.p = FALSE, 
  transform = NULL
)
``` 

## Keep or remove coefficients from the table

Using the `terms`- or `rm.terms`-argument allows us to explicitly show or remove specific coefficients from the table output.

```{r}
tab_model(m1, terms = c("c160age", "c12hour"))
``` 

Note that the names of terms to keep or remove should match the coefficients names. For categorical predictors, one example would be:

```{r}
tab_model(m1, rm.terms = c("c172code2", "c161sex2"))
```