File: convert_r_d_OR.Rmd

package info (click to toggle)
r-cran-effectsize 0.8.3%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,404 kB
  • sloc: sh: 17; makefile: 2
file content (155 lines) | stat: -rw-r--r-- 4,611 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Converting Between r, d, and Odds Ratios"
output: 
  rmarkdown::html_vignette:
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, effect size, rules of thumb, guidelines, conversion]
vignette: >
  \usepackage[utf8]{inputenc}
  %\VignetteIndexEntry{Converting Between r, d, and Odds Ratios}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: bibliography.bib
---

```{r message=FALSE, warning=FALSE, include=FALSE}
library(knitr)
options(knitr.kable.NA = "")
knitr::opts_chunk$set(comment = ">")
options(digits = 3)

.eval_if_requireNamespace <- function(...) {
  pkgs <- c(...)
  knitr::opts_chunk$get("eval") && all(sapply(pkgs, requireNamespace, quietly = TRUE))
}
```

The `effectsize` package contains function to convert among indices of effect
size. This can be useful for meta-analyses, or any comparison between different
types of statistical analyses.

# Converting Between *d* and *r*

The most basic conversion is between *r* values, a measure of standardized
association between two continuous measures, and *d* values (such as Cohen's
*d*), a measure of standardized differences between two groups / conditions.

Let's looks at some (simulated) data:

```{r}
library(effectsize)
data("hardlyworking")
head(hardlyworking)
```

We can compute Cohen's *d* between the two groups:

```{r}
cohens_d(salary ~ is_senior, data = hardlyworking)
```

But we can also compute a point-biserial correlation, which is Pearson's *r* when treating the 2-level `is_senior` variable as a numeric binary variable:

```{r, warning=FALSE, eval=.eval_if_requireNamespace("correlation")}
correlation::cor_test(hardlyworking, "salary", "is_senior")
```

But what if we only have summary statistics?
Say, we only have $d=-0.72$ and we want to know what the *r* would
have been? We can approximate *r* using the following formula
[@borenstein2009converting]:

$$
r \approx \frac{d}{\sqrt{d^2 + 4}}
$$
And indeed, if we use `d_to_r()`, we get a pretty decent approximation:

```{r}
d_to_r(-0.72)
```

(Which also works in the other way, with `r_to_d(0.12)` gives 
`r round(r_to_d(0.34),3)`)

As we can see, these are rough approximations, but they can be useful when we
don't have the raw data on hand.

## In multiple regression

Although not exactly a classic Cohen's d, we can also approximate a partial-*d*
value (that is, the standardized difference between two groups / conditions,
with variance from other predictors partilled out). For example:

```{r}
fit <- lm(salary ~ is_senior + xtra_hours, data = hardlyworking)

parameters::model_parameters(fit)

# A couple of ways to get partial-d:
1683.65 / sigma(fit)
t_to_d(5.31, df_error = 497)[[1]]
```

We can convert these semi-*d* values to *r* values, but in this case these
represent the *partial* correlation:

```{r, eval=.eval_if_requireNamespace("correlation")}
t_to_r(5.31, df_error = 497)

correlation::correlation(hardlyworking[, c("salary", "xtra_hours", "is_senior")],
  include_factors = TRUE,
  partial = TRUE
)[2, ]

# all close to:
d_to_r(0.47)
```

# Converting Between *OR* and *d*

In binomial regression (more specifically in logistic regression), Odds ratios
(OR) are themselves measures of effect size; they indicate the expected change
in the odds of a some event.

In some fields, it is common to dichotomize outcomes in order to be able to
analyze them with logistic models. For example, if the outcome is the count of
white blood cells, it can be more useful (medically) to predict the crossing of
the threshold rather than the raw count itself. And so, where some scientists
would maybe analyze the above data with a *t*-test and present Cohen's *d*,
others might analyze it with a logistic regression model on the dichotomized
outcome, and present OR. So the question can be asked: given such a OR, what
would Cohen's *d* have been?

Fortunately, there is a formula to approximate this [@sanchez2003effect]:

$$
d = log(OR) \times \frac{\sqrt{3}}{\pi}
$$

which is implemented in the `oddsratio_to_d()` function.

Let's give it a try:

```{r}
# 1. Set a threshold
thresh <- 22500

# 2. dichotomize the outcome
hardlyworking$salary_high <- hardlyworking$salary < thresh

# 3. Fit a logistic regression:
fit <- glm(salary_high ~ is_senior,
  data = hardlyworking,
  family = binomial()
)

parameters::model_parameters(fit)

# Convert log(OR) (the coefficient) to d
oddsratio_to_d(-1.22, log = TRUE)
```

# References