File: from_test_statistics.Rmd

package info (click to toggle)
r-cran-effectsize 0.8.3%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bookworm
size: 1,404 kB
sloc: sh: 17; makefile: 2
file content (269 lines) | stat: -rw-r--r-- 7,678 bytes
parent folder | download | duplicates (2)
---
title: "Effect Size from Test Statistics"
output: 
  rmarkdown::html_vignette:
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, effect size, standardization, cohen d]
vignette: >
  %\VignetteIndexEntry{Effect Size from Test Statistics}
  \usepackage[utf8]{inputenc}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: bibliography.bib
---

```{r message=FALSE, warning=FALSE, include=FALSE}
library(knitr)
library(effectsize)

knitr::opts_chunk$set(comment = ">")
options(digits = 2)
options(knitr.kable.NA = "")

set.seed(747)

.eval_if_requireNamespace <- function(...) {
  pkgs <- c(...)
  knitr::opts_chunk$get("eval") && all(sapply(pkgs, requireNamespace, quietly = TRUE))
}
```

# Introduction

In many real world applications there are no straightforward ways of obtaining
standardized effect sizes. However, it is possible to get approximations of most
of the effect size indices ($d$, $r$, $\eta^2_p$...) with the use of test
statistics. These conversions are based on the idea that **test statistics are a
function of effect size and sample size**. Thus information about samples size
(or more often of degrees of freedom) is used to reverse-engineer indices of
effect size from test statistics. This idea and these functions also power our
[***Effect Sizes From Test Statistics*** *shiny app*](https://easystats4u.shinyapps.io/statistic2effectsize/).

The measures discussed here are, in one way or another, ***signal to noise
ratios***, with the "noise" representing the unaccounted variance in the outcome
variable^[Note that for generalized linear models (Poisson, Logistic...), where
the outcome is never on an arbitrary scale, estimates themselves **are** indices
of effect size! Thus this vignette is relevant only to general linear models.].

The indices are:

- Percent variance explained ($\eta^2_p$, $\omega^2_p$, $\epsilon^2_p$).

- Measure of association ($r$).

- Measure of difference ($d$).

## (Partial) Percent Variance Explained

These measures represent the ratio of $Signal^2 / (Signal^2 + Noise^2)$, with
the "noise" having all other "signals" partial-ed out (be they of other fixed or
random effects). The most popular of these indices is $\eta^2_p$ (Eta; which is
equivalent to $R^2$).

The conversion of the $F$- or $t$-statistic is based on
@friedman1982simplified.

Let's look at an example:

```{r, eval = .eval_if_requireNamespace("afex"), message=FALSE}
library(afex)

data(md_12.1)

aov_fit <- aov_car(rt ~ angle * noise + Error(id / (angle * noise)),
  data = md_12.1,
  anova_table = list(correction = "none", es = "pes")
)
aov_fit
```

Let's compare the $\eta^2_p$ (the `pes` column) obtained here with ones
recovered from `F_to_eta2()`:

```{r}
library(effectsize)
options(es.use_symbols = TRUE) # get nice symbols when printing! (On Windows, requires R >= 4.2.0)

F_to_eta2(
  f = c(40.72, 33.77, 45.31),
  df = c(2, 1, 2),
  df_error = c(18, 9, 18)
)
```

**They are identical!**^[Note that these are *partial* percent variance
explained, and so their sum can be larger than 1.] (except for the fact that
`F_to_eta2()` also provides confidence intervals^[Confidence intervals for all
indices are estimated using the non-centrality parameter method; These methods
search for a the best non-central parameter of the non-central $F$/$t$
distribution for the desired tail-probabilities, and then convert these ncps to
the corresponding effect sizes.] :)

In this case we were able to easily obtain the effect size (thanks to `afex`!),
but in other cases it might not be as easy, and using estimates based on test
statistic offers a good approximation.

For example:

### In Simple Effect and Contrast Analysis

```{r, eval = .eval_if_requireNamespace("afex", "emmeans")}
library(emmeans)

joint_tests(aov_fit, by = "noise")

F_to_eta2(
  f = c(5, 79),
  df = 2,
  df_error = 29
)
```

We can also use `t_to_eta2()` for contrast analysis:

```{r, eval = .eval_if_requireNamespace("afex", "emmeans")}
pairs(emmeans(aov_fit, ~angle))

t_to_eta2(
  t = c(-6.2, -8.2, -3.3),
  df_error = 9
)
```

### In Linear Mixed Models

```{r, eval = .eval_if_requireNamespace("lmerTest"), message=FALSE}
library(lmerTest)

fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

anova(fit_lmm)

F_to_eta2(45.8, 1, 17)
```

We can also use `t_to_eta2()` for the slope of `Days` (which in this case gives
the same result).

```{r, eval = .eval_if_requireNamespace("lmerTest")}
parameters::model_parameters(fit_lmm, effects = "fixed", ci_method = "satterthwaite")

t_to_eta2(6.77, df_error = 17)
```

### Bias-Corrected Indices

Alongside $\eta^2_p$ there are also the less biased $\omega_p^2$ (Omega) and
$\epsilon^2_p$ (Epsilon; sometimes called $\text{Adj. }\eta^2_p$, which is
equivalent to $R^2_{adj}$; @albers2018power, @mordkoff2019simple).

```{r}
F_to_eta2(45.8, 1, 17)
F_to_epsilon2(45.8, 1, 17)
F_to_omega2(45.8, 1, 17)
```

## Measure of Association

Similar to $\eta^2_p$, $r$ is a signal to noise ratio, and is in fact equal to
$\sqrt{\eta^2_p}$ (so it's really a *partial* $r$). It is often used instead of
$\eta^2_p$ when discussing the *strength* of association (but I suspect people
use it instead of $\eta^2_p$ because it gives a bigger number, which looks
better).

### For Slopes

```{r, eval = .eval_if_requireNamespace("lmerTest")}
parameters::model_parameters(fit_lmm, effects = "fixed", ci_method = "satterthwaite")

t_to_r(6.77, df_error = 17)
```

In a fixed-effect linear model, this returns the **partial** correlation.
Compare:

```{r}
fit_lm <- lm(rating ~ complaints + critical, data = attitude)

parameters::model_parameters(fit_lm)

t_to_r(
  t = c(7.46, 0.01),
  df_error = 27
)
```

to:

```{r, eval=.eval_if_requireNamespace("correlation")}
correlation::correlation(attitude,
  select = "rating",
  select2 = c("complaints", "critical"),
  partial = TRUE
)
```

### In Contrast Analysis

This measure is also sometimes used in contrast analysis, where it is called the
point bi-serial correlation - $r_{pb}$ [@cohen1965some; @rosnow2000contrasts]:

```{r, eval = .eval_if_requireNamespace("afex", "emmeans")}
pairs(emmeans(aov_fit, ~angle))

t_to_r(
  t = c(-5.7, -8.9, -3.2),
  df_error = 18
)
```

## Measures of Difference

These indices represent $Signal/Noise$ with the "signal" representing the
difference between two means. This is akin to Cohen's $d$, and is a close
approximation when comparing two groups of equal size [@wolf1986meta;
@rosnow2000contrasts].

These can be useful in contrast analyses.

### Between-Subject Contrasts

```{r, eval = .eval_if_requireNamespace("emmeans")}
m <- lm(breaks ~ tension, data = warpbreaks)

em_tension <- emmeans(m, ~tension)
pairs(em_tension)

t_to_d(
  t = c(2.53, 3.72, 1.20),
  df_error = 51
)
```

However, these are merely approximations of a *true* Cohen's *d*. It is advised
to directly estimate Cohen's *d*, whenever possible. For example, here with
`emmeans::eff_size()`:

```{r, eval = .eval_if_requireNamespace("emmeans")}
eff_size(em_tension, sigma = sigma(m), edf = df.residual(m))
```

### Within-Subject Contrasts

```{r, eval = .eval_if_requireNamespace("afex", "emmeans")}
pairs(emmeans(aov_fit, ~angle))

t_to_d(
  t = c(-5.7, -5.9, -3.2),
  df_error = 18,
  paired = TRUE
)
```

(Note set `paired = TRUE` to not over estimate the size of the effect;
@rosenthal1991meta; @rosnow2000contrasts)

# References