1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379
|
---
title: "Effect Sizes for ANOVAs"
output:
rmarkdown::html_vignette:
toc: true
fig_width: 10.08
fig_height: 6
tags: [r, effect size, ANOVA]
vignette: >
\usepackage[utf8]{inputenc}
%\VignetteIndexEntry{Effect Sizes for ANOVAs}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
bibliography: bibliography.bib
---
```{r message=FALSE, warning=FALSE, include=FALSE}
library(knitr)
options(knitr.kable.NA = "")
options(digits = 2)
knitr::opts_chunk$set(comment = ">", warning = FALSE)
set.seed(1)
.eval_if_requireNamespace <- function(...) {
pkgs <- c(...)
knitr::opts_chunk$get("eval") && all(sapply(pkgs, requireNamespace, quietly = TRUE))
}
knitr::opts_chunk$set(eval = .eval_if_requireNamespace("effectsize", "afex"))
```
## Eta<sup>2</sup>
In the context of ANOVA-like tests, it is common to report ANOVA-like effect sizes. These effect sizes represent the amount of variance explained by each of the model's terms, where each term can be represented by 1 *or more* parameters.
For example, in the following case, the parameters for the `treatment` term represent specific contrasts between the factor's levels (treatment groups) - the difference between each level and the reference level (`obk.long == 'control'`).
```{r}
data(obk.long, package = "afex")
# modify the data slightly for the demonstration:
obk.long <- obk.long[1:240 %% 3 == 0, ]
obk.long$id <- seq_len(nrow(obk.long))
m <- lm(value ~ treatment, data = obk.long)
parameters::model_parameters(m)
```
But we can also ask about the overall effect of `treatment` - how much of the
variation in our dependent variable `value` can be predicted by (or explained
by) the variation between the `treatment` groups. Such a question can be
answered with an ANOVA test:
```{r}
parameters::model_parameters(anova(m))
```
As we can see, the variance in `value` (the *sums-of-squares*, or *SS*) has been
split into pieces:
- The part associated with `treatment`.
- The unexplained part (The Residual-*SS*).
We can now ask what is the percent of the total variance in `value` that is
associated with `treatment`. This measure is called Eta-squared (written as
$\eta^2$):
$$
\eta^2 = \frac{SS_{effect}}{SS_{total}} = \frac{72.23}{72.23 + 250.96} = 0.22
$$
and can be accessed via the `eta_squared()` function:
```{r}
library(effectsize)
options(es.use_symbols = TRUE) # get nice symbols when printing! (On Windows, requires R >= 4.2.0)
eta_squared(m, partial = FALSE)
```
### Adding More Terms
When we add more terms to our model, we can ask two different questions about
the percent of variance explained by a predictor - how much variance is
accounted by the predictor in *total*, and how much is accounted when
*controlling* for any other predictors. The latter questions is answered by the
*partial*-Eta squared ($\eta^2_p$), which is the percent of the **partial**
variance (after accounting for other predictors in the model) associated with a
term:
$$
\eta^2_p = \frac{SS_{effect}}{SS_{effect} + SS_{error}}
$$
which can also be accessed via the `eta_squared()` function:
```{r}
m <- lm(value ~ gender + phase + treatment, data = obk.long)
eta_squared(m, partial = FALSE)
eta_squared(m) # partial = TRUE by default
```
*(`phase` is a repeated-measures variable, but for simplicity it is not modeled as such.)*
In the calculation above, the *SS*s were computed sequentially - that is the
*SS* for `phase` is computed after controlling for `gender`, and the *SS* for
`treatment` is computed after controlling for both `gender` and `phase`. This
method of sequential *SS* is called also *type-I* test. If this is what you
want, that's great - however in many fields (and other statistical programs) it
is common to use "simultaneous" sums of squares (*type-II* or *type-III* tests),
where each *SS* is computed controlling for all other predictors, regardless of
order. This can be done with `car::Anova(type = ...)`:
```{r, eval=.eval_if_requireNamespace("car")}
eta_squared(car::Anova(m, type = 2), partial = FALSE)
eta_squared(car::Anova(m, type = 3)) # partial = TRUE by default
```
$\eta^2_p$ will always be larger than $\eta^2$. The idea is to simulate the
effect size in a design where only the term of interest was manipulated. This
terminology assumes some causal relationship between the predictor and the
outcome, which reflects the experimental world from which these analyses and
measures hail; However, $\eta^2_p$ can also simply be seen as a
**signal-to-noise- ratio**, as it only uses the term's *SS* and the error-term's
*SS*.[^in repeated-measure designs the term-specific residual-*SS* is used for
the computation of the effect size].
(Note that in a one-way fixed-effect designs $\eta^2 = \eta^2_p$.)
### Adding Interactions
Type II and type III treat interaction differently.
Without going into the weeds here, keep in mind that **when using type III SS, it is important to center all of the predictors**;
for numeric variables this can be done by mean-centering the predictors;
for factors this can be done by using orthogonal coding (such as `contr.sum` for *effects-coding*) for the dummy variables (and *NOT* treatment coding, which is the default in R).
This unfortunately makes parameter interpretation harder, but *only* when this is does do the *SS*s associated with each lower-order term (or lower-order interaction) represent the ***SS*** of the **main effect** (with treatment coding they represent the *SS* of the simple effects).
```{r, eval=.eval_if_requireNamespace("car")}
# compare
m_interaction1 <- lm(value ~ treatment * gender, data = obk.long)
# to:
m_interaction2 <- lm(
value ~ treatment * gender,
data = obk.long,
contrasts = list(
treatment = "contr.sum",
gender = "contr.sum"
)
)
eta_squared(car::Anova(m_interaction1, type = 3))
eta_squared(car::Anova(m_interaction2, type = 3))
```
If all of this type-III-effects-coding seems like a hassle, you can use the `afex` package, which takes care of all of this behind the scenes:
```{r}
library(afex)
m_afex <- aov_car(value ~ treatment * gender + Error(id), data = obk.long)
eta_squared(m_afex)
```
## Other Measures of Effect Size
### Unbiased Effect Sizes
These effect sizes are unbiased estimators of the population's $\eta^2$:
- **Omega Squared** ($\omega^2$)
- **Epsilon Squared** ($\epsilon^2$), also referred to as *Adjusted Eta Squared*.
```{r}
omega_squared(m_afex)
epsilon_squared(m_afex)
```
Both $\omega^2$ and $\epsilon^2$ (and their partial counterparts, $\omega^2_p$ &
$\epsilon^2_p$) are unbiased estimators of the population's $\eta^2$ (or
$\eta^2_p$, respectively), which is especially important is small samples.
Though $\omega^2$ is the more popular choice [@albers2018power], $\epsilon^2$ is
analogous to adjusted-$R^2$ [@allen2017statistics, p. 382], and has been found
to be less biased [@carroll1975sampling].
### Generalized Eta<sup>2</sup>
*Partial* Eta squared aims at estimating the effect size in a design where only
the term of interest was manipulated, assuming all other terms are have also
manipulated. However, not all predictors are always manipulated - some can only
be observed. For such cases, we can use *generalized* Eta squared ($\eta^2_G$),
which like $\eta^2_p$ estimating the effect size in a design where only the term
of interest was manipulated, accounting for the fact that some terms cannot be
manipulated (and so their variance would be present in such a design).
```{r}
eta_squared(m_afex, generalized = "gender")
```
$\eta^2_G$ is useful in repeated-measures designs, as it can estimate what a
*within-subject* effect size would have been had that predictor been manipulated
*between-subjects* [@olejnik2003generalized].
### Cohen's *f*
Finally, we have the forgotten child - Cohen's $f$. Cohen's $f$ is a
transformation of $\eta^2_p$, and is the ratio between the term-*SS* and the
error-*SS*.
$$\text{Cohen's} f_p = \sqrt{\frac{\eta^2_p}{1-\eta^2_p}} = \sqrt{\frac{SS_{effect}}{SS_{error}}}$$
It can take on values between zero, when the population means are all equal, and
an indefinitely large number as the means are further and further apart. It is
analogous to Cohen's $d$ when there are only two groups.
```{r}
cohens_f(m_afex)
```
## When Sum-of-Squares are Hard to Come By
Until now we've discusses effect sizes in fixed-effect linear model and
repeated-measures ANOVA's - cases where the *SS*s are readily available, and so
the various effect sized presented can easily be estimated. How ever this is not
always the case.
For example, in linear mixed models (LMM/HLM/MLM), the estimation of all required *SS*s is not straightforward. However, we can still *approximate* these effect sizes (only their partial versions) based on the **test-statistic approximation method** (learn more in the [*Effect Size from Test Statistics* vignette](https://easystats.github.io/effectsize/articles/from_test_statistics.html)).
```{r, eval=.eval_if_requireNamespace("lmerTest", "lme4")}
library(lmerTest)
fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
anova(fit_lmm) # note the type-3 errors
F_to_eta2(45.8, df = 1, df_error = 17)
```
Or directly with `eta_squared() and co.:
```{r, eval=.eval_if_requireNamespace("lmerTest", "lme4")}
eta_squared(fit_lmm)
epsilon_squared(fit_lmm)
omega_squared(fit_lmm)
```
Another case where *SS*s are not available is when using Bayesian models...
## For Bayesian Models
An alternative route to obtaining effect sizes of explained variance, is via the
use of the ***posterior predictive distribution*** (*PPD*). The PPD is the
Bayesian expected distribution of possible unobserved values. Thus, after
observing some data, we can estimate not just the expected mean values (the
conditional marginal means), but also the full *distribution* of data around
these values [@gelman2014bayesian, chapter 7].
By sampling from the PPD, we can decompose the sample to the various *SS*s
needed for the computation of explained variance measures. By repeatedly
sampling from the PPD, we can generate a posterior distribution of explained
variance estimates. But note that **these estimates are conditioned not only on
the location-parameters of the model, but also on the scale-parameters of the
model!** So it is vital to [validate the
PPD](https://mc-stan.org/docs/2_23/stan-users-guide/meta-models-part.html#meta-models.part/)
before using it to estimate explained variance measures.
Let's fit our model:
```{r, eval = .eval_if_requireNamespace("rstanarm", "bayestestR", "car")}
library(rstanarm)
m_bayes <- stan_glm(value ~ gender + phase + treatment,
data = obk.long, family = gaussian(),
refresh = 0
)
```
We can use `eta_squared_posterior()` to get the posterior distribution of
$eta^2$ or $eta^2_p$ for each effect. Like an ANOVA table, we must make sure to
use the right effects-coding and *SS*-type:
```{r, eval = .eval_if_requireNamespace("rstanarm", "bayestestR", "car")}
pes_posterior <- eta_squared_posterior(m_bayes,
draws = 500, # how many samples from the PPD?
partial = TRUE, # partial eta squared
# type 3 SS
ss_function = car::Anova, type = 3,
verbose = FALSE
)
head(pes_posterior)
bayestestR::describe_posterior(pes_posterior,
rope_range = c(0, 0.1), test = "rope"
)
```
Compare to:
```{r, eval = .eval_if_requireNamespace("rstanarm", "bayestestR", "car")}
m_ML <- lm(value ~ gender + phase + treatment, data = obk.long)
eta_squared(car::Anova(m_ML, type = 3))
```
# For Ordinal Outcomes
When our outcome is not a numeric variable, the effect sizes described above
cannot be used - measured based on sum-of-squares are ill suited for such
outcomes. Instead, we must use effect sizes for *ordinal* ANOVAs.
In `R`, there are two functions for running *ordinal* one way ANOVAs:
`kruskal.test()` for differences between independent groups, and
`friedman.test()` for differences between dependent groups.
For the one-way ordinal ANOVA, the Rank-Epsilon-Squared ($E^2_R$) and
Rank-Eta-Squared ($\eta^2_H$) are measures of association similar to their
non-rank counterparts: values range between 0 (no relative superiority between
any of the groups) to 1 (complete separation - with no overlap in ranks between
the groups).
```{r}
group_data <- list(
g1 = c(2.9, 3.0, 2.5, 2.6, 3.2), # normal subjects
g2 = c(3.8, 2.7, 4.0, 2.4), # with obstructive airway disease
g3 = c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
)
kruskal.test(group_data)
rank_epsilon_squared(group_data)
rank_eta_squared(group_data)
```
For an ordinal repeated measures one-way ANOVA, Kendall's *W* is a measure of
agreement on the effect of condition between various "blocks" (the subjects), or
more often conceptualized as a measure of reliability of the rating / scores of
observations (or "groups") between "raters" ("blocks").
```{r}
# Subjects are COLUMNS
(ReactionTimes <- matrix(
c(
398, 338, 520,
325, 388, 555,
393, 363, 561,
367, 433, 470,
286, 492, 536,
362, 475, 496,
253, 334, 610
),
nrow = 7, byrow = TRUE,
dimnames = list(
paste0("Subject", 1:7),
c("Congruent", "Neutral", "Incongruent")
)
))
friedman.test(ReactionTimes)
kendalls_w(ReactionTimes)
```
# References
|