File: effectsize_API.Rmd

package info (click to toggle)
r-cran-effectsize 0.8.3%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bookworm
size: 1,404 kB
sloc: sh: 17; makefile: 2
file content (274 lines) | stat: -rw-r--r-- 9,246 bytes
parent folder | download | duplicates (4)
---
title: "Support Functions for Model Extensions"
output: 
  rmarkdown::html_vignette:
    toc: true
    fig_width: 10.08
    fig_height: 6
tags: [r, effect size, ANOVA, standardization, standardized coefficients]
vignette: >
  \usepackage[utf8]{inputenc}
  %\VignetteIndexEntry{Support Functions for Model Extensions}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: bibliography.bib
---

```{r message=FALSE, warning=FALSE, include=FALSE}
library(knitr)
knitr::opts_chunk$set(
  comment = ">",
  warning = FALSE,
  message = FALSE
)
options(digits = 2)
options(knitr.kable.NA = "")

set.seed(333)
```

```{r}
library(effectsize)
```

## Supporting ANOVA Effect Sizes

To add support for you model, create a new `.anova_es()` method function. This functions should generally do 3 things:

1. Build a data frame with all the required information.
2. Pass the data frame to one of the 3 functions.
3. Set some attributes to the output.

### Simple ANOVA tables

The input data frame must have these columns:
- `Parameter` (char) - The name of the parameter or, more often, the term.
- `Sum_Squares` (num) - The sum of squares.
- `df` (num) - The degrees of freedom associated with the `Sum_Squares`.
- `Mean_Square_residuals` (num; *optional*) - if *not* present, is calculated as `Sum_Squares / df`.
(Any other column is ignored.)

And exactly *1* row Where `Parameter` is `Residual`.

Optionally, one of the rows can have a `(Intercept)` value for `Parameter`.

An example of a minimally valid data frame:

```{r}
min_aov <- data.frame(
  Parameter = c("(Intercept)", "A", "B", "Residuals"),
  Sum_Squares = c(30, 40, 10, 100),
  df = c(1, 1, 2, 50)
)
```

Pass the data frame to `.es_aov_simple()`:

```{r}
.es_aov_simple(
  min_aov,
  type = "eta", partial = TRUE, generalized = FALSE,
  include_intercept = FALSE,
  ci = 0.95, alternative = "greater",
  verbose = TRUE
)
```

The output is a data frame with the columns: `Parameter`, the effect size, and (optionally) `CI` + `CI_low` + `CI_high`,

And with the following attributes: `generalized`, `ci`, `alternative`, `anova_type` (`NA` or `NULL`), `approximate`.

You can then set the `anova_type` attribute to {1, 2, 3, or `NA`} and return the output.

### ANOVA Tables with Multiple Error Strata

(e.g., `aovlist` models.)

The input data frame must have these columns:

- `Group` (char) - The strata
- `Parameter` (char)
- `Sum_Squares` (num)
- `df` (num)
- `Mean_Square_residuals` (num; *optional*)

And exactly *1* row ***per `Group`*** Where `Parameter` is `Residual`.

Optionally, one of the rows can have a `(Intercept)` value for `Parameter`.

An example of a minimally valid data frame:

```{r}
min_aovlist <- data.frame(
  Group = c("S", "S", "S:A", "S:A"),
  Parameter = c("(Intercept)", "Residuals", "A", "Residuals"),
  Sum_Squares = c(34, 21, 34, 400),
  df = c(1, 12, 4, 30)
)
```

Pass the data frame to `.es_aov_strata()`, along with a list of predictors (including the stratifying variables) to the `DV_names` argument:

```{r}
.es_aov_strata(
  min_aovlist,
  DV_names = c("S", "A"),
  type = "omega", partial = TRUE, generalized = FALSE,
  ci = 0.95, alternative = "greater",
  verbose = TRUE,
  include_intercept = TRUE
)
```

The output is a data frame with the columns: `Group`, `Parameter`, the effect size, and (optionally) `CI` + `CI_low` + `CI_high`,

And with the following attributes: `generalized`, `ci`, `alternative`, `approximate`.

You can then set the `anova_type` attribute to {1, 2, 3, or `NA`} and return the output.


### Approximate Effect sizes

When *sums of squares* cannot be extracted, we can still get *approximate* effect sizes based on the `F_to_eta2()` family of functions.

The input data frame must have these columns:

- `Parameter` (char)
- `F` (num) - The *F* test statistic.
- `df` (num) - effect degrees of freedom.
- (Can also have a `t` col instead, in which case `df` is set to 1, and `F` is `t^2`).
- `df_error` (num) - error degrees of freedom.

Optionally, one of the rows can have `(Intercept)` as the `Parameter`.

An example of a minimally valid data frame:

```{r}
min_anova <- data.frame(
  Parameter = c("(Intercept)", "A", "B"),
  F = c(4, 7, 0.7),
  df = c(1, 1, 2),
  df_error = 34
)
```

Pass the table to `.es_aov_table()`:

```{r}
.es_aov_table(
  min_anova,
  type = "eta", partial = TRUE, generalized = FALSE,
  include_intercept = FALSE,
  ci = 0.95, alternative = "greater",
  verbose = TRUE
)
```

The output is a data frame with the columns: `Parameter`, the effect size, and (optionally) `CI` + `CI_low` + `CI_high`,

And with the following attributes: `generalized`, `ci`, `alternative`, `approximate`.

You can then set the `anova_type` attribute to {1, 2, 3, or `NA`} and return the output, and optionally the `approximate` attribute, and return the output.

### *Example*

Let's fit a simple linear model and change its class:

```{r}
mod <- lm(mpg ~ factor(cyl) + am, mtcars)

class(mod) <- "superMODEL"
```

We now need a new `.anova_es.superMODEL` function:

```{r}
.anova_es.superMODEL <- function(model, ...) {
  # Get ANOVA table
  anov <- suppressWarnings(stats:::anova.lm(model))
  anov <- as.data.frame(anov)

  # Clean up
  anov[["Parameter"]] <- rownames(anov)
  colnames(anov)[2:1] <- c("Sum_Squares", "df")

  # Pass
  out <- .es_aov_simple(anov, ...)

  # Set attribute
  attr(out, "anova_type") <- 1

  out
}
```

```{r, echo=FALSE}
# This is for: https://github.com/easystats/easystats/issues/348
.anova_es.superMODEL <<- .anova_es.superMODEL
```


And... that's it! Our new `superMODEL` class of models is fully supported!

```{r}
eta_squared(mod)

eta_squared(mod, partial = FALSE)

omega_squared(mod)

# Etc...
```


<!-- ## Supporting Model Re-Fitting with Standardized Data -->

<!-- `effectsize::standardize.default()` should support your model if you have methods for: -->

<!-- 1. `{insight}` functions. -->
<!-- 2. An `update()` method that can take the model and a data frame via the `data = ` argument. -->

<!-- Or you can make your own `standardize.my_class()` function, DIY-style (possibly using `datawizard::standardize.data.frame()` or `datawizard::standardize.numeric()`). This function should return a fiffed model of the same class as the input model. -->

<!-- ## Supporting Standardized Parameters -->

<!-- `standardize_parameters.default()` offers a few methods of parameter standardization: -->

<!-- - For `method = "refit"` all you need is to have `effectsize::standardize()` support (see above) as well as `parameters::model_parameters()`.   -->
<!-- - ***API for post-hoc methods coming soon...***   -->

<!-- `standardize_parameters.default()` should support your model if it is already supported by `{parameters}` and `{insight}`. -->

<!-- - For `method = "refit"`, to have `effectsize::standardize()` support (see above). -->
<!-- - For the post-hoc methods, you will need to have a method for `standardize_info()` (or use the default method). See next section. -->

<!-- Or you can make your own `standardize_parameters.my_class()` and/or `standardize_info.my_class()` functions. -->

<!-- ## Extracting Post-Hoc Standardization Information (`standardize_info`) -->

<!-- The `standardize_info()` function computes the standardized units needed for standardization; In order to standardize some slope $b_{xi}$, we need to multiply it by a scaling factor: -->

<!-- $$b^*_{xi} = \frac{\text{Deviation}_{xi}}{\text{Deviation}_{y}}\times b_{xi}$$ -->

<!-- These "deviations" are univariate scaling factors of the response and the specific parameter (usin its corresponding feature in the design matrix). Most often these are a single standard deviation (*SD*), but depending on the `robust` and `two_sd` arguments, these can be also be two *MAD*s, etc. -->

<!-- Let's look at an example: -->

<!-- ```{r} -->
<!-- m <- lm(mpg ~ factor(cyl) * am, data = mtcars) -->

<!-- standardize_info(m) -->
<!-- ``` -->

<!-- - The first 4 columns (`Parameter`, `Type`, `Link`, `Secondary_Parameter`) are taken from `parameters::parameters_type()`.   -->
<!-- - The `EffectSize_Type` column is not used here, but is used in the the `{report}` package.   -->
<!-- - `Deviation_Response_Basic` and `Deviation_Response_Smart` correspond to the $\text{Deviation}_{y}$ scalar using two different methods of post-hoc standardization (see `standardize_parameters()` docs for more details).   -->
<!--     - Note then when the response is not standardized (either due to `standardize_parameters(include_response = FALSE)` or because the model uses a non-continuous response), both methods are fixed at **1** (i.e., no standardization with respect to the outcome).   -->
<!-- - `Deviation_Basic` and `Deviation_Smart` correspond to the $\text{Deviation}_{xi}$ scaler using two different methods of post-hoc standardization. -->

<!-- This information is then used by the `standardize_parameters()` to standardize the parameters. -->
    

# References