File: selections.Rmd

package info (click to toggle)
r-cran-recipes 1.0.4%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 3,636 kB
  • sloc: sh: 37; makefile: 2
file content (46 lines) | stat: -rw-r--r-- 1,436 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
## Tips for saving recipes and filtering columns

When creating variable selections:

* If you are using column filtering steps, such as `step_corr()`, try to avoid hardcoding specific variable names in downstream steps in case those columns are removed by the filter. Instead, use [dplyr::any_of()] and [dplyr::all_of()]. 
  
   * [dplyr::any_of()] will be tolerant if a column has been removed. 
   * [dplyr::all_of()] will fail unless all of the columns are present in the data. 

* For both of these functions, if you are going to save the recipe as a binary object to use in another R session, try to avoid referring to a vector in your workspace. 

   * Preferred: `any_of(!!var_names)`
   * Avoid: `any_of(var_names)` 
   
Some examples:

```{r, error=TRUE}
some_vars <- names(mtcars)[4:6]

# No filter steps, OK for not saving the recipe
rec_1 <-
  recipe(mpg ~ ., data = mtcars) %>% 
  step_log(all_of(some_vars)) %>% 
  prep()

# No filter steps, saving the recipe
rec_2 <-
  recipe(mpg ~ ., data = mtcars) %>% 
  step_log(!!!some_vars) %>% 
  prep()

# This fails since `wt` is not in the data
recipe(mpg ~ ., data = mtcars)  %>% 
  step_rm(wt) %>% 
  step_log(!!!some_vars) %>% 
  prep()

# Best for filters (using any_of()) and when
# saving the recipe
rec_4 <- 
  recipe(mpg ~ ., data = mtcars) %>% 
  step_rm(wt) %>% 
  step_log(any_of(!!some_vars)) %>% 
  # equal to step_log(any_of(c("hp", "drat", "wt")))
  prep()
```