1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
|
#' General Variable Filter
#'
#' `step_rm` creates a *specification* of a recipe step
#' that will remove variables based on their name, type, or role.
#'
#' @inheritParams step_center
#' @param ... One or more selector functions to choose which
#' variables that will be evaluated by the filtering bake. See
#' [selections()] for more details. For the `tidy`
#' method, these are not currently used.
#' @param role Not used by this step since no new variables are
#' created.
#' @param removals A character string that contains the names of
#' columns that should be removed. These values are not determined
#' until [prep.recipe()] is called.
#' @return An updated version of `recipe` with the new step
#' added to the sequence of existing steps (if any). For the
#' `tidy` method, a tibble with columns `terms` which
#' is the columns that will be removed.
#' @keywords datagen
#' @concept preprocessing
#' @concept variable_filters
#' @export
#' @examples
#' library(modeldata)
#' data(biomass)
#'
#' biomass_tr <- biomass[biomass$dataset == "Training",]
#' biomass_te <- biomass[biomass$dataset == "Testing",]
#'
#' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
#' data = biomass_tr)
#'
#' library(dplyr)
#' smaller_set <- rec %>%
#' step_rm(contains("gen"))
#'
#' smaller_set <- prep(smaller_set, training = biomass_tr)
#'
#' filtered_te <- bake(smaller_set, biomass_te)
#' filtered_te
#'
#' tidy(smaller_set, number = 1)
step_rm <- function(recipe,
...,
role = NA,
trained = FALSE,
removals = NULL,
skip = FALSE,
id = rand_id("rm")) {
add_step(recipe,
step_rm_new(
terms = ellipse_check(...),
role = role,
trained = trained,
removals = removals,
skip = skip,
id = id
))
}
step_rm_new <- function(terms, role, trained, removals, skip, id) {
step(
subclass = "rm",
terms = terms,
role = role,
trained = trained,
removals = removals,
skip = skip,
id = id
)
}
#' @export
prep.step_rm <- function(x, training, info = NULL, ...) {
col_names <- eval_select_recipes(x$terms, training, info)
step_rm_new(
terms = x$terms,
role = x$role,
trained = TRUE,
removals = col_names,
skip = x$skip,
id = x$id
)
}
#' @export
bake.step_rm <- function(object, new_data, ...) {
if (length(object$removals) > 0)
new_data <- new_data[, !(colnames(new_data) %in% object$removals)]
as_tibble(new_data)
}
print.step_rm <-
function(x, width = max(20, options()$width - 22), ...) {
if (x$trained) {
if (length(x$removals) > 0) {
cat("Variables removed ")
cat(format_ch_vec(x$removals, width = width))
} else
cat("No variables were removed")
} else {
cat("Delete terms ", sep = "")
cat(format_selectors(x$terms, width = width))
}
if (x$trained)
cat(" [trained]\n")
else
cat("\n")
invisible(x)
}
#' @rdname step_rm
#' @param x A `step_rm` object.
#' @export
tidy.step_rm <- tidy_filter
|