1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
|
#' Scaling Numeric Data to a Specific Range
#'
#' `step_range` creates a *specification* of a recipe
#' step that will normalize numeric data to be within a pre-defined
#' range of values.
#'
#' @inheritParams step_center
#' @param min A single numeric value for the smallest value in the
#' range.
#' @param max A single numeric value for the largest value in the
#' range.
#' @param clipping A single logical value for determining whether
#' application of transformation onto new data should be forced
#' to be inside `min` and `max`. Defaults to TRUE.
#' @param ranges A character vector of variables that will be
#' normalized. Note that this is ignored until the values are
#' determined by [prep()]. Setting this value will
#' be ineffective.
#' @template step-return
#' @family normalization steps
#' @export
#' @details When a new data point is outside of the ranges seen in
#' the training set, the new values are truncated at `min` or
#' `max`.
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble with columns
#' `terms` (the selectors or variables selected), `min`, and `max` is
#' returned.
#'
#' @template case-weights-not-supported
#'
#' @examplesIf rlang::is_installed("modeldata")
#' data(biomass, package = "modeldata")
#'
#' biomass_tr <- biomass[biomass$dataset == "Training", ]
#' biomass_te <- biomass[biomass$dataset == "Testing", ]
#'
#' rec <- recipe(
#' HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
#' data = biomass_tr
#' )
#'
#' ranged_trans <- rec %>%
#' step_range(carbon, hydrogen)
#'
#' ranged_obj <- prep(ranged_trans, training = biomass_tr)
#'
#' transformed_te <- bake(ranged_obj, biomass_te)
#'
#' biomass_te[1:10, names(transformed_te)]
#' transformed_te
#'
#' tidy(ranged_trans, number = 1)
#' tidy(ranged_obj, number = 1)
step_range <-
function(recipe,
...,
role = NA,
trained = FALSE,
min = 0,
max = 1,
clipping = TRUE,
ranges = NULL,
skip = FALSE,
id = rand_id("range")) {
add_step(
recipe,
step_range_new(
terms = enquos(...),
role = role,
trained = trained,
min = min,
max = max,
clipping = clipping,
ranges = ranges,
skip = skip,
id = id
)
)
}
step_range_new <-
function(terms, role, trained, min, max, clipping, ranges, skip, id) {
step(
subclass = "range",
terms = terms,
role = role,
trained = trained,
min = min,
max = max,
clipping = clipping,
ranges = ranges,
skip = skip,
id = id
)
}
#' @export
prep.step_range <- function(x, training, info = NULL, ...) {
col_names <- recipes_eval_select(x$terms, training, info)
check_type(training[, col_names], types = c("double", "integer"))
mins <-
vapply(training[, col_names], min, c(min = 0), na.rm = TRUE)
maxs <-
vapply(training[, col_names], max, c(max = 0), na.rm = TRUE)
step_range_new(
terms = x$terms,
role = x$role,
trained = TRUE,
min = x$min,
max = x$max,
clipping = x$clipping,
ranges = rbind(mins, maxs),
skip = x$skip,
id = x$id
)
}
#' @export
bake.step_range <- function(object, new_data, ...) {
check_new_data(colnames(object$ranges), object, new_data)
for (column in colnames(object$ranges)) {
min <- object$ranges["mins", column]
max <- object$ranges["maxs", column]
new_data[[column]] <- (new_data[[column]] - min) *
(object$max - object$min) / (max - min) + object$min
if (object$clipping) {
new_data[[column]] <- pmax(new_data[[column]], object$min)
new_data[[column]] <- pmin(new_data[[column]], object$max)
}
}
new_data
}
print.step_range <-
function(x, width = max(20, options()$width - 30), ...) {
title <- glue::glue("Range scaling to [{x$min},{x$max}] for ")
print_step(colnames(x$ranges), x$terms, x$trained, title, width)
invisible(x)
}
#' @rdname tidy.recipe
#' @export
tidy.step_range <- function(x, ...) {
if (is_trained(x)) {
res <- tibble(
terms = colnames(x$ranges) %||% character(),
min = unname(x$ranges["mins", ]),
max = unname(x$ranges["maxs", ])
)
} else {
term_names <- sel2char(x$terms)
res <- tibble(
terms = term_names,
min = na_dbl,
max = na_dbl
)
}
res$id <- x$id
res
}
|