1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
|
#' Forge prediction-ready data
#'
#' @description
#'
#' `forge()` applies the transformations requested by the specific `blueprint`
#' on a set of `new_data`. This `new_data` contains new predictors
#' (and potentially outcomes) that will be used to generate predictions.
#'
#' All blueprints have consistent return values with the others, but each is
#' unique enough to have its own help page. Click through below to learn
#' how to use each one in conjunction with `forge()`.
#'
#' * XY Method - [default_xy_blueprint()]
#'
#' * Formula Method - [default_formula_blueprint()]
#'
#' * Recipes Method - [default_recipe_blueprint()]
#'
#' @details
#'
#' If the outcomes are present in `new_data`, they can optionally be processed
#' and returned in the `outcomes` slot of the returned list by setting
#' `outcomes = TRUE`. This is very useful when doing cross validation where
#' you need to preprocess the outcomes of a test set before computing
#' performance.
#'
#' @param new_data A data frame or matrix of predictors to process. If
#' `outcomes = TRUE`, this should also contain the outcomes to process.
#'
#' @param blueprint A preprocessing `blueprint`.
#'
#' @param outcomes A logical. Should the outcomes be processed and returned
#' as well?
#'
#' @param ... Not used.
#'
#' @return
#'
#' A named list with 3 elements:
#'
#' - `predictors`: A tibble containing the preprocessed
#' `new_data` predictors.
#'
#' - `outcomes`: If `outcomes = TRUE`, a tibble containing the preprocessed
#' outcomes found in `new_data`. Otherwise, `NULL`.
#'
#' - `extras`: Either `NULL` if the blueprint returns no extra information,
#' or a named list containing the extra information.
#'
#' @examples
#' # See the blueprint specific documentation linked above
#' # for various ways to call forge with different
#' # blueprints.
#'
#' train <- iris[1:100, ]
#' test <- iris[101:150, ]
#'
#' # Formula
#' processed <- mold(
#' log(Sepal.Width) ~ Species,
#' train,
#' blueprint = default_formula_blueprint(indicators = "none")
#' )
#'
#' forge(test, processed$blueprint, outcomes = TRUE)
#' @export
forge <- function(new_data, blueprint, ..., outcomes = FALSE) {
UseMethod("forge")
}
#' @export
forge.default <- function(new_data, blueprint, ..., outcomes = FALSE) {
glubort("The class of `new_data`, '{class1(new_data)}', is not recognized.")
}
#' @export
forge.data.frame <- function(new_data, blueprint, ..., outcomes = FALSE) {
validate_empty_dots(...)
validate_is_blueprint(blueprint)
run_forge(
blueprint,
new_data = new_data,
outcomes = outcomes
)
}
#' @export
forge.matrix <- forge.data.frame
# ------------------------------------------------------------------------------
#' `forge()` according to a blueprint
#'
#' @description
#' This is a developer facing function that is _only_ used if you are creating
#' your own blueprint subclass. It is called from [forge()] and dispatches off
#' the S3 class of the `blueprint`. This gives you an opportunity to forge the
#' new data in a way that is specific to your blueprint.
#'
#' `run_forge()` is always called from `forge()` with the same arguments, unlike
#' [run_mold()], because there aren't different interfaces for calling
#' `forge()`. `run_forge()` is always called as:
#'
#' `run_forge(blueprint, new_data = new_data, outcomes = outcomes)`
#'
#' If you write a blueprint subclass for [new_xy_blueprint()],
#' [new_recipe_blueprint()], [new_formula_blueprint()], or [new_blueprint()],
#' then your `run_forge()` method signature must match this.
#'
#' @inheritParams forge
#'
#' @return
#' `run_forge()` methods return the object that is then immediately returned
#' from `forge()`. See the return value section of [forge()] to understand what
#' the structure of the return value should look like.
#'
#' @name run-forge
#' @order 1
#' @export
#' @examples
#' bp <- default_xy_blueprint()
#'
#' outcomes <- mtcars["mpg"]
#' predictors <- mtcars
#' predictors$mpg <- NULL
#'
#' mold <- run_mold(bp, x = predictors, y = outcomes)
#'
#' run_forge(mold$blueprint, new_data = predictors)
run_forge <- function(blueprint,
new_data,
...,
outcomes = FALSE) {
UseMethod("run_forge")
}
#' @export
run_forge.default <- function(blueprint,
new_data,
...,
outcomes = FALSE) {
class <- class(blueprint)[[1L]]
message <- glue("No `run_forge()` method provided for an object of type <{class}>.")
abort(message)
}
|