File: default_recipe_blueprint.Rd

package info (click to toggle)
r-cran-hardhat 1.2.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,656 kB
  • sloc: sh: 13; makefile: 2
file content (170 lines) | stat: -rw-r--r-- 6,015 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/blueprint-recipe-default.R, R/mold.R
\name{default_recipe_blueprint}
\alias{default_recipe_blueprint}
\alias{mold.recipe}
\title{Default recipe blueprint}
\usage{
default_recipe_blueprint(
  intercept = FALSE,
  allow_novel_levels = FALSE,
  fresh = TRUE,
  composition = "tibble"
)

\method{mold}{recipe}(x, data, ..., blueprint = NULL)
}
\arguments{
\item{intercept}{A logical. Should an intercept be included in the
processed data? This information is used by the \code{process} function
in the \code{mold} and \code{forge} function list.}

\item{allow_novel_levels}{A logical. Should novel factor levels be allowed at
prediction time? This information is used by the \code{clean} function in the
\code{forge} function list, and is passed on to \code{\link[=scream]{scream()}}.}

\item{fresh}{Should already trained operations be re-trained when \code{prep()} is
called?}

\item{composition}{Either "tibble", "matrix", or "dgCMatrix" for the format
of the processed predictors. If "matrix" or "dgCMatrix" are chosen, all of
the predictors must be numeric after the preprocessing method has been
applied; otherwise an error is thrown.}

\item{x}{An unprepped recipe created from \code{\link[recipes:recipe]{recipes::recipe()}}.}

\item{data}{A data frame or matrix containing the outcomes and predictors.}

\item{...}{Not used.}

\item{blueprint}{A preprocessing \code{blueprint}. If left as \code{NULL}, then a
\code{\link[=default_recipe_blueprint]{default_recipe_blueprint()}} is used.}
}
\value{
For \code{default_recipe_blueprint()}, a recipe blueprint.
}
\description{
This pages holds the details for the recipe preprocessing blueprint. This
is the blueprint used by default from \code{mold()} if \code{x} is a recipe.
}
\section{Mold}{


When \code{mold()} is used with the default recipe blueprint:
\itemize{
\item It calls \code{\link[recipes:prep]{recipes::prep()}} to prep the recipe.
\item It calls \code{\link[recipes:juice]{recipes::juice()}} to extract the outcomes and predictors. These
are returned as tibbles.
\item If \code{intercept = TRUE}, adds an intercept column to the predictors.
}
}

\section{Forge}{


When \code{forge()} is used with the default recipe blueprint:
\itemize{
\item It calls \code{\link[=shrink]{shrink()}} to trim \code{new_data} to only the required columns and
coerce \code{new_data} to a tibble.
\item It calls \code{\link[=scream]{scream()}} to perform validation on the structure of the columns
of \code{new_data}.
\item It calls \code{\link[recipes:bake]{recipes::bake()}} on the \code{new_data} using the prepped recipe
used during training.
\item It adds an intercept column onto \code{new_data} if \code{intercept = TRUE}.
}
}

\examples{
library(recipes)

# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# ---------------------------------------------------------------------------
# Recipes example

# Create a recipe that logs a predictor
rec <- recipe(Species ~ Sepal.Length + Sepal.Width, train) \%>\%
  step_log(Sepal.Length)

processed <- mold(rec, train)

# Sepal.Length has been logged
processed$predictors

processed$outcomes

# The underlying blueprint is a prepped recipe
processed$blueprint$recipe

# Call forge() with the blueprint and the test data
# to have it preprocess the test data in the same way
forge(test, processed$blueprint)

# Use `outcomes = TRUE` to also extract the preprocessed outcome!
# This logged the Sepal.Length column of `new_data`
forge(test, processed$blueprint, outcomes = TRUE)

# ---------------------------------------------------------------------------
# With an intercept

# You can add an intercept with `intercept = TRUE`
processed <- mold(rec, train, blueprint = default_recipe_blueprint(intercept = TRUE))

processed$predictors

# But you also could have used a recipe step
rec2 <- step_intercept(rec)

mold(rec2, iris)$predictors

# ---------------------------------------------------------------------------
# Matrix output for predictors

# You can change the `composition` of the predictor data set
bp <- default_recipe_blueprint(composition = "dgCMatrix")
processed <- mold(rec, train, blueprint = bp)
class(processed$predictors)

\dontshow{if (utils::packageVersion("recipes") >= "0.2.0.9002") (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
# ---------------------------------------------------------------------------
# Non standard roles

# If you have custom recipes roles, they are assumed to be required at
# `bake()` time when passing in `new_data`. This is an assumption that both
# recipes and hardhat makes, meaning that those roles are required at
# `forge()` time as well.
rec_roles <- recipe(train) \%>\%
  update_role(Sepal.Width, new_role = "predictor") \%>\%
  update_role(Species, new_role = "outcome") \%>\%
  update_role(Sepal.Length, new_role = "id") \%>\%
  update_role(Petal.Length, new_role = "important")

processed_roles <- mold(rec_roles, train)

# The custom roles will be in the `mold()` result in case you need
# them for modeling.
processed_roles$extras

# And they are in the `forge()` result
forge(test, processed_roles$blueprint)$extras

# If you remove a column with a custom role from the test data, then you
# won't be able to `forge()` even though this recipe technically didn't
# use that column in any steps
test2 <- test
test2$Petal.Length <- NULL
try(forge(test2, processed_roles$blueprint))

# Most of the time, if you find yourself in the above scenario, then we
# suggest that you remove `Petal.Length` from the data that is supplied to
# the recipe. If that isn't an option, you can declare that that column
# isn't required at `bake()` time by using `update_role_requirements()`
rec_roles <- update_role_requirements(rec_roles, "important", bake = FALSE)
processed_roles <- mold(rec_roles, train)
forge(test2, processed_roles$blueprint)
\dontshow{\}) # examplesIf}
}