File: prep.Rd

package info (click to toggle)
r-cran-recipes 0.1.15%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 2,496 kB
  • sloc: sh: 37; makefile: 2
file content (107 lines) | stat: -rw-r--r-- 3,532 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/recipe.R
\name{prep}
\alias{prep}
\alias{prep.recipe}
\title{Train a Data Recipe}
\usage{
prep(x, ...)

\method{prep}{recipe}(
  x,
  training = NULL,
  fresh = FALSE,
  verbose = FALSE,
  retain = TRUE,
  log_changes = FALSE,
  strings_as_factors = TRUE,
  ...
)
}
\arguments{
\item{x}{an object}

\item{...}{further arguments passed to or from other methods (not currently
used).}

\item{training}{A data frame or tibble that will be used to estimate
parameters for preprocessing.}

\item{fresh}{A logical indicating whether already trained operation should be
re-trained. If \code{TRUE}, you should pass in a data set to the argument
\code{training}.}

\item{verbose}{A logical that controls whether progress is reported as operations
are executed.}

\item{retain}{A logical: should the \emph{preprocessed} training set be saved
into the \code{template} slot of the recipe after training? This is a good
idea if you want to add more steps later but want to avoid re-training
the existing steps. Also, it is advisable to use \code{retain = TRUE}
if any steps use the option \code{skip = FALSE}. \strong{Note} that this can make
the final recipe size large. When \code{verbose = TRUE}, a message is written
with the approximate object size in memory but may be an underestimate
since it does not take environments into account.}

\item{log_changes}{A logical for printing a summary for each step regarding
which (if any) columns were added or removed during training.}

\item{strings_as_factors}{A logical: should character columns be converted to
factors? This affects the preprocessed training set (when
\code{retain = TRUE}) as well as the results of \code{bake.recipe}.}
}
\value{
A recipe whose step objects have been updated with the required
quantities (e.g. parameter estimates, model objects, etc). Also, the
\code{term_info} object is likely to be modified as the operations are
executed.
}
\description{
For a recipe with at least one preprocessing operation, estimate the required
parameters from a training set that can be later applied to other data
sets.
}
\details{
Given a data set, this function estimates the required quantities
and statistics required by any operations.

\code{\link[=prep]{prep()}} returns an updated recipe with the estimates.

Note that missing data handling is handled in the steps; there is no global
\code{na.rm} option at the recipe-level or in \code{\link[=prep]{prep()}}.

Also, if a recipe has been trained using \code{\link[=prep]{prep()}} and then steps
are added, \code{\link[=prep]{prep()}} will only update the new operations. If
\code{fresh = TRUE}, all of the operations will be (re)estimated.

As the steps are executed, the \code{training} set is updated. For example,
if the first step is to center the data and the second is to scale the
data, the step for scaling is given the centered data.
}
\examples{
data(ames, package = "modeldata")

library(dplyr)

ames <- mutate(ames, Sale_Price = log10(Sale_Price))

ames_rec <-
  recipe(
    Sale_Price ~ Longitude + Latitude + Neighborhood + Year_Built + Central_Air,
    data = ames
  ) \%>\%
  step_other(Neighborhood, threshold = 0.05) \%>\%
  step_dummy(all_nominal()) \%>\%
  step_interact(~ starts_with("Central_Air"):Year_Built) \%>\%
  step_ns(Longitude, Latitude, deg_free = 5)

prep(ames_rec, verbose = TRUE)

prep(ames_rec, log_changes = TRUE)
}
\author{
Max Kuhn
}
\concept{model_specification}
\concept{preprocessing}
\keyword{datagen}