1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/novel.R
\name{step_novel}
\alias{step_novel}
\alias{tidy.step_novel}
\title{Simple Value Assignments for Novel Factor Levels}
\usage{
step_novel(
recipe,
...,
role = NA,
trained = FALSE,
new_level = "new",
objects = NULL,
skip = FALSE,
id = rand_id("novel")
)
\method{tidy}{step_novel}(x, ...)
}
\arguments{
\item{recipe}{A recipe object. The step will be added to the
sequence of operations for this recipe.}
\item{...}{One or more selector functions to choose which
variables that will be affected by the step. These variables
should be character or factor types. See \code{\link[=selections]{selections()}} for more
details. For the \code{tidy} method, these are not currently used.}
\item{role}{Not used by this step since no new variables are
created.}
\item{trained}{A logical to indicate if the quantities for
preprocessing have been estimated.}
\item{new_level}{A single character value that will be assigned
to new factor levels.}
\item{objects}{A list of objects that contain the information
on factor levels that will be determined by \code{\link[=prep.recipe]{prep.recipe()}}.}
\item{skip}{A logical. Should the step be skipped when the
recipe is baked by \code{\link[=bake.recipe]{bake.recipe()}}? While all operations are baked
when \code{\link[=prep.recipe]{prep.recipe()}} is run, some operations may not be able to be
conducted on new data (e.g. processing the outcome variable(s)).
Care should be taken when using \code{skip = TRUE} as it may affect
the computations for subsequent operations}
\item{id}{A character string that is unique to this step to identify it.}
\item{x}{A \code{step_novel} object.}
}
\value{
An updated version of \code{recipe} with the new step
added to the sequence of existing steps (if any). For the
\code{tidy} method, a tibble with columns \code{terms} (the
columns that will be affected) and \code{value} (the factor
levels that is used for the new value)
}
\description{
\code{step_novel} creates a \emph{specification} of a recipe
step that will assign a previously unseen factor level to a
new value.
}
\details{
The selected variables are adjusted to have a new
level (given by \code{new_level}) that is placed in the last
position. During preparation there will be no data points
associated with this new level since all of the data have been
seen.
Note that if the original columns are character, they will be
converted to factors by this step.
Missing values will remain missing.
If \code{new_level} is already in the data given to \code{prep}, an error
is thrown.
}
\examples{
library(modeldata)
data(okc)
okc_tr <- okc[1:30000,]
okc_te <- okc[30001:30006,]
okc_te$diet[3] <- "cannibalism"
okc_te$diet[4] <- "vampirism"
rec <- recipe(~ diet + location, data = okc_tr)
rec <- rec \%>\%
step_novel(diet, location)
rec <- prep(rec, training = okc_tr)
processed <- bake(rec, okc_te)
tibble(old = okc_te$diet, new = processed$diet)
tidy(rec, number = 1)
}
\seealso{
\code{\link[=step_factor2string]{step_factor2string()}}, \code{\link[=step_string2factor]{step_string2factor()}},
\code{\link[=dummy_names]{dummy_names()}}, \code{\link[=step_regex]{step_regex()}}, \code{\link[=step_count]{step_count()}},
\code{\link[=step_ordinalscore]{step_ordinalscore()}}, \code{\link[=step_unorder]{step_unorder()}}, \code{\link[=step_other]{step_other()}}
}
\concept{factors}
\concept{preprocessing}
\keyword{datagen}
|