File: step_select.Rd

package info (click to toggle)
r-cran-recipes 1.0.4%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 3,636 kB
  • sloc: sh: 37; makefile: 2
file content (123 lines) | stat: -rw-r--r-- 3,761 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/select.R
\name{step_select}
\alias{step_select}
\title{Select variables using dplyr}
\usage{
step_select(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  skip = FALSE,
  id = rand_id("select")
)
}
\arguments{
\item{recipe}{A recipe object. The step will be added to the
sequence of operations for this recipe.}

\item{...}{One or more selector functions to choose variables
for this step. See \code{\link[=selections]{selections()}} for more details.}

\item{role}{For model terms selected by this step, what analysis
role should they be assigned?}

\item{trained}{A logical to indicate if the quantities for
preprocessing have been estimated.}

\item{skip}{A logical. Should the step be skipped when the
recipe is baked by \code{\link[=bake]{bake()}}? While all operations are baked
when \code{\link[=prep]{prep()}} is run, some operations may not be able to be
conducted on new data (e.g. processing the outcome variable(s)).
Care should be taken when using \code{skip = TRUE} as it may affect
the computations for subsequent operations.}

\item{id}{A character string that is unique to this step to identify it.}
}
\value{
An updated version of \code{recipe} with the new step added to the
sequence of any existing operations.
}
\description{
\code{step_select()} creates a \emph{specification} of a recipe step
that will select variables using \code{\link[dplyr:select]{dplyr::select()}}.
}
\details{
When an object in the user's global environment is
referenced in the expression defining the new variable(s),
it is a good idea to use quasiquotation (e.g. \verb{!!}) to embed
the value of the object in the expression (to be portable
between sessions). See the examples.

This step can potentially remove columns from the data set. This may
cause issues for subsequent steps in your recipe if the missing columns are
specifically referenced by name. To avoid this, see the advice in the
\emph{Tips for saving recipes and filtering columns} section of \link{selections}.
}
\section{Tidying}{
When you \code{\link[=tidy.recipe]{tidy()}} this step, a tibble with column
\code{terms} which contains the \code{select} expressions as character strings
(and are not reparsable) is returned.
}

\section{Case weights}{


The underlying operation does not allow for case weights.
}

\examples{
library(dplyr)

iris_tbl <- as_tibble(iris)
iris_train <- slice(iris_tbl, 1:75)
iris_test <- slice(iris_tbl, 76:150)

dplyr_train <- select(iris_train, Species, starts_with("Sepal"))
dplyr_test <- select(iris_test, Species, starts_with("Sepal"))

rec <- recipe(~., data = iris_train) \%>\%
  step_select(Species, starts_with("Sepal")) \%>\%
  prep(training = iris_train)

rec_train <- bake(rec, new_data = NULL)
all.equal(dplyr_train, rec_train)

rec_test <- bake(rec, iris_test)
all.equal(dplyr_test, rec_test)

# Local variables
sepal_vars <- c("Sepal.Width", "Sepal.Length")

qq_rec <-
  recipe(~., data = iris_train) \%>\%
  # fine for interactive usage
  step_select(Species, all_of(sepal_vars)) \%>\%
  # best approach for saving a recipe to disk
  step_select(Species, all_of(!!sepal_vars))

# Note that `sepal_vars` is inlined in the second approach
qq_rec
}
\seealso{
Other variable filter steps: 
\code{\link{step_corr}()},
\code{\link{step_filter_missing}()},
\code{\link{step_lincomb}()},
\code{\link{step_nzv}()},
\code{\link{step_rm}()},
\code{\link{step_zv}()}

Other dplyr steps: 
\code{\link{step_arrange}()},
\code{\link{step_filter}()},
\code{\link{step_mutate_at}()},
\code{\link{step_mutate}()},
\code{\link{step_rename_at}()},
\code{\link{step_rename}()},
\code{\link{step_sample}()},
\code{\link{step_slice}()}
}
\concept{dplyr steps}
\concept{variable filter steps}