1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/helpers.R
\name{language}
\alias{language}
\alias{select_helpers}
\title{Selection language}
\description{
\subsection{Overview of selection features:}{
tidyselect implements a DSL for selecting variables. It provides helpers
for selecting variables:
\itemize{
\item \code{var1:var10}: variables lying between \code{var1} on the left and \code{var10} on the right.
}
\itemize{
\item \code{\link[=starts_with]{starts_with("a")}}: names that start with \code{"a"}.
\item \code{\link[=ends_with]{ends_with("z")}}: names that end with \code{"z"}.
\item \code{\link[=contains]{contains("b")}}: names that contain \code{"b"}.
\item \code{\link[=matches]{matches("x.y")}}: names that match regular expression \code{x.y}.
\item \code{\link[=num_range]{num_range(x, 1:4)}}: names following the pattern, \code{x1}, \code{x2}, ..., \code{x4}.
\item \code{\link[=all_of]{all_of(vars)}}/\code{\link[=any_of]{any_of(vars)}}:
matches names stored in the character vector \code{vars}. \code{all_of(vars)} will
error if the variables aren't present; \code{any_of(var)} will match just the
variables that exist.
\item \code{\link[=everything]{everything()}}: all variables.
\item \code{\link[=last_col]{last_col()}}: furthest column on the right.
\item \code{\link[=where]{where(is.numeric)}}: all variables where
\code{is.numeric()} returns \code{TRUE}.
}
As well as operators for combining those selections:
\itemize{
\item \code{!selection}: only variables that don't match \code{selection}.
\item \code{selection1 & selection2}: only variables included in both \code{selection1} and \code{selection2}.
\item \code{selection1 | selection2}: all variables that match either \code{selection1} or \code{selection2}.
}
When writing code inside packages you can substitute \code{"var"} for \code{var} to avoid \verb{R CMD check} notes.
}
}
\section{Simple examples}{
Here we show the usage for the basic selection operators. See the
specific help pages to learn about helpers like \code{\link[=starts_with]{starts_with()}}.
The selection language can be used in functions like
\code{dplyr::select()} or \code{tidyr::pivot_longer()}. Let's first attach
the tidyverse:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{library(tidyverse)
# For better printing
iris <- as_tibble(iris)
}\if{html}{\out{</div>}}
Select variables by name:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{starwars \%>\% select(height)
#> # A tibble: 87 x 1
#> height
#> <int>
#> 1 172
#> 2 167
#> 3 96
#> 4 202
#> # ... with 83 more rows
iris \%>\% pivot_longer(Sepal.Length)
#> # A tibble: 150 x 6
#> Sepal.Width Petal.Length Petal.Width Species name value
#> <dbl> <dbl> <dbl> <fct> <chr> <dbl>
#> 1 3.5 1.4 0.2 setosa Sepal.Length 5.1
#> 2 3 1.4 0.2 setosa Sepal.Length 4.9
#> 3 3.2 1.3 0.2 setosa Sepal.Length 4.7
#> 4 3.1 1.5 0.2 setosa Sepal.Length 4.6
#> # ... with 146 more rows
}\if{html}{\out{</div>}}
Select multiple variables by separating them with commas. Note how
the order of columns is determined by the order of inputs:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{starwars \%>\% select(homeworld, height, mass)
#> # A tibble: 87 x 3
#> homeworld height mass
#> <chr> <int> <dbl>
#> 1 Tatooine 172 77
#> 2 Tatooine 167 75
#> 3 Naboo 96 32
#> 4 Tatooine 202 136
#> # ... with 83 more rows
}\if{html}{\out{</div>}}
Functions like \code{tidyr::pivot_longer()} don't take variables with
dots. In this case use \code{c()} to select multiple variables:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{iris \%>\% pivot_longer(c(Sepal.Length, Petal.Length))
#> # A tibble: 300 x 5
#> Sepal.Width Petal.Width Species name value
#> <dbl> <dbl> <fct> <chr> <dbl>
#> 1 3.5 0.2 setosa Sepal.Length 5.1
#> 2 3.5 0.2 setosa Petal.Length 1.4
#> 3 3 0.2 setosa Sepal.Length 4.9
#> 4 3 0.2 setosa Petal.Length 1.4
#> # ... with 296 more rows
}\if{html}{\out{</div>}}
\subsection{Operators:}{
The \code{:} operator selects a range of consecutive variables:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{starwars \%>\% select(name:mass)
#> # A tibble: 87 x 3
#> name height mass
#> <chr> <int> <dbl>
#> 1 Luke Skywalker 172 77
#> 2 C-3PO 167 75
#> 3 R2-D2 96 32
#> 4 Darth Vader 202 136
#> # ... with 83 more rows
}\if{html}{\out{</div>}}
The \code{!} operator negates a selection:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{starwars \%>\% select(!(name:mass))
#> # A tibble: 87 x 11
#> hair_color skin_c~1 eye_c~2 birth~3 sex gender homew~4 species films vehic~5
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <lis> <list>
#> 1 blond fair blue 19 male mascu~ Tatooi~ Human <chr> <chr>
#> 2 <NA> gold yellow 112 none mascu~ Tatooi~ Droid <chr> <chr>
#> 3 <NA> white, ~ red 33 none mascu~ Naboo Droid <chr> <chr>
#> 4 none white yellow 41.9 male mascu~ Tatooi~ Human <chr> <chr>
#> # ... with 83 more rows, 1 more variable: starships <list>, and abbreviated
#> # variable names 1: skin_color, 2: eye_color, 3: birth_year, 4: homeworld,
#> # 5: vehicles
iris \%>\% select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#> Sepal.Width Petal.Width Species
#> <dbl> <dbl> <fct>
#> 1 3.5 0.2 setosa
#> 2 3 0.2 setosa
#> 3 3.2 0.2 setosa
#> 4 3.1 0.2 setosa
#> # ... with 146 more rows
iris \%>\% select(!ends_with("Width"))
#> # A tibble: 150 x 3
#> Sepal.Length Petal.Length Species
#> <dbl> <dbl> <fct>
#> 1 5.1 1.4 setosa
#> 2 4.9 1.4 setosa
#> 3 4.7 1.3 setosa
#> 4 4.6 1.5 setosa
#> # ... with 146 more rows
}\if{html}{\out{</div>}}
\code{&} and \code{|} take the intersection or the union of two selections:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{iris \%>\% select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#> Petal.Width
#> <dbl>
#> 1 0.2
#> 2 0.2
#> 3 0.2
#> 4 0.2
#> # ... with 146 more rows
iris \%>\% select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#> Petal.Length Petal.Width Sepal.Width
#> <dbl> <dbl> <dbl>
#> 1 1.4 0.2 3.5
#> 2 1.4 0.2 3
#> 3 1.3 0.2 3.2
#> 4 1.5 0.2 3.1
#> # ... with 146 more rows
}\if{html}{\out{</div>}}
To take the difference between two selections, combine the \code{&} and
\code{!} operators:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{iris \%>\% select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#> Petal.Length
#> <dbl>
#> 1 1.4
#> 2 1.4
#> 3 1.3
#> 4 1.5
#> # ... with 146 more rows
}\if{html}{\out{</div>}}
}
}
\section{Details}{
The order of selected columns is determined by the inputs.
\itemize{
\item \code{all_of(c("foo", "bar"))} selects \code{"foo"} first.
\item \code{c(starts_with("c"), starts_with("d"))} selects all columns
starting with \code{"c"} first, then all columns starting with \code{"d"}.
}
}
|