File: across.Rd

package info (click to toggle)
r-cran-dplyr 1.1.4-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,292 kB
  • sloc: cpp: 1,403; sh: 17; makefile: 7
file content (233 lines) | stat: -rw-r--r-- 8,774 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/across.R
\name{across}
\alias{across}
\alias{if_any}
\alias{if_all}
\title{Apply a function (or functions) across multiple columns}
\usage{
across(.cols, .fns, ..., .names = NULL, .unpack = FALSE)

if_any(.cols, .fns, ..., .names = NULL)

if_all(.cols, .fns, ..., .names = NULL)
}
\arguments{
\item{.cols}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> Columns to transform.
You can't select grouping columns because they are already automatically
handled by the verb (i.e. \code{\link[=summarise]{summarise()}} or \code{\link[=mutate]{mutate()}}).}

\item{.fns}{Functions to apply to each of the selected columns.
Possible values are:
\itemize{
\item A function, e.g. \code{mean}.
\item A purrr-style lambda, e.g. \code{~ mean(.x, na.rm = TRUE)}
\item A named list of functions or lambdas, e.g.
\verb{list(mean = mean, n_miss = ~ sum(is.na(.x))}. Each function is applied
to each column, and the output is named by combining the function name
and the column name using the glue specification in \code{.names}.
}

Within these functions you can use \code{\link[=cur_column]{cur_column()}} and \code{\link[=cur_group]{cur_group()}}
to access the current column and grouping keys respectively.}

\item{...}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

Additional arguments for the function calls in \code{.fns} are no longer
accepted in \code{...} because it's not clear when they should be evaluated:
once per \code{across()} or once per group? Instead supply additional arguments
directly in \code{.fns} by using a lambda. For example, instead of
\code{across(a:b, mean, na.rm = TRUE)} write
\code{across(a:b, ~ mean(.x, na.rm = TRUE))}.}

\item{.names}{A glue specification that describes how to name the output
columns. This can use \code{{.col}} to stand for the selected column name, and
\code{{.fn}} to stand for the name of the function being applied. The default
(\code{NULL}) is equivalent to \code{"{.col}"} for the single function case and
\code{"{.col}_{.fn}"} for the case where a list is used for \code{.fns}.}

\item{.unpack}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}

Optionally \link[tidyr:pack]{unpack} data frames returned by functions in
\code{.fns}, which expands the df-columns out into individual columns, retaining
the number of rows in the data frame.
\itemize{
\item If \code{FALSE}, the default, no unpacking is done.
\item If \code{TRUE}, unpacking is done with a default glue specification of
\code{"{outer}_{inner}"}.
\item Otherwise, a single glue specification can be supplied to describe how to
name the unpacked columns. This can use \code{{outer}} to refer to the name
originally generated by \code{.names}, and \code{{inner}} to refer to the names of
the data frame you are unpacking.
}}
}
\value{
\code{across()} typically returns a tibble with one column for each column in
\code{.cols} and each function in \code{.fns}. If \code{.unpack} is used, more columns may
be returned depending on how the results of \code{.fns} are unpacked.

\code{if_any()} and \code{if_all()} return a logical vector.
}
\description{
\code{across()} makes it easy to apply the same transformation to multiple
columns, allowing you to use \code{\link[=select]{select()}} semantics inside in "data-masking"
functions like \code{\link[=summarise]{summarise()}} and \code{\link[=mutate]{mutate()}}. See \code{vignette("colwise")} for
more details.

\code{if_any()} and \code{if_all()} apply the same
predicate function to a selection of columns and combine the
results into a single logical vector: \code{if_any()} is \code{TRUE} when
the predicate is \code{TRUE} for \emph{any} of the selected columns, \code{if_all()}
is \code{TRUE} when the predicate is \code{TRUE} for \emph{all} selected columns.

If you just need to select columns without applying a transformation to each
of them, then you probably want to use \code{\link[=pick]{pick()}} instead.

\code{across()} supersedes the family of "scoped variants" like
\code{summarise_at()}, \code{summarise_if()}, and \code{summarise_all()}.
}
\section{Timing of evaluation}{

R code in dplyr verbs is generally evaluated once per group.
Inside \code{across()} however, code is evaluated once for each
combination of columns and groups. If the evaluation timing is
important, for example if you're generating random variables, think
about when it should happen and place your code in consequence.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{gdf <-
  tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) \%>\%
  group_by(g)

set.seed(1)

# Outside: 1 normal variate
n <- rnorm(1)
gdf \%>\% mutate(across(v1:v2, ~ .x + n))
#> # A tibble: 4 x 3
#> # Groups:   g [3]
#>       g    v1    v2
#>   <dbl> <dbl> <dbl>
#> 1     1  9.37  19.4
#> 2     1 10.4   20.4
#> 3     2 11.4   21.4
#> 4     3 12.4   22.4

# Inside a verb: 3 normal variates (ngroup)
gdf \%>\% mutate(n = rnorm(1), across(v1:v2, ~ .x + n))
#> # A tibble: 4 x 4
#> # Groups:   g [3]
#>       g    v1    v2      n
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1  10.2  20.2  0.184
#> 2     1  11.2  21.2  0.184
#> 3     2  11.2  21.2 -0.836
#> 4     3  14.6  24.6  1.60

# Inside `across()`: 6 normal variates (ncol * ngroup)
gdf \%>\% mutate(across(v1:v2, ~ .x + rnorm(1)))
#> # A tibble: 4 x 3
#> # Groups:   g [3]
#>       g    v1    v2
#>   <dbl> <dbl> <dbl>
#> 1     1  10.3  20.7
#> 2     1  11.3  21.7
#> 3     2  11.2  22.6
#> 4     3  13.5  22.7
}\if{html}{\out{</div>}}
}

\examples{
# For better printing
iris <- as_tibble(iris)

# across() -----------------------------------------------------------------
# Different ways to select the same set of columns
# See <https://tidyselect.r-lib.org/articles/syntax.html> for details
iris \%>\%
  mutate(across(c(Sepal.Length, Sepal.Width), round))
iris \%>\%
  mutate(across(c(1, 2), round))
iris \%>\%
  mutate(across(1:Sepal.Width, round))
iris \%>\%
  mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round))

# Using an external vector of names
cols <- c("Sepal.Length", "Petal.Width")
iris \%>\%
  mutate(across(all_of(cols), round))

# If the external vector is named, the output columns will be named according
# to those names
names(cols) <- tolower(cols)
iris \%>\%
  mutate(across(all_of(cols), round))

# A purrr-style formula
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))

# A named list of functions
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd)))

# Use the .names argument to control the output names
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"))
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd), .names = "{.col}.{.fn}"))

# If a named external vector is used for column selection, .names will use
# those names when constructing the output names
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(all_of(cols), mean, .names = "mean_{.col}"))

# When the list is not named, .fn is replaced by the function's position
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), list(mean, sd), .names = "{.col}.fn{.fn}"))

# When the functions in .fns return a data frame, you typically get a
# "packed" data frame back
quantile_df <- function(x, probs = c(0.25, 0.5, 0.75)) {
  tibble(quantile = probs, value = quantile(x, probs))
}

iris \%>\%
  reframe(across(starts_with("Sepal"), quantile_df))

# Use .unpack to automatically expand these packed data frames into their
# individual columns
iris \%>\%
  reframe(across(starts_with("Sepal"), quantile_df, .unpack = TRUE))

# .unpack can utilize a glue specification if you don't like the defaults
iris \%>\%
  reframe(across(starts_with("Sepal"), quantile_df, .unpack = "{outer}.{inner}"))

# This is also useful inside mutate(), for example, with a multi-lag helper
multilag <- function(x, lags = 1:3) {
  names(lags) <- as.character(lags)
  purrr::map_dfr(lags, lag, x = x)
}

iris \%>\%
  group_by(Species) \%>\%
  mutate(across(starts_with("Sepal"), multilag, .unpack = TRUE)) \%>\%
  select(Species, starts_with("Sepal"))

# if_any() and if_all() ----------------------------------------------------
iris \%>\%
  filter(if_any(ends_with("Width"), ~ . > 4))
iris \%>\%
  filter(if_all(ends_with("Width"), ~ . > 2))

}
\seealso{
\code{\link[=c_across]{c_across()}} for a function that returns a vector
}