File: across.Rd

package info (click to toggle)
r-cran-dplyr 1.0.10-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 3,476 kB
  • sloc: cpp: 1,266; sh: 16; makefile: 9
file content (174 lines) | stat: -rw-r--r-- 6,152 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/across.R
\name{across}
\alias{across}
\alias{if_any}
\alias{if_all}
\title{Apply a function (or functions) across multiple columns}
\usage{
across(.cols = everything(), .fns = NULL, ..., .names = NULL)

if_any(.cols = everything(), .fns = NULL, ..., .names = NULL)

if_all(.cols = everything(), .fns = NULL, ..., .names = NULL)
}
\arguments{
\item{.cols, cols}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> Columns to transform.
Because \code{across()} is used within functions like \code{summarise()} and
\code{mutate()}, you can't select or compute upon grouping variables.}

\item{.fns}{Functions to apply to each of the selected columns.
Possible values are:
\itemize{
\item A function, e.g. \code{mean}.
\item A purrr-style lambda, e.g. \code{~ mean(.x, na.rm = TRUE)}
\item A list of functions/lambdas, e.g.
\verb{list(mean = mean, n_miss = ~ sum(is.na(.x))}
\item \code{NULL}: the default value, returns the selected columns in a data
frame without applying a transformation. This is useful for when you want to
use a function that takes a data frame.
}

Within these functions you can use \code{\link[=cur_column]{cur_column()}} and \code{\link[=cur_group]{cur_group()}}
to access the current column and grouping keys respectively.}

\item{...}{Additional arguments for the function calls in \code{.fns}. Using these
\code{...} is strongly discouraged because of issues of timing of evaluation.}

\item{.names}{A glue specification that describes how to name the output
columns. This can use \code{{.col}} to stand for the selected column name, and
\code{{.fn}} to stand for the name of the function being applied. The default
(\code{NULL}) is equivalent to \code{"{.col}"} for the single function case and
\code{"{.col}_{.fn}"} for the case where a list is used for \code{.fns}.}
}
\value{
\code{across()} returns a tibble with one column for each column in \code{.cols} and each function in \code{.fns}.

\code{if_any()} and \code{if_all()} return a logical vector.
}
\description{
\code{across()} makes it easy to apply the same transformation to multiple
columns, allowing you to use \code{\link[=select]{select()}} semantics inside in "data-masking"
functions like \code{\link[=summarise]{summarise()}} and \code{\link[=mutate]{mutate()}}. See \code{vignette("colwise")} for
more details.

\code{if_any()} and \code{if_all()} apply the same
predicate function to a selection of columns and combine the
results into a single logical vector: \code{if_any()} is \code{TRUE} when
the predicate is \code{TRUE} for \emph{any} of the selected columns, \code{if_all()}
is \code{TRUE} when the predicate is \code{TRUE} for \emph{all} selected columns.

\code{across()} supersedes the family of "scoped variants" like
\code{summarise_at()}, \code{summarise_if()}, and \code{summarise_all()}.
}
\section{Timing of evaluation}{

R code in dplyr verbs is generally evaluated once per group.
Inside \code{across()} however, code is evaluated once for each
combination of columns and groups. If the evaluation timing is
important, for example if you're generating random variables, think
about when it should happen and place your code in consequence.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{gdf <-
  tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) \%>\%
  group_by(g)

set.seed(1)

# Outside: 1 normal variate
n <- rnorm(1)
gdf \%>\% mutate(across(v1:v2, ~ .x + n))
#> # A tibble: 4 x 3
#> # Groups:   g [3]
#>       g    v1    v2
#>   <dbl> <dbl> <dbl>
#> 1     1  9.37  19.4
#> 2     1 10.4   20.4
#> 3     2 11.4   21.4
#> 4     3 12.4   22.4

# Inside a verb: 3 normal variates (ngroup)
gdf \%>\% mutate(n = rnorm(1), across(v1:v2, ~ .x + n))
#> # A tibble: 4 x 4
#> # Groups:   g [3]
#>       g    v1    v2      n
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1  10.2  20.2  0.184
#> 2     1  11.2  21.2  0.184
#> 3     2  11.2  21.2 -0.836
#> 4     3  14.6  24.6  1.60

# Inside `across()`: 6 normal variates (ncol * ngroup)
gdf \%>\% mutate(across(v1:v2, ~ .x + rnorm(1)))
#> # A tibble: 4 x 3
#> # Groups:   g [3]
#>       g    v1    v2
#>   <dbl> <dbl> <dbl>
#> 1     1  10.3  20.7
#> 2     1  11.3  21.7
#> 3     2  11.2  22.6
#> 4     3  13.5  22.7
}\if{html}{\out{</div>}}
}

\examples{
# across() -----------------------------------------------------------------
# Different ways to select the same set of columns
# See <https://tidyselect.r-lib.org/articles/syntax.html> for details
iris \%>\%
  as_tibble() \%>\%
  mutate(across(c(Sepal.Length, Sepal.Width), round))
iris \%>\%
  as_tibble() \%>\%
  mutate(across(c(1, 2), round))
iris \%>\%
  as_tibble() \%>\%
  mutate(across(1:Sepal.Width, round))
iris \%>\%
  as_tibble() \%>\%
  mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round))

# A purrr-style formula
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))

# A named list of functions
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd)))

# Use the .names argument to control the output names
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"))
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd), .names = "{.col}.{.fn}"))

# When the list is not named, .fn is replaced by the function's position
iris \%>\%
  group_by(Species) \%>\%
  summarise(across(starts_with("Sepal"), list(mean, sd), .names = "{.col}.fn{.fn}"))

# across() returns a data frame, which can be used as input of another function
df <- data.frame(
  x1  = c(1, 2, NA),
  x2  = c(4, NA, 6),
  y   = c("a", "b", "c")
)
df \%>\%
  mutate(x_complete = complete.cases(across(starts_with("x"))))
df \%>\%
  filter(complete.cases(across(starts_with("x"))))

# if_any() and if_all() ----------------------------------------------------
iris \%>\%
  filter(if_any(ends_with("Width"), ~ . > 4))
iris \%>\%
  filter(if_all(ends_with("Width"), ~ . > 2))

}
\seealso{
\code{\link[=c_across]{c_across()}} for a function that returns a vector
}