File: group_by.Rd

package info (click to toggle)
r-cran-dplyr 1.1.4-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,292 kB
  • sloc: cpp: 1,403; sh: 17; makefile: 7
file content (164 lines) | stat: -rw-r--r-- 5,521 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/group-by.R
\name{group_by}
\alias{group_by}
\alias{ungroup}
\title{Group by one or more variables}
\usage{
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

ungroup(x, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{In \code{group_by()}, variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate \code{mutate()} step before the \code{group_by()}.
Computations are not allowed in \code{nest_by()}.
In \code{ungroup()}, variables to remove from the grouping.}

\item{.add}{When \code{FALSE}, the default, \code{group_by()} will
override existing groups. To add to the existing groups, use
\code{.add = TRUE}.

This argument was previously called \code{add}, but that prevented
creating a new grouping variable called \code{add}, and conflicts with
our naming conventions.}

\item{.drop}{Drop groups formed by factor levels that don't appear in the
data? The default is \code{TRUE} except when \code{.data} has been previously
grouped with \code{.drop = FALSE}. See \code{\link[=group_by_drop_default]{group_by_drop_default()}} for details.}

\item{x}{A \code{\link[=tbl]{tbl()}}}
}
\value{
A grouped data frame with class \code{\link{grouped_df}},
unless the combination of \code{...} and \code{add} yields a empty set of
grouping columns, in which case a tibble will be returned.
}
\description{
Most data operations are done on groups defined by variables.
\code{group_by()} takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". \code{ungroup()} removes grouping.
}
\section{Methods}{

These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:
\itemize{
\item \code{group_by()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("group_by")}.
\item \code{ungroup()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("ungroup")}.
}
}

\section{Ordering}{

Currently, \code{group_by()} internally orders the groups in ascending order. This
results in ordered output from functions that aggregate groups, such as
\code{\link[=summarise]{summarise()}}.

When used as grouping columns, character vectors are ordered in the C locale
for performance and reproducibility across R sessions. If the resulting
ordering of your grouped operation matters and is dependent on the locale,
you should follow up the grouped operation with an explicit call to
\code{\link[=arrange]{arrange()}} and set the \code{.locale} argument. For example:

\if{html}{\out{<div class="sourceCode">}}\preformatted{data \%>\%
  group_by(chr) \%>\%
  summarise(avg = mean(x)) \%>\%
  arrange(chr, .locale = "en")
}\if{html}{\out{</div>}}

This is often useful as a preliminary step before generating content intended
for humans, such as an HTML table.
\subsection{Legacy behavior}{

Prior to dplyr 1.1.0, character vector grouping columns were ordered in the
system locale. If you need to temporarily revert to this behavior, you can
set the global option \code{dplyr.legacy_locale} to \code{TRUE}, but this should be
used sparingly and you should expect this option to be removed in a future
version of dplyr. It is better to update existing code to explicitly call
\code{arrange(.locale = )} instead. Note that setting \code{dplyr.legacy_locale} will
also force calls to \code{\link[=arrange]{arrange()}} to use the system locale.
}
}

\examples{
by_cyl <- mtcars \%>\% group_by(cyl)

# grouping doesn't change how the data looks (apart from listing
# how it's grouped):
by_cyl

# It changes how it acts with the other dplyr verbs:
by_cyl \%>\% summarise(
  disp = mean(disp),
  hp = mean(hp)
)
by_cyl \%>\% filter(disp == max(disp))

# Each call to summarise() removes a layer of grouping
by_vs_am <- mtcars \%>\% group_by(vs, am)
by_vs <- by_vs_am \%>\% summarise(n = n())
by_vs
by_vs \%>\% summarise(n = sum(n))

# To removing grouping, use ungroup
by_vs \%>\%
  ungroup() \%>\%
  summarise(n = sum(n))

# By default, group_by() overrides existing grouping
by_cyl \%>\%
  group_by(vs, am) \%>\%
  group_vars()

# Use add = TRUE to instead append
by_cyl \%>\%
  group_by(vs, am, .add = TRUE) \%>\%
  group_vars()

# You can group by expressions: this is a short-hand
# for a mutate() followed by a group_by()
mtcars \%>\%
  group_by(vsam = vs + am)

# The implicit mutate() step is always performed on the
# ungrouped data. Here we get 3 groups:
mtcars \%>\%
  group_by(vs) \%>\%
  group_by(hp_cut = cut(hp, 3))

# If you want it to be performed by groups,
# you have to use an explicit mutate() call.
# Here we get 3 groups per value of vs
mtcars \%>\%
  group_by(vs) \%>\%
  mutate(hp_cut = cut(hp, 3)) \%>\%
  group_by(hp_cut)

# when factors are involved and .drop = FALSE, groups can be empty
tbl <- tibble(
  x = 1:10,
  y = factor(rep(c("a", "c"), each  = 5), levels = c("a", "b", "c"))
)
tbl \%>\%
  group_by(y, .drop = FALSE) \%>\%
  group_rows()

}
\seealso{
Other grouping functions: 
\code{\link{group_map}()},
\code{\link{group_nest}()},
\code{\link{group_split}()},
\code{\link{group_trim}()}
}
\concept{grouping functions}