File: group_by.Rd

package info (click to toggle)
r-cran-dplyr 1.1.4-4
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,292 kB
sloc: cpp: 1,403; sh: 17; makefile: 7
file content (164 lines) | stat: -rw-r--r-- 5,521 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/group-by.R
\name{group_by}
\alias{group_by}
\alias{ungroup}
\title{Group by one or more variables}
\usage{
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

ungroup(x, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{In \code{group_by()}, variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate \code{mutate()} step before the \code{group_by()}.
Computations are not allowed in \code{nest_by()}.
In \code{ungroup()}, variables to remove from the grouping.}

\item{.add}{When \code{FALSE}, the default, \code{group_by()} will
override existing groups. To add to the existing groups, use
\code{.add = TRUE}.

This argument was previously called \code{add}, but that prevented
creating a new grouping variable called \code{add}, and conflicts with
our naming conventions.}

\item{.drop}{Drop groups formed by factor levels that don't appear in the
data? The default is \code{TRUE} except when \code{.data} has been previously
grouped with \code{.drop = FALSE}. See \code{\link[=group_by_drop_default]{group_by_drop_default()}} for details.}

\item{x}{A \code{\link[=tbl]{tbl()}}}
}
\value{
A grouped data frame with class \code{\link{grouped_df}},
unless the combination of \code{...} and \code{add} yields a empty set of
grouping columns, in which case a tibble will be returned.
}
\description{
Most data operations are done on groups defined by variables.
\code{group_by()} takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". \code{ungroup()} removes grouping.
}
\section{Methods}{

These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:
\itemize{
\item \code{group_by()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("group_by")}.
\item \code{ungroup()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("ungroup")}.
}
}

\section{Ordering}{

Currently, \code{group_by()} internally orders the groups in ascending order. This
results in ordered output from functions that aggregate groups, such as
\code{\link[=summarise]{summarise()}}.

When used as grouping columns, character vectors are ordered in the C locale
for performance and reproducibility across R sessions. If the resulting
ordering of your grouped operation matters and is dependent on the locale,
you should follow up the grouped operation with an explicit call to
\code{\link[=arrange]{arrange()}} and set the \code{.locale} argument. For example:

\if{html}{\out{<div class="sourceCode">}}\preformatted{data \%>\%
  group_by(chr) \%>\%
  summarise(avg = mean(x)) \%>\%
  arrange(chr, .locale = "en")
}\if{html}{\out{</div>}}

This is often useful as a preliminary step before generating content intended
for humans, such as an HTML table.
\subsection{Legacy behavior}{

Prior to dplyr 1.1.0, character vector grouping columns were ordered in the
system locale. If you need to temporarily revert to this behavior, you can
set the global option \code{dplyr.legacy_locale} to \code{TRUE}, but this should be
used sparingly and you should expect this option to be removed in a future
version of dplyr. It is better to update existing code to explicitly call
\code{arrange(.locale = )} instead. Note that setting \code{dplyr.legacy_locale} will
also force calls to \code{\link[=arrange]{arrange()}} to use the system locale.
}
}

\examples{
by_cyl <- mtcars \%>\% group_by(cyl)

# grouping doesn't change how the data looks (apart from listing
# how it's grouped):
by_cyl

# It changes how it acts with the other dplyr verbs:
by_cyl \%>\% summarise(
  disp = mean(disp),
  hp = mean(hp)
)
by_cyl \%>\% filter(disp == max(disp))

# Each call to summarise() removes a layer of grouping
by_vs_am <- mtcars \%>\% group_by(vs, am)
by_vs <- by_vs_am \%>\% summarise(n = n())
by_vs
by_vs \%>\% summarise(n = sum(n))

# To removing grouping, use ungroup
by_vs \%>\%
  ungroup() \%>\%
  summarise(n = sum(n))

# By default, group_by() overrides existing grouping
by_cyl \%>\%
  group_by(vs, am) \%>\%
  group_vars()

# Use add = TRUE to instead append
by_cyl \%>\%
  group_by(vs, am, .add = TRUE) \%>\%
  group_vars()

# You can group by expressions: this is a short-hand
# for a mutate() followed by a group_by()
mtcars \%>\%
  group_by(vsam = vs + am)

# The implicit mutate() step is always performed on the
# ungrouped data. Here we get 3 groups:
mtcars \%>\%
  group_by(vs) \%>\%
  group_by(hp_cut = cut(hp, 3))

# If you want it to be performed by groups,
# you have to use an explicit mutate() call.
# Here we get 3 groups per value of vs
mtcars \%>\%
  group_by(vs) \%>\%
  mutate(hp_cut = cut(hp, 3)) \%>\%
  group_by(hp_cut)

# when factors are involved and .drop = FALSE, groups can be empty
tbl <- tibble(
  x = 1:10,
  y = factor(rep(c("a", "c"), each  = 5), levels = c("a", "b", "c"))
)
tbl \%>\%
  group_by(y, .drop = FALSE) \%>\%
  group_rows()

}
\seealso{
Other grouping functions: 
\code{\link{group_map}()},
\code{\link{group_nest}()},
\code{\link{group_split}()},
\code{\link{group_trim}()}
}
\concept{grouping functions}