1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/summarise.R
\name{summarise}
\alias{summarise}
\alias{summarize}
\title{Summarise each group down to one row}
\usage{
summarise(.data, ..., .by = NULL, .groups = NULL)
summarize(.data, ..., .by = NULL, .groups = NULL)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Name-value pairs of
summary functions. The name will be the name of the variable in the result.
The value can be:
\itemize{
\item A vector of length 1, e.g. \code{min(x)}, \code{n()}, or \code{sum(is.na(y))}.
\item A data frame, to add multiple columns from a single expression.
}
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} Returning values with size 0 or >1 was
deprecated as of 1.1.0. Please use \code{\link[=reframe]{reframe()}} for this instead.}
\item{.by}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
<\code{\link[=dplyr_tidy_select]{tidy-select}}> Optionally, a selection of columns to
group by for just this operation, functioning as an alternative to \code{\link[=group_by]{group_by()}}. For
details and examples, see \link[=dplyr_by]{?dplyr_by}.}
\item{.groups}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}} Grouping structure of the
result.
\itemize{
\item "drop_last": dropping the last level of grouping. This was the
only supported option before version 1.0.0.
\item "drop": All levels of grouping are dropped.
\item "keep": Same grouping structure as \code{.data}.
\item "rowwise": Each row is its own group.
}
When \code{.groups} is not specified, it is chosen
based on the number of rows of the results:
\itemize{
\item If all the results have 1 row, you get "drop_last".
\item If the number of rows varies, you get "keep" (note that returning a
variable number of rows was deprecated in favor of \code{\link[=reframe]{reframe()}}, which
also unconditionally drops all levels of grouping).
}
In addition, a message informs you of that choice, unless the result is ungrouped,
the option "dplyr.summarise.inform" is set to \code{FALSE},
or when \code{summarise()} is called from a function in a package.}
}
\value{
An object \emph{usually} of the same type as \code{.data}.
\itemize{
\item The rows come from the underlying \code{\link[=group_keys]{group_keys()}}.
\item The columns are a combination of the grouping keys and the summary
expressions that you provide.
\item The grouping structure is controlled by the \verb{.groups=} argument, the
output may be another \link{grouped_df}, a \link{tibble} or a \link{rowwise} data frame.
\item Data frame attributes are \strong{not} preserved, because \code{summarise()}
fundamentally creates a new data frame.
}
}
\description{
\code{summarise()} creates a new data frame. It returns one row for each
combination of grouping variables; if there are no grouping variables, the
output will have a single row summarising all observations in the input. It
will contain one column for each grouping variable and one column for each of
the summary statistics that you have specified.
\code{summarise()} and \code{summarize()} are synonyms.
}
\section{Useful functions}{
\itemize{
\item Center: \code{\link[=mean]{mean()}}, \code{\link[=median]{median()}}
\item Spread: \code{\link[=sd]{sd()}}, \code{\link[=IQR]{IQR()}}, \code{\link[=mad]{mad()}}
\item Range: \code{\link[=min]{min()}}, \code{\link[=max]{max()}},
\item Position: \code{\link[=first]{first()}}, \code{\link[=last]{last()}}, \code{\link[=nth]{nth()}},
\item Count: \code{\link[=n]{n()}}, \code{\link[=n_distinct]{n_distinct()}}
\item Logical: \code{\link[=any]{any()}}, \code{\link[=all]{all()}}
}
}
\section{Backend variations}{
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in \code{\link[=mutate]{mutate()}}.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected
results, consider using new names for your summary variables, especially when
creating multiple summaries.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("summarise")}.
}
\examples{
# A summary applied to ungrouped tbl returns a single row
mtcars \%>\%
summarise(mean = mean(disp), n = n())
# Usually, you'll want to group first
mtcars \%>\%
group_by(cyl) \%>\%
summarise(mean = mean(disp), n = n())
# Each summary call removes one grouping level (since that group
# is now just a single row)
mtcars \%>\%
group_by(cyl, vs) \%>\%
summarise(cyl_n = n()) \%>\%
group_vars()
# BEWARE: reusing variables may lead to unexpected results
mtcars \%>\%
group_by(cyl) \%>\%
summarise(disp = mean(disp), sd = sd(disp))
# Refer to column names stored as strings with the `.data` pronoun:
var <- "mass"
summarise(starwars, avg = mean(.data[[var]], na.rm = TRUE))
# Learn more in ?rlang::args_data_masking
# In dplyr 1.1.0, returning multiple rows per group was deprecated in favor
# of `reframe()`, which never messages and always returns an ungrouped
# result:
mtcars \%>\%
group_by(cyl) \%>\%
summarise(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))
# ->
mtcars \%>\%
group_by(cyl) \%>\%
reframe(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{mutate}()},
\code{\link{reframe}()},
\code{\link{rename}()},
\code{\link{select}()},
\code{\link{slice}()}
}
\concept{single table verbs}
|