1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mutate.R
\name{mutate}
\alias{mutate}
\alias{mutate.data.frame}
\title{Create, modify, and delete columns}
\usage{
mutate(.data, ...)
\method{mutate}{data.frame}(
.data,
...,
.by = NULL,
.keep = c("all", "used", "unused", "none"),
.before = NULL,
.after = NULL
)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Name-value pairs.
The name gives the name of the column in the output.
The value can be:
\itemize{
\item A vector of length 1, which will be recycled to the correct length.
\item A vector the same length as the current group (or the whole data frame
if ungrouped).
\item \code{NULL}, to remove the column.
\item A data frame or tibble, to create multiple columns in the output.
}}
\item{.by}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
<\code{\link[=dplyr_tidy_select]{tidy-select}}> Optionally, a selection of columns to
group by for just this operation, functioning as an alternative to \code{\link[=group_by]{group_by()}}. For
details and examples, see \link[=dplyr_by]{?dplyr_by}.}
\item{.keep}{Control which columns from \code{.data} are retained in the output. Grouping
columns and columns created by \code{...} are always kept.
\itemize{
\item \code{"all"} retains all columns from \code{.data}. This is the default.
\item \code{"used"} retains only the columns used in \code{...} to create new
columns. This is useful for checking your work, as it displays inputs
and outputs side-by-side.
\item \code{"unused"} retains only the columns \emph{not} used in \code{...} to create new
columns. This is useful if you generate new columns, but no longer need
the columns used to generate them.
\item \code{"none"} doesn't retain any extra columns from \code{.data}. Only the grouping
variables and columns created by \code{...} are kept.
}}
\item{.before, .after}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> Optionally, control where new columns
should appear (the default is to add to the right hand side). See
\code{\link[=relocate]{relocate()}} for more details.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Columns from \code{.data} will be preserved according to the \code{.keep} argument.
\item Existing columns that are modified by \code{...} will always be returned in
their original location.
\item New columns created through \code{...} will be placed according to the
\code{.before} and \code{.after} arguments.
\item The number of rows is not affected.
\item Columns given the value \code{NULL} will be removed.
\item Groups will be recomputed if a grouping variable is mutated.
\item Data frame attributes are preserved.
}
}
\description{
\code{mutate()} creates new columns that are functions of existing variables.
It can also modify (if the name is the same as an existing
column) and delete columns (by setting their value to \code{NULL}).
}
\section{Useful mutate functions}{
\itemize{
\item \code{\link{+}}, \code{\link{-}}, \code{\link[=log]{log()}}, etc., for their usual mathematical meanings
\item \code{\link[=lead]{lead()}}, \code{\link[=lag]{lag()}}
\item \code{\link[=dense_rank]{dense_rank()}}, \code{\link[=min_rank]{min_rank()}}, \code{\link[=percent_rank]{percent_rank()}}, \code{\link[=row_number]{row_number()}},
\code{\link[=cume_dist]{cume_dist()}}, \code{\link[=ntile]{ntile()}}
\item \code{\link[=cumsum]{cumsum()}}, \code{\link[=cummean]{cummean()}}, \code{\link[=cummin]{cummin()}}, \code{\link[=cummax]{cummax()}}, \code{\link[=cumany]{cumany()}}, \code{\link[=cumall]{cumall()}}
\item \code{\link[=na_if]{na_if()}}, \code{\link[=coalesce]{coalesce()}}
\item \code{\link[=if_else]{if_else()}}, \code{\link[=recode]{recode()}}, \code{\link[=case_when]{case_when()}}
}
}
\section{Grouped tibbles}{
Because mutating expressions are computed within groups, they may
yield different results on grouped tibbles. This will be the case
as soon as an aggregating, lagging, or ranking function is
involved. Compare this ungrouped mutate:
\if{html}{\out{<div class="sourceCode">}}\preformatted{starwars \%>\%
select(name, mass, species) \%>\%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
}\if{html}{\out{</div>}}
With the grouped equivalent:
\if{html}{\out{<div class="sourceCode">}}\preformatted{starwars \%>\%
select(name, mass, species) \%>\%
group_by(species) \%>\%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
}\if{html}{\out{</div>}}
The former normalises \code{mass} by the global average whereas the
latter normalises by the averages within species levels.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("mutate")}.
}
\examples{
# Newly created variables are available immediately
starwars \%>\%
select(name, mass) \%>\%
mutate(
mass2 = mass * 2,
mass2_squared = mass2 * mass2
)
# As well as adding new variables, you can use mutate() to
# remove variables and modify existing variables.
starwars \%>\%
select(name, height, mass, homeworld) \%>\%
mutate(
mass = NULL,
height = height * 0.0328084 # convert to feet
)
# Use across() with mutate() to apply a transformation
# to multiple columns in a tibble.
starwars \%>\%
select(name, homeworld, species) \%>\%
mutate(across(!name, as.factor))
# see more in ?across
# Window functions are useful for grouped mutates:
starwars \%>\%
select(name, mass, homeworld) \%>\%
group_by(homeworld) \%>\%
mutate(rank = min_rank(desc(mass)))
# see `vignette("window-functions")` for more details
# By default, new columns are placed on the far right.
df <- tibble(x = 1, y = 2)
df \%>\% mutate(z = x + y)
df \%>\% mutate(z = x + y, .before = 1)
df \%>\% mutate(z = x + y, .after = x)
# By default, mutate() keeps all columns from the input data.
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df \%>\% mutate(z = x + y, .keep = "all") # the default
df \%>\% mutate(z = x + y, .keep = "used")
df \%>\% mutate(z = x + y, .keep = "unused")
df \%>\% mutate(z = x + y, .keep = "none")
# Grouping ----------------------------------------
# The mutate operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
# The following normalises `mass` by the global average:
starwars \%>\%
select(name, mass, species) \%>\%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
# Whereas this normalises `mass` by the averages within species
# levels:
starwars \%>\%
select(name, mass, species) \%>\%
group_by(species) \%>\%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
# Indirection ----------------------------------------
# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
# Learn more in ?rlang::args_data_masking
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{reframe}()},
\code{\link{rename}()},
\code{\link{select}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}
|