File: mutate.Rd

package info (click to toggle)
r-cran-dplyr 1.1.4-4
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,292 kB
sloc: cpp: 1,403; sh: 17; makefile: 7
file content (204 lines) | stat: -rw-r--r-- 7,670 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mutate.R
\name{mutate}
\alias{mutate}
\alias{mutate.data.frame}
\title{Create, modify, and delete columns}
\usage{
mutate(.data, ...)

\method{mutate}{data.frame}(
  .data,
  ...,
  .by = NULL,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Name-value pairs.
The name gives the name of the column in the output.

The value can be:
\itemize{
\item A vector of length 1, which will be recycled to the correct length.
\item A vector the same length as the current group (or the whole data frame
if ungrouped).
\item \code{NULL}, to remove the column.
\item A data frame or tibble, to create multiple columns in the output.
}}

\item{.by}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}

<\code{\link[=dplyr_tidy_select]{tidy-select}}> Optionally, a selection of columns to
group by for just this operation, functioning as an alternative to \code{\link[=group_by]{group_by()}}. For
details and examples, see \link[=dplyr_by]{?dplyr_by}.}

\item{.keep}{Control which columns from \code{.data} are retained in the output. Grouping
columns and columns created by \code{...} are always kept.
\itemize{
\item \code{"all"} retains all columns from \code{.data}. This is the default.
\item \code{"used"} retains only the columns used in \code{...} to create new
columns. This is useful for checking your work, as it displays inputs
and outputs side-by-side.
\item \code{"unused"} retains only the columns \emph{not} used in \code{...} to create new
columns. This is useful if you generate new columns, but no longer need
the columns used to generate them.
\item \code{"none"} doesn't retain any extra columns from \code{.data}. Only the grouping
variables and columns created by \code{...} are kept.
}}

\item{.before, .after}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> Optionally, control where new columns
should appear (the default is to add to the right hand side). See
\code{\link[=relocate]{relocate()}} for more details.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Columns from \code{.data} will be preserved according to the \code{.keep} argument.
\item Existing columns that are modified by \code{...} will always be returned in
their original location.
\item New columns created through \code{...} will be placed according to the
\code{.before} and \code{.after} arguments.
\item The number of rows is not affected.
\item Columns given the value \code{NULL} will be removed.
\item Groups will be recomputed if a grouping variable is mutated.
\item Data frame attributes are preserved.
}
}
\description{
\code{mutate()} creates new columns that are functions of existing variables.
It can also modify (if the name is the same as an existing
column) and delete columns (by setting their value to \code{NULL}).
}
\section{Useful mutate functions}{

\itemize{
\item \code{\link{+}}, \code{\link{-}}, \code{\link[=log]{log()}}, etc., for their usual mathematical meanings
\item \code{\link[=lead]{lead()}}, \code{\link[=lag]{lag()}}
\item \code{\link[=dense_rank]{dense_rank()}}, \code{\link[=min_rank]{min_rank()}}, \code{\link[=percent_rank]{percent_rank()}}, \code{\link[=row_number]{row_number()}},
\code{\link[=cume_dist]{cume_dist()}}, \code{\link[=ntile]{ntile()}}
\item \code{\link[=cumsum]{cumsum()}}, \code{\link[=cummean]{cummean()}}, \code{\link[=cummin]{cummin()}}, \code{\link[=cummax]{cummax()}}, \code{\link[=cumany]{cumany()}}, \code{\link[=cumall]{cumall()}}
\item \code{\link[=na_if]{na_if()}}, \code{\link[=coalesce]{coalesce()}}
\item \code{\link[=if_else]{if_else()}}, \code{\link[=recode]{recode()}}, \code{\link[=case_when]{case_when()}}
}
}

\section{Grouped tibbles}{


Because mutating expressions are computed within groups, they may
yield different results on grouped tibbles. This will be the case
as soon as an aggregating, lagging, or ranking function is
involved. Compare this ungrouped mutate:

\if{html}{\out{<div class="sourceCode">}}\preformatted{starwars \%>\%
  select(name, mass, species) \%>\%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
}\if{html}{\out{</div>}}

With the grouped equivalent:

\if{html}{\out{<div class="sourceCode">}}\preformatted{starwars \%>\%
  select(name, mass, species) \%>\%
  group_by(species) \%>\%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
}\if{html}{\out{</div>}}

The former normalises \code{mass} by the global average whereas the
latter normalises by the averages within species levels.
}

\section{Methods}{

This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("mutate")}.
}

\examples{
# Newly created variables are available immediately
starwars \%>\%
  select(name, mass) \%>\%
  mutate(
    mass2 = mass * 2,
    mass2_squared = mass2 * mass2
  )

# As well as adding new variables, you can use mutate() to
# remove variables and modify existing variables.
starwars \%>\%
  select(name, height, mass, homeworld) \%>\%
  mutate(
    mass = NULL,
    height = height * 0.0328084 # convert to feet
  )

# Use across() with mutate() to apply a transformation
# to multiple columns in a tibble.
starwars \%>\%
  select(name, homeworld, species) \%>\%
  mutate(across(!name, as.factor))
# see more in ?across

# Window functions are useful for grouped mutates:
starwars \%>\%
  select(name, mass, homeworld) \%>\%
  group_by(homeworld) \%>\%
  mutate(rank = min_rank(desc(mass)))
# see `vignette("window-functions")` for more details

# By default, new columns are placed on the far right.
df <- tibble(x = 1, y = 2)
df \%>\% mutate(z = x + y)
df \%>\% mutate(z = x + y, .before = 1)
df \%>\% mutate(z = x + y, .after = x)

# By default, mutate() keeps all columns from the input data.
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df \%>\% mutate(z = x + y, .keep = "all") # the default
df \%>\% mutate(z = x + y, .keep = "used")
df \%>\% mutate(z = x + y, .keep = "unused")
df \%>\% mutate(z = x + y, .keep = "none")

# Grouping ----------------------------------------
# The mutate operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
# The following normalises `mass` by the global average:
starwars \%>\%
  select(name, mass, species) \%>\%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

# Whereas this normalises `mass` by the averages within species
# levels:
starwars \%>\%
  select(name, mass, species) \%>\%
  group_by(species) \%>\%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

# Indirection ----------------------------------------
# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
# Learn more in ?rlang::args_data_masking
}
\seealso{
Other single table verbs: 
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{reframe}()},
\code{\link{rename}()},
\code{\link{select}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}