1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/expand.R
\name{expand}
\alias{expand}
\alias{crossing}
\alias{nesting}
\title{Expand data frame to include all possible combinations of values}
\usage{
expand(data, ..., .name_repair = "check_unique")
crossing(..., .name_repair = "check_unique")
nesting(..., .name_repair = "check_unique")
}
\arguments{
\item{data}{A data frame.}
\item{...}{<\code{\link[=tidyr_data_masking]{data-masking}}> Specification of columns
to expand or complete. Columns can be atomic vectors or lists.
\itemize{
\item To find all unique combinations of \code{x}, \code{y} and \code{z}, including those not
present in the data, supply each variable as a separate argument:
\code{expand(df, x, y, z)} or \code{complete(df, x, y, z)}.
\item To find only the combinations that occur in the
data, use \code{nesting}: \code{expand(df, nesting(x, y, z))}.
\item You can combine the two forms. For example,
\code{expand(df, nesting(school_id, student_id), date)} would produce
a row for each present school-student combination for all possible
dates.
}
When used with factors, \code{\link[=expand]{expand()}} and \code{\link[=complete]{complete()}} use the full set of
levels, not just those that appear in the data. If you want to use only the
values seen in the data, use \code{forcats::fct_drop()}.
When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
\code{year = 2010:2020} or \code{year = full_seq(year,1)}.}
\item{.name_repair}{Treatment of problematic column names:
\itemize{
\item \code{"minimal"}: No name repair or checks, beyond basic existence,
\item \code{"unique"}: Make sure names are unique and not empty,
\item \code{"check_unique"}: (default value), no name repair, but check they are
\code{unique},
\item \code{"universal"}: Make the names \code{unique} and syntactic
\item a function: apply custom name repair (e.g., \code{.name_repair = make.names}
for names in the style of base R).
\item A purrr-style anonymous function, see \code{\link[rlang:as_function]{rlang::as_function()}}
}
This argument is passed on as \code{repair} to \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}.
See there for more details on these terms and the strategies used
to enforce them.}
}
\description{
\code{expand()} generates all combination of variables found in a dataset.
It is paired with \code{nesting()} and \code{crossing()} helpers. \code{crossing()}
is a wrapper around \code{\link[=expand_grid]{expand_grid()}} that de-duplicates and sorts its inputs;
\code{nesting()} is a helper that only finds combinations already present in the
data.
\code{expand()} is often useful in conjunction with joins:
\itemize{
\item use it with \code{right_join()} to convert implicit missing values to
explicit missing values (e.g., fill in gaps in your data frame).
\item use it with \code{anti_join()} to figure out which combinations are missing
(e.g., identify gaps in your data frame).
}
}
\section{Grouped data frames}{
With grouped data frames created by \code{\link[dplyr:group_by]{dplyr::group_by()}}, \code{expand()} operates
\emph{within} each group. Because of this, you cannot expand on a grouping column.
}
\examples{
# Finding combinations ------------------------------------------------------
fruits <- tibble(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2011, 2012),
size = factor(
c("XS", "S", "M", "S", "S", "M"),
levels = c("XS", "S", "M", "L")
),
weights = rnorm(6, as.numeric(size) + 2)
)
# All combinations, including factor levels that are not used
fruits \%>\% expand(type)
fruits \%>\% expand(size)
fruits \%>\% expand(type, size)
fruits \%>\% expand(type, size, year)
# Only combinations that already appear in the data
fruits \%>\% expand(nesting(type))
fruits \%>\% expand(nesting(size))
fruits \%>\% expand(nesting(type, size))
fruits \%>\% expand(nesting(type, size, year))
# Other uses ----------------------------------------------------------------
# Use with `full_seq()` to fill in values of continuous variables
fruits \%>\% expand(type, size, full_seq(year, 1))
fruits \%>\% expand(type, size, 2010:2013)
# Use `anti_join()` to determine which observations are missing
all <- fruits \%>\% expand(type, size, year)
all
all \%>\% dplyr::anti_join(fruits)
# Use with `right_join()` to fill in missing rows (like `complete()`)
fruits \%>\% dplyr::right_join(all)
# Use with `group_by()` to expand within each group
fruits \%>\%
dplyr::group_by(type) \%>\%
expand(year, size)
}
\seealso{
\code{\link[=complete]{complete()}} to expand list objects. \code{\link[=expand_grid]{expand_grid()}}
to input vectors rather than a data frame.
}
|