File: expand.Rd

package info (click to toggle)
r-cran-tidyr 1.3.1-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 2,720 kB
  • sloc: cpp: 268; sh: 9; makefile: 2
file content (122 lines) | stat: -rw-r--r-- 4,791 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/expand.R
\name{expand}
\alias{expand}
\alias{crossing}
\alias{nesting}
\title{Expand data frame to include all possible combinations of values}
\usage{
expand(data, ..., .name_repair = "check_unique")

crossing(..., .name_repair = "check_unique")

nesting(..., .name_repair = "check_unique")
}
\arguments{
\item{data}{A data frame.}

\item{...}{<\code{\link[=tidyr_data_masking]{data-masking}}> Specification of columns
to expand or complete. Columns can be atomic vectors or lists.
\itemize{
\item To find all unique combinations of \code{x}, \code{y} and \code{z}, including those not
present in the data, supply each variable as a separate argument:
\code{expand(df, x, y, z)} or \code{complete(df, x, y, z)}.
\item To find only the combinations that occur in the
data, use \code{nesting}: \code{expand(df, nesting(x, y, z))}.
\item You can combine the two forms. For example,
\code{expand(df, nesting(school_id, student_id), date)} would produce
a row for each present school-student combination for all possible
dates.
}

When used with factors, \code{\link[=expand]{expand()}} and \code{\link[=complete]{complete()}} use the full set of
levels, not just those that appear in the data. If you want to use only the
values seen in the data, use \code{forcats::fct_drop()}.

When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
\code{year = 2010:2020} or \code{year = full_seq(year,1)}.}

\item{.name_repair}{Treatment of problematic column names:
\itemize{
\item \code{"minimal"}: No name repair or checks, beyond basic existence,
\item \code{"unique"}: Make sure names are unique and not empty,
\item \code{"check_unique"}: (default value), no name repair, but check they are
\code{unique},
\item \code{"universal"}: Make the names \code{unique} and syntactic
\item a function: apply custom name repair (e.g., \code{.name_repair = make.names}
for names in the style of base R).
\item A purrr-style anonymous function, see \code{\link[rlang:as_function]{rlang::as_function()}}
}

This argument is passed on as \code{repair} to \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}.
See there for more details on these terms and the strategies used
to enforce them.}
}
\description{
\code{expand()} generates all combination of variables found in a dataset.
It is paired with \code{nesting()} and \code{crossing()} helpers. \code{crossing()}
is a wrapper around \code{\link[=expand_grid]{expand_grid()}} that de-duplicates and sorts its inputs;
\code{nesting()} is a helper that only finds combinations already present in the
data.

\code{expand()} is often useful in conjunction with joins:
\itemize{
\item use it with \code{right_join()} to convert implicit missing values to
explicit missing values (e.g., fill in gaps in your data frame).
\item use it with \code{anti_join()} to figure out which combinations are missing
(e.g., identify gaps in your data frame).
}
}
\section{Grouped data frames}{

With grouped data frames created by \code{\link[dplyr:group_by]{dplyr::group_by()}}, \code{expand()} operates
\emph{within} each group. Because of this, you cannot expand on a grouping column.
}

\examples{
# Finding combinations ------------------------------------------------------
fruits <- tibble(
  type = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year = c(2010, 2010, 2012, 2010, 2011, 2012),
  size = factor(
    c("XS", "S", "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
)

# All combinations, including factor levels that are not used
fruits \%>\% expand(type)
fruits \%>\% expand(size)
fruits \%>\% expand(type, size)
fruits \%>\% expand(type, size, year)

# Only combinations that already appear in the data
fruits \%>\% expand(nesting(type))
fruits \%>\% expand(nesting(size))
fruits \%>\% expand(nesting(type, size))
fruits \%>\% expand(nesting(type, size, year))

# Other uses ----------------------------------------------------------------
# Use with `full_seq()` to fill in values of continuous variables
fruits \%>\% expand(type, size, full_seq(year, 1))
fruits \%>\% expand(type, size, 2010:2013)

# Use `anti_join()` to determine which observations are missing
all <- fruits \%>\% expand(type, size, year)
all
all \%>\% dplyr::anti_join(fruits)

# Use with `right_join()` to fill in missing rows (like `complete()`)
fruits \%>\% dplyr::right_join(all)

# Use with `group_by()` to expand within each group
fruits \%>\%
  dplyr::group_by(type) \%>\%
  expand(year, size)
}
\seealso{
\code{\link[=complete]{complete()}} to expand list objects. \code{\link[=expand_grid]{expand_grid()}}
to input vectors rather than a data frame.
}