1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pivot-wide.R
\name{pivot_wider_spec}
\alias{pivot_wider_spec}
\alias{build_wider_spec}
\title{Pivot data from long to wide using a spec}
\usage{
pivot_wider_spec(
data,
spec,
...,
names_repair = "check_unique",
id_cols = NULL,
id_expand = FALSE,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL,
error_call = current_env()
)
build_wider_spec(
data,
...,
names_from = name,
values_from = value,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
error_call = current_env()
)
}
\arguments{
\item{data}{A data frame to pivot.}
\item{spec}{A specification data frame. This is useful for more complex
pivots because it gives you greater control on how metadata stored in the
columns become column names in the result.
Must be a data frame containing character \code{.name} and \code{.value} columns.
Additional columns in \code{spec} should be named to match columns in the
long format of the dataset and contain values corresponding to columns
pivoted from the wide format.
The special \code{.seq} variable is used to disambiguate rows internally;
it is automatically removed after pivoting.}
\item{...}{These dots are for future extensions and must be empty.}
\item{names_repair}{What happens if the output has invalid column names?
The default, \code{"check_unique"} is to error if the columns are duplicated.
Use \code{"minimal"} to allow duplicates in the output, or \code{"unique"} to
de-duplicated by adding numeric suffixes. See \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}
for more options.}
\item{id_cols}{<\code{\link[=tidyr_tidy_select]{tidy-select}}> A set of columns that
uniquely identifies each observation. Defaults to all columns in \code{data}
except for the columns specified in \code{spec$.value} and the columns of the
\code{spec} that aren't named \code{.name} or \code{.value}. Typically used when you have
redundant variables, i.e. variables whose values are perfectly correlated
with existing variables.}
\item{id_expand}{Should the values in the \code{id_cols} columns be expanded by
\code{\link[=expand]{expand()}} before pivoting? This results in more rows, the output will
contain a complete expansion of all possible values in \code{id_cols}. Implicit
factor levels that aren't represented in the data will become explicit.
Additionally, the row values corresponding to the expanded \code{id_cols} will
be sorted.}
\item{values_fill}{Optionally, a (scalar) value that specifies what each
\code{value} should be filled in with when missing.
This can be a named list if you want to apply different fill values to
different value columns.}
\item{values_fn}{Optionally, a function applied to the value in each cell
in the output. You will typically use this when the combination of
\code{id_cols} and \code{names_from} columns does not uniquely identify an
observation.
This can be a named list if you want to apply different aggregations
to different \code{values_from} columns.}
\item{unused_fn}{Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by \code{id_cols},
\code{names_from}, or \code{values_from}).
The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations
to different unused columns.
\code{id_cols} must be supplied for \code{unused_fn} to be useful, since otherwise
all unspecified columns will be considered \code{id_cols}.
This is similar to grouping by the \code{id_cols} then summarizing the
unused columns using \code{unused_fn}.}
\item{error_call}{The execution environment of a currently
running function, e.g. \code{caller_env()}. The function will be
mentioned in error messages as the source of the error. See the
\code{call} argument of \code{\link[rlang:abort]{abort()}} for more information.}
\item{names_from, values_from}{<\code{\link[=tidyr_tidy_select]{tidy-select}}> A pair of
arguments describing which column (or columns) to get the name of the
output column (\code{names_from}), and which column (or columns) to get the
cell values from (\code{values_from}).
If \code{values_from} contains multiple values, the value will be added to the
front of the output column.}
\item{names_prefix}{String added to the start of every variable name. This is
particularly useful if \code{names_from} is a numeric vector and you want to
create syntactic variable names.}
\item{names_sep}{If \code{names_from} or \code{values_from} contains multiple
variables, this will be used to join their values together into a single
string to use as a column name.}
\item{names_glue}{Instead of \code{names_sep} and \code{names_prefix}, you can supply
a glue specification that uses the \code{names_from} columns (and special
\code{.value}) to create custom column names.}
\item{names_sort}{Should the column names be sorted? If \code{FALSE}, the default,
column names are ordered by first appearance.}
\item{names_vary}{When \code{names_from} identifies a column (or columns) with
multiple unique values, and multiple \code{values_from} columns are provided,
in what order should the resulting column names be combined?
\itemize{
\item \code{"fastest"} varies \code{names_from} values fastest, resulting in a column
naming scheme of the form: \verb{value1_name1, value1_name2, value2_name1, value2_name2}. This is the default.
\item \code{"slowest"} varies \code{names_from} values slowest, resulting in a column
naming scheme of the form: \verb{value1_name1, value2_name1, value1_name2, value2_name2}.
}}
\item{names_expand}{Should the values in the \code{names_from} columns be expanded
by \code{\link[=expand]{expand()}} before pivoting? This results in more columns, the output
will contain column names corresponding to a complete expansion of all
possible values in \code{names_from}. Implicit factor levels that aren't
represented in the data will become explicit. Additionally, the column
names will be sorted, identical to what \code{names_sort} would produce.}
}
\description{
This is a low level interface to pivoting, inspired by the cdata package,
that allows you to describe pivoting with a data frame.
}
\examples{
# See vignette("pivot") for examples and explanation
us_rent_income
spec1 <- us_rent_income \%>\%
build_wider_spec(names_from = variable, values_from = c(estimate, moe))
spec1
us_rent_income \%>\%
pivot_wider_spec(spec1)
# Is equivalent to
us_rent_income \%>\%
pivot_wider(names_from = variable, values_from = c(estimate, moe))
# `pivot_wider_spec()` provides more control over column names and output format
# instead of creating columns with estimate_ and moe_ prefixes,
# keep original variable name for estimates and attach _moe as suffix
spec2 <- tibble(
.name = c("income", "rent", "income_moe", "rent_moe"),
.value = c("estimate", "estimate", "moe", "moe"),
variable = c("income", "rent", "income", "rent")
)
us_rent_income \%>\%
pivot_wider_spec(spec2)
}
\keyword{internal}
|