1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/verb-pivot-wider.R
\name{pivot_wider.tbl_lazy}
\alias{pivot_wider.tbl_lazy}
\title{Pivot data from long to wide}
\usage{
\method{pivot_wider}{tbl_lazy}(
data,
id_cols = NULL,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL,
...
)
}
\arguments{
\item{data}{A lazy data frame backed by a database query.}
\item{id_cols}{A set of columns that uniquely identifies each observation.}
\item{names_from, values_from}{A pair of
arguments describing which column (or columns) to get the name of the
output column (\code{names_from}), and which column (or columns) to get the
cell values from (\code{values_from}).
If \code{values_from} contains multiple values, the value will be added to the
front of the output column.}
\item{names_prefix}{String added to the start of every variable name.}
\item{names_sep}{If \code{names_from} or \code{values_from} contains multiple
variables, this will be used to join their values together into a single
string to use as a column name.}
\item{names_glue}{Instead of \code{names_sep} and \code{names_prefix}, you can supply
a glue specification that uses the \code{names_from} columns (and special
\code{.value}) to create custom column names.}
\item{names_sort}{Should the column names be sorted? If \code{FALSE}, the default,
column names are ordered by first appearance.}
\item{names_vary}{When \code{names_from} identifies a column (or columns) with
multiple unique values, and multiple \code{values_from} columns are provided,
in what order should the resulting column names be combined?
\itemize{
\item \code{"fastest"} varies \code{names_from} values fastest, resulting in a column
naming scheme of the form: \verb{value1_name1, value1_name2, value2_name1, value2_name2}. This is the default.
\item \code{"slowest"} varies \code{names_from} values slowest, resulting in a column
naming scheme of the form: \verb{value1_name1, value2_name1, value1_name2, value2_name2}.
}}
\item{names_expand}{Should the values in the \code{names_from} columns be expanded
by \code{\link[=expand]{expand()}} before pivoting? This results in more columns, the output
will contain column names corresponding to a complete expansion of all
possible values in \code{names_from}. Additionally, the column names will be
sorted, identical to what \code{names_sort} would produce.}
\item{names_repair}{What happens if the output has invalid column names?}
\item{values_fill}{Optionally, a (scalar) value that specifies what each
\code{value} should be filled in with when missing.}
\item{values_fn}{A function, the default is \code{max()}, applied to the \code{value}
in each cell in the output. In contrast to local data frames it must not be
\code{NULL}.}
\item{unused_fn}{Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by \code{id_cols},
\code{names_from}, or \code{values_from}).
The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations
to different unused columns.
\code{id_cols} must be supplied for \code{unused_fn} to be useful, since otherwise
all unspecified columns will be considered \code{id_cols}.
This is similar to grouping by the \code{id_cols} then summarizing the
unused columns using \code{unused_fn}.}
\item{...}{Unused; included for compatibility with generic.}
}
\description{
\code{pivot_wider()} "widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
\code{pivot_longer()}.
Learn more in \code{vignette("pivot", "tidyr")}.
Note that \code{pivot_wider()} is not and cannot be lazy because we need to look
at the data to figure out what the new column names will be.
}
\details{
The big difference to \code{pivot_wider()} for local data frames is that
\code{values_fn} must not be \code{NULL}. By default it is \code{max()} which yields
the same results as for local data frames if the combination of \code{id_cols}
and \code{value} column uniquely identify an observation.
Mind that you also do not get a warning if an observation is not uniquely
identified.
The translation to SQL code basically works as follows:
\enumerate{
\item Get unique keys in \code{names_from} column.
\item For each key value generate an expression of the form:
\if{html}{\out{<div class="sourceCode sql">}}\preformatted{value_fn(
CASE WHEN (`names from column` == `key value`)
THEN (`value column`)
END
) AS `output column`
}\if{html}{\out{</div>}}
\item Group data by id columns.
\item Summarise the grouped data with the expressions from step 2.
}
}
\examples{
\dontshow{if (rlang::is_installed("tidyr", version = "1.0.0")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
memdb_frame(
id = 1,
key = c("x", "y"),
value = 1:2
) \%>\%
tidyr::pivot_wider(
id_cols = id,
names_from = key,
values_from = value
)
\dontshow{\}) # examplesIf}
}
|