File: pivot_wider.tbl_lazy.Rd

package info (click to toggle)
r-cran-dbplyr 2.3.0%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bookworm
size: 2,376 kB
sloc: sh: 13; makefile: 2
file content (138 lines) | stat: -rw-r--r-- 5,234 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/verb-pivot-wider.R
\name{pivot_wider.tbl_lazy}
\alias{pivot_wider.tbl_lazy}
\title{Pivot data from long to wide}
\usage{
\method{pivot_wider}{tbl_lazy}(
  data,
  id_cols = NULL,
  names_from = name,
  names_prefix = "",
  names_sep = "_",
  names_glue = NULL,
  names_sort = FALSE,
  names_vary = "fastest",
  names_expand = FALSE,
  names_repair = "check_unique",
  values_from = value,
  values_fill = NULL,
  values_fn = ~max(.x, na.rm = TRUE),
  unused_fn = NULL,
  ...
)
}
\arguments{
\item{data}{A lazy data frame backed by a database query.}

\item{id_cols}{A set of columns that uniquely identifies each observation.}

\item{names_from, values_from}{A pair of
arguments describing which column (or columns) to get the name of the
output column (\code{names_from}), and which column (or columns) to get the
cell values from (\code{values_from}).

If \code{values_from} contains multiple values, the value will be added to the
front of the output column.}

\item{names_prefix}{String added to the start of every variable name.}

\item{names_sep}{If \code{names_from} or \code{values_from} contains multiple
variables, this will be used to join their values together into a single
string to use as a column name.}

\item{names_glue}{Instead of \code{names_sep} and \code{names_prefix}, you can supply
a glue specification that uses the \code{names_from} columns (and special
\code{.value}) to create custom column names.}

\item{names_sort}{Should the column names be sorted? If \code{FALSE}, the default,
column names are ordered by first appearance.}

\item{names_vary}{When \code{names_from} identifies a column (or columns) with
multiple unique values, and multiple \code{values_from} columns are provided,
in what order should the resulting column names be combined?
\itemize{
\item \code{"fastest"} varies \code{names_from} values fastest, resulting in a column
naming scheme of the form: \verb{value1_name1, value1_name2, value2_name1, value2_name2}. This is the default.
\item \code{"slowest"} varies \code{names_from} values slowest, resulting in a column
naming scheme of the form: \verb{value1_name1, value2_name1, value1_name2, value2_name2}.
}}

\item{names_expand}{Should the values in the \code{names_from} columns be expanded
by \code{\link[=expand]{expand()}} before pivoting? This results in more columns, the output
will contain column names corresponding to a complete expansion of all
possible values in \code{names_from}. Additionally, the column names will be
sorted, identical to what \code{names_sort} would produce.}

\item{names_repair}{What happens if the output has invalid column names?}

\item{values_fill}{Optionally, a (scalar) value that specifies what each
\code{value} should be filled in with when missing.}

\item{values_fn}{A function, the default is \code{max()}, applied to the \code{value}
in each cell in the output. In contrast to local data frames it must not be
\code{NULL}.}

\item{unused_fn}{Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by \code{id_cols},
\code{names_from}, or \code{values_from}).

The default drops all unused columns from the result.

This can be a named list if you want to apply different aggregations
to different unused columns.

\code{id_cols} must be supplied for \code{unused_fn} to be useful, since otherwise
all unspecified columns will be considered \code{id_cols}.

This is similar to grouping by the \code{id_cols} then summarizing the
unused columns using \code{unused_fn}.}

\item{...}{Unused; included for compatibility with generic.}
}
\description{
\code{pivot_wider()} "widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
\code{pivot_longer()}.
Learn more in \code{vignette("pivot", "tidyr")}.

Note that \code{pivot_wider()} is not and cannot be lazy because we need to look
at the data to figure out what the new column names will be.
}
\details{
The big difference to \code{pivot_wider()} for local data frames is that
\code{values_fn} must not be \code{NULL}. By default it is \code{max()} which yields
the same results as for local data frames if the combination of \code{id_cols}
and \code{value} column uniquely identify an observation.
Mind that you also do not get a warning if an observation is not uniquely
identified.

The translation to SQL code basically works as follows:
\enumerate{
\item Get unique keys in \code{names_from} column.
\item For each key value generate an expression of the form:

\if{html}{\out{<div class="sourceCode sql">}}\preformatted{value_fn(
  CASE WHEN (`names from column` == `key value`)
  THEN (`value column`)
  END
) AS `output column`
}\if{html}{\out{</div>}}
\item Group data by id columns.
\item Summarise the grouped data with the expressions from step 2.
}
}
\examples{
\dontshow{if (rlang::is_installed("tidyr", version = "1.0.0")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
memdb_frame(
  id = 1,
  key = c("x", "y"),
  value = 1:2
) \%>\%
  tidyr::pivot_wider(
    id_cols = id,
    names_from = key,
    values_from = value
  )
\dontshow{\}) # examplesIf}
}