File: complete.Rd

package info (click to toggle)
r-cran-tidyr 1.3.1-1
links: PTS, VCS
area: main
in suites: sid, trixie
size: 2,720 kB
sloc: cpp: 268; sh: 9; makefile: 2
file content (98 lines) | stat: -rw-r--r-- 3,638 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/complete.R
\name{complete}
\alias{complete}
\title{Complete a data frame with missing combinations of data}
\usage{
complete(data, ..., fill = list(), explicit = TRUE)
}
\arguments{
\item{data}{A data frame.}

\item{...}{<\code{\link[=tidyr_data_masking]{data-masking}}> Specification of columns
to expand or complete. Columns can be atomic vectors or lists.
\itemize{
\item To find all unique combinations of \code{x}, \code{y} and \code{z}, including those not
present in the data, supply each variable as a separate argument:
\code{expand(df, x, y, z)} or \code{complete(df, x, y, z)}.
\item To find only the combinations that occur in the
data, use \code{nesting}: \code{expand(df, nesting(x, y, z))}.
\item You can combine the two forms. For example,
\code{expand(df, nesting(school_id, student_id), date)} would produce
a row for each present school-student combination for all possible
dates.
}

When used with factors, \code{\link[=expand]{expand()}} and \code{\link[=complete]{complete()}} use the full set of
levels, not just those that appear in the data. If you want to use only the
values seen in the data, use \code{forcats::fct_drop()}.

When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
\code{year = 2010:2020} or \code{year = full_seq(year,1)}.}

\item{fill}{A named list that for each variable supplies a single value to
use instead of \code{NA} for missing combinations.}

\item{explicit}{Should both implicit (newly created) and explicit
(pre-existing) missing values be filled by \code{fill}? By default, this is
\code{TRUE}, but if set to \code{FALSE} this will limit the fill to only implicit
missing values.}
}
\description{
Turns implicit missing values into explicit missing values. This is a wrapper
around \code{\link[=expand]{expand()}}, \code{\link[dplyr:mutate-joins]{dplyr::full_join()}} and \code{\link[=replace_na]{replace_na()}} that's useful for
completing missing combinations of data.
}
\section{Grouped data frames}{

With grouped data frames created by \code{\link[dplyr:group_by]{dplyr::group_by()}}, \code{complete()}
operates \emph{within} each group. Because of this, you cannot complete a grouping
column.
}

\examples{
df <- tibble(
  group = c(1:2, 1, 2),
  item_id = c(1:2, 2, 3),
  item_name = c("a", "a", "b", "b"),
  value1 = c(1, NA, 3, 4),
  value2 = 4:7
)
df

# Combinations --------------------------------------------------------------
# Generate all possible combinations of `group`, `item_id`, and `item_name`
# (whether or not they appear in the data)
df \%>\% complete(group, item_id, item_name)

# Cross all possible `group` values with the unique pairs of
# `(item_id, item_name)` that already exist in the data
df \%>\% complete(group, nesting(item_id, item_name))

# Within each `group`, generate all possible combinations of
# `item_id` and `item_name` that occur in that group
df \%>\%
  dplyr::group_by(group) \%>\%
  complete(item_id, item_name)

# Supplying values for new rows ---------------------------------------------
# Use `fill` to replace NAs with some value. By default, affects both new
# (implicit) and pre-existing (explicit) missing values.
df \%>\%
  complete(
    group,
    nesting(item_id, item_name),
    fill = list(value1 = 0, value2 = 99)
  )

# Limit the fill to only the newly created (i.e. previously implicit)
# missing values with `explicit = FALSE`
df \%>\%
  complete(
    group,
    nesting(item_id, item_name),
    fill = list(value1 = 0, value2 = 99),
    explicit = FALSE
  )
}