File: chop.Rd

package info (click to toggle)
r-cran-tidyr 1.3.1-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 2,720 kB
  • sloc: cpp: 268; sh: 9; makefile: 2
file content (92 lines) | stat: -rw-r--r-- 3,793 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/chop.R
\name{chop}
\alias{chop}
\alias{unchop}
\title{Chop and unchop}
\usage{
chop(data, cols, ..., error_call = current_env())

unchop(
  data,
  cols,
  ...,
  keep_empty = FALSE,
  ptype = NULL,
  error_call = current_env()
)
}
\arguments{
\item{data}{A data frame.}

\item{cols}{<\code{\link[=tidyr_tidy_select]{tidy-select}}> Columns to chop or unchop.

For \code{unchop()}, each column should be a list-column containing generalised
vectors (e.g. any mix of \code{NULL}s, atomic vector, S3 vectors, a lists,
or data frames).}

\item{...}{These dots are for future extensions and must be empty.}

\item{error_call}{The execution environment of a currently
running function, e.g. \code{caller_env()}. The function will be
mentioned in error messages as the source of the error. See the
\code{call} argument of \code{\link[rlang:abort]{abort()}} for more information.}

\item{keep_empty}{By default, you get one row of output for each element
of the list that you are unchopping/unnesting. This means that if there's a
size-0 element (like \code{NULL} or an empty data frame or vector), then that
entire row will be dropped from the output. If you want to preserve all
rows, use \code{keep_empty = TRUE} to replace size-0 elements with a single row
of missing values.}

\item{ptype}{Optionally, a named list of column name-prototype pairs to
coerce \code{cols} to, overriding the default that will be guessed from
combining the individual values. Alternatively, a single empty ptype
can be supplied, which will be applied to all \code{cols}.}
}
\description{
Chopping and unchopping preserve the width of a data frame, changing its
length. \code{chop()} makes \code{df} shorter by converting rows within each group
into list-columns. \code{unchop()} makes \code{df} longer by expanding list-columns
so that each element of the list-column gets its own row in the output.
\code{chop()} and \code{unchop()} are building blocks for more complicated functions
(like \code{\link[=unnest]{unnest()}}, \code{\link[=unnest_longer]{unnest_longer()}}, and \code{\link[=unnest_wider]{unnest_wider()}}) and are generally
more suitable for programming than interactive data analysis.
}
\details{
Generally, unchopping is more useful than chopping because it simplifies
a complex data structure, and \code{\link[=nest]{nest()}}ing is usually more appropriate
than \code{chop()}ing since it better preserves the connections between
observations.

\code{chop()} creates list-columns of class \code{\link[vctrs:list_of]{vctrs::list_of()}} to ensure
consistent behaviour when the chopped data frame is emptied. For
instance this helps getting back the original column types after
the roundtrip chop and unchop. Because \verb{<list_of>} keeps tracks of
the type of its elements, \code{unchop()} is able to reconstitute the
correct vector type even for empty list-columns.
}
\examples{
# Chop ----------------------------------------------------------------------
df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
# Note that we get one row of output for each unique combination of
# non-chopped variables
df \%>\% chop(c(y, z))
# cf nest
df \%>\% nest(data = c(y, z))

# Unchop --------------------------------------------------------------------
df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3))
df \%>\% unchop(y)
df \%>\% unchop(y, keep_empty = TRUE)

# unchop will error if the types are not compatible:
df <- tibble(x = 1:2, y = list("1", 1:3))
try(df \%>\% unchop(y))

# Unchopping a list-col of data frames must generate a df-col because
# unchop leaves the column names unchanged
df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2)))
df \%>\% unchop(y)
df \%>\% unchop(y, keep_empty = TRUE)
}