File: slice.Rd

package info (click to toggle)
r-cran-dplyr 1.1.4-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,292 kB
  • sloc: cpp: 1,403; sh: 17; makefile: 7
file content (210 lines) | stat: -rw-r--r-- 7,773 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/slice.R
\name{slice}
\alias{slice}
\alias{slice_head}
\alias{slice_tail}
\alias{slice_min}
\alias{slice_max}
\alias{slice_sample}
\title{Subset rows using their positions}
\usage{
slice(.data, ..., .by = NULL, .preserve = FALSE)

slice_head(.data, ..., n, prop, by = NULL)

slice_tail(.data, ..., n, prop, by = NULL)

slice_min(
  .data,
  order_by,
  ...,
  n,
  prop,
  by = NULL,
  with_ties = TRUE,
  na_rm = FALSE
)

slice_max(
  .data,
  order_by,
  ...,
  n,
  prop,
  by = NULL,
  with_ties = TRUE,
  na_rm = FALSE
)

slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}

\item{...}{For \code{slice()}: <\code{\link[rlang:args_data_masking]{data-masking}}>
Integer row values.

Provide either positive values to keep, or negative values to drop.
The values provided must be either all positive or all negative.
Indices beyond the number of rows in the input are silently ignored.

For \verb{slice_*()}, these arguments are passed on to methods.}

\item{.by, by}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}

<\code{\link[=dplyr_tidy_select]{tidy-select}}> Optionally, a selection of columns to
group by for just this operation, functioning as an alternative to \code{\link[=group_by]{group_by()}}. For
details and examples, see \link[=dplyr_by]{?dplyr_by}.}

\item{.preserve}{Relevant when the \code{.data} input is grouped.
If \code{.preserve = FALSE} (the default), the grouping structure
is recalculated based on the resulting data, otherwise the grouping is kept as is.}

\item{n, prop}{Provide either \code{n}, the number of rows, or \code{prop}, the
proportion of rows to select. If neither are supplied, \code{n = 1} will be
used. If \code{n} is greater than the number of rows in the group
(or \code{prop > 1}), the result will be silently truncated to the group size.
\code{prop} will be rounded towards zero to generate an integer number of
rows.

A negative value of \code{n} or \code{prop} will be subtracted from the group
size. For example, \code{n = -2} with a group of 5 rows will select 5 - 2 = 3
rows; \code{prop = -0.25} with 8 rows will select 8 * (1 - 0.25) = 6 rows.}

\item{order_by}{<\code{\link[rlang:args_data_masking]{data-masking}}> Variable or
function of variables to order by. To order by multiple variables, wrap
them in a data frame or tibble.}

\item{with_ties}{Should ties be kept together? The default, \code{TRUE},
may return more rows than you request. Use \code{FALSE} to ignore ties,
and return the first \code{n} rows.}

\item{na_rm}{Should missing values in \code{order_by} be removed from the result?
If \code{FALSE}, \code{NA} values are sorted to the end (like in \code{\link[=arrange]{arrange()}}), so
they will only be included if there are insufficient non-missing values to
reach \code{n}/\code{prop}.}

\item{weight_by}{<\code{\link[rlang:args_data_masking]{data-masking}}> Sampling
weights. This must evaluate to a vector of non-negative numbers the same
length as the input. Weights are automatically standardised to sum to 1.}

\item{replace}{Should sampling be performed with (\code{TRUE}) or without
(\code{FALSE}, the default) replacement.}
}
\value{
An object of the same type as \code{.data}. The output has the following
properties:
\itemize{
\item Each row may appear 0, 1, or many times in the output.
\item Columns are not modified.
\item Groups are not modified.
\item Data frame attributes are preserved.
}
}
\description{
\code{slice()} lets you index rows by their (integer) locations. It allows you
to select, remove, and duplicate rows. It is accompanied by a number of
helpers for common use cases:
\itemize{
\item \code{slice_head()} and \code{slice_tail()} select the first or last rows.
\item \code{slice_sample()} randomly selects rows.
\item \code{slice_min()} and \code{slice_max()} select rows with the smallest or largest
values of a variable.
}

If \code{.data} is a \link{grouped_df}, the operation will be performed on each group,
so that (e.g.) \code{slice_head(df, n = 5)} will select the first five rows in
each group.
}
\details{
Slice does not work with relational databases because they have no
intrinsic notion of row order. If you want to perform the equivalent
operation, use \code{\link[=filter]{filter()}} and \code{\link[=row_number]{row_number()}}.
}
\section{Methods}{

These function are \strong{generic}s, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:
\itemize{
\item \code{slice()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice")}.
\item \code{slice_head()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_head")}.
\item \code{slice_tail()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_tail")}.
\item \code{slice_min()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_min")}.
\item \code{slice_max()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_max")}.
\item \code{slice_sample()}: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("slice_sample")}.
}
}

\examples{
# Similar to head(mtcars, 1):
mtcars \%>\% slice(1L)
# Similar to tail(mtcars, 1):
mtcars \%>\% slice(n())
mtcars \%>\% slice(5:n())
# Rows can be dropped with negative indices:
slice(mtcars, -(1:4))

# First and last rows based on existing order
mtcars \%>\% slice_head(n = 5)
mtcars \%>\% slice_tail(n = 5)

# Rows with minimum and maximum values of a variable
mtcars \%>\% slice_min(mpg, n = 5)
mtcars \%>\% slice_max(mpg, n = 5)

# slice_min() and slice_max() may return more rows than requested
# in the presence of ties.
mtcars \%>\% slice_min(cyl, n = 1)
# Use with_ties = FALSE to return exactly n matches
mtcars \%>\% slice_min(cyl, n = 1, with_ties = FALSE)
# Or use additional variables to break the tie:
mtcars \%>\% slice_min(tibble(cyl, mpg), n = 1)

# slice_sample() allows you to random select with or without replacement
mtcars \%>\% slice_sample(n = 5)
mtcars \%>\% slice_sample(n = 5, replace = TRUE)

# you can optionally weight by a variable - this code weights by the
# physical weight of the cars, so heavy cars are more likely to get
# selected
mtcars \%>\% slice_sample(weight_by = wt, n = 5)

# Group wise operation ----------------------------------------
df <- tibble(
  group = rep(c("a", "b", "c"), c(1, 2, 4)),
  x = runif(7)
)

# All slice helpers operate per group, silently truncating to the group
# size, so the following code works without error
df \%>\% group_by(group) \%>\% slice_head(n = 2)

# When specifying the proportion of rows to include non-integer sizes
# are rounded down, so group a gets 0 rows
df \%>\% group_by(group) \%>\% slice_head(prop = 0.5)

# Filter equivalents --------------------------------------------
# slice() expressions can often be written to use `filter()` and
# `row_number()`, which can also be translated to SQL. For many databases,
# you'll need to supply an explicit variable to use to compute the row number.
filter(mtcars, row_number() == 1L)
filter(mtcars, row_number() == n())
filter(mtcars, between(row_number(), 5, n()))
}
\seealso{
Other single table verbs: 
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{mutate}()},
\code{\link{reframe}()},
\code{\link{rename}()},
\code{\link{select}()},
\code{\link{summarise}()}
}
\concept{single table verbs}