File: rsample-dplyr.Rd

package info (click to toggle)
r-cran-rsample 1.1.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,872 kB
  • sloc: sh: 13; makefile: 2
file content (112 lines) | stat: -rw-r--r-- 4,353 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/compat-dplyr.R
\name{rsample-dplyr}
\alias{rsample-dplyr}
\title{Compatibility with dplyr}
\description{
rsample should be fully compatible with dplyr 1.0.0.

With older versions of dplyr, there is partial support for the following
verbs: \code{mutate()}, \code{arrange()}, \code{filter()}, \code{rename()}, \code{select()}, and
\code{slice()}. We strongly recommend updating to dplyr 1.0.0 if possible to
get more complete integration with dplyr.
}
\section{Version Specific Behavior}{


rsample performs somewhat differently depending on whether you have
dplyr >= 1.0.0 (new) or dplyr < 1.0.0 (old). Additionally, version
0.0.7 of rsample (new) introduced some changes to how rsample objects
work with dplyr, even on old dplyr. Most of these changes influence the
return value of a dplyr verb and determine whether it will be a tibble
or an rsample rset subclass.

The table below attempts to capture most of these changes. These examples
are not exhaustive and may not capture some edge-cases.
\subsection{Joins}{

The following affect all of the dplyr joins, such as \code{left_join()},
\code{right_join()}, \code{full_join()}, and \code{inner_join()}.

Joins that alter the rows of the original rset object:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{join(rset, tbl)} \tab error \tab error \tab tibble \cr
}


The idea here is that, if there are less rows in the result, the result should
not be an rset object. For example, you can't have a 10-fold CV object
without 10 rows.

Joins that keep the rows of the original rset object:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{join(rset, tbl)} \tab error \tab error \tab rset \cr
}


As with the logic above, if the original rset object (defined by the split
column and the id column(s)) is left intact, the results should be an rset.
}

\subsection{Row Subsetting}{

As mentioned above, this should result in a tibble if any rows are removed
or added. Simply reordering rows still results in a valid rset with new
rsample.

Cases where rows are removed or added:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{rset[ind,]} \tab tibble \tab tibble \tab tibble \cr
   \code{slice(rset)} \tab rset \tab tibble \tab tibble \cr
   \code{filter(rset)} \tab rset \tab tibble \tab tibble \cr
}


Cases where all rows are kept, but are possibly reordered:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{rset[ind,]} \tab tibble \tab rset \tab rset \cr
   \code{slice(rset)} \tab rset \tab rset \tab rset \cr
   \code{filter(rset)} \tab rset \tab rset \tab rset \cr
   \code{arrange(rset)} \tab rset \tab rset \tab rset \cr
}

}

\subsection{Column Subsetting}{

When the \code{splits} column or any \code{id} columns are dropped or renamed,
the result should no longer be considered a valid rset.

Cases when the required columns are removed or renamed:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{rset[,ind]} \tab tibble \tab tibble \tab tibble \cr
   \code{select(rset)} \tab rset \tab tibble \tab tibble \cr
   \code{rename(rset)} \tab tibble \tab tibble \tab tibble \cr
}


Cases when no required columns are affected:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{rset[,ind]} \tab tibble \tab rset \tab rset \cr
   \code{select(rset)} \tab rset \tab rset \tab rset \cr
   \code{rename(rset)} \tab rset \tab rset \tab rset \cr
}

}

\subsection{Other Column Operations}{

Cases when the required columns are altered:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{mutate(rset)} \tab rset \tab tibble \tab tibble \cr
}


Cases when no required columns are affected:\tabular{lccc}{
   operation \tab old rsample + old dplyr \tab new rsample + old dplyr \tab new rsample + new dplyr \cr
   \code{mutate(rset)} \tab rset \tab rset \tab rset \cr
}

}
}