File: make_groups.Rd

package info (click to toggle)
r-cran-rsample 1.2.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,932 kB
  • sloc: sh: 13; makefile: 2
file content (54 lines) | stat: -rw-r--r-- 2,047 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/make_groups.R
\name{make_groups}
\alias{make_groups}
\title{Make groupings for grouped rsplits}
\usage{
make_groups(
  data,
  group,
  v,
  balance = c("groups", "observations", "prop"),
  strata = NULL,
  ...
)
}
\arguments{
\item{data}{A data frame.}

\item{group}{A variable in \code{data} (single character or name) used for
grouping observations with the same value to either the analysis or
assessment set within a fold.}

\item{v}{The number of partitions of the data set.}

\item{balance}{If \code{v} is less than the number of unique groups, how should
groups be combined into folds? Should be one of
\code{"groups"}, \code{"observations"}, \code{"prop"}.}

\item{strata}{A variable in \code{data} (single character or name) used to conduct
stratified sampling. When not \code{NULL}, each resample is created within the
stratification variable. Numeric \code{strata} are binned into quartiles.}

\item{...}{Arguments passed to balance functions.}
}
\description{
This function powers grouped resampling by splitting the data based upon
a grouping variable and returning the assessment set indices for each
split.
}
\details{
Not all \code{balance} options are accepted -- or make sense -- for all resampling
functions. For instance, \code{balance = "prop"} assigns groups to folds at
random, meaning that any given observation is not guaranteed to be in one
(and only one) assessment set. That means \code{balance = "prop"} can't
be used with \code{\link[=group_vfold_cv]{group_vfold_cv()}}, and so isn't an option available for that
function.

Similarly, \code{\link[=group_mc_cv]{group_mc_cv()}} and its derivatives don't assign data to one (and
only one) assessment set, but rather allow each observation to be in an
assessment set zero-or-more times. As a result, those functions don't have
a \code{balance} argument, and under the hood always specify \code{balance = "prop"}
when they call \code{\link[=make_groups]{make_groups()}}.
}
\keyword{internal}