File: group_vfold_cv.Rd

package info (click to toggle)
r-cran-rsample 1.2.1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,932 kB
sloc: sh: 13; makefile: 2
file content (94 lines) | stat: -rw-r--r-- 3,213 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vfold.R
\name{group_vfold_cv}
\alias{group_vfold_cv}
\title{Group V-Fold Cross-Validation}
\usage{
group_vfold_cv(
  data,
  group = NULL,
  v = NULL,
  repeats = 1,
  balance = c("groups", "observations"),
  ...,
  strata = NULL,
  pool = 0.1
)
}
\arguments{
\item{data}{A data frame.}

\item{group}{A variable in \code{data} (single character or name) used for
grouping observations with the same value to either the analysis or
assessment set within a fold.}

\item{v}{The number of partitions of the data set. If left as \code{NULL} (the
default), \code{v} will be set to the number of unique values in the grouping
variable, creating "leave-one-group-out" splits.}

\item{repeats}{The number of times to repeat the V-fold partitioning.}

\item{balance}{If \code{v} is less than the number of unique groups, how should
groups be combined into folds? Should be one of
\code{"groups"}, which will assign roughly the same number of groups to each
fold, or \code{"observations"}, which will assign roughly the same number of
observations to each fold.}

\item{...}{These dots are for future extensions and must be empty.}

\item{strata}{A variable in \code{data} (single character or name) used to conduct
stratified sampling. When not \code{NULL}, each resample is created within the
stratification variable. Numeric \code{strata} are binned into quartiles.}

\item{pool}{A proportion of data used to determine if a particular group is
too small and should be pooled into another group. We do not recommend
decreasing this argument below its default of 0.1 because of the dangers
of stratifying groups that are too small.}
}
\value{
A tibble with classes \code{group_vfold_cv},
\code{rset}, \code{tbl_df}, \code{tbl}, and \code{data.frame}.
The results include a column for the data split objects and an
identification variable.
}
\description{
Group V-fold cross-validation creates splits of the data based
on some grouping variable (which may have more than a single row
associated with it). The function can create as many splits as
there are unique values of the grouping variable or it can
create a smaller set of splits where more than one group is left
out at a time. A common use of this kind of resampling is when you have
repeated measures of the same subject.
}
\examples{
\dontshow{if (rlang::is_installed("modeldata")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
data(ames, package = "modeldata")

set.seed(123)
group_vfold_cv(ames, group = Neighborhood, v = 5)
group_vfold_cv(
  ames,
  group = Neighborhood,
  v = 5,
  balance = "observations"
)
group_vfold_cv(ames, group = Neighborhood, v = 5, repeats = 2)

# Leave-one-group-out CV
group_vfold_cv(ames, group = Neighborhood)

library(dplyr)
data(Sacramento, package = "modeldata")

city_strata <- Sacramento \%>\%
  group_by(city) \%>\%
  summarize(strata = mean(price)) \%>\%
  summarize(city = city,
            strata = cut(strata, quantile(strata), include.lowest = TRUE))

sacramento_data <- Sacramento \%>\%
  full_join(city_strata, by = "city")

group_vfold_cv(sacramento_data, city, strata = strata)
\dontshow{\}) # examplesIf}
}