File: nested_cv.Rd

package info (click to toggle)
r-cran-rsample 1.2.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,932 kB
  • sloc: sh: 13; makefile: 2
file content (63 lines) | stat: -rw-r--r-- 2,166 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nest.R
\name{nested_cv}
\alias{nested_cv}
\title{Nested or Double Resampling}
\usage{
nested_cv(data, outside, inside)
}
\arguments{
\item{data}{A data frame.}

\item{outside}{The initial resampling specification. This can be an already
created object or an expression of a new object (see the examples below).
If the latter is used, the \code{data} argument does not need to be
specified and, if it is given, will be ignored.}

\item{inside}{An expression for the type of resampling to be conducted
within the initial procedure.}
}
\value{
An tibble with \code{nested_cv} class and any other classes that
outer resampling process normally contains. The results include a
column for the outer data split objects, one or more \code{id} columns,
and a column of nested tibbles called \code{inner_resamples} with the
additional resamples.
}
\description{
\code{nested_cv} can be used to take the results of one resampling procedure
and conduct further resamples within each split. Any type of resampling
used in \code{rsample} can be used.
}
\details{
It is a bad idea to use bootstrapping as the outer resampling procedure (see
the example below)
}
\examples{
## Using expressions for the resampling procedures:
nested_cv(mtcars, outside = vfold_cv(v = 3), inside = bootstraps(times = 5))

## Using an existing object:
folds <- vfold_cv(mtcars)
nested_cv(mtcars, folds, inside = bootstraps(times = 5))

## The dangers of outer bootstraps:
set.seed(2222)
bad_idea <- nested_cv(mtcars,
  outside = bootstraps(times = 5),
  inside = vfold_cv(v = 3)
)

first_outer_split <- bad_idea$splits[[1]]
outer_analysis <- as.data.frame(first_outer_split)
sum(grepl("Volvo 142E", rownames(outer_analysis)))

## For the 3-fold CV used inside of each bootstrap, how are the replicated
## `Volvo 142E` data partitioned?
first_inner_split <- bad_idea$inner_resamples[[1]]$splits[[1]]
inner_analysis <- as.data.frame(first_inner_split)
inner_assess <- as.data.frame(first_inner_split, data = "assessment")

sum(grepl("Volvo 142E", rownames(inner_analysis)))
sum(grepl("Volvo 142E", rownames(inner_assess)))
}