1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/boot.R
\name{bootstraps}
\alias{bootstraps}
\title{Bootstrap Sampling}
\usage{
bootstraps(
data,
times = 25,
strata = NULL,
breaks = 4,
pool = 0.1,
apparent = FALSE,
...
)
}
\arguments{
\item{data}{A data frame.}
\item{times}{The number of bootstrap samples.}
\item{strata}{A variable in \code{data} (single character or name) used to conduct
stratified sampling. When not \code{NULL}, each resample is created within the
stratification variable. Numeric \code{strata} are binned into quartiles.}
\item{breaks}{A single number giving the number of bins desired to stratify a
numeric stratification variable.}
\item{pool}{A proportion of data used to determine if a particular group is
too small and should be pooled into another group. We do not recommend
decreasing this argument below its default of 0.1 because of the dangers
of stratifying groups that are too small.}
\item{apparent}{A logical. Should an extra resample be added where the
analysis and holdout subset are the entire data set. This is required for
some estimators used by the \code{summary} function that require the apparent
error rate.}
\item{...}{These dots are for future extensions and must be empty.}
}
\value{
A tibble with classes \code{bootstraps}, \code{rset}, \code{tbl_df}, \code{tbl}, and
\code{data.frame}. The results include a column for the data split objects and a
column called \code{id} that has a character string with the resample identifier.
}
\description{
A bootstrap sample is a sample that is the same size as the original data
set that is made using replacement. This results in analysis samples that
have multiple replicates of some of the original rows of the data. The
assessment set is defined as the rows of the original data that were not
included in the bootstrap sample. This is often referred to as the
"out-of-bag" (OOB) sample.
}
\details{
The argument \code{apparent} enables the option of an additional
"resample" where the analysis and assessment data sets are the same as the
original data set. This can be required for some types of analysis of the
bootstrap results.
With a \code{strata} argument, the random sampling is conducted
\emph{within the stratification variable}. This can help ensure that the
resamples have equivalent proportions as the original data set. For
a categorical variable, sampling is conducted separately within each class.
For a numeric stratification variable, \code{strata} is binned into quartiles,
which are then used to stratify. Strata below 10\% of the total are
pooled together; see \code{\link[=make_strata]{make_strata()}} for more details.
}
\examples{
\dontshow{if (rlang::is_installed("modeldata")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
bootstraps(mtcars, times = 2)
bootstraps(mtcars, times = 2, apparent = TRUE)
library(purrr)
library(modeldata)
data(wa_churn)
set.seed(13)
resample1 <- bootstraps(wa_churn, times = 3)
map_dbl(
resample1$splits,
function(x) {
dat <- as.data.frame(x)$churn
mean(dat == "Yes")
}
)
set.seed(13)
resample2 <- bootstraps(wa_churn, strata = churn, times = 3)
map_dbl(
resample2$splits,
function(x) {
dat <- as.data.frame(x)$churn
mean(dat == "Yes")
}
)
set.seed(13)
resample3 <- bootstraps(wa_churn, strata = tenure, breaks = 6, times = 3)
map_dbl(
resample3$splits,
function(x) {
dat <- as.data.frame(x)$churn
mean(dat == "Yes")
}
)
\dontshow{\}) # examplesIf}
}
|