1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rolling_origin.R
\name{rolling_origin}
\alias{rolling_origin}
\title{Rolling Origin Forecast Resampling}
\usage{
rolling_origin(
data,
initial = 5,
assess = 1,
cumulative = TRUE,
skip = 0,
lag = 0,
...
)
}
\arguments{
\item{data}{A data frame.}
\item{initial}{The number of samples used for analysis/modeling in the
initial resample.}
\item{assess}{The number of samples used for each assessment resample.}
\item{cumulative}{A logical. Should the analysis resample grow beyond the
size specified by \code{initial} at each resample?.}
\item{skip}{A integer indicating how many (if any) \emph{additional} resamples
to skip to thin the total amount of data points in the analysis resample.
See the example below.}
\item{lag}{A value to include a lag between the assessment
and analysis set. This is useful if lagged predictors will be used
during training and testing.}
\item{...}{These dots are for future extensions and must be empty.}
}
\value{
An tibble with classes \code{rolling_origin}, \code{rset}, \code{tbl_df}, \code{tbl},
and \code{data.frame}. The results include a column for the data split objects
and a column called \code{id} that has a character string with the resample
identifier.
}
\description{
This resampling method is useful when the data set has a strong time
component. The resamples are not random and contain data points that are
consecutive values. The function assumes that the original data set are
sorted in time order.
}
\details{
The main options, \code{initial} and \code{assess}, control the number of
data points from the original data that are in the analysis and assessment
set, respectively. When \code{cumulative = TRUE}, the analysis set will grow as
resampling continues while the assessment set size will always remain
static.
\code{skip} enables the function to not use every data point in the resamples.
When \code{skip = 0}, the resampling data sets will increment by one position.
Suppose that the rows of a data set are consecutive days. Using \code{skip = 6}
will make the analysis data set to operate on \emph{weeks} instead of days. The
assessment set size is not affected by this option.
}
\examples{
\dontshow{if (rlang::is_installed("modeldata")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
set.seed(1131)
ex_data <- data.frame(row = 1:20, some_var = rnorm(20))
dim(rolling_origin(ex_data))
dim(rolling_origin(ex_data, skip = 2))
dim(rolling_origin(ex_data, skip = 2, cumulative = FALSE))
# You can also roll over calendar periods by first nesting by that period,
# which is especially useful for irregular series where a fixed window
# is not useful. This example slides over 5 years at a time.
library(dplyr)
library(tidyr)
data(drinks, package = "modeldata")
drinks_annual <- drinks \%>\%
mutate(year = as.POSIXlt(date)$year + 1900) \%>\%
nest(data = c(-year))
multi_year_roll <- rolling_origin(drinks_annual, cumulative = FALSE)
analysis(multi_year_roll$splits[[1]])
assessment(multi_year_roll$splits[[1]])
\dontshow{\}) # examplesIf}
}
\seealso{
\code{\link[=sliding_window]{sliding_window()}}, \code{\link[=sliding_index]{sliding_index()}}, and \code{\link[=sliding_period]{sliding_period()}} for additional
time based resampling functions.
}
|