File: froll.Rd

package info (click to toggle)
r-cran-data.table 1.14.8%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 15,936 kB
  • sloc: ansic: 15,680; sh: 100; makefile: 6
file content (214 lines) | stat: -rw-r--r-- 8,850 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
\name{roll}
\alias{roll}
\alias{froll}
\alias{rolling}
\alias{sliding}
\alias{moving}
\alias{rollmean}
\alias{frollmean}
\alias{rollsum}
\alias{frollsum}
\alias{rollapply}
\alias{frollapply}
\title{Rolling functions}
\description{
  Fast rolling functions to calculate aggregates on sliding window. Function name and arguments are experimental.
}
\usage{
frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right",
  "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left",
  "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollapply(x, n, FUN, \dots, fill=NA, align=c("right", "left", "center"))
}
\arguments{
  \item{x}{ vector, list, data.frame or data.table of numeric or logical columns. }
  \item{n}{ integer vector, for adaptive rolling function also list of
    integer vectors, rolling window size. }
  \item{fill}{ numeric, value to pad by. Defaults to \code{NA}. }
  \item{algo}{ character, default \code{"fast"}. When set to \code{"exact"},
    then slower algorithm is used. It suffers less from floating point
    rounding error, performs extra pass to adjust rounding error
    correction and carefully handles all non-finite values. If available
    it will use multiple cores. See details for more information. }
  \item{align}{ character, define if rolling window covers preceding rows
    (\code{"right"}), following rows (\code{"left"}) or centered
    (\code{"center"}). Defaults to \code{"right"}. }
  \item{na.rm}{ logical. Should missing values be removed when
    calculating window? Defaults to \code{FALSE}. For details on handling
    other non-finite values, see details below. }
  \item{hasNA}{ logical. If it is known that \code{x} contains \code{NA}
    then setting to \code{TRUE} will speed up. Defaults to \code{NA}. }
  \item{adaptive}{ logical, should adaptive rolling function be
    calculated, default \code{FALSE}. See details below. }
  \item{FUN}{ the function to be applied in rolling fashion; see Details for restrictions }
  \item{\dots}{ extra arguments passed to \code{FUN} in \code{frollapply}. }
}
\details{
  \code{froll*} functions accepts vectors, lists, data.frames or
  data.tables. They always return a list except when the input is a
  \code{vector} and \code{length(n)==1} in which case a \code{vector}
  is returned, for convenience. Thus rolling functions can be used
  conveniently within data.table syntax.

  Argument \code{n} allows multiple values to apply rolling functions on
  multiple window sizes. If \code{adaptive=TRUE}, then it expects a list.
  Each list element must be integer vector of window sizes corresponding
  to every single observation in each column.

  When \code{algo="fast"} then \emph{on-line} algorithm is used, also
  any \code{NaN, +Inf, -Inf} is treated as \code{NA}.
  Setting \code{algo="exact"} will make rolling functions to use
  compute-intensive algorithm that suffers less from floating point
  rounding error. It also handles \code{NaN, +Inf, -Inf} consistently to
  base R. In case of some functions (like \emph{mean}), it will additionally
  make extra pass to perform floating point error correction. Error
  corrections might not be truly exact on some platforms (like Windows)
  when using multiple threads.

  Adaptive rolling functions are special cases where for each single
  observation has own corresponding rolling window width. Due to the logic
  of adaptive rolling functions, following restrictions apply:
  \itemize{
    \item{ \code{align} only \code{"right"}. }
    \item{ if list of vectors is passed to \code{x}, then all
      list vectors must have equal length. }
  }

  When multiple columns or multiple windows width are provided, then they
  are run in parallel. Except for the \code{algo="exact"} which runs in
  parallel already.

  \code{frollapply} computes rolling aggregate on arbitrary R functions.
  The input \code{x} (first argument) to the function \code{FUN}
  is coerced to \emph{numeric} beforehand and \code{FUN}
  has to return a scalar \emph{numeric} value. Checks for that are made only
  during the first iteration when \code{FUN} is evaluated. Edge cases can be
  found in examples below. Any R function is supported, but it is not optimized
  using our own C implementation -- hence, for example, using \code{frollapply}
  to compute a rolling average is inefficient. It is also always single-threaded
  because there is no thread-safe API to R's C \code{eval}. Nevertheless we've
  seen the computation speed up vis-a-vis versions implemented in base R.
}
\value{
  A list except when the input is a \code{vector} and
  \code{length(n)==1} in which case a \code{vector} is returned.
}
\note{
  Users coming from most popular package for rolling functions
  \code{zoo} might expect following differences in \code{data.table}
  implementation.
  \itemize{
    \item{ rolling function will always return result of the same length
      as input. }
    \item{ \code{fill} defaults to \code{NA}. }
    \item{ \code{fill} accepts only constant values. It does not support
      for \emph{na.locf} or other functions. }
    \item{ \code{align} defaults to \code{"right"}. }
    \item{ \code{na.rm} is respected, and other functions are not needed
      when input contains \code{NA}. }
    \item{ integers and logical are always coerced to double. }
    \item{ when \code{adaptive=FALSE} (default), then \code{n} must be a
      numeric vector. List is not accepted. }
    \item{ when \code{adaptive=TRUE}, then \code{n} must be vector of
      length equal to \code{nrow(x)}, or list of such vectors. }
    \item{ \code{partial} window feature is not supported, although it can
      be accomplished by using \code{adaptive=TRUE}, see examples. }
  }

  Be aware that rolling functions operates on the physical order of input.
  If the intent is to roll values in a vector by a logical window, for
  example an hour, or a day, one has to ensure that there are no gaps in
  input. For details see \href{https://github.com/Rdatatable/data.table/issues/3241}{issue #3241}.
}
\examples{
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available

# partial window using adaptive rolling function
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)

# frollsum
frollsum(d, 3:4)

# frollapply
frollapply(d, 3:4, sum)
f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...)
frollapply(d, 3:4, f, na.rm=TRUE)

# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
  ans = rep(NA_real_, nx<-length(x))
  for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
  ans
}
fastma = function(x, n, na.rm) {
  if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
  cs = cumsum(x)
  scs = shift(cs, n)
  scs[n] = 0
  as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n))
system.time(ans4<-frollmean(x, n, algo="exact"))
system.time(ans5<-frollapply(x, n, mean))
anserr = list(
  fastma = ans2-ans1,
  froll_fast = ans3-ans1,
  froll_exact = ans4-ans1,
  frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff

# frollapply corner cases
f = function(x) head(x, 2)     ## FUN returns non length 1
try(frollapply(1:5, 3, f))
f = function(x) {              ## FUN sometimes returns non length 1
  n = length(x)
  # length 1 will be returned only for first iteration where we check length
  if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored!
}
frollapply(1:5, 3, f)
options(datatable.verbose=TRUE)
x = c(1,2,1,1,1,2,3,2)
frollapply(x, 3, uniqueN)     ## FUN returns integer
numUniqueN = function(x) as.numeric(uniqueN(x))
frollapply(x, 3, numUniqueN)
x = c(1,2,1,1,NA,2,NA,2)
frollapply(x, 3, anyNA)       ## FUN returns logical
as.logical(frollapply(x, 3, anyNA))
options(datatable.verbose=FALSE)
f = function(x) {             ## FUN returns character
  if (sum(x)>5) "big" else "small"
}
try(frollapply(1:5, 3, f))
f = function(x) {             ## FUN is not type-stable
  n = length(x)
  # double type will be returned only for first iteration where we check type
  if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double
}
try(frollapply(1:5, 3, f))
}
\seealso{
  \code{\link{shift}}, \code{\link{data.table}}
}
\references{
  \href{https://en.wikipedia.org/wiki/Round-off_error}{Round-off error}
}
\keyword{ data }