1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
|
#' Cache the value of an R expression to an RDS file
#'
#' Save the value of an expression to a cache file (of the RDS format). Next
#' time the value is loaded from the file if it exists.
#'
#' Note that the \code{file} argument does not provide the full cache filename.
#' The actual name of the cache file is of the form \file{BASENAME_HASH.rds},
#' where \file{BASENAME} is the base name provided via the \file{file} argument
#' (e.g., if \code{file = 'foo.rds'}, \code{BASENAME} would be \file{foo}), and
#' \file{HASH} is the MD5 hash (also called the \sQuote{checksum}) calculated
#' from the R code provided to the \code{expr} argument and the value of the
#' \code{hash} argument, which means when the code or the \code{hash} argument
#' changes, the \file{HASH} string may also change, and the old cache will be
#' invalidated (if it exists). If you want to find the cache file, look for
#' \file{.rds} files that contain 32 hexadecimal digits (consisting of 0-9 and
#' a-z) at the end of the filename.
#'
#' The possible ways to invalidate the cache are: 1) change the code in
#' \code{expr} argument; 2) delete the cache file manually or automatically
#' through the argument \code{rerun = TRUE}; and 3) change the value of the
#' \code{hash} argument. The first two ways should be obvious. For the third
#' way, it makes it possible to automatically invalidate the cache based on
#' changes in certain R objects. For example, when you run \code{cache_rds({ x +
#' y })}, you may want to invalidate the cache to rerun \code{{ x + y }} when
#' the value of \code{x} or \code{y} has been changed, and you can tell
#' \code{cache_rds()} to do so by \code{cache_rds({ x + y }, hash = list(x,
#' y))}. The value of the argument \code{hash} is expected to be a list, but it
#' can also take a special value, \code{"auto"}, which means
#' \code{cache_rds(expr)} will try to automatically figure out the global
#' variables in \code{expr}, return a list of their values, and use this list as
#' the actual value of \code{hash}. This behavior is most likely to be what you
#' really want: if the code in \code{expr} uses an external global variable, you
#' may want to invalidate the cache if the value of the global variable has
#' changed. Here a \dQuote{global variable} means a variable not created locally
#' in \code{expr}, e.g., for \code{cache_rds({ x <- 1; x + y })}, \code{x} is a
#' local variable, and \code{y} is (most likely to be) a global variable, so
#' changes in \code{y} should invalidate the cache. However, you know your own
#' code the best. If you want to be completely sure when to invalidate the
#' cache, you can always provide a list of objects explicitly rather than
#' relying on \code{hash = "auto"}.
#'
#' By default (the argument \code{clean = TRUE}), old cache files will be
#' automatically cleaned up. Sometimes you may want to use \code{clean = FALSE}
#' (set the R global option \code{options(xfun.cache_rds.clean = FALSE)} if you
#' want \code{FALSE} to be the default). For example, you may not have decided
#' which version of code to use, and you can keep the cache of both versions
#' with \code{clean = FALSE}, so when you switch between the two versions of
#' code, it will still be fast to run the code.
#' @param expr An R expression.
#' @param rerun Whether to delete the RDS file, rerun the expression, and save
#' the result again (i.e., invalidate the cache if it exists).
#' @param file The \emph{base} (see Details) cache filename under the directory
#' specified by the \code{dir} argument. If not specified and this function is
#' called inside a code chunk of a \pkg{knitr} document (e.g., an R Markdown
#' document), the default is the current chunk label plus the extension
#' \file{.rds}.
#' @param dir The path of the RDS file is partially determined by
#' \code{paste0(dir, file)}. If not specified and the \pkg{knitr} package is
#' available, the default value of \code{dir} is the \pkg{knitr} chunk option
#' \code{cache.path} (so if you are compiling a \pkg{knitr} document, you do
#' not need to provide this \code{dir} argument explicitly), otherwise the
#' default is \file{cache/}. If you do not want to provide a \code{dir} but
#' simply a valid path to the \code{file} argument, you may use \code{dir =
#' ""}.
#' @param hash A \code{list} object that contributes to the MD5 hash of the
#' cache filename (see Details). It can also take a special character value
#' \code{"auto"}. Other types of objects are ignored.
#' @param clean Whether to clean up the old cache files automatically when
#' \code{expr} has changed.
#' @param ... Other arguments to be passed to \code{\link{saveRDS}()}.
#' @note Changes in the code in the \code{expr} argument do not necessarily
#' always invalidate the cache, if the changed code is \code{\link{parse}d} to
#' the same expression as the previous version of the code. For example, if
#' you have run \code{cache_rds({Sys.sleep(5);1+1})} before, running
#' \code{cache_rds({ Sys.sleep( 5 ) ; 1 + 1 })} will use the cache, because
#' the two expressions are essentially the same (they only differ in white
#' spaces). Usually you can add/delete white spaces or comments to your code
#' in \code{expr} without invalidating the cache. See the package vignette
#' \code{vignette('xfun', package = 'xfun')} for more examples.
#'
#' When this function is called in a code chunk of a \pkg{knitr} document, you
#' may not want to provide the filename or directory of the cache file,
#' because they have reasonable defaults.
#'
#' Side-effects (such as plots or printed output) will not be cached. The
#' cache only stores the last value of the expression in \code{expr}.
#' @return If the cache file does not exist, run the expression and save the
#' result to the file, otherwise read the cache file and return the value.
#' @export
#' @examples
#' f = tempfile() # the cache file
#' compute = function(...) {
#' res = xfun::cache_rds({
#' Sys.sleep(1)
#' 1:10
#' }, file = f, dir = '', ...)
#' res
#' }
#' compute() # takes one second
#' compute() # returns 1:10 immediately
#' compute() # fast again
#' compute(rerun = TRUE) # one second to rerun
#' compute()
#' file.remove(f)
cache_rds = function(
expr = {}, rerun = FALSE, file = 'cache.rds', dir = 'cache/',
hash = NULL, clean = getOption('xfun.cache_rds.clean', TRUE), ...
) {
if (loadable('knitr')) {
if (missing(file) && !is.null(lab <- knitr::opts_current$get('label')))
file = paste0(lab, '.rds')
if (missing(dir) && !is.null(d <- knitr::opts_current$get('cache.path')))
dir = d
}
path = paste0(dir, file)
if (!grepl(r <- '([.]rds)$', path)) path = paste0(path, '.rds')
code = deparse(substitute(expr))
md5 = md5sum_obj(code)
if (identical(hash, 'auto')) hash = global_vars(code, parent.frame(2))
if (is.list(hash)) md5 = md5sum_obj(c(md5, md5sum_obj(hash)))
path = sub(r, paste0('_', md5, '\\1'), path)
if (rerun) unlink(path)
if (clean) clean_cache(path)
if (file_exists(path)) readRDS(path) else {
obj = expr # lazy evaluation
dir.create(dirname(path), recursive = TRUE, showWarnings = FALSE)
saveRDS(obj, path, ...)
obj
}
}
# write an object to a file and return the md5 sum
md5sum_obj = function(x) {
f = tempfile(); on.exit(unlink(f), add = TRUE)
if (is.character(x)) writeLines(x, f) else saveRDS(x, f)
tools::md5sum(f)
}
# clean up old cache files (those with the same base names as the new cache
# file, e.g., if the new file is FOO_0123abc...z.rds, then FOO_9876def...x.rds
# should be deleted)
clean_cache = function(path) {
olds = list.files(dirname(path), '_[0-9a-f]{32}[.]rds$', full.names = TRUE)
olds = c(olds, path) # `path` may not exist; make sure it is in target paths
base = basename(olds)
keep = basename(path) == base # keep this file (will cache to this file)
base = substr(base, 1, nchar(base) - 37) # 37 = 1 (_) + 32 (md5 sum) + 4 (.rds)
unlink(olds[(base == base[keep][1]) & !keep])
}
# analyze code and find out global variables
find_globals = function(code) {
fun = eval(parse_only(c('function(){', code, '}')))
setdiff(codetools::findGlobals(fun), known_globals)
}
known_globals = c(
'{', '[', '(', ':', '<-', '=', '+', '-', '*', '/', '%%', '%/%', '%*%', '%o%', '%in%'
)
# return a list of values of global variables in code
global_vars = function(code, env) {
if (length(vars <- find_globals(code)) > 0) mget(vars, env)
}
|