1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cache-disk.R
\name{cache_disk}
\alias{cache_disk}
\title{Create a disk cache object}
\usage{
cache_disk(
dir = NULL,
max_size = 1024 * 1024^2,
max_age = Inf,
max_n = Inf,
evict = c("lru", "fifo"),
destroy_on_finalize = FALSE,
read_fn = NULL,
write_fn = NULL,
extension = ".rds",
missing = key_missing(),
prune_rate = 20,
warn_ref_objects = FALSE,
logfile = NULL
)
}
\arguments{
\item{dir}{Directory to store files for the cache. If \code{NULL} (the default) it
will create and use a temporary directory.}
\item{max_size}{Maximum size of the cache, in bytes. If the cache exceeds
this size, cached objects will be removed according to the value of the
\code{evict}. Use \code{Inf} for no size limit. The default is 1 gigabyte.}
\item{max_age}{Maximum age of files in cache before they are evicted, in
seconds. Use \code{Inf} for no age limit.}
\item{max_n}{Maximum number of objects in the cache. If the number of objects
exceeds this value, then cached objects will be removed according to the
value of \code{evict}. Use \code{Inf} for no limit of number of items.}
\item{evict}{The eviction policy to use to decide which objects are removed
when a cache pruning occurs. Currently, \code{"lru"} and \code{"fifo"} are supported.}
\item{destroy_on_finalize}{If \code{TRUE}, then when the cache_disk object is
garbage collected, the cache directory and all objects inside of it will be
deleted from disk. If \code{FALSE} (the default), it will do nothing when
finalized.}
\item{read_fn}{The function used to read the values from disk. If \code{NULL}
(the default) it will use \code{readRDS}.}
\item{write_fn}{The function used to write the values from disk. If \code{NULL}
(the default) it will use \code{writeRDS}.}
\item{extension}{The file extension to use for files on disk.}
\item{missing}{A value to return when \code{get(key)} is called but the key is not
present in the cache. The default is a \code{\link[=key_missing]{key_missing()}} object. It is
actually an expression that is evaluated each time there is a cache miss.
See section Missing keys for more information.}
\item{prune_rate}{How often to prune the cache. See section Cache Pruning for
more information.}
\item{warn_ref_objects}{Should a warning be emitted when a reference is
stored in the cache? This can be useful because serializing and
deserializing a reference object (such as environments and external
pointers) can lead to unexpected behavior.}
\item{logfile}{An optional filename or connection object to where logging
information will be written. To log to the console, use \code{stderr()} or
\code{stdout()}.}
}
\value{
A disk caching object, with class \code{cache_disk}.
}
\description{
A disk cache object is a key-value store that saves the values as files in a
directory on disk. Objects can be stored and retrieved using the \code{get()} and
\code{set()} methods. Objects are automatically pruned from the cache according to
the parameters \code{max_size}, \code{max_age}, \code{max_n}, and \code{evict}.
}
\section{Missing keys}{
The \code{missing} parameter controls what happens when \code{get()} is called with a
key that is not in the cache (a cache miss). The default behavior is to
return a \code{\link[=key_missing]{key_missing()}} object. This is a \emph{sentinel value} that indicates
that the key was not present in the cache. You can test if the returned
value represents a missing key by using the \code{\link[=is.key_missing]{is.key_missing()}} function.
You can also have \code{get()} return a different sentinel value, like \code{NULL}.
If you want to throw an error on a cache miss, you can do so by providing
an expression for \code{missing}, as in \code{missing = stop("Missing key")}.
When the cache is created, you can supply a value for \code{missing}, which sets
the default value to be returned for missing values. It can also be
overridden when \code{get()} is called, by supplying a \code{missing} argument. For
example, if you use \code{cache$get("mykey", missing = NULL)}, it will return
\code{NULL} if the key is not in the cache.
The \code{missing} parameter is actually an expression which is evaluated each
time there is a cache miss. A quosure (from the rlang package) can be used.
If you use this, the code that calls \code{get()} should be wrapped with
\code{\link[=tryCatch]{tryCatch()}} to gracefully handle missing keys.
}
\section{Cache pruning}{
Cache pruning occurs when \code{set()} is called, or it can be invoked manually
by calling \code{prune()}.
The disk cache will throttle the pruning so that it does not happen on
every call to \code{set()}, because the filesystem operations for checking the
status of files can be slow. Instead, it will prune once in every
\code{prune_rate} calls to \code{set()}, or if at least 5 seconds have elapsed since
the last prune occurred, whichever is first.
When a pruning occurs, if there are any objects that are older than
\code{max_age}, they will be removed.
The \code{max_size} and \code{max_n} parameters are applied to the cache as a whole,
in contrast to \code{max_age}, which is applied to each object individually.
If the number of objects in the cache exceeds \code{max_n}, then objects will be
removed from the cache according to the eviction policy, which is set with
the \code{evict} parameter. Objects will be removed so that the number of items
is \code{max_n}.
If the size of the objects in the cache exceeds \code{max_size}, then objects
will be removed from the cache. Objects will be removed from the cache so
that the total size remains under \code{max_size}. Note that the size is
calculated using the size of the files, not the size of disk space used by
the files --- these two values can differ because of files are stored in
blocks on disk. For example, if the block size is 4096 bytes, then a file
that is one byte in size will take 4096 bytes on disk.
Another time that objects can be removed from the cache is when \code{get()} is
called. If the target object is older than \code{max_age}, it will be removed
and the cache will report it as a missing value.
}
\section{Eviction policies}{
If \code{max_n} or \code{max_size} are used, then objects will be removed from the
cache according to an eviction policy. The available eviction policies are:
\describe{
\item{\code{"lru"}}{
Least Recently Used. The least recently used objects will be removed.
This uses the filesystem's mtime property. When "lru" is used, each
\code{get()} is called, it will update the file's mtime using
\code{\link[=Sys.setFileTime]{Sys.setFileTime()}}. Note that on some platforms, the resolution of
\code{\link[=Sys.setFileTime]{Sys.setFileTime()}} may be low, one or two seconds.
}
\item{\code{"fifo"}}{
First-in-first-out. The oldest objects will be removed.
}
}
Both of these policies use files' mtime. Note that some filesystems (notably
FAT) have poor mtime resolution. (atime is not used because support for atime
is worse than mtime.)
}
\section{Sharing among multiple processes}{
The directory for a cache_disk can be shared among multiple R processes. To
do this, each R process should have a cache_disk object that uses the same
directory. Each cache_disk will do pruning independently of the others, so
if they have different pruning parameters, then one cache_disk may remove
cached objects before another cache_disk would do so.
Even though it is possible for multiple processes to share a cache_disk
directory, this should not be done on networked file systems, because of
slow performance of networked file systems can cause problems. If you need
a high-performance shared cache, you can use one built on a database like
Redis, SQLite, mySQL, or similar.
When multiple processes share a cache directory, there are some potential
race conditions. For example, if your code calls \code{exists(key)} to check if
an object is in the cache, and then call \code{get(key)}, the object may be
removed from the cache in between those two calls, and \code{get(key)} will
throw an error. Instead of calling the two functions, it is better to
simply call \code{get(key)}, and check that the returned object is not a
\code{key_missing()} object, using \code{is.key_missing()}. This effectively tests
for existence and gets the object in one operation.
It is also possible for one processes to prune objects at the same time
that another processes is trying to prune objects. If this happens, you may
see a warning from \code{file.remove()} failing to remove a file that has
already been deleted.
}
\section{Methods}{
A disk cache object has the following methods:
\describe{
\item{\code{get(key, missing)}}{
Returns the value associated with \code{key}. If the key is not in the
cache, then it evaluates the expression specified by \code{missing} and
returns the value. If \code{missing} is specified here, then it will
override the default that was set when the \code{cache_mem} object was
created. See section Missing Keys for more information.
}
\item{\code{set(key, value)}}{
Stores the \code{key}-\code{value} pair in the cache.
}
\item{\code{exists(key)}}{
Returns \code{TRUE} if the cache contains the key, otherwise
\code{FALSE}.
}
\item{\code{remove(key)}}{
Removes \code{key} from the cache, if it exists in the cache. If the key is
not in the cache, this does nothing.
}
\item{\code{size()}}{
Returns the number of items currently in the cache.
}
\item{\code{keys()}}{
Returns a character vector of all keys currently in the cache.
}
\item{\code{reset()}}{
Clears all objects from the cache.
}
\item{\code{destroy()}}{
Clears all objects in the cache, and removes the cache directory from
disk.
}
\item{\code{prune()}}{
Prunes the cache, using the parameters specified by \code{max_size},
\code{max_age}, \code{max_n}, and \code{evict}.
}
}
}
|