1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
|
DiskCache Cache Benchmarks
==========================
Accurately measuring performance is a difficult task. The benchmarks on this
page are synthetic in the sense that they were designed to stress getting,
setting, and deleting items repeatedly. Measurements in production systems are
much harder to reproduce reliably. So take the following data with a `grain of
salt`_. A stated feature of :doc:`DiskCache <index>` is performance so we would
be remiss not to produce this page with comparisons.
The source for all benchmarks can be found under the "tests" directory in the
source code repository. Measurements are reported by percentile: median, 90th
percentile, 99th percentile, and maximum along with total time and miss
rate. The average is not reported as its less useful in response-time
scenarios. Each process in the benchmark executes 100,000 operations with ten
times as many sets as deletes and ten times as many gets as sets.
Each comparison includes `Memcached`_ and `Redis`_ with default client and
server settings. Note that these backends work differently as they communicate
over the localhost network. The also require a server process running and
maintained. All keys and values are short byte strings to reduce the network
impact.
.. _`grain of salt`: https://en.wikipedia.org/wiki/Grain_of_salt
.. _`Memcached`: http://memcached.org/
.. _`Redis`: http://redis.io/
Single Access
-------------
The single access workload starts one worker processes which performs all
operations. No concurrent cache access occurs.
Get
...
.. image:: _static/core-p1-get.png
Above displays cache access latency at three percentiles. Notice the
performance of :doc:`DiskCache <index>` is faster than highly optimized
memory-backed server solutions.
Set
...
.. image:: _static/core-p1-set.png
Above displays cache store latency at three percentiles. The cost of writing to
disk is higher but still sub-millisecond. All data in :doc:`DiskCache <index>`
is persistent.
Delete
......
.. image:: _static/core-p1-delete.png
Above displays cache delete latency at three percentiles. As above, deletes
require disk writes but latency is still sub-millisecond.
Timing Data
...........
Not all data is easily displayed in the graphs above. Miss rate, maximum
latency and total latency is recorded below.
========= ========= ========= ========= ========= ========= ========= =========
Timings for diskcache.Cache
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 88966 9705 12.159us 17.166us 28.849us 174.999us 1.206s
set 9021 0 68.903us 93.937us 188.112us 10.297ms 875.907ms
delete 1012 104 47.207us 66.042us 128.031us 7.160ms 89.599ms
Total 98999 2.171s
========= ========= ========= ========= ========= ========= ========= =========
The generated workload includes a ~1% cache miss rate. All items were stored
with no expiry. The miss rate is due entirely to gets after deletes.
========= ========= ========= ========= ========= ========= ========= =========
Timings for diskcache.FanoutCache(shards=4, timeout=1.0)
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 88966 9705 15.020us 20.027us 33.855us 437.021us 1.425s
set 9021 0 71.049us 100.136us 203.133us 9.186ms 892.262ms
delete 1012 104 48.161us 69.141us 129.952us 5.216ms 87.294ms
Total 98999 2.405s
========= ========= ========= ========= ========= ========= ========= =========
The high maximum store latency is likely an artifact of disk/OS interactions.
========= ========= ========= ========= ========= ========= ========= =========
Timings for diskcache.FanoutCache(shards=8, timeout=0.010)
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 88966 9705 15.020us 20.027us 34.094us 627.995us 1.420s
set 9021 0 72.956us 100.851us 203.133us 9.623ms 927.824ms
delete 1012 104 50.783us 72.002us 132.084us 8.396ms 78.898ms
Total 98999 2.426s
========= ========= ========= ========= ========= ========= ========= =========
Notice the low overhead of the :class:`FanoutCache
<diskcache.FanoutCache>`. Increasing the number of shards from four to eight
has a negligible impact on performance.
========= ========= ========= ========= ========= ========= ========= =========
Timings for pylibmc.Client
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 88966 9705 25.988us 29.802us 41.008us 139.952us 2.388s
set 9021 0 27.895us 30.994us 40.054us 97.990us 254.248ms
delete 1012 104 25.988us 29.087us 38.147us 89.169us 27.159ms
Total 98999 2.669s
========= ========= ========= ========= ========= ========= ========= =========
Memcached performance is low latency and stable.
========= ========= ========= ========= ========= ========= ========= =========
Timings for redis.StrictRedis
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 88966 9705 44.107us 54.121us 73.910us 204.086us 4.125s
set 9021 0 45.061us 56.028us 75.102us 237.942us 427.197ms
delete 1012 104 44.107us 54.836us 72.002us 126.839us 46.771ms
Total 98999 4.599s
========= ========= ========= ========= ========= ========= ========= =========
Redis performance is roughly half that of Memcached. :doc:`DiskCache <index>`
performs better than Redis for get operations through the Max percentile.
Concurrent Access
-----------------
The concurrent access workload starts eight worker processes each with
different and interleaved operations. None of these benchmarks saturated all
the processors.
Get
...
.. image:: _static/core-p8-get.png
Under heavy load, :doc:`DiskCache <index>` gets are low latency. At the 90th
percentile, they are less than half the latency of Memcached.
Set
...
.. image:: _static/core-p8-set.png
Stores are much slower under load and benefit greatly from sharding. Not
displayed are latencies in excess of five milliseconds. With one shard
allocated per worker, latency is within a magnitude of memory-backed server
solutions.
Delete
......
.. image:: _static/core-p8-delete.png
Again deletes require writes to disk. Only the :class:`FanoutCache
<diskcache.FanoutCache>` performs well with one shard allocated per worker.
Timing Data
...........
Not all data is easily displayed in the graphs above. Miss rate, maximum
latency and total latency is recorded below.
========= ========= ========= ========= ========= ========= ========= =========
Timings for diskcache.Cache
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 712546 71214 15.974us 23.127us 40.054us 4.953ms 12.349s
set 71530 0 94.891us 1.328ms 21.307ms 1.846s 131.728s
delete 7916 807 65.088us 1.278ms 19.610ms 1.244s 13.811s
Total 791992 157.888s
========= ========= ========= ========= ========= ========= ========= =========
Notice the unacceptably high maximum store and delete latency. Without
sharding, cache writers block each other. By default :class:`Cache
<diskcache.Cache>` objects raise a timeout error after sixty seconds.
========= ========= ========= ========= ========= ========= ========= =========
Timings for diskcache.FanoutCache(shards=4, timeout=1.0)
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 712546 71623 19.073us 35.048us 59.843us 12.980ms 16.849s
set 71530 0 108.004us 1.313ms 9.176ms 333.361ms 50.821s
delete 7916 767 73.195us 1.264ms 9.033ms 108.232ms 4.964s
Total 791992 72.634s
========= ========= ========= ========= ========= ========= ========= =========
Here :class:`FanoutCache <diskcache.FanoutCache>` uses four shards to
distribute writes. That reduces the maximum latency by a factor of ten. Note
the miss rate is variable due to the interleaved operations of concurrent
workers.
========= ========= ========= ========= ========= ========= ========= =========
Timings for diskcache.FanoutCache(shards=8, timeout=0.010)
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 712546 71106 25.034us 47.922us 101.089us 9.015ms 22.336s
set 71530 39 134.945us 1.324ms 5.763ms 16.027ms 33.347s
delete 7916 775 88.930us 1.267ms 5.017ms 13.732ms 3.308s
Total 791992 58.991s
========= ========= ========= ========= ========= ========= ========= =========
With one shard allocated per worker and a low timeout, the maximum latency is
more reasonable and corresponds to the specified 10 millisecond timeout. Some
set and delete operations were therefore canceled and recorded as cache
misses. The miss rate due to timeout is about 0.01% so our success rate is
four-nines or 99.99%.
========= ========= ========= ========= ========= ========= ========= =========
Timings for pylibmc.Client
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 712546 72043 83.923us 107.050us 123.978us 617.027us 61.824s
set 71530 0 84.877us 108.004us 124.931us 312.090us 6.283s
delete 7916 796 82.970us 105.858us 123.024us 288.963us 680.970ms
Total 791992 68.788s
========= ========= ========= ========= ========= ========= ========= =========
Memcached performance is low latency and stable even under heavy load. Notice
that cache gets are three times slower in total as compared with
:class:`FanoutCache <diskcache.FanoutCache>`. The superior performance of get
operations put the overall performance of :doc:`DiskCache <index>` ahead of
Memcached.
========= ========= ========= ========= ========= ========= ========= =========
Timings for redis.StrictRedis
-------------------------------------------------------------------------------
Action Count Miss Median P90 P99 Max Total
========= ========= ========= ========= ========= ========= ========= =========
get 712546 72093 138.044us 169.039us 212.908us 151.121ms 101.197s
set 71530 0 138.998us 169.992us 216.007us 1.200ms 10.173s
delete 7916 752 136.137us 167.847us 211.954us 1.059ms 1.106s
Total 791992 112.476s
========= ========= ========= ========= ========= ========= ========= =========
Redis performance is roughly half that of Memcached. Beware the impact of
persistence settings on your Redis performance. Depending on your use of
logging and snapshotting, maximum latency may increase significantly.
|