1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
|
Talk: All Things Cached - SF Python 2017 Meetup
===============================================
* `Python All Things Cached Slides`_
* Can we have some fun together in this talk?
* Can I show you some code that I would not run in production?
* Great talk by David Beazley at PyCon Israel this year.
* Encourages us to scratch our itch under the code phrase: "It's just a
prototype." Not a bad place to start. Often how it ends :)
Landscape
---------
* At face value, caches seem simple: get/set/delete.
* But zoom in a little and you find just more and more detail.
Backends
--------
* Backends have different designs and tradeoffs.
Frameworks
----------
* Caches have broad applications.
* Web and scientific communities reach for them first.
I can haz mor memory?
---------------------
* Redis is great technology: free, open source, fast.
* But another process to manage and more memory required.
::
$ emacs talk/settings.py
$ emacs talk/urls.py
$ emacs talk/views.py
::
$ gunicorn --reload talk.wsgi
::
$ emacs benchmark.py
::
$ python benchmark.py
* I dislike benchmarks in general so don't copy this code. I kind of stole it
from Beazley in another great talk he did on concurrency in Python. He said
not to copy it so I'm telling you not to copy it.
::
$ python manage.py shell
.. code-block:: pycon
>>> import time
>>> from django.conf import settings
>>> from django.core.cache import caches
.. code-block:: pycon
>>> for key in settings.CACHES.keys():
... caches[key].clear()
::
>>> while True:
... !ls /tmp/filebased | wc -l
... time.sleep(1)
Fool me once, strike one. Feel me twice? Strike three.
------------------------------------------------------
* Filebased cache has two severe drawbacks.
1. Culling is random.
2. set() uses glob.glob1() which slows linearly with directory size.
DiskCache
---------
* Wanted to solve Django-filebased cache problems.
* Felt like something was missing in the landscape.
* Found an unlikely hero in SQLite.
I'd rather drive a slow car fast than a fast car slow
-----------------------------------------------------
* Story: driving down the Grapevine in SoCal in friend's 1960s VW Bug.
Features
--------
* Lot's of features. Maybe a few too many. Ex: never used the tag metadata and
eviction feature.
Use Case: Static file serving with read()
-----------------------------------------
* Some fun features. Data is stored in files and web servers are good at
serving files.
Use Case: Analytics with incr()/pop()
-------------------------------------
* Tried to create really functional APIs.
* All write operations are atomic.
Case Study: Baby Web Crawler
----------------------------
* Convert from ephemeral, single-process to persistent, multi-process.
"get" Time vs Percentile
------------------------
* Tradeoff cache latency and miss-rate using timeout.
"set" Time vs Percentile
------------------------
* Django-filebased cache so slow, can't plot.
Design
------
* Cache is a single shard. FanoutCache uses multiple shards. Trick is
cross-platform hash.
* Pickle can actually be fast if you use a higher protocol. Default 0. Up to 4
now.
* Don't choose higher than 2 if you want to be portable between Python 2
and 3.
* Size limit really indicates when to start culling. Limit number of items
deleted.
SQLite
------
* Tradeoff cache latency and miss-rate using timeout.
* SQLite supports 64-bit integers and floats, UTF-8 text and binary blobs.
* Use a context manager for isolation level management.
* Pragmas tune the behavior and performance of SQLite.
* Default is robust and slow.
* Use write-ahead-log so writers don't block readers.
* Memory-map pages for fast lookups.
Best way to make money in photography? Sell all your gear.
----------------------------------------------------------
* Who saw eclipse? Awesome, right?
* Hard to really photograph the experience.
* This is me, staring up at the sun, blinding myself as I hold my glasses and
my phone to take a photo. Clearly lousy.
* Software talks are hard to get right and I can't cover everything related to
caching in 20 minutes. I hope you've learned something tonight or at least
seen something interesting.
Conclusion
----------
* Windows support mostly "just worked".
* SQLite is truly cross-platform.
* Filesystems are a little different.
* AppVeyor was about half as fast as Travis.
* check() to fix inconsistencies.
* Caveats:
* NFS and SQLite do not play nice.
* Not well suited to queues (want read:write at 10:1 or higher).
* Alternative databases: BerkeleyDB, LMDB, RocksDB, LevelDB, etc.
* Engage with me on Github, find bugs, complain about performance.
* If you like the project, star-it on Github and share it with friends.
* Thanks for letting me share tonight. Questions?
.. _`Python All Things Cached Slides`: http://bit.ly/dc-2017-slides
|