
|
Talk: All Things Cached - SF Python 2017 Meetup
===============================================
* `Python All Things Cached Slides`_
* Can we have some fun together in this talk?
* Can I show you some code that I would not run in production?
* Great talk by David Beazley at PyCon Israel this year.
* Encourages us to scratch our itch under the code phrase: "It's just a
prototype." Not a bad place to start. Often how it ends :)
Landscape
---------
* At face value, caches seem simple: get/set/delete.
* But zoom in a little and you find just more and more detail.
Backends
--------
* Backends have different designs and tradeoffs.
Frameworks
----------
* Caches have broad applications.
* Web and scientific communities reach for them first.
I can haz mor memory?
---------------------
* Redis is great technology: free, open source, fast.
* But another process to manage and more memory required.
::
$ emacs talk/settings.py
$ emacs talk/urls.py
$ emacs talk/views.py
::
$ gunicorn --reload talk.wsgi
::
$ emacs benchmark.py
::
$ python benchmark.py
* I dislike benchmarks in general so don't copy this code. I kind of stole it
from Beazley in another great talk he did on concurrency in Python. He said
not to copy it so I'm telling you not to copy it.
::
$ python manage.py shell
.. code-block:: pycon
>>> import time
>>> from django.conf import settings
>>> from django.core.cache import caches
.. code-block:: pycon
>>> for key in settings.CACHES.keys():
... caches[key].clear()
::
>>> while True:
... !ls /tmp/filebased | wc -l
... time.sleep(1)
Fool me once, strike one. Feel me twice? Strike three.
------------------------------------------------------
* Filebased cache has two severe drawbacks.
1. Culling is random.
2. set() uses glob.glob1() which slows linearly with directory size.
DiskCache
---------
* Wanted to solve Django-filebased cache problems.
* Felt like something was missing in the landscape.
* Found an unlikely hero in SQLite.
I'd rather drive a slow car fast than a fast car slow
-----------------------------------------------------
* Story: driving down the Grapevine in SoCal in friend's 1960s VW Bug.
Features
--------
* Lot's of features. Maybe a few too many. Ex: never used the tag metadata and
eviction feature.
Use Case: Static file serving with read()
-----------------------------------------
* Some fun features. Data is stored in files and web servers are good at
serving files.
Use Case: Analytics with incr()/pop()
-------------------------------------
* Tried to create really functional APIs.
* All write operations are atomic.
Case Study: Baby Web Crawler
----------------------------
* Convert from ephemeral, single-process to persistent, multi-process.
"get" Time vs Percentile
------------------------
* Tradeoff cache latency and miss-rate using timeout.
"set" Time vs Percentile
------------------------
* Django-filebased cache so slow, can't plot.
Design
------
* Cache is a single shard. FanoutCache uses multiple shards. Trick is
cross-platform hash.
* Pickle can actually be fast if you use a higher protocol. Default 0. Up to 4
now.
* Don't choose higher than 2 if you want to be portable between Python 2
and 3.
* Size limit really indicates when to start culling. Limit number of items
deleted.
SQLite
------
* Tradeoff cache latency and miss-rate using timeout.
* SQLite supports 64-bit integers and floats, UTF-8 text and binary blobs.
* Use a context manager for isolation level management.
* Pragmas tune the behavior and performance of SQLite.
* Default is robust and slow.
* Use write-ahead-log so writers don't block readers.
* Memory-map pages for fast lookups.
Best way to make money in photography? Sell all your gear.
----------------------------------------------------------
* Who saw eclipse? Awesome, right?
* Hard to really photograph the experience.
* This is me, staring up at the sun, blinding myself as I hold my glasses and
my phone to take a photo. Clearly lousy.
* Software talks are hard to get right and I can't cover everything related to
caching in 20 minutes. I hope you've learned something tonight or at least
seen something interesting.
Conclusion
----------
* Windows support mostly "just worked".
* SQLite is truly cross-platform.
* Filesystems are a little different.
* AppVeyor was about half as fast as Travis.
* check() to fix inconsistencies.
* Caveats:
* NFS and SQLite do not play nice.
* Not well suited to queues (want read:write at 10:1 or higher).
* Alternative databases: BerkeleyDB, LMDB, RocksDB, LevelDB, etc.
* Engage with me on Github, find bugs, complain about performance.
* If you like the project, star-it on Github and share it with friends.
* Thanks for letting me share tonight. Questions?
.. _`Python All Things Cached Slides`: http://bit.ly/dc-2017-slides
|