File: sf-python-2017-meetup-talk.rst

package info (click to toggle)
diskcache 5.6.3-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,364 kB
sloc: python: 7,026; makefile: 20
file content (207 lines) | stat: -rw-r--r-- 4,992 bytes
parent folder | download | duplicates (3)
Talk: All Things Cached - SF Python 2017 Meetup
===============================================

* `Python All Things Cached Slides`_
* Can we have some fun together in this talk?
* Can I show you some code that I would not run in production?
* Great talk by David Beazley at PyCon Israel this year.

  * Encourages us to scratch our itch under the code phrase: "It's just a
    prototype." Not a bad place to start. Often how it ends :)


Landscape
---------

* At face value, caches seem simple: get/set/delete.
* But zoom in a little and you find just more and more detail.


Backends
--------

* Backends have different designs and tradeoffs.


Frameworks
----------

* Caches have broad applications.
* Web and scientific communities reach for them first.


I can haz mor memory?
---------------------

* Redis is great technology: free, open source, fast.
* But another process to manage and more memory required.

::

    $ emacs talk/settings.py
    $ emacs talk/urls.py
    $ emacs talk/views.py

::

    $ gunicorn --reload talk.wsgi

::

    $ emacs benchmark.py

::

    $ python benchmark.py

* I dislike benchmarks in general so don't copy this code. I kind of stole it
  from Beazley in another great talk he did on concurrency in Python. He said
  not to copy it so I'm telling you not to copy it.

::

    $ python manage.py shell

.. code-block:: pycon

    >>> import time
    >>> from django.conf import settings
    >>> from django.core.cache import caches

.. code-block:: pycon

    >>> for key in settings.CACHES.keys():
    ...     caches[key].clear()

::

    >>> while True:
    ...     !ls /tmp/filebased | wc -l
    ...     time.sleep(1)


Fool me once, strike one. Feel me twice? Strike three.
------------------------------------------------------

* Filebased cache has two severe drawbacks.

  1. Culling is random.
  2. set() uses glob.glob1() which slows linearly with directory size.


DiskCache
---------

* Wanted to solve Django-filebased cache problems.
* Felt like something was missing in the landscape.
* Found an unlikely hero in SQLite.


I'd rather drive a slow car fast than a fast car slow
-----------------------------------------------------

* Story: driving down the Grapevine in SoCal in friend's 1960s VW Bug.


Features
--------

* Lot's of features. Maybe a few too many. Ex: never used the tag metadata and
  eviction feature.


Use Case: Static file serving with read()
-----------------------------------------

* Some fun features. Data is stored in files and web servers are good at
  serving files.


Use Case: Analytics with incr()/pop()
-------------------------------------

* Tried to create really functional APIs.
* All write operations are atomic.


Case Study: Baby Web Crawler
----------------------------

* Convert from ephemeral, single-process to persistent, multi-process.


"get" Time vs Percentile
------------------------

* Tradeoff cache latency and miss-rate using timeout.


"set" Time vs Percentile
------------------------

* Django-filebased cache so slow, can't plot.


Design
------

* Cache is a single shard. FanoutCache uses multiple shards. Trick is
  cross-platform hash.
* Pickle can actually be fast if you use a higher protocol. Default 0. Up to 4
  now.

  * Don't choose higher than 2 if you want to be portable between Python 2
    and 3.

* Size limit really indicates when to start culling. Limit number of items
  deleted.


SQLite
------

* Tradeoff cache latency and miss-rate using timeout.
* SQLite supports 64-bit integers and floats, UTF-8 text and binary blobs.
* Use a context manager for isolation level management.
* Pragmas tune the behavior and performance of SQLite.

  * Default is robust and slow.
  * Use write-ahead-log so writers don't block readers.
  * Memory-map pages for fast lookups.


Best way to make money in photography? Sell all your gear.
----------------------------------------------------------

* Who saw eclipse? Awesome, right?

  * Hard to really photograph the experience.
  * This is me, staring up at the sun, blinding myself as I hold my glasses and
    my phone to take a photo. Clearly lousy.

* Software talks are hard to get right and I can't cover everything related to
  caching in 20 minutes. I hope you've learned something tonight or at least
  seen something interesting.


Conclusion
----------

* Windows support mostly "just worked".

  * SQLite is truly cross-platform.
  * Filesystems are a little different.
  * AppVeyor was about half as fast as Travis.
  * check() to fix inconsistencies.

* Caveats:

  * NFS and SQLite do not play nice.
  * Not well suited to queues (want read:write at 10:1 or higher).

* Alternative databases: BerkeleyDB, LMDB, RocksDB, LevelDB, etc.
* Engage with me on Github, find bugs, complain about performance.
* If you like the project, star-it on Github and share it with friends.
* Thanks for letting me share tonight. Questions?

.. _`Python All Things Cached Slides`: http://bit.ly/dc-2017-slides