File: sf-python-2017-meetup-talk.rst

package info (click to toggle)
diskcache 5.6.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,364 kB
  • sloc: python: 7,026; makefile: 20
file content (207 lines) | stat: -rw-r--r-- 4,992 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
Talk: All Things Cached - SF Python 2017 Meetup
===============================================

* `Python All Things Cached Slides`_
* Can we have some fun together in this talk?
* Can I show you some code that I would not run in production?
* Great talk by David Beazley at PyCon Israel this year.

  * Encourages us to scratch our itch under the code phrase: "It's just a
    prototype." Not a bad place to start. Often how it ends :)


Landscape
---------

* At face value, caches seem simple: get/set/delete.
* But zoom in a little and you find just more and more detail.


Backends
--------

* Backends have different designs and tradeoffs.


Frameworks
----------

* Caches have broad applications.
* Web and scientific communities reach for them first.


I can haz mor memory?
---------------------

* Redis is great technology: free, open source, fast.
* But another process to manage and more memory required.

::

    $ emacs talk/settings.py
    $ emacs talk/urls.py
    $ emacs talk/views.py

::

    $ gunicorn --reload talk.wsgi

::

    $ emacs benchmark.py

::

    $ python benchmark.py

* I dislike benchmarks in general so don't copy this code. I kind of stole it
  from Beazley in another great talk he did on concurrency in Python. He said
  not to copy it so I'm telling you not to copy it.

::

    $ python manage.py shell

.. code-block:: pycon

    >>> import time
    >>> from django.conf import settings
    >>> from django.core.cache import caches

.. code-block:: pycon

    >>> for key in settings.CACHES.keys():
    ...     caches[key].clear()

::

    >>> while True:
    ...     !ls /tmp/filebased | wc -l
    ...     time.sleep(1)


Fool me once, strike one. Feel me twice? Strike three.
------------------------------------------------------

* Filebased cache has two severe drawbacks.

  1. Culling is random.
  2. set() uses glob.glob1() which slows linearly with directory size.


DiskCache
---------

* Wanted to solve Django-filebased cache problems.
* Felt like something was missing in the landscape.
* Found an unlikely hero in SQLite.


I'd rather drive a slow car fast than a fast car slow
-----------------------------------------------------

* Story: driving down the Grapevine in SoCal in friend's 1960s VW Bug.


Features
--------

* Lot's of features. Maybe a few too many. Ex: never used the tag metadata and
  eviction feature.


Use Case: Static file serving with read()
-----------------------------------------

* Some fun features. Data is stored in files and web servers are good at
  serving files.


Use Case: Analytics with incr()/pop()
-------------------------------------

* Tried to create really functional APIs.
* All write operations are atomic.


Case Study: Baby Web Crawler
----------------------------

* Convert from ephemeral, single-process to persistent, multi-process.


"get" Time vs Percentile
------------------------

* Tradeoff cache latency and miss-rate using timeout.


"set" Time vs Percentile
------------------------

* Django-filebased cache so slow, can't plot.


Design
------

* Cache is a single shard. FanoutCache uses multiple shards. Trick is
  cross-platform hash.
* Pickle can actually be fast if you use a higher protocol. Default 0. Up to 4
  now.

  * Don't choose higher than 2 if you want to be portable between Python 2
    and 3.

* Size limit really indicates when to start culling. Limit number of items
  deleted.


SQLite
------

* Tradeoff cache latency and miss-rate using timeout.
* SQLite supports 64-bit integers and floats, UTF-8 text and binary blobs.
* Use a context manager for isolation level management.
* Pragmas tune the behavior and performance of SQLite.

  * Default is robust and slow.
  * Use write-ahead-log so writers don't block readers.
  * Memory-map pages for fast lookups.


Best way to make money in photography? Sell all your gear.
----------------------------------------------------------

* Who saw eclipse? Awesome, right?

  * Hard to really photograph the experience.
  * This is me, staring up at the sun, blinding myself as I hold my glasses and
    my phone to take a photo. Clearly lousy.

* Software talks are hard to get right and I can't cover everything related to
  caching in 20 minutes. I hope you've learned something tonight or at least
  seen something interesting.


Conclusion
----------

* Windows support mostly "just worked".

  * SQLite is truly cross-platform.
  * Filesystems are a little different.
  * AppVeyor was about half as fast as Travis.
  * check() to fix inconsistencies.

* Caveats:

  * NFS and SQLite do not play nice.
  * Not well suited to queues (want read:write at 10:1 or higher).

* Alternative databases: BerkeleyDB, LMDB, RocksDB, LevelDB, etc.
* Engage with me on Github, find bugs, complain about performance.
* If you like the project, star-it on Github and share it with friends.
* Thanks for letting me share tonight. Questions?

.. _`Python All Things Cached Slides`: http://bit.ly/dc-2017-slides