1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
|
(filtering)=
# {fas}`filter` Cache Filtering
In many cases you will want to choose what you want to cache instead of just caching everything. By
default, all **read-only** (`GET` and `HEAD`) **requests with a 200 response code** are cached. A
few options are available to modify this behavior.
```{note}
When using {py:class}`.CachedSession`, any requests that you don't want to cache can also be made
with a regular {py:class}`requests.Session` object, or wrapper functions like
{py:func}`requests.get`, etc.
```
(http-method-filtering)=
## Filter by HTTP Methods
To cache additional HTTP methods, specify them with `allowable_methods`:
```python
>>> session = CachedSession(allowable_methods=('GET', 'POST'))
>>> session.post('https://httpbin.org/post', json={'param': 'value'})
```
For example, some APIs use the `POST` method to request data via a JSON-formatted request body, for
requests that may exceed the max size of a `GET` request. You may also want to cache `POST` requests
if you have a case where you could potentially send the exact same data multiple times.
Method override headers will also be respected. For example, if a server supports the `X-HTTP-Method-Override` header, you may want to cache POST requests that only fetch data
(overridden as GET), but not other POST requests that may create/modify data:
```python
>>> session = CachedSession(allowable_methods=('GET'))
>>> session.post('https://httpbin.org/post', headers={'X-HTTP-Method-Override': 'GET'})
```
:::
## Filter by Status Codes
To cache additional status codes, specify them with `allowable_codes`
```python
>>> session = CachedSession(allowable_codes=(200, 418))
>>> session.get('https://httpbin.org/teapot')
```
(url-filtering)=
## Filter by URLs
You can use {ref}`URL patterns <url-patterns>` to define an allowlist for selective caching, by
using a expiration value of `requests_cache.DO_NOT_CACHE` for non-matching request URLs:
```python
>>> from requests_cache import DO_NOT_CACHE, NEVER_EXPIRE, CachedSession
>>> urls_expire_after = {
... '*.site_1.com': 30,
... 'site_2.com/static': NEVER_EXPIRE,
... '*': DO_NOT_CACHE,
... }
>>> session = CachedSession(urls_expire_after=urls_expire_after)
```
Note that the catch-all rule above (`'*'`) will behave the same as setting the session-level
expiration to `0`:
```python
>>> urls_expire_after = {'*.site_1.com': 30, 'site_2.com/static': -1}
>>> session = CachedSession(urls_expire_after=urls_expire_after, expire_after=0)
```
(custom-filtering)=
## Custom Cache Filtering
If you need more advanced behavior for choosing what to cache, you can provide a custom filtering
function via the `filter_fn` param. This can by any function that takes a
{py:class}`requests.Response` object and returns a boolean indicating whether or not that response
should be cached. It will be applied to both new responses (on write) and previously cached
responses (on read):
```python
>>> from sys import getsizeof
>>> from requests_cache import CachedSession
>>> def filter_by_size(response: Response) -> bool:
>>> """Don't cache responses with a body over 1 MB"""
>>> return getsizeof(response.content) <= 1024 * 1024
>>> session = CachedSession(filter_fn=filter_by_size)
```
```{note}
`filter_fn()` will be used **in addition to** other filtering options.
```
|