1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
|
# Optimizations and benchmark
*apischema* is (a lot) [faster](#benchmark) than its known alternatives, thanks to advanced optimizations.


!!! note
Chart is truncated to a relative performance of 20x slower. Benchmark results are detailed in the [results table](#relative-execution-time-lower-is-better).
## Precomputed (de)serialization methods
*apischema* precomputes (de)serialization methods depending on the (de)serialized type (and other parameters); type annotations processing is done in the precomputation. Methods are then cached using `functools.lru_cache`, so `deserialize` and `serialize` don't recompute them every time.
!!! note
The cache is automatically reset when global settings are modified, because it impacts the generated methods.
However, if `lru_cache` is fast, using the methods directly is faster, so *apischema* provides `apischema.deserialization_method` and `apischema.serialization_method`. These functions share the same parameters than `deserialize`/`serialize`, except the data/object parameter to (de)serialize. Using the computed methods directly can increase performances by 10%.
```python
{!de_serialization_methods.py!}
```
!!! warning
Methods computed before settings modification will not be updated and use the old settings. Be careful to set your settings first.
## Avoid unnecessary copies
As an example, when a list of integers is deserialized, `json.load` already return a list of integers. The loaded data can thus be "reused", and the deserialization just become a validation step. The same principle applies to serialization.
It's controlled by the settings `apischema.settings.deserialization.no_copy`/`apischema.settings.serialization.no_copy`, or `no_copy` parameter of `deserialize`/`serialize` methods. Default behavior is to avoid these unnecessary copies, i.e. `no_copy=False`.
```python
{!no_copy.py!}
```
## Serialization passthrough
JSON serialization libraries expect primitive data types (`dict`/`list`/`str`/etc.). A non-negligible part of objects to be serialized are primitive.
When [type checking](#type-checking) is disabled (this is default), objects annotated with primitive types doesn't need to be transformed or checked; *apischema* can simply "pass through" them, and it will result into an identity serialization method, just returning its argument.
Container types like `list` or `dict` are passed through only when the contained types are passed through too (and when `no_copy=True`)
```python
{!pass_through_primitives.py!}
```
!!! note
`Enum` subclasses which also inherit `str`/`int` are also passed through
### Passthrough options
Some JSON serialization libraries natively support types like `UUID` or `datetime`, sometimes with a faster implementation than the *apischema* one — [orjson](https://github.com/ijl/orjson), written in Rust, is a good example.
To take advantage of that, *apischema* provides `apischema.PassThroughOptions` class to specify which type should be passed through, whether they are supported natively by JSON libraries (or handled in a `default` fallback).
`apischema.serialization_default` can be used as `default` fallback in combination to `PassThroughOptions`. It has to be instantiated with the same kwargs parameters (`aliaser`, etc.) than `serialization_method`.
```python
{!pass_through.py!}
```
!!! important
Passthrough optimization is a lot diminished with `check_type=False`.
`PassThroughOptions` has the following parameters:
#### `any` — pass through `Any`
#### `collections` — pass through collections
Standard collections `list`, `tuple` and `dict` are natively handled by JSON libraries, but `set`, for example, isn't. Moreover, standard abstract collections like `Collection` or `Mapping`, which are used a lot, are
not guaranteed to have their runtime type supported (having a `set` annotated with
`Collection` for instance).
But, most of the time, collections runtime types are `list`/`dict`, so others can be handled in `default` fallback.
!!! note
Set-like type will not be passed through.
#### `dataclasses` - pass through dataclasses
Some JSON libraries, like [orjson](https://github.com/ijl/orjson), support dataclasses natively.
However, because *apischema* has a lot of specific features ([aliasing](json_schema.md#field-alias), [flatten fields](data_model.md#composition-over-inheritance---composed-dataclasses-flattening), [conditional skipping](data_model.md#skip-field-serialization-depending-on-condition), [fields ordering](de_serialization.md#field-ordering), etc.), only dataclasses with none of these features, and only passed through fields, will be passed through too.
#### `enums` — pass through `enum.Enum` subclasses
#### `tuple` — pass through `tuple`
Even if `tuple` is often supported by JSON serializers, if this options is not enabled, tuples will be serialized as lists. It also allows easier test writing for example.
!!! note
`collections=True` implies `tuple=True`;
#### `types` — pass through arbitrary types
Either a collection of types, or a predicate to determine if type has to be passed through.
## Binary compilation using Cython
*apischema* use Cython in order to compile critical parts of the code, i.e. the (de)serialization methods.
However, *apischema* remains a pure Python library — it can work without binary modules. Cython source files (`.pyx`) are in fact generated from Python modules. It allows notably keeping the code simple, by adding *switch-case* optimization to replace dynamic dispatch, avoiding big chains of `elif` in Python code.
!!! note
Compilation is disabled when using PyPy, because it's even faster with the bare Python code.
That's another interest of generating `.pyx` files: keeping Python source for PyPy.
## Override dataclass constructors
!!! warning
This feature is still experimental and disabled by default. Test carefully its impact on your code before enable it in production.
Dataclass constructors calls is the slowest part of the deserialization, about 50% of its runtime!
They are indeed pure Python functions and cannot be compiled.
In case of "normal" dataclass (no `__slots__`, `__post_init__`, or `__init__`/`__new__`/`__setattr__` overriding), *apischema* can override the constructor with a compilable code.
This feature can be toggled on/off globally using `apischema.settings.deserialization.override_dataclass_constructors`
## Discriminator
[OpenAPI discriminator](json_schema.md#openapi-discriminator) allows making union deserialization time more homogeneous.
```python
{!discriminator_perf.py!}
```
!!! note
As you can notice in the example, discriminator brings its own additional cost, but it's completely worth it.
## Benchmark
Benchmark code is located [benchmark directory](https://github.com/wyfo/apischema/tree/master/benchmark) or *apischema* repository.
Performances are measured on two datasets: a simple, a more complex one.
Benchmark is run by Github Actions workflow on `ubuntu-latest` with Python 3.10.
Results are given relatively to the fastest library, i.e. *apischema*; simple and complex results are detailed in the table, displayed result is the mean of both.
#### Relative execution time (lower is better)
{!benchmark_table.md!}
!!! note
Benchmark use [binary optimization](#binary-compilation-using-cython), but even running as a pure Python library, *apischema* still performs better than almost all of the competitors.
|