File: api_usage.rst

package info (click to toggle)
python-zstandard 0.23.0-4
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 6,940 kB
  • sloc: ansic: 41,411; python: 8,665; makefile: 22; sh: 14
file content (149 lines) | stat: -rw-r--r-- 6,658 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
.. _api_usage:

=========
API Usage
=========

To interface with Zstandard, simply import the ``zstandard`` module:

.. code-block:: python

   import zstandard

It is a popular convention to alias the module as a different name for
brevity:

.. code-block:: python

   import zstandard as zstd

This module attempts to import and use either the C extension or CFFI
implementation. On Python platforms known to support C extensions (like
CPython), it raises an ImportError if the C extension cannot be imported.
On Python platforms known to not support C extensions (like PyPy), it only
attempts to import the CFFI implementation and raises ImportError if that
can't be done. On other platforms, it first tries to import the C extension
then falls back to CFFI if that fails and raises ImportError if CFFI fails.

To change the module import behavior, a ``PYTHON_ZSTANDARD_IMPORT_POLICY``
environment variable can be set. The following values are accepted:

``default``
   The behavior described above.
``cffi_fallback``
   Always try to import the C extension then fall back to CFFI if that
   fails.
``cext``
   Only attempt to import the C extension.
``cffi``
   Only attempt to import the CFFI implementation.

In addition, the ``zstandard`` module exports a ``backend`` attribute
containing the string name of the backend being used. It will be one
of ``cext`` or ``cffi`` (for *C extension* and *cffi*, respectively).

.. note::

   The documentation in this section makes references to various zstd
   concepts and functionality. See :ref:`concepts` for more details.

Choosing an API
===============

There are multiple APIs for performing compression and decompression. This is
because different applications have different needs and this library wants to
facilitate optimal use in as many use cases as possible.

From a high-level, APIs are divided into *one-shot* and *streaming*: either you
are operating on all data at once or you operate on it piecemeal.

The *one-shot* APIs are useful for small data, where the input or output
size is known. (The size can come from a buffer length, file size, or
stored in the zstd frame header.) A limitation of the *one-shot* APIs is that
input and output must fit in memory simultaneously. For say a 4 GB input,
this is often not feasible.

The *one-shot* APIs also perform all work as a single operation. So, if you
feed it large input, it could take a long time for the function to return.

The streaming APIs do not have the limitations of the simple API. But the
price you pay for this flexibility is that they are more complex than a
single function call.

The streaming APIs put the caller in control of compression and decompression
behavior by allowing them to directly control either the input or output side
of the operation.

With the *streaming input*, *compressor*, and *decompressor* APIs, the caller
has full control over the input to the compression or decompression stream.
They can directly choose when new data is operated on.

With the *streaming ouput* APIs, the caller has full control over the output
of the compression or decompression stream. It can choose when to receive
new data.

When using the *streaming* APIs that operate on file-like or stream objects,
it is important to consider what happens in that object when I/O is requested.
There is potential for long pauses as data is read or written from the
underlying stream (say from interacting with a filesystem or network). This
could add considerable overhead.

Thread and Object Reuse Safety
==============================

Unless stated otherwise, ``ZstdCompressor`` and ``ZstdDecompressor`` instances
cannot be used for temporally overlapping (de)compression operations. i.e.
if you start a (de)compression operation on an instance or a helper object
derived from it, it isn't safe to start another (de)compression operation
from the same instance until the first one has finished.

``ZstdCompressor`` and ``ZstdDecompressor`` instances have no guarantees
about thread safety. Do not operate on the same ``ZstdCompressor`` and
``ZstdDecompressor`` instance simultaneously from different threads. It is
fine to have different threads call into a single instance, just not at the
same time.

Objects derived from ``ZstdCompressor`` and ``ZstdDecompressor`` that
perform (de)compression operations (such as ``ZstdCompressionReader`` and
``ZstdDecompressionWriter``) are bound to the ``ZstdCompressor`` or
``ZstdDecompressor`` from which they came and are therefore not thread safe
by extension.

Some operations require multiple function calls to complete. e.g. streaming
operations. A single ``ZstdCompressor`` or ``ZstdDecompressor`` cannot be used
for simultaneously active operations. e.g. you must not start a streaming
operation when another streaming operation is already active.

If you need to perform multiple compression or decompression operations in
parallel, you MUST construct multiple ``ZstdCompressor`` or ``ZstdDecompressor``
instances so each independent operation has its own ``ZstdCompressor`` or
``ZstdDecompressor`` instance.

The C extension releases the GIL during non-trivial calls into the zstd C
API. Non-trivial calls are notably compression and decompression. Trivial
calls are things like parsing frame parameters. Where the GIL is released
is considered an implementation detail and can change in any release.

APIs that accept bytes-like objects don't enforce that the underlying object
is read-only. However, it is assumed that the passed object is read-only for
the duration of the function call. It is possible to pass a mutable object
(like a ``bytearray``) to e.g. ``ZstdCompressor.compress()``, have the GIL
released, and mutate the object from another thread. Such a race condition
is a bug in the consumer of python-zstandard. Most Python data types are
immutable, so unless you are doing something fancy, you don't need to
worry about this.

Performance Considerations
==========================

The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a
persistent compression or decompression *context*. Reusing a ``ZstdCompressor``
or ``ZstdDecompressor`` instance for multiple operations is faster than
instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each
operation. The differences are magnified as the size of data decreases. For
example, the difference between *context* reuse and non-reuse for 100,000
100 byte inputs will be significant (possibly over 10x faster to reuse contexts)
whereas 10 100,000,000 byte inputs will be more similar in speed (because the
time spent doing compression dwarfs time spent creating new *contexts*).