1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
|
# PyCDDL: Deserialize CBOR and/or do CDDL schema validation
[CDDL](https://www.rfc-editor.org/rfc/rfc8610.html) is a schema language for the CBOR serialization format.
`pycddl` allows you to:
* Validate CBOR documents match a particular CDDL schema, based on the Rust [`cddl`](https://github.com/anweiss/cddl) library.
* Optionally, decode CBOR documents.
## Usage
### Validation
Here we use the [`cbor2`](https://pypi.org/project/cbor2/) library to serialize a dictionary to CBOR, and then validate it:
```python
from pycddl import Schema
import cbor2
uint_schema = Schema("""
object = {
xint: uint
}
"""
)
uint_schema.validate_cbor(cbor2.dumps({"xint", -2}))
```
If validation fails, a `pycddl.ValidationError` is raised.
### Validation + deserialization
You can deserialize CBOR to Python objects using `cbor.loads()`.
However:
* `cbor2` uses C code by default, and the C programming language is prone to memory safety issues.
If you are reading untrusted CBOR, better to use a Rust library to decode the data.
* You will need to parse the CBOR twice, once for validation and once for decoding, adding performance overhead.
By deserializing with `pycddl`, you solve the first problem, and a future version of `pycddl` will solve the second problem (see https://gitlab.com/tahoe-lafs/pycddl/-/issues/37).
```python
from pycddl import Schema
import cbor2
uint_schema = Schema("""
object = {
xint: uint
}
"""
)
deserialized = uint_schema.validate_cbor(cbor2.dumps({"xint", -2}), True)
assert deserialized == {"xint": -2}
```
### Deserializing without schema validation
If you don't care about schemas, you can just deserialize the CBOR like so:
```python
from pycddl import Schema
ACCEPT_ANYTHING = Schema("main = any")
def loads(encoded_cbor_bytes):
return ACCEPT_ANYTHING.validate_cbor(encoded_cbor_bytes, True)
```
In a future release this will become a standalone, more efficient API, see https://gitlab.com/tahoe-lafs/pycddl/-/issues/36
### Reducing memory usage and safety constraints
In order to reduce memory usage, you can pass in any Python object that implements the buffer API and stores bytes, e.g. a `memoryview()` or a `mmap` object.
**The passed-in object must be read-only, and the data must not change during validation!**
If you mutate the data while validation is happening the result can be memory corruption or other [undefined behavior](https://stackoverflow.com/questions/18506029/can-undefined-behavior-erase-the-hard-drive#comment27209771_18506029).
## Supported CBOR types for deserialization
If you are deserializing a CBOR document into Python objects, you can deserialize:
* Null/None.
* Booleans.
* Floats.
* Integers up to 64-bit size.
Larger integers aren't supported yet.
* Bytes.
* Strings.
* Lists.
* Maps/dictionaries.
* Sets.
Other types will be added in the future if there is user demand.
Schema validation is not restricted to this list, but rather is limited by the functionality of the [`cddl` Rust crate](https://github.com/anweiss/cddl).
## Release notes
### 0.6.4
Features:
* Update Rust [CDDL](https://crates.io/crates/cddl/) dependency to update transitive dependency [lexical-core](https://crates.io/crates/lexical-core) which had been revised (i.a.) because of soundness/safety issues.
### 0.6.3
Features:
* Support final 3.13.
### 0.6.2
Features:
* Added support for Python 3.13.
### 0.6.1
Bug fixes:
* Allow PyPy to deserialize CBOR too.
### 0.6.0
Features:
* `validate_cbor(serialized_cbor, True)` will deserialize the CBOR document into Python objects, so you don't need to use e.g. `cbor2.loads(serialized_cbor)` separately.
Bug fixes:
* Release the GIL in a much more useful way.
Misc:
* Upgrade to newer `cddl` crate (0.9.4).
### 0.5.3
* Upgrade to newer `cddl` crate (0.9.3).
* Add Python 3.12 support.
### 0.5.2
* Upgrade to newer `cddl` crate (0.9.2), improving validation functionality.
### 0.5.1
* Upgrade to newer `cddl` crate, fixing some validation bugs.
### 0.5.0
* Support for ARM macOS.
* Dropped Python 3.7 support.
### 0.4.1
* Test fixes, with no user-relevant changes.
### 0.4.0
* `validate_cbor()` now accepts read-only buffers, not just `bytes`. This is useful if you want to e.g. validate a large file, since you can `mmap()` it.
* The GIL is released when parsing documents larger than 10KiB.
### 0.3.0
* Fixed major bug where if the document was valid UTF-8, the library would attempt to parse it as JSON!
* Added support for ARM macOS.
### 0.2.2
* Updated to `cddl` 0.9.1.
### 0.2.1
* Added PyPy wheels.
### 0.2.0
* Schemas are now only parsed once (when the `Schema()` object is created), instead of every time validation happens, which should improve validation performance.
* Updated to a newer version of [underlying CDDL library](https://github.com/anweiss/cddl), which should make CDDL parsing more compliant.
* Added a `repr()` implementation to `Schema` for easier debugging.
### 0.1.11
* Initial release.
|