File: column_encryption.rst

package info (click to toggle)
python-cassandra-driver 3.29.2-6
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,144 kB
  • sloc: python: 51,532; ansic: 768; makefile: 138; sh: 13
file content (101 lines) | stat: -rw-r--r-- 5,503 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
Column Encryption
=================

Overview
--------
Support for client-side encryption of data was added in version 3.27.0 of the Python driver.  When using 
this feature data will be encrypted on-the-fly according to a specified :class:`~.ColumnEncryptionPolicy`
instance.  This policy is also used to decrypt data in returned rows.  If a prepared statement is used
this decryption is transparent to the user; retrieved data will be decrypted and converted into the original
type (according to definitions in the encryption policy).  Support for simple (i.e. non-prepared) queries is 
also available, although in this case values must be manually encrypted and/or decrypted.  The 
:class:`~.ColumnEncryptionPolicy` instance provides methods to assist with these operations.

Client-side encryption and decryption should work against all versions of Cassandra and DSE.  It does not
utilize any server-side functionality to do its work.

WARNING: Encryption format changes in 3.28.0
------------------------------------------------
Python driver 3.28.0 introduces a new encryption format for data written by :class:`~.AES256ColumnEncryptionPolicy`.
As a result, any encrypted data written by Python driver 3.27.0 will **NOT** be readable.
If you upgraded from 3.27.0, you should re-encrypt your data with 3.28.0.

Configuration
-------------
Client-side encryption is enabled by creating an instance of a subclass of :class:`~.ColumnEncryptionPolicy`
and adding information about columns to be encrypted to it.  This policy is then supplied to :class:`~.Cluster`
when it's created.

.. code-block:: python

    import os

    from cassandra.policies import ColDesc
    from cassandra.column_encryption.policies import AES256ColumnEncryptionPolicy, AES256_KEY_SIZE_BYTES

    key = os.urandom(AES256_KEY_SIZE_BYTES)
    cl_policy = AES256ColumnEncryptionPolicy()
    col_desc = ColDesc('ks1','table1','column1')
    cql_type = "int"
    cl_policy.add_column(col_desc, key, cql_type)
    cluster = Cluster(column_encryption_policy=cl_policy)

:class:`~.AES256ColumnEncryptionPolicy` is a subclass of :class:`~.ColumnEncryptionPolicy` which provides 
encryption and decryption via AES-256.  This class is currently the only available column encryption policy 
implementation, although users can certainly implement their own by subclassing :class:`~.ColumnEncryptionPolicy`.

:class:`~.ColDesc` is a named tuple which uniquely identifies a column in a given keyspace and table.  When we
have this tuple, the encryption key and the CQL type contained by this column we can add the column to the policy
using :func:`~.ColumnEncryptionPolicy.add_column`.  Once we have added all column definitions to the policy we
pass it along to the cluster.

The CQL type for the column only has meaning at the client; it is never sent to Cassandra.  The encryption key 
is also never sent to the server; all the server ever sees are random bytes reflecting the encrypted data.  As a
result all columns containing client-side encrypted values should be declared with the CQL type "blob" at the 
Cassandra server.

Usage
-----

Encryption
^^^^^^^^^^
Client-side encryption shines most when used with prepared statements.  A prepared statement is aware of information 
about the columns in the query it was built from and we can use this information to transparently encrypt any
supplied parameters.  For example, we can create a prepared statement to insert a value into column1 (as defined above)
by executing the following code after creating a :class:`~.Cluster` in the manner described above:

.. code-block:: python

    session = cluster.connect()
    prepared = session.prepare("insert into ks1.table1 (column1) values (?)")
    session.execute(prepared, (1000,))

Our encryption policy will detect that "column1" is an encrypted column and take appropriate action.

As mentioned above client-side encryption can also be used with simple queries, although such use cases are
certainly not transparent.  :class:`~.ColumnEncryptionPolicy` provides a helper named
:func:`~.ColumnEncryptionPolicy.encode_and_encrypt` which will convert an input value into bytes using the
standard serialization methods employed by the driver.  The result is then encrypted according to the configuration
of the policy.  Using this approach the example above could be implemented along the lines of the following:

.. code-block:: python

    session = cluster.connect()
    session.execute("insert into ks1.table1 (column1) values (%s)",(cl_policy.encode_and_encrypt(col_desc, 1000),))

Decryption
^^^^^^^^^^
Decryption of values returned from the server is always transparent.  Whether we're executing a simple or prepared
statement encrypted columns will be decrypted automatically and made available via rows just like any other
result.

Limitations
-----------
:class:`~.AES256ColumnEncryptionPolicy` uses the implementation of AES-256 provided by the 
`cryptography <https://cryptography.io/en/latest/>`_ module.  Any limitations of this module should be considered
when deploying client-side encryption.  Note specifically that a Rust compiler is required for modern versions
of the cryptography package, although wheels exist for many common platforms.

Client-side encryption has been implemented for both the default Cython and pure Python row processing logic.
This functionality has not yet been ported to the NumPy Cython implementation.  During testing,
the NumPy processing works on Python 3.7 but fails for Python 3.8.