File: integrating.md

package info (click to toggle)
libmongocrypt 1.17.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 12,572 kB
  • sloc: ansic: 70,067; python: 4,547; cpp: 615; sh: 460; makefile: 44; awk: 8
file content (316 lines) | stat: -rw-r--r-- 12,727 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
# The guide to integrating libmongocrypt #

libmongocrypt is a C library meant to assist drivers in supporting
client side encryption. libmongocrypt acts as a state machine and the
driver is responsible for I/O between mongod, mongocryptd, crypt_shared, and KMS.

There are two major parts to integrating libmongocrypt into your driver:

-   Writing a language-specific binding to libmongocrypt
-   Using the binding in your driver to support client side encryption

## Design Rationale

### Simple interface

The library interface is intended to be used with multiple languages.

The API tries to be minimal. Most structs are opaque. Global initialization
is lazy.

Much of the API passes and returns BSON since all drivers can produce and parse
BSON.

### No I/O

libmongocrypt deliberately does not do I/O to avoid poor behavior with some
language runtimes. Example: in Go a blocking C call may block an OS thread,
rather than a goroutine.

## Part 1: Writing a Language-Specific Binding ##

The binding is the glue between your driver\'s native language and
libmongocrypt.

The binding uses the native language\'s foreign function interface to C.
For example, Java can accomplish this with
[JNA](https://github.com/java-native-access/jna), CPython with
[extensions](https://docs.python.org/3/extending/extending.html),
Node.js with [add-ons](https://nodejs.org/api/addons.html), etc.

The libmongocrypt library files (.so/.dll) are pre-built on its
[Evergreen project](https://spruce.mongodb.com/project/libmongocrypt/waterfall). Click
the variant\'s \"built-and-test-and-upload\" tasks to download the
attached files.

libmongocrypt describes all API that needs to be called from your driver
in the main public header
[mongocrypt.h](https://github.com/mongodb/libmongocrypt/blob/master/src/mongocrypt.h).

There are many types and functions in mongocrypt.h to bind. Consider as
a first step binding to only `mongocrypt_version`.
Once you have that working, proceed to write bindings for the remaining
API. Here are a few things to keep in mind:

-   \"ctx\" is short for context, and is a generic term indicating that
    the object stores state.
-   By C convention, functions are named like:
    `mongocrypt_<type>_<method>`. For example `mongocrypt_ctx_id`
    can be thought of as a class method \"id\" on the class \"ctx\".
-   `mongocrypt_binary_t` is a non-owning view of data. Calling
    `mongocrypt_binary_destroy` frees the view, but does nothing to the
    underlying data. When a `mongocrypt_binary_t` is returned (e.g.
    `mongocrypt_ctx_mongo_op`), the lifetime of the data is tied to the
    type that returned it (so the data returned will be freed when the
    `mongocrypt_ctx_t`) is freed.

Once you have full bindings for the API, it\'s time to do a sanity
check. The crux of libmongocrypt\'s API is the state machine represented
by `mongocrypt_ctx_t`. This state machine is exercised in the
[example-state-machine](https://github.com/mongodb/libmongocrypt/blob/master/test/example-state-machine.c)
executable included with libmongocrypt. It uses mock responses from
mongod, mongocryptd, and KMS. Reimplement the state machine loop
(`_run_state_machine`) in example-state-machine with your binding.

Seek help in the slack channel \#dbx-encryption.

## Part 2: Integrate into Driver ##

After you have a binding, integrate libmongocrypt in your driver to
support client side encryption.

See the [driver spec](https://github.com/mongodb/specifications/blob/master/source/client-side-encryption/client-side-encryption.md)
for a reference of the user-facing API. libmongocrypt is needed for:

-   Automatic encryption/decryption (enabled with `AutoEncryptionOpts`)
-   ClientEncryption (explicit encryption/decryption + key management)

It is recommended to start by integrating libmongocrypt to support
automatic encryption/decryption. Then reuse the implementation to
implement the ClientEncryption.

A MongoClient enabled with client side encryption MUST have one shared
`mongocrypt_t` handle (important because keys + JSON Schemas are cached
in this handle). Each ClientEncryption also has its own `mongocrypt_t`.

Any encryption or decryption operation is done by creating a
`mongocrypt_ctx_t` and initializing it for the appropriate operation.
`mongocrypt_ctx_t` is a state machine, and each state requires the
driver to perform some action. This may be performing I/O on one of the
following:

-   the encrypted MongoClient to which the operation is occurring (for
    auto encrypt).
-   the key vault MongoClient (which may be the same as the encrypted
    MongoClient).
-   KMS (via a TLS socket).
-   the MongoClient to the local mongocryptd process.

### Initializing ###

Call one of the following on a `mongocrypt_ctx_t`:

-   auto encrypt (`mongocrypt_ctx_encrypt_init`)
-   auto decrypt (`mongocrypt_ctx_decrypt_init`)
-   explicit encrypt (`mongocrypt_ctx_explicit_encrypt_init`)
-   explicit decrypt (`mongocrypt_ctx_explicit_decrypt_init`)
-   create data key (`mongocrypt_ctx_datakey_init`)
-   rewrap data key (`mongocrypt_ctx_rewrap_many_datakey_init`)

### State Machine ###

Below is a list of the various states a mongocrypt ctx can be in. For
each state, there is a description of what the driver is expected to do
to advance the state machine. Not all states will be entered for all
types of contexts. But one state machine runner can be used for all
types of contexts.

#### State: `MONGOCRYPT_CTX_ERROR` ####

**Driver needs to...**

Throw an exception based on the status from `mongocrypt_ctx_status`.

**Applies to...**

All contexts.

#### State: `MONGOCRYPT_CTX_NEED_MONGO_COLLINFO` ####

> [!IMPORTANT]
> <a name="multi-collection-commands"></a> **Multi-collection commands**: prior to 1.13.0, drivers were expected to pass _at most one result_ from `listCollections`. In 1.13.0, drivers are expected to pass _all results_ from `listCollections` to support multi-collection commands (e.g. aggregate with `$lookup`).
>
> Drivers must call `mongocrypt_setopt_enable_multiple_collinfo` to indicate the new behavior is implemented and opt-in to support for multi-collection commands. This opt-in is to prevent the following bug scenario:
> > A driver upgrades to 1.13.0, but does not update prior behavior which passes at most one result of a multi-collection command.
> > A multi-collection command requests schemas for both `db.c1` and `db.c2`.
> > The driver only passes the result for `db.c1` even though `db.c2` also has a result.
> > Therefore, libmongocrypt incorrectly believes `db.c2` has no schema.

**libmongocrypt needs**...

A result from a listCollections cursor.

**Driver needs to...**

1.  Run listCollections on the encrypted MongoClient with the filter
    provided by `mongocrypt_ctx_mongo_op`
2.  Pass all results (if any) with calls to `mongocrypt_ctx_mongo_feed` or proceed to the next step if nothing was returned. Results may be passed in any order.
3.  Call `mongocrypt_ctx_mongo_done`

**Applies to...**

auto encrypt

#### State: `MONGOCRYPT_CTX_NEED_MONGO_COLLINFO_WITH_DB` ####

See [note](#multi-collection-commands) about multi-collection commands.

**libmongocrypt needs**...

Results from a listCollections cursor from a specified database.

**Driver needs to...**

1.  Run listCollections on the encrypted MongoClient with the filter
    provided by `mongocrypt_ctx_mongo_op` on the database provided by `mongocrypt_ctx_mongo_db`.
2.  Pass all results (if any) with calls to `mongocrypt_ctx_mongo_feed` or proceed to the next step if nothing was returned. Results may be passed in any order.
3.  Call `mongocrypt_ctx_mongo_done`

**Applies to...**

A context initialized with `mongocrypt_ctx_encrypt_init` for automatic encryption. This state is only entered when `mongocrypt_setopt_use_need_mongo_collinfo_with_db_state` is called to opt-in.

#### State: `MONGOCRYPT_CTX_NEED_MONGO_MARKINGS` ####

**libmongocrypt needs**...

A reply from mongocryptd indicating which values in a command need to be
encrypted.

**Driver needs to...**

1.  Use db.runCommand to run the command provided by `mongocrypt_ctx_mongo_op`
    on the MongoClient connected to mongocryptd.
2.  Feed the reply back with `mongocrypt_ctx_mongo_feed`.
3.  Call `mongocrypt_ctx_mongo_done`.

**Applies to...**

auto encrypt

#### State: `MONGOCRYPT_CTX_NEED_MONGO_KEYS` ####

**libmongocrypt needs**...

Documents from the key vault collection.

**Driver needs to...**

1.  Use MongoCollection.find on the MongoClient connected to the key
    vault client (which may be the same as the encrypted client). Use
    the filter provided by `mongocrypt_ctx_mongo_op`.
2.  Feed all resulting documents back (if any) with repeated calls to
    `mongocrypt_ctx_mongo_feed`.
3.  Call `mongocrypt_ctx_mongo_done`.

**Applies to...**

All contexts except for create data key.

#### State: `MONGOCRYPT_CTX_NEED_KMS` ####

**libmongocrypt needs**...

The responses from one or more messages to KMS.

Ensure `mongocrypt_setopt_retry_kms` is called on the `mongocrypt_t` to enable retry.

**Driver needs to...**

1.  For each context returned by `mongocrypt_ctx_next_kms_ctx`:

    a.  Delay the message by the time in microseconds indicated by
        `mongocrypt_kms_ctx_usleep` if returned value is greater than 0.

    b.  Create/reuse a TLS socket connected to the endpoint indicated by
        `mongocrypt_kms_ctx_endpoint`. The endpoint string is a host name with
        a port number separated by a colon. E.g.
        "kms.us-east-1.amazonaws.com:443". A port number will always be
        included. Drivers may assume the host name is not an IP address or IP
        literal.

    c.  Write the message from `mongocrypt_kms_ctx_message` to the
        > socket.

    d.  Feed the reply back with `mongocrypt_kms_ctx_feed` or `mongocrypt_kms_ctx_feed_with_retry`. Repeat
        until `mongocrypt_kms_ctx_bytes_needed` returns 0. If the `should_retry` outparam returns true,
        the request may be retried by feeding the new response into the same context.

    If any step encounters a network error, call `mongocrypt_kms_ctx_fail`.
    If `mongocrypt_kms_ctx_fail` returns true, retry the request by continuing to the next KMS context or by feeding the new response into the same context.
    If `mongocrypt_kms_ctx_fail` returns false, abort and report an error. Consider wrapping the error reported in `mongocrypt_kms_ctx_status` to include the last network error.

2.  When done feeding all replies, call `mongocrypt_ctx_kms_done`.

##### Retry and Iteration

Call `mongocrypt_setopt_retry_kms` to enable retry behavior.

There are two options for retry:
-   Lazy retry: After processing KMS contexts, iterate again by calling `mongocrypt_ctx_next_kms_ctx`. KMS contexts
    needing a retry will be returned.
-   In-place retry: If a KMS context indicates retry, retry the KMS request and feed the new response to the same KMS
    context. Use `mongocrypt_kms_ctx_feed_with_retry` and check the return of `mongocrypt_kms_ctx_fail` to check if a
    retry is indicated.

The driver MAY fan out KMS requests in parallel. It is not safe to iterate KMS contexts (i.e. call
`mongocrypt_ctx_next_kms_ctx`) while operating on KMS contexts (e.g. calling `mongocrypt_kms_ctx_feed`). Drivers are
recommended to do an in-place retry on KMS requests.

**Applies to...**

All contexts.

#### State: `MONGOCRYPT_CTX_NEED_KMS_CREDENTIALS` ####

`MONGOCRYPT_CTX_NEED_KMS_CREDENTIALS` was added in libmongocrypt 1.4.0 as part of [MONGOCRYPT-382](https://jira.mongodb.org/browse/MONGOCRYPT-382).

`MONGOCRYPT_CTX_NEED_KMS_CREDENTIALS` can only be entered if `mongocrypt_setopt_use_need_kms_credentials_state` is called. This prevents breaking drivers that do not handle the `MONGOCRYPT_CTX_NEED_KMS_CREDENTIALS` state.

If a KMS provider is configured with an empty document (e.g. `{ "aws": {} }`), the `MONGOCRYPT_CTX_NEED_KMS_CREDENTIALS` is entered before KMS requests are made.

**libmongocrypt needs**...

Credentials for one or more KMS providers.

**Driver needs to...**

Fetch credentials for supported KMS providers. See the [Client Side Encryption specification](https://github.com/mongodb/specifications/blob/master/source/client-side-encryption/client-side-encryption.md#automatic-credentials) for details.

Pass credentials to libmongocrypt using `mongocrypt_ctx_provide_kms_providers`.

**Applies to...**

All contexts.

#### State: `MONGOCRYPT_CTX_READY` ####

**Driver needs to...**

Call `mongocrypt_ctx_finalize` to perform the encryption/decryption and
get the final result.

**Applies to...**

All contexts except for create data key.

#### State: `MONGOCRYPT_CTX_DONE` ####

**Driver needs to...**

Exit the state machine loop.

**Applies to...**

All contexts.