1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476
|
# Writing code in aws-crt-python
`aws-crt-python` provides "language bindings", allowing Python to use the
C libraries which make up the AWS SDK Common Runtime (CRT).
You **MUST** read both [Extending Python with C](https://docs.python.org/3/extending/extending.html)
and [Coding Guidelines for the aws-c Libraries](https://github.com/awslabs/aws-c-common#coding-guidelines)
from top to bottom before going any further in this guide.
This is not easy code to write. You must know Python. You must know C.
You must learn how the aws-c libraries do error handling and memory management,
you must learn how the Python C API does error handling and memory management,
and you must mix the two styles together. This code is multithreaded and asynchronous.
Buckle up.
### Table of Contents
* [Useful Links](#useful-links)
* [Writing Python Code](#writing-python-code)
* [General](#general-python-rules)
* [Naming Conventions](#python-naming-conventions)
* [Use Type Hints](#use-type-hints)
* [Forward and Backward Compatibility](#forward-and-backward-compatibility)
* [Functions with Lots of Arguments](#functions-with-lots-of-arguments)
* [Use `None` for Optional Arguments](#use-none-as-the-default-value-for-optional-arguments)
* [Callback Signatures](#callback-signatures)
* [Be Careful When Adding](#be-careful-when)
* [Asynchronous APIs](#asynchronous-apis)
* [Don't Add Features](#dont-add-features)
* [Lifetime Management](#lifetime-management)
* [Terminology](#terminology)
* [Strong References / Reference Counting](#strong-references--reference-counting)
* [Reference Cycle](#reference-cycle)
* [Capsule](#capsule)
* [Bindings Design](#bindings-design)
* [A More Complex Example](#a-more-complex-example)
* [The Wrong Way to Build it](#the-wrong-way-to-build-it)
* [Option 1 - Pass Callbacks to C](#option-1---pass-callbacks-to-c)
* [Option 2 - Private Core Class](#option-2---private-core-class)
* [Writing C Code](#writing-c-code)
## Useful Links
* Required reading:
* [Extending Python with C](https://docs.python.org/3/extending/extending.html)
* [Coding Guidelines for the aws-c Libraries](https://github.com/awslabs/aws-c-common#coding-guidelines)
* Reference pages you'll visit 3x per day:
* [Format strings: Python -> C](https://docs.python.org/3/c-api/arg.html) -
Used by [PyArg_ParseTuple()](https://docs.python.org/3/c-api/arg.html#c.PyArg_ParseTuple)
* [Format strings: C -> Python](https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue) -
Used by [Py_BuildValue()](https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue),
[PyObject_CallMethod()](https://docs.python.org/3/c-api/call.html#c.PyObject_CallMethod), and
[PyObject_CallFunction()](https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunction)
# Writing Python Code
Follow these conventions unless you have a very convincing reason not to.
We acknowledge that our existing code isn't 100% consistent at following them.
Some features we recommend now weren't available in older versions of
Python that we used to support. Some conventions are due to lessons learned
when we had a hard time making changes to something without breaking its API.
And sometimes naming is inconsistent because the code had different authors
and our conventions weren't written down yet. But going forward
let's do it right.
## General Python Rules
### Python Naming Conventions
* Modules (files and folders) - `lowercase`
* Smoosh words together, if it's not too confusing.
* Example - `awscrt.eventstream` (NOT `aws_crt.event_stream`)
* Classes - `UpperCamelCase`
* For acronyms three letters or longer, only capitalize the first letter
* Example: `TlsContext` (NOT `TLSContext`)
* Don't repeat words in the full path.
* Example: `awscrt.mqtt.Client` (NOT `awscrt.mqtt.MqttClient`)
* Member variables - `snake_case`
* Functions - `snake_case()`
* Anything private - prefix with underscore
* Constants and Enum values - `ALL_CAPS`
* Example: `MessageType.PING`
* Time values - suffix with `_ms`, `_sec`, etc
### Use Type Hints
Use [type hints](https://docs.python.org/3/library/typing.html) in your APIs.
They help users and make it easier to write documentation.
Sadly, most of our existing code isn't using type hints because it was written
back when we supported older versions of Python
(TODO: add type hints to all our APIs).
Because type hints are newer, pay close attention in the docs before you use a feature,
to ensure it's available in our minimum supported Python version.
(TODO: add CI tests that would catch such errors)
## Forward and Backward compatibility
We need to design our APIs so that they don't break when we inevitably
add a few more configuration options to a class.
Follow these rules so we can gracefully alter the API without breaking it.
### Functions with Lots of Arguments
For functions with a lot of configuration options,
such as class `__init__()` functions, use one of the techniques below.
Complex functions inevitably get more optional arguments added over time.
Sometimes an argument even changes from required to optional.
TECHNIQUE 1 - Use [keyword-only](https://docs.python.org/3/tutorial/controlflow.html#keyword-only-arguments) arguments.
These let you introduce more arguments over time,
and they let you change an argument from required to optional.
They can also make user code more clear (i.e. `do_a_thing(ignore_errors=True)` vs `do_a_thing(True)`).
Example:
```py
class Client:
def __init__(self, *,
hostname: str, # this is required, but must be passed by keyword
port: int, # again, required
bootstrap: ClientBootstrap = None, # optional
connect_timeout_ms: int = None): # optional
```
TECHNIQUE 2 - Use an "options class", and pass that as the only argument.
It's easy to build these as a [dataclass](https://docs.python.org/3/library/dataclasses.html).
Example:
```py
@dataclass
class ClientOptions:
hostname: str
port: int
bootstrap: ClientBootstrap = None
connect_timeout_ms: int = None
class Client:
def __init__(self, options: ClientOptions):
```
The jury's currently out on which technique is better. Keyword arguments are graceful,
but "options classes" let us easily nest one set of options inside another set of options.
### Use `None` as the Default Value for Optional Arguments
Note in the examples above that `connect_timeout_ms` had a default value of `=None`,
instead of something concrete like `=5000`. This is a common in Python,
and a good practice besides. Default values sometimes change.
There are many aws-crt language bindings, and the fewer places something is hardcoded,
the easier it is to change. Ideally, all language bindings use `None` or similar
to represent "defaults please", which results in passing `0` or `NULL` down to C to
represent "defaults please", and then in a single location in C we set the actual default.
In documentation, just say "a default value is used" instead of writing in the actual value,
because the odds are good that the documentation will get out of sync with reality.
### Callback Signatures
Similar to how we build `__init__()` functions so that more options can be added over time,
we need to build callbacks so that more info can be passed to them in the future.
Public callbacks should take a single argument, which is built as a `dataclass`.
This gives us freedom to add members to the class in the future.
Example:
```py
@dataclass
class Message:
topic: str
payload: bytes
class Client:
def __init__(self, *,
...,
on_message_received: Callable[[Message], None] = None,
...)
# and then user code looks like:
def my_on_message_received_callback(msg):
print(f'Yay I got a Message: {msg}')
```
NOTE: Most of our existing code uses a different pattern for callbacks.
Instead of a single `dataclass` argument, multiple arguments are passed by keyword.
In documentation, we instruct the user to add `**kwargs` as the last argument in their function,
so that we are free to add more arguments over time without breaking user code.
This is weirder and more fragile than passing a single object.
Don't use this pattern unless you're adding to a class where it's already in use.
### Be careful when adding
1) When adding arguments to a function that is NOT using keyword-only arguments,
you MUST add new arguments to the end of the argument list.
Otherwise you may break user code that passes arguments by position.
2) When adding new members to a `dataclass`, you MUST add new members at the end.
Otherwise you may break user code that initializes the class using positional arguments.
(in Python 3.10+ there's a `kw_only` feature for `dataclass`,
but we can't use it since we support older Python versions)
## Asynchronous APIs
TODO: document when to use future vs callback
## Don't Add Features
When binding an API from the aws-c libraries, don't start adding extra features.
If you're tempted to add any "special logic" that would be valuable to the other aws-crt language bindings,
add that logic in the underlying aws-c library so that every other language can benefit.
Even for trivial things like picking nice default values, put it in the underlying aws-c library.
(see [Use `None` for Optional Arguments](#use-none-as-the-default-value-for-optional-arguments)).
# Lifetime Management
## Terminology
### Strong References / Reference Counting
A "strong reference" is one that keeps an object alive by incrementing its reference count.
To "release" the reference is to decrement the object's reference count.
When all references to an object are released, its reference count goes to zero and it gets cleaned up.
In pure Python code, every variable is a strong reference to an object.
When the variable goes away, the reference is released.
In C code, reference counts on Python objects are controlled using `Py_INCREF(x)` and `Py_DECREF(x)`.
Structs from the aws-c libraries have `_acquire(x)` and `_release(x)` calls to control their reference counts.
We'll talk more about this [later](#reference-counting-in-c).
### Reference Cycle
A "reference cycle" is when a circle of strong references is created.
Reference cycles cause memory to leak because the reference counts never get to zero.
Python has a [garbage collector](https://devguide.python.org/internals/garbage-collector)
that can detect and clean up reference cycles among normal Python objects.
HOWEVER, any cycle involving a `Py_INCREF(x)` from C creates an undetectable cycle.
You MUST NOT create reference cycles when designing bindings.
### Capsule
[PyCapsule](https://docs.python.org/3/extending/extending.html#using-capsules)
lets us bind the lifetime of a C struct to the lifetime of a Python object.
It's a Python object that holds a C pointer and a "destructor" function pointer.
When Python cleans up the `PyCapsule`, the destructor function will be called.
## Bindings Design
Let's look at the bindings for `aws_event_loop_group` (our I/O thread pool).
This diagram shows the strong references between objects in Python and C:

Description of parts (from bottom to top):
* `aws_event_loop_group` - The underlying native implementation struct,
which knows nothing about Python.
* Lives in C library: [aws-c-io](https://github.com/awslabs/aws-c-io)
* Git submodule location: [crt/aws-c-io](/crt/aws-c-io)
* Header file: [<aws/io/event_loop.h>](https://github.com/awslabs/aws-c-io/blob/main/include/aws/io/event_loop.h)
* `event_loop_group_binding` - The "bindings" struct.
Holds a strong reference to the underlying native implementation (usually in a member variable named "native")
* Lives in Python/C extension module: `_awscrt`
* Source file: [source/io.c](/source/io.c)
* `PyCapsule` - The Python object which "owns" the pointer to `event_loop_group_binding`.
* `EventLoopGroup` - The Python class that users create and interact with.
Holds a reference to the `PyCapsule` in a member variable named "_binding".
* Lives in Python module: `awscrt.io`
* Source File: [awscrt/io.py](/awscrt/io.py)
Creation goes like this:
* User's Python code creates an `EventLoopGroup`:
```py
elg = EventLoopGroup()
```
* `EventLoopGroup` initializer looks something like:
```py
class EventLoopGroup:
def __init__(self, ...):
self._binding = _awscrt.event_loop_group_new(...)
```
* `_awscrt.event_loop_group_new(...)` is Python calling down into C.
The C function looks something like:
```C
PyObject *aws_py_event_loop_group_new(PyObject *self, PyObject *args) {
// ...parse arguments...
// allocate memory for binding struct
struct event_loop_group_binding *binding = aws_mem_calloc(...);
// create underlying implementation
binding->native = aws_event_loop_group_new(...);
// create PyCapsule which owns the binding struct.
// pass in "destructor" function that runs when
// PyCapsule is cleaned up by the garbage collector.
PyObject *capsule = PyCapsule_New(binding, on_capsule_destroyed_fn);
return capsule;
}
```
* Things stay alive because:
* The user's `elg` variable keeps the `EventLoopGroup` object alive.
* Member variable `EventLoopGroup._binding` keeps the `PyCapsule` alive.
* The `PyCapsule` keeps the `struct event_loop_group_binding` alive.
* The `event_loop_group_binding.native` pointer is a "strong reference" that keeps
`struct aws_event_loop_group` alive.
Destruction goes like this (it's actually more complex, we'll cover that later):
* When the user's Python code has no references to `elg`, the `EventLoopGroup` instance...
* The garbage collector cleans up the `EventLoopGroup`,
and the `PyCapsule` referenced by `EventLoopGroup._binding`.
* The `PyCapsule`'s destructor function runs, which looks something like:
```C
static on_capsule_destroyed_fn(PyObject *capsule) {
struct event_loop_group_binding *binding = PyCapsule_GetPointer(capsule);
// release reference to underlying implementation
aws_event_loop_group_release(binding->native)
// free binding struct's memory
aws_mem_release(binding);
}
```
* IF nothing else has a strong reference to `struct aws_event_loop_group`:
* then it begins its shutdown process, and its memory is cleaned up when shutdown completes.
* ELSE something else has a strong reference to `struct aws_event_loop_group`:
* so it won't begin its shutdown until the last reference is released.
Note: In the past, the aws-c libraries didn't have reference counting for any C structs.
You will still find older code in our Python bindings that tries to keep the entire dependency trees
of Python objects alive via `Py_INCREF(x)` (TODO: remove needless complexity).
You can't always look at existing code to see "the right way" of doing things.
## A More Complex Example
The sample above is simplified, it only shows Python calling into C.
But `aws_event_loop_group` has a callback that fires when it finishes shutting down.
That means C needs to call into Python AFTER the Python `EventLoopGroup` object has been cleaned up.
For C to call into Python, it must reference a Python object
(the function itself, or an object with a member function).
This means our binding needs to store a strong reference and
keep that Python object alive until the callback has fired.
### The Wrong Way to Build it
You MUST NOT create a reference cycle!
You might be tempted to give `event_loop_group_binding` a strong reference to the `EventLoopGroup` instance.
Then C could simply call a private member function like `EventLoopGroup._on_shutdown_complete()`.
But this design creates a reference cycle (see image below):

### Option 1 - Pass Callbacks to C
Most of our bindings work like this:

Creation is similar to the [simple binding](#bindings-design), except:
* Within `EventLoopGroup.__init__()` a "callable" is defined and passed down to C.
The code looks something like:
```py
class EventLoopGroup:
def __init__(self, ...):
# define callable local function
def shutdown_callback():
...do stuff...
self._binding = _awscrt.event_loop_group_new(shutdown_callback, ...)
```
* `event_loop_group_binding` keeps a strong reference to this Python object.
The extra code looks something like:
```C
PyObject *aws_py_event_loop_group_new(PyObject *self, PyObject *args) {
// ...parse arguments, creating binding struct, etc same as before...
// store strong reference to callable
binding->py_shutdown_callback = Py_INCREF(py_shutdown_callback);
```
When the final shutdown callback happens in C, the Python callable
is invoked, and then the reference is released via `Py_DECREF(x)`.
Destruction goes like this:
* When the Python code has no references to `elg`, the `EventLoopGroup` instance...
* The garbage collector cleans up the `EventLoopGroup`,
and the `PyCapsule` referenced by `EventLoopGroup._binding`.
* The `PyCapsule` runs its destructor function.
* The destructor function calls `aws_event_loop_group_release(binding->native)`,
but doesn't delete the `event_loop_group_binding` struct yet.
* The `aws_event_loop_group` won't shutdown until nothing else is referencing it.
Even when the final reference is released, it still needs to wait for the threads
in its thread-pool to finish their shutdown process.
* Finally, shutdown completes and the C callback is invoked.
* The C callback invokes the Python `callable`, then releases it via `Py_DECREF(x)`.
* Now the garbage collector can clean up the `callable` object.
* The C callback finally deletes the `event_loop_group_binding`.
This struct only existed to keep two strong references,
but now they've both been released.
### Option 2 - Private Core Class
Another option is to build a private `_Core` class containing anything
that may need to outlive the main Python object:

This is similar to [Option 1](#option-1---pass-callbacks-to-c),
but we write callbacks as member functions on the `_Core` class,
instead of defining local functions within the body of `EventLoopGroup.__init__(self)`.
Code looks something like:
```py
class EventLoopGroup:
def __init__(self, ...):
core = _EventLoopGroupCore()
self._binding = _awscrt.event_loop_group_new(core, ...)
class _EventLoopGroupCore:
def shutdown_callback(self):
...do stuff...
```
This technique hasn't actually been used, but the author of this doc
thinks it might be a graceful way to build in the future.
# Writing C Code
## Reference Counting in C
Read python.org's guide to [Extending Python with C](https://docs.python.org/3/extending/extending.html)
from top to bottom. It does an excellent job teaching about reference counts.
Great, now you know what "strong references" and "borrowed references"
are, all about `Py_INCREF(x)` and `Py_DECREF(x)` and when you do an do not
need to call them. You know that you must be EXTREMELY CAREFUL with reference
counts, because if you don't do it PERFECTLY then you will leak memory,
or crash due to double-free, or crash due use-after-free.
Thanks for reading that guide in full.
Read the docs for EVERY SINGLE Python API call you make in C,
to see whether it returns a new reference or borrowed reference.
You should add `/* new reference */` and `/* borrowed reference */` comments
next to these calls so it's clear to any future people that touch this code.
You are also encouraged to use [Py_XDECREF(x)](https://docs.python.org/3/c-api/refcounting.html#c.Py_XDECREF)
and [Py_CLEAR(x)](https://docs.python.org/3/c-api/refcounting.html#c.Py_CLEAR),
which are safer versions of the basic `Py_DECREF(x)`.
In the aws-c libraries, reference counting on structs is done using `_acquire(x)` and `_release(x)` functions.
Structs will keep each other alive as long as necessary using these functions.
For example, `struct aws_http_connection` needs `struct aws_event_loop_group` (an I/O thread pool)
to exist for the duration of the connection.
Therefore, the connection's creation function takes a pointer to the thread pool
and calls `aws_event_loop_group_acquire(x)` to keep it alive.
When the connection dies it calls `aws_event_loop_group_release(x)` to release
the thread pool.
Not every struct in the aws-c libraries has `_acquire(x)` and `_release(x)` functions
(simple datastructures like `struct aws_byte_buf` are not reference counted).
Only heap-allocated structs with complex or unpredictable lifetimes have these functions.
Every struct bound to a Python class is considered to have an unpredictable lifetime
because we don't know what our users' Python code will look like.
We can't assume a Python programmer will carefully store variables to each item
in a tree of dependencies, ensuring everything stays alive for "the right" length of time.
Python programmers just don't work that way, and they shouldn't.
Python is a garbage collected language. Garbage collected languages exist
to free programmers from wasting their time on that kind of tedium.
TODO:
Talk about how our tests can and cannot check for leaks.
Talk about which classes require a `close()` function, and which don't.
Suggest writing as little C code as possible.
Recommend error-handling strategies.
Talk about the allocators (tracked vs untracked)
Talk about logging. Consider making it easier to turn on logging.
Talk about sloppy shutdown.
|