1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
|
.. _free-threaded:
.. cpp:namespace:: nanobind
Free-threaded Python
====================
**Free-threading** is an experimental new Python feature that replaces the
`Global Interpreter Lock (GIL)
<https://en.wikipedia.org/wiki/Global_interpreter_lock>`__ with a fine-grained
locking scheme to better leverage multi-core parallelism. The resulting
benefits do not come for free: extensions must explicitly opt-in and generally
require careful modifications to ensure correctness.
Nanobind can target free-threaded Python since version 2.2.0. This page
explains how to do so and discusses a few caveats. Besides this page, make sure
to review `py-free-threading.github.io <https://py-free-threading.github.io>`__
for a more comprehensive discussion of free-threaded Python. `PEP 703
<https://peps.python.org/pep-0703/>`__ explains the nitty gritty details.
Opting in
---------
To opt into free-threaded Python, pass the ``FREE_THREADED`` parameter to the
:cmake:command:`nanobind_add_module()` CMake target command. For other build
systems, refer to their respective documentation pages.
.. code-block:: cmake
nanobind_add_module(
my_ext # Target name
FREE_THREADED # Opt into free-threading
my_ext.h # Source code files below
my_ext.cpp)
nanobind ignores the ``FREE_THREADED`` parameter when the registered Python
version does not support free-threading.
.. note::
**Stable ABI**: Note that there currently is no stable ABI for free-threaded
Python, hence the ``STABLE_ABI`` parameter will be ignored in free-threaded
extensions builds. It is valid to combine the ``STABLE_ABI`` and
``FREE_THREADED`` arguments: the build system will choose between the two
depending on the detected Python version.
.. warning::
Loading an Python extension that does not support free-threading disables
free-threading globally. In larger binding projects with multiple
extensions, all of them must be adapted.
If free-threading was requested and is available, the build system will set the
``NB_FREE_THREADED`` preprocessor flag. This can be helpful to specialize
binding code with ``#ifdef`` blocks, e.g.:
.. code-block:: cpp
#if !defined(NB_FREE_THREADED)
... // simple GIL-protected code
#else
... // more complex thread-aware code
#endif
Caveats
-------
Free-threading can violate implicit assumptions made by extension developers
when previously serial operations suddenly run concurrently, producing
undefined behavior (race conditions, crashes, etc.).
Let's consider a concrete example: the binding code below defines a ``Counter``
class with an increment operation.
.. code-block:: cpp
struct Counter {
int value = 0;
void inc() { value++; }
};
nb::class_<Counter>(m, "Counter")
.def("inc", &Counter::inc)
.def_ro("value", &Counter::value);
If multiple threads call the ``inc()`` method of a single ``Counter``, the
final count will generally be incorrect, as the increment operation ``value++``
does not execute atomically.
To fix this, we could modify the C++ type so that it protects its ``value``
member from concurrent modification, for example using an atomic number type
(e.g., ``std::atomic<int>``) or a critical section (e.g., based on
``std::mutex``).
The race condition in the above example is relatively benign. However,
in more complex projects, combinations of concurrency and unsafe memory
accesses could introduce non-deterministic data corruption and crashes.
Another common source of problems are *global variables* undergoing concurrent
modification when no longer protected by the GIL. They will likewise require
supplemental locking. The :ref:`next section <free-threaded-locks>` explains a
Python-specific locking primitive that can be used in binding code besides
the solutions mentioned above.
Multi-threaded code that concurrently returns the same C++ instance via the
:cpp:enumerator:`nb::rv_policy::reference` policy may observe situations, where
multiple Python objects are created that all wrap the same C++ instance
(however, this is harmless aside from the duplication).
.. _free-threaded-locks:
Python locks
------------
Nanobind provides convenience functionality encapsulating the mutex
implementation that is part of Python ("``PyMutex``"). It is slightly more
efficient than OS/language-provided synchronization primitives and generally
preferable within Python extensions.
The class :cpp:class:`ft_mutex` is analogous to ``std::mutex``, and
:cpp:class:`ft_lock_guard` is analogous to ``std::lock_guard``. Note that they
only exist to add *supplemental* critical sections needed in free-threaded
Python, while becoming inactive (no-ops) when targeting regular GIL-protected
Python.
With these abstractions, the previous ``Counter`` implementation could be
rewritten as:
.. code-block:: cpp
:emphasize-lines: 3,6
struct Counter {
int value = 0;
nb::ft_mutex mutex;
void inc() {
nb::ft_lock_guard guard(mutex);
value++;
}
};
These locks are very compact (``sizeof(nb::ft_mutex) == 1``), though this is a
Python implementation detail that could change in the future.
.. _argument-locks:
Argument locking
----------------
Modifying class and function definitions as shown above may not always be
possible. As an alternative, nanobind also provides a way to *retrofit*
supplemental locking onto existing code. The idea is to lock individual
arguments of a function *before* being allowed to invoke it. A built-in mutex
present in every Python object enables this.
To do so, call the :cpp:func:`.lock() <arg::lock>` member of
:cpp:class:`nb::arg() <arg>` annotations to indicate that an
argument must be locked, e.g.:
- :cpp:func:`nb::arg("my_parameter").lock() <arg::lock>`
- :cpp:func:`"my_parameter"_a.lock() <arg::lock>` (short-hand form)
In methods bindings, pass :cpp:struct:`nb::lock_self() <lock_self>` to lock
the implicit ``self`` argument. Note that at most 2 arguments can be
locked per function, which is a limitation of the `Python locking API
<https://docs.python.org/3.13/c-api/init.html#c.Py_BEGIN_CRITICAL_SECTION2>`__.
The example below shows how this functionality can be used to protect ``inc()``
and a new ``merge()`` function that acquires two simultaneous locks.
.. code-block:: cpp
struct Counter {
int value = 0;
void inc() { value++; }
void merge(Counter &other) {
value += other.value;
other.value = 0;
}
};
nb::class_<Counter>(m, "Counter")
.def("inc", &Counter::inc, nb::lock_self())
.def("merge", &Counter::merge, nb::lock_self(), "other"_a.lock())
.def_ro("value", &Counter::value);
The above solution has an obvious drawback: it only protects *bindings* (i.e.,
transitions from Python to C++). For example, if some other part of a C++
codebase calls ``merge()`` directly, the binding layer won't be involved, and
no locking takes place. If such behavior can introduce race conditions, a
larger-scale redesign of your project may be in order.
.. note::
Adding locking annotations indiscriminately is inadvisable because locked
calls are more costly than unlocked ones. The :cpp:func:`.lock()
<arg::lock>` and :cpp:struct:`nb::lock_self() <lock_self>` annotations are
ignored in GIL-protected builds, hence this added cost only applies to
free-threaded extensions.
Furthermore, when adding locking annotations to a function, consider keeping
the arguments *unnamed* (i.e., :cpp:func:`nb::arg().lock() <arg::lock>`
instead of :cpp:func:`nb::arg("name").lock() <arg::lock>`) if the function
will never be called with keyword arguments. Processing named arguments
causes small :ref:`binding overheads <binding-overheads>` that may be
undesirable if a function that does very little is called at a very high
rate.
.. note::
**Python API and locking**: When the lock-protected function performs Python
API calls (e.g., using :ref:`wrappers <wrappers>` like :cpp:class:`nb::dict
<dict>`), Python may temporarily release locks to avoid deadlocks. Here,
even basic reference counting such as a :cpp:class:`nb::object
<object>` variable expiring at the end of a scope counts as an API call.
These locks will be reacquired following the Python API call. This behavior
resembles ordinary (GIL-protected) Python code, where operations like
`Py_DECREF()
<https://docs.python.org/3/c-api/refcounting.html#c.Py_DECREF>`__ can cause
cause arbitrary Python code to execute. The semantics of this kind of
relaxed critical section are described in the `Python documentation
<https://docs.python.org/3.13/c-api/init.html#python-critical-section-api>`__.
Miscellaneous notes
-------------------
API
---
The following API specific to free-threading has been added:
- :cpp:class:`nb::ft_mutex <ft_mutex>`
- :cpp:class:`nb::ft_lock_guard <ft_lock_guard>`
- :cpp:class:`nb::ft_object_guard <ft_object_guard>`
- :cpp:class:`nb::ft_object2_guard <ft_object2_guard>`
- :cpp:func:`nb::arg::lock() <arg::lock>`
API stability
_____________
The interface explained in this is excluded from the project's semantic
versioning policy. Free-threading is still experimental, and API breaks may be
necessary based on future experience and changes in Python itself.
Wrappers
________
:ref:`Wrapper types <wrappers>` like :cpp:class:`nb::list <list>` may be used
in multi-threaded code. Operations like :cpp:func:`nb::list::append()
<list::append>` internally acquire locks and behave just like their ordinary
Python counterparts. This means that race conditions can still occur without
larger-scale synchronization, but such races won't jeopardize the memory safety
of the program.
GIL scope guards
________________
Prior to free-threaded Python, the nanobind scope guards
:cpp:struct:`gil_scoped_acquire` and :cpp:struct:`gil_scoped_release` would
normally be used to acquire/release the GIL and enable parallel regions.
These remain useful and should not be removed from existing code: while no
longer blocking operations, they set and unset the current Python thread
context and inform the garbage collector.
The :cpp:struct:`gil_scoped_release` RAII scope guard class plays a special
role in free-threaded builds, since it releases all :ref:`argument locks
<argument-locks>` held by the current thread.
Immortalization
_______________
Python relies on a technique called *reference counting* to determine when an
object is no longer needed. This approach can become a bottleneck in
multi-threaded programs, since increasing and decreasing reference counts
requires coordination among multiple processor cores. Python type and function
objects are especially sensitive, since their reference counts change at a very
high rate.
Similar to free-threaded Python itself, nanobind avoids this bottleneck by
*immortalizing* functions (``nanobind.nb_func``, ``nanobind.nb_method``) and
type bindings. Immortal objects don't require reference counting and therefore
cannot cause the bottleneck mentioned above. The main downside of this approach
is that these objects leak when the interpreter shuts down. Free-threaded
nanobind extensions disable the internal :ref:`leak checker <refleaks>`,
since it would produce many warning messages caused by immortal objects.
Internal data structures
________________________
Nanobind maintains various internal data structures that store information
about instances and function/type bindings. These data structures also play an
important role to exchange type/instance data in larger projects that are split
across several independent extension modules.
The layout of these data structures differs between ordinary and free-threaded
extensions, therefore nanobind isolates them from each other by assigning a
different ABI version tag. This means that multi-module projects will need
to consistently compile either free-threaded or non-free-threaded modules.
Free-threaded nanobind uses thread-local and sharded data structures to avoid
lock and atomic contention on the internal data structures, which would
otherwise become a bottleneck in multi-threaded Python programs.
Thread sanitizers
_________________
The `thread sanitizer
<https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual>`__ (TSAN)
offers an effective way of tracking down undefined behavior in multithreaded
application.
To use TSAN with nanonbind extensions, you *must* also create a custom Python
build that has TSAN enabled. This is because nanobind internally builds on
Python locks. If the implementation of the locks is not instrumented by TSAN,
the tool will detect a large volume of false positives.
To make a TSAN-instrumented Python build, download a Python source release and
to pass the following options to its ``configure`` script:
.. code-block:: bash
$ ./configure --disable-gil --with-thread-sanitizer <.. other options ..>
|