File: free_threaded.rst

package info (click to toggle)
nanobind 2.11.0-3
links: PTS, VCS
area: main
in suites: forky, sid
size: 3,300 kB
sloc: cpp: 12,232; python: 6,315; ansic: 4,813; makefile: 22; sh: 15
file content (326 lines) | stat: -rw-r--r-- 13,023 bytes
parent folder | download | duplicates (2)
.. _free-threaded:

.. cpp:namespace:: nanobind

Free-threaded Python
====================

**Free-threading** is an experimental new Python feature that replaces the
`Global Interpreter Lock (GIL)
<https://en.wikipedia.org/wiki/Global_interpreter_lock>`__ with a fine-grained
locking scheme to better leverage multi-core parallelism. The resulting
benefits do not come for free: extensions must explicitly opt-in and generally
require careful modifications to ensure correctness.

Nanobind can target free-threaded Python since version 2.2.0. This page
explains how to do so and discusses a few caveats. Besides this page, make sure
to review `py-free-threading.github.io <https://py-free-threading.github.io>`__
for a more comprehensive discussion of free-threaded Python. `PEP 703
<https://peps.python.org/pep-0703/>`__ explains the nitty gritty details.

Opting in
---------

To opt into free-threaded Python, pass the ``FREE_THREADED`` parameter to the
:cmake:command:`nanobind_add_module()` CMake target command. For other build
systems, refer to their respective documentation pages.

.. code-block:: cmake

      nanobind_add_module(
        my_ext                   # Target name
        FREE_THREADED            # Opt into free-threading
        my_ext.h                 # Source code files below
        my_ext.cpp)

nanobind ignores the ``FREE_THREADED`` parameter when the registered Python
version does not support free-threading.

.. note::

   **Stable ABI**: Note that there currently is no stable ABI for free-threaded
   Python, hence the ``STABLE_ABI`` parameter will be ignored in free-threaded
   extensions builds. It is valid to combine the ``STABLE_ABI`` and
   ``FREE_THREADED`` arguments: the build system will choose between the two
   depending on the detected Python version.

.. warning::

   Loading an Python extension that does not support free-threading disables
   free-threading globally. In larger binding projects with multiple
   extensions, all of them must be adapted.

If free-threading was requested and is available, the build system will set the
``NB_FREE_THREADED`` preprocessor flag. This can be helpful to specialize
binding code with ``#ifdef`` blocks, e.g.:

.. code-block:: cpp

   #if !defined(NB_FREE_THREADED)
   ... // simple GIL-protected code
   #else
   ... // more complex thread-aware code
   #endif

Caveats
-------

Free-threading can violate implicit assumptions made by extension developers
when previously serial operations suddenly run concurrently, producing
undefined behavior (race conditions, crashes, etc.).

Let's consider a concrete example: the binding code below defines a ``Counter``
class with an increment operation.

.. code-block:: cpp

   struct Counter {
       int value = 0;
       void inc() { value++; }
   };

   nb::class_<Counter>(m, "Counter")
       .def("inc", &Counter::inc)
       .def_ro("value", &Counter::value);


If multiple threads call the ``inc()`` method of a single ``Counter``, the
final count will generally be incorrect, as the increment operation ``value++``
does not execute atomically.

To fix this, we could modify the C++ type so that it protects its ``value``
member from concurrent modification, for example using an atomic number type
(e.g., ``std::atomic<int>``) or a critical section (e.g., based on
``std::mutex``).

The race condition in the above example is relatively benign. However,
in more complex projects, combinations of concurrency and unsafe memory
accesses could introduce non-deterministic data corruption and crashes.

Another common source of problems are *global variables* undergoing concurrent
modification when no longer protected by the GIL. They will likewise require
supplemental locking. The :ref:`next section <free-threaded-locks>` explains a
Python-specific locking primitive that can be used in binding code besides
the solutions mentioned above.

Multi-threaded code that concurrently returns the same C++ instance via the
:cpp:enumerator:`nb::rv_policy::reference` policy may observe situations, where
multiple Python objects are created that all wrap the same C++ instance
(however, this is harmless aside from the duplication).

.. _free-threaded-locks:

Python locks
------------

Nanobind provides convenience functionality encapsulating the mutex
implementation that is part of Python ("``PyMutex``"). It is slightly more
efficient than OS/language-provided synchronization primitives and generally
preferable within Python extensions.

The class :cpp:class:`ft_mutex` is analogous to ``std::mutex``, and
:cpp:class:`ft_lock_guard` is analogous to ``std::lock_guard``. Note that they
only exist to add *supplemental* critical sections needed in free-threaded
Python, while becoming inactive (no-ops) when targeting regular GIL-protected
Python.

With these abstractions, the previous ``Counter`` implementation could be
rewritten as:

.. code-block:: cpp
   :emphasize-lines: 3,6

   struct Counter {
       int value = 0;
       nb::ft_mutex mutex;

       void inc() {
           nb::ft_lock_guard guard(mutex);
           value++;
       }
   };

These locks are very compact (``sizeof(nb::ft_mutex) == 1``), though this is a
Python implementation detail that could change in the future.

.. _argument-locks:

Argument locking
----------------

Modifying class and function definitions as shown above may not always be
possible. As an alternative, nanobind also provides a way to *retrofit*
supplemental locking onto existing code. The idea is to lock individual
arguments of a function *before* being allowed to invoke it. A built-in mutex
present in every Python object enables this.

To do so, call the :cpp:func:`.lock() <arg::lock>` member of
:cpp:class:`nb::arg() <arg>` annotations to indicate that an
argument must be locked, e.g.:

- :cpp:func:`nb::arg("my_parameter").lock() <arg::lock>`
- :cpp:func:`"my_parameter"_a.lock() <arg::lock>` (short-hand form)

In methods bindings, pass :cpp:struct:`nb::lock_self() <lock_self>` to lock
the implicit ``self`` argument. Note that at most 2 arguments can be
locked per function, which is a limitation of the `Python locking API
<https://docs.python.org/3.13/c-api/init.html#c.Py_BEGIN_CRITICAL_SECTION2>`__.

The example below shows how this functionality can be used to protect ``inc()``
and a new ``merge()`` function that acquires two simultaneous locks.

.. code-block:: cpp

   struct Counter {
       int value = 0;

       void inc() { value++; }
       void merge(Counter &other) {
           value += other.value;
           other.value = 0;
       }
   };

   nb::class_<Counter>(m, "Counter")
       .def("inc", &Counter::inc, nb::lock_self())
       .def("merge", &Counter::merge, nb::lock_self(), "other"_a.lock())
       .def_ro("value", &Counter::value);

The above solution has an obvious drawback: it only protects *bindings* (i.e.,
transitions from Python to C++). For example, if some other part of a C++
codebase calls ``merge()`` directly, the binding layer won't be involved, and
no locking takes place. If such behavior can introduce race conditions, a
larger-scale redesign of your project may be in order.

.. note::

   Adding locking annotations indiscriminately is inadvisable because locked
   calls are more costly than unlocked ones. The :cpp:func:`.lock()
   <arg::lock>` and :cpp:struct:`nb::lock_self() <lock_self>` annotations are
   ignored in GIL-protected builds, hence this added cost only applies to
   free-threaded extensions.

   Furthermore, when adding locking annotations to a function, consider keeping
   the arguments *unnamed* (i.e., :cpp:func:`nb::arg().lock() <arg::lock>`
   instead of :cpp:func:`nb::arg("name").lock() <arg::lock>`) if the function
   will never be called with keyword arguments. Processing named arguments
   causes small :ref:`binding overheads <binding-overheads>` that may be
   undesirable if a function that does very little is called at a very high
   rate.

.. note::

   **Python API and locking**: When the lock-protected function performs Python
   API calls (e.g., using :ref:`wrappers <wrappers>` like :cpp:class:`nb::dict
   <dict>`), Python may temporarily release locks to avoid deadlocks. Here,
   even basic reference counting such as a :cpp:class:`nb::object
   <object>` variable expiring at the end of a scope counts as an API call.

   These locks will be reacquired following the Python API call. This behavior
   resembles ordinary (GIL-protected) Python code, where operations like
   `Py_DECREF()
   <https://docs.python.org/3/c-api/refcounting.html#c.Py_DECREF>`__ can cause
   cause arbitrary Python code to execute. The semantics of this kind of
   relaxed critical section are described in the `Python documentation
   <https://docs.python.org/3.13/c-api/init.html#python-critical-section-api>`__.

Miscellaneous notes
-------------------

API
---

The following API specific to free-threading has been added:

- :cpp:class:`nb::ft_mutex <ft_mutex>`
- :cpp:class:`nb::ft_lock_guard <ft_lock_guard>`
- :cpp:class:`nb::ft_object_guard <ft_object_guard>`
- :cpp:class:`nb::ft_object2_guard <ft_object2_guard>`
- :cpp:func:`nb::arg::lock() <arg::lock>`

API stability
_____________

The interface explained in this is excluded from the project's semantic
versioning policy. Free-threading is still experimental, and API breaks may be
necessary based on future experience and changes in Python itself.

Wrappers
________

:ref:`Wrapper types <wrappers>` like :cpp:class:`nb::list <list>` may be used
in multi-threaded code. Operations like :cpp:func:`nb::list::append()
<list::append>` internally acquire locks and behave just like their ordinary
Python counterparts. This means that race conditions can still occur without
larger-scale synchronization, but such races won't jeopardize the memory safety
of the program.

GIL scope guards
________________

Prior to free-threaded Python, the nanobind scope guards
:cpp:struct:`gil_scoped_acquire` and :cpp:struct:`gil_scoped_release` would
normally be used to acquire/release the GIL and enable parallel regions.

These remain useful and should not be removed from existing code: while no
longer blocking operations, they set and unset the current Python thread
context and inform the garbage collector.

The :cpp:struct:`gil_scoped_release` RAII scope guard class plays a special
role in free-threaded builds, since it releases all :ref:`argument locks
<argument-locks>` held by the current thread.

Immortalization
_______________

Python relies on a technique called *reference counting* to determine when an
object is no longer needed. This approach can become a bottleneck in
multi-threaded programs, since increasing and decreasing reference counts
requires coordination among multiple processor cores. Python type and function
objects are especially sensitive, since their reference counts change at a very
high rate.

Similar to free-threaded Python itself,  nanobind avoids this bottleneck by
*immortalizing* functions (``nanobind.nb_func``, ``nanobind.nb_method``) and
type bindings. Immortal objects don't require reference counting and therefore
cannot cause the bottleneck mentioned above. The main downside of this approach
is that these objects leak when the interpreter shuts down. Free-threaded
nanobind extensions disable the internal :ref:`leak checker <refleaks>`,
since it would produce many warning messages caused by immortal objects.

Internal data structures
________________________

Nanobind maintains various internal data structures that store information
about instances and function/type bindings. These data structures also play an
important role to exchange type/instance data in larger projects that are split
across several independent extension modules.

The layout of these data structures differs between ordinary and free-threaded
extensions, therefore nanobind isolates them from each other by assigning a
different ABI version tag. This means that multi-module projects will need
to consistently compile either free-threaded or non-free-threaded modules.

Free-threaded nanobind uses thread-local and sharded data structures to avoid
lock and atomic contention on the internal data structures, which would
otherwise become a bottleneck in multi-threaded Python programs.

Thread sanitizers
_________________

The `thread sanitizer
<https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual>`__ (TSAN)
offers an effective way of tracking down undefined behavior in multithreaded
application.

To use TSAN with nanonbind extensions, you *must* also create a custom Python
build that has TSAN enabled. This is because nanobind internally builds on
Python locks. If the implementation of the locks is not instrumented by TSAN,
the tool will detect a large volume of false positives.

To make a TSAN-instrumented Python build, download a Python source release and
to pass the following options to its ``configure`` script:

.. code-block:: bash

   $ ./configure --disable-gil --with-thread-sanitizer <.. other options ..>