1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485
|
.. _refleaks:
.. cpp:namespace:: nanobind
Reference leaks
===============
When the Python interpreter shuts down, nanobind may generate warnings similar
to the following:
.. code-block:: text
nanobind: leaked 1 instances!
- leaked instance 0x102123728 of type "my_ext.MyClass"
nanobind: leaked 1 types!
- leaked type "my_ext.MyClass"
nanobind: leaked 1 functions!
- leaked function "__init__"
nanobind: this is likely caused by a reference counting issue in the binding code.
See https://nanobind.readthedocs.io/en/latest/refleaks.html
Reference leaks are the most frequently reported issue in this project—please
read this page carefully before opening a bug report.
If you are a user of an extension experiencing this issue (e.g., ``my_ext`` in the
example above), **do not open a nanobind issue**. Instead, inform the extension
maintainers and direct them to this page.
Why are these warnings generated?
---------------------------------
nanobind registers a callback that runs once Python has fully shut down. If any
nanobind-created instances, functions, or types still exist at this point,
something has gone wrong—they should have been automatically deleted by
Python's garbage collector.
As Python objects can reference significant amounts of memory (e.g.,
large CPU or GPU tensors), an inability to delete them is potentially very bad.
Although leaks aren't always a serious problem, the decision was made to have
nanobind complain noisily about their presence to to encourage early detection
and resolution. Other binding tools do not report leaks, allowing them to
accumulate unnoticed until they cause serious problems.
Disabling leak warnings
-----------------------
Ignorance is bliss. If you prefer to simply not see these messages, you can
easily disable them by calling :cpp:func:`nb::set_leak_warnings()
<set_leak_warnings>` in your binding code:
.. code-block:: cpp
NB_MODULE(my_ext, m) {
nb::set_leak_warnings(false);
// ...
}
Note that is a *global flag* shared by all nanobind extension libraries in the
same ABI domain. When changing global flags, please isolate your extension from
others by passing the ``NB_DOMAIN`` parameter to the
:cmake:command:`nanobind_add_module()` CMake command:
.. code-block:: cmake
nanobind_add_module(
my_module
NB_DOMAIN my_abi_domain
extension_file.cpp)
Some projects choose to activate leak warnings for internal builds but disable
them for wheels uploaded to PyPI, as they can be confusing to end users.
Reference counting fundamentals
-------------------------------
Let's begin with some background material to understand the cause of these warnings.
Each Python object tracks its *reference count*—the number of places it is
used. When this count reaches zero, the object is automatically deallocated.
This mechanism is simple and efficient but it fails in the presence of
reference cycles, where objects indirectly references themselves (e.g., A → B → C →
A). In this case, the reference count never reaches zero, and reference
counting alone does not suffice to free the cycle (even if ``A``, ``B``, and
``C`` are not used anywhere else).
In such cases, Python's cyclic *garbage collector* must step in. The garbage
collector periodically sweeps through all Python objects to find and break up
any cycles that are not referenced by other objects. To do its job, it must
know how objects are connected to each other.
For pure Python code, this works seamlessly. Consider the following snippet:
.. code-block:: python
l = [] # 'l' ref. count = 1
l.append(l) # 'l' ref. count = 2
del l # 'l' ref. count = 1
Following the last line, the reference count of ``l`` remains ``1`` due to the
self-reference. Python's garbage collector will eventually visit the list and
its elements, identify the cycle, and delete it.
Sources of reference leaks
--------------------------
Under-defined types impede Python's ability to detect cycles, which can causes
leaks. However, user-defined types alone aren't enough---a specific mixture of
ingredients is needed to cause leaks. The following subsections review several
troublesome constructions.
Class members
^^^^^^^^^^^^^
Consider this nanobind extension:
.. code-block:: cpp
#include <nanobind/nanobind.h>
namespace nb = nanobind;
struct Wrapper { nb::object value; };
NB_MODULE(my_ext, m) {
nb::class_<Wrapper>(m, "Wrapper")
.def(nb::init<>())
.def_rw("value", &Wrapper::value);
}
Now, run the following Python code.
.. code-block:: pycon
>>> import my_ext
>>> w = my_ext.Wrapper()
>>> w.value = w
This triggers a leak warning:
.. code-block:: text
nanobind: leaked 1 instances!
- leaked instance 0x104d63728 of type "my_ext.Wrapper"
nanobind: leaked 1 types!
- leaked type "my_ext.Wrapper"
nanobind: leaked 3 functions!
- leaked function ""
- leaked function ""
- leaked function "__init__"
This resembles the previous example with a self-referential list,
except that a user-defined ``Wrapper`` type is now used instead.
The first message ("*leaked instance*") warns that a Python object of type
``Wrapper`` was not freed during the Python interpreter shutdown. This instance
in turn references other objects, which also become part of the leak:
- ``w`` implicitly references the underlying type object ``my_ext.Wrapper``.
- ``my_ext.Wrapper`` references several methods: ``__init__``, and anonymous
setter/getter functions.
The root of the problem here is that Python lacks the ability to peek inside
the C++ ``Wrapper`` class to examine its connectivity. Therefore, it cannot
detect and free the cycle.
The fact that we are storing a ``nb::object`` in the C++ instance is
irrelevant---the same issue would have occurred when using
``std::shared_ptr<Wrapper>`` or an intrusively reference-counted object.
Function objects
^^^^^^^^^^^^^^^^
Functions are often a source of reference cycles. Let's reuse the earlier
example but instead assign a local function ``g`` to ``w.value``.
.. code-block:: pycon
>>> def f():
... w = my_ext.Wrapper()
... def g():
... return w
... w.value = g
...
>>> f()
This code behaves very badly: every call to ``f()`` will leak an uncollectable cycle.
The local function ``g()`` is a `function closure
<https://en.wikipedia.org/wiki/Closure_(computer_programming)>`_. That is to
say, besides being a function, it additionally captures variable state, in this
case the variable ``w``. This creates an inter-language ``Wrapper`` →
``function`` → ``Wrapper`` cycle.
Here is another tricky case: let's move the code back to the top level and
create a dummy function that doesn't reference anything.
.. code-block:: python
>>> def f():
... pass
...
>>> w = my_ext.Wrapper()
>>> w.value = f
Given that the function is now empty, we may be tempted to assume that this
should fix the leak. However, this intuition is incorrect:
.. code-block:: text
nanobind: leaked 1 instances!
- leaked instance 0x104d63728 of type "my_ext.Wrapper"
nanobind: leaked 1 types!
- leaked type "my_ext.Wrapper"
nanobind: leaked 3 functions!
- leaked function ""
- leaked function ""
- leaked function "__init__"
The reference cycle consists of:
- ``w`` (``Wrapper`` instance) → ``f`` (Python function object).
- ``f`` (Python function object) → ``globals()``.
- ``globals()`` → ``w`` (``Wrapper`` instance).
Functions *implicitly* depend on the global module namespace, which in turn
associates the name ``w`` with the instance. Reference leaks involving globals
can be especially noisy because they can pull in thousands of other objects
that dangle from the uncollectable cycle.
Modifying ``Wrapper`` so that it uses an STL function object does not help.
.. code-block:: cpp
#include <nanobind/stl/functional.h>
struct Wrapper {
std::function<void()> value;
};
This produces same cycle, just with more layers of indirection:
- ``w`` → ``std::function<void()>`` instance
- ``std::function<void()>`` instance → nanobind function dispatch object
- nanobind function dispatch object → ``f``.
- ``f`` → ``globals()``.
- ``globals()`` → ``w``.
It is easy to encounter such cycles when binding C++ classes with callbacks
that invoke Python functions. An example would be a button class in a GUI
framework that allows the user to assign a button press handler.
Default arguments
^^^^^^^^^^^^^^^^^
Here is another subtle case, where the ``Wrapper`` constructor was modified
to set a default argument.
.. code-block:: cpp
struct Wrapper { nb::object value; };
NB_MODULE(my_ext, m) {
nb::class_<Wrapper>(m, "Wrapper")
.def(nb::init<Wrapper>() = Wrapper());
}
Now, we *don't even need to use* the ``Wrapper`` type.
.. code-block:: python
import my_ext
Its mere presence produces a leak:
.. code-block:: text
nanobind: leaked 1 instances!
- leaked instance 0x1035fbb68 of type "my_ext.Wrapper"
nanobind: leaked 1 types!
- leaked type "my_ext.Wrapper"
nanobind: leaked 1 functions!
- leaked function "__init__"
The reference cycle here is as follows:
- ``my_ext.Wrapper`` type → ``my_ext.Wrapper.__init__`` function
- ``my_ext.Wrapper.__init__`` function → ``my_ext.Wrapper`` instance (the constructed default argument)
- ``my_ext.Wrapper`` instance → ``my_ext.Wrapper`` type (instances implictly reference their type)
Default arguments in general are harmless. However, default arguments that
introduce cycles between instance and type objects can cause uncollectable cycles.
.. _fixing_refleaks:
Fixing reference leaks
----------------------
As the above examples hopefully demonstrate, this can be quite the
minefield---and these were "easy" cycles with only only a few hops. In
practice, leaks can be significantly more complex.
For this reason, it is recommended that you *immediately* investigate and
squash leaks when they occur, especially while things are still under control
(i.e., when there is only a single source of leaks). Start by building your
extension in debug mode, in which case Dr.Jit will exhaustively print warnings
about all leaked instances/type.
Look at the listed types and think about what objects they reference directly
or indirectly. C++ code that stores Python functions (i.e., callbacks) is
especially suspect, since functions can implicitly depend on globals and other
state through theyr closure object. Does a simple ``import`` statement suffice to
cause leaks? This might implicate default function arguments.
Once you have identified a type binding as likely culprit, you must tell Python
how to traverse instances of this type to break cycles. nanobind provides no
abstractions for this at the moment. You must drop down to the CPython API
level and declare two callbacks (referred to as *type slots*):
- ``tp_traverse``: Python's GC will call this function to discover references
of user-defined types.
- ``tp_clear``: Python's GC will call this function to break collectable cycles.
In particular, *all* types in the cycle must implement the ``tp_traverse``
*type slot*, and *at least one* of them must implement the ``tp_clear`` type
slot.
Here is an example of the required code for a ``Wrapper`` type:
.. code-block:: cpp
struct Wrapper { std::shared_ptr<Wrapper> value; };
int wrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
// On Python 3.9+, we must traverse the implicit dependency
// of an object on its associated type object.
#if PY_VERSION_HEX >= 0x03090000
Py_VISIT(Py_TYPE(self));
#endif
// The tp_traverse method may be called after __new__ but before or during
// __init__, before the C++ constructor has been completed. We must not
// inspect the C++ state if the constructor has not yet completed.
if (!nb::inst_ready(self)) {
return 0;
}
// Get the C++ object associated with 'self' (this always succeeds)
Wrapper *w = nb::inst_ptr<Wrapper>(self);
// If w->value has an associated Python object, return it.
// If not, value.ptr() will equal NULL, which is also fine.
nb::handle value = nb::find(w->value);
// Inform the Python GC about the instance
Py_VISIT(value.ptr());
return 0;
}
int wrapper_tp_clear(PyObject *self) {
// Get the C++ object associated with 'self' (this always succeeds)
Wrapper *w = nb::inst_ptr<Wrapper>(self);
// Break the reference cycle!
w->value = {};
return 0;
}
// Table of custom type slots we want to install
PyType_Slot wrapper_slots[] = {
{ Py_tp_traverse, (void *) wrapper_tp_traverse },
{ Py_tp_clear, (void *) wrapper_tp_clear },
{ 0, 0 }
};
The types ``visitproc``, ``PyType_Slot``, and macro ``Py_VISIT()`` are part of
the Python C API.
The expression :cpp:func:`nb::inst_ptr\<Wrapper\>(self) <inst_ptr>` efficiently
returns the C++ instance associated with a Python object and is explained in
the documentation about nanobind's :cpp:ref:`low level interface <lowlevel>`.
Note the use of the :cpp:func:`nb::find() <find>` function, which behaves like
:cpp:func:`nb::cast() <cast>` by returning the Python object associated with a
C++ instance. The main difference is that :cpp:func:`nb::cast() <cast>` will
create the Python object if it doesn't exist, while :cpp:func:`nb::find()
<find>` returns a ``nullptr`` object in that case. When given a
``std::function<>`` instance, :cpp:func:`nb::find() <find>` retrieves the
associated Python ``function`` object (if present), which means that the
``wrapper_tp_traverse()`` works for all of the examples shown in this
documentation section.
To activate this machinery, the ``Wrapper`` type bindings must be made aware of
these extra type slots via :cpp:class:`nb::type_slots <type_slots>`:
.. code-block:: cpp
nb::class_<Wrapper>(m, "Wrapper", nb::type_slots(slots))
With this change, the cycle can be garbage-collected, and the leak warnings
disappear.
.. note::
When targeting free-threaded Python, it is important that the ``tp_traverse``
callback does not hold additional references to the objects being traversed.
A previous version of this documentation page suggested the following
.. code-block:: cpp
nb::object value = nb::find(w->value);
Py_VISIT(value.ptr());
However, these now have to change to
.. code-block:: cpp
nb::handle value = nb::find(w->value);
Py_VISIT(value.ptr());
Additional sources of leaks
---------------------------
In most of cases, leaks are caused by cycles, and the text above explains
how deal with them. For completeness, let's consider some other possibilities.
- **Reference counting bugs**. If you write raw Python C API code or use the
nanobind wrappers including functions like ``Py_[X]INCREF()``,
``Py_[X]DECREF()``, :cpp:func:`nb::steal() <steal>`, :cpp:func:`nb::borrow()
<borrow>`, :cpp:func:`.dec_ref() <detail::api::dec_ref>`,
:cpp:func:`.inc_ref() <detail::api::inc_ref>`
, etc., then incorrect
use of such calls can cause a reference to leak that prevents the associated
object from being deleted.
- **Interactions with other tools that leak references**. Python extension
libraries---especially *huge* ones with C library components like PyTorch,
Tensorflow, etc., have been observed to leak references to nanobind
objects.
Some of these frameworks cache JIT-compiled functions based on the arguments
with which they were called, and such caching schemes could leak references
to nanobind types if they aren't cleaned up by the responsible extensions
(this is a hypothesis). In this case, the leak would be benign---even so, it
should be fixed in the responsible framework so that leak warnings aren't
cluttered with flukes and can be more broadly useful.
- **Older Python versions**: Very old Python versions (e.g., 3.8) don't
do a good job cleaning up global references when the interpreter shuts down.
The following code may leak a reference if it is a top-level statement in a
Python file or the REPL.
.. code-block:: python
a = my_ext.MyObject()
Such a warning is benign and does not indicate an actual leak. It simply
highlights a flaws in the interpreter shutdown logic of old Python versions.
Wrap your code into a function to address this issue even on such versions:
.. code-block:: python
def run():
a = my_ext.MyObject()
# ...
if __name__ == '__main__':
run()
- **Exceptions**. Some exceptions such as ``AttributeError`` have been observed
to hold references, e.g. to the object which lacked the desired attribute. If
the last exception raised by the program references a nanobind instance, then
this may be reported as a leak since Python finalization appears not to
release the exception object. See `issue #376
<https://github.com/wjakob/nanobind/issues/376>`__ for a discussion.
|