1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
|
.. _arch-numba-runtime:
======================
Notes on Numba Runtime
======================
The *Numba Runtime (NRT)* provides the language runtime to the *nopython mode*
Python subset. NRT is a standalone C library with a Python binding. This
allows :term:`NPM` runtime feature to be used without the GIL. Currently, the
only language feature implemented in NRT is memory management.
Memory Management
=================
NRT implements memory management for :term:`NPM` code. It uses *atomic
reference count* for threadsafe, deterministic memory management. NRT maintains
a separate ``MemInfo`` structure for storing information about each allocation.
Cooperating with CPython
------------------------
For NRT to cooperate with CPython, the NRT python binding provides adaptors for
converting python objects that export a memory region. When such an
object is used as an argument to a :term:`NPM` function, a new ``MemInfo`` is
created and it acquires a reference to the Python object. When a :term:`NPM`
value is returned to the Python interpreter, the associated ``MemInfo``
(if any) is checked. If the ``MemInfo`` references a Python object, the
underlying Python object is released and returned instead. Otherwise, the
``MemInfo`` is wrapped in a Python object and returned. Additional process
maybe required depending on the type.
The current implementation supports Numpy array and any buffer-exporting types.
Compiler-side Cooperation
-------------------------
NRT reference counting requires the compiler to emit incref/decref operations
according to the usage. When the reference count drops to zero, the compiler
must call the destructor routine in NRT.
.. _nrt-refct-opt-pass:
Optimizations
-------------
The compiler is allowed to emit incref/decref operations naively. It relies
on an optimization pass to remove redundant reference count operations.
A new optimization pass is implemented in version 0.52.0 to remove reference
count operations that fall into the following four categories of control-flow
structure---per basic-block, diamond, fanout, fanout+raise. See the documentation
for :envvar:`NUMBA_LLVM_REFPRUNE_FLAGS` for their descriptions.
The old optimization pass runs at block level to avoid control flow analysis.
It depends on LLVM function optimization pass to simplify the control flow,
stack-to-register, and simplify instructions. It works by matching and
removing incref and decref pairs within each block. The old pass can be
enabled by setting :envvar:`NUMBA_LLVM_REFPRUNE_PASS` to `0`.
Important assumptions
---------------------
Both the old (pre-0.52.0) and the new (post-0.52.0) optimization passes assume
that the only function that can consume a reference is ``NRT_decref``.
It is important that there are no other functions that will consume references.
Since the passes operate on LLVM IR, the "functions" here are referring to any
callee in a LLVM call instruction.
To summarize, all functions exposed to the refcount optimization pass
**must not** consume counted references unless done so via ``NRT_decref``.
Quirks of the old optimization pass
-----------------------------------
Since the pre-0.52.0 `refcount optimization pass <nrt-refct-opt-pass_>`_
requires the LLVM function optimization pass, the pass works on the LLVM IR as
text. The optimized IR is then materialized again as a new LLVM in-memory
bitcode object.
Debugging Leaks
---------------
To debug reference leaks in NRT MemInfo, each MemInfo python object has a
``.refcount`` attribute for inspection. To get the MemInfo from a ndarray
allocated by NRT, use the ``.base`` attribute.
To debug memory leaks in NRT, the ``numba.core.runtime.rtsys`` defines
``.get_allocation_stats()``. It returns a namedtuple containing the
number of allocation and deallocation since the start of the program.
Checking that the allocation and deallocation counters are matching is the
simplest way to know if the NRT is leaking.
Debugging Leaks in C
--------------------
The start of `numba/core/runtime/nrt.h
<https://github.com/numba/numba/blob/main/numba/core/runtime/nrt.h>`_
has these lines:
.. code-block:: C
/* Debugging facilities - enabled at compile-time */
/* #undef NDEBUG */
#if 0
# define NRT_Debug(X) X
#else
# define NRT_Debug(X) if (0) { X; }
#endif
Undefining NDEBUG (uncomment the ``#undef NDEBUG`` line) enables the assertion
check in NRT.
Enabling the NRT_Debug (replace ``#if 0`` with ``#if 1``) turns on
debug print inside NRT.
Recursion Support
=================
During the compilation of a pair of mutually recursive functions, one of the
functions will contain unresolved symbol references since the compiler handles
one function at a time. The memory for the unresolved symbols is allocated and
initialized to the address of the *unresolved symbol abort* function
(``nrt_unresolved_abort``) just before the machine code is
generated by LLVM. These symbols are tracked and resolved as new functions are
compiled. If a bug prevents the resolution of these symbols,
the abort function will be called, raising a ``RuntimeError`` exception.
The *unresolved symbol abort* function is defined in the NRT with a zero-argument
signature. The caller is safe to call it with arbitrary number of
arguments. Therefore, it is safe to be used inplace of the intended callee.
Using the NRT from C code
=========================
Externally compiled C code should use the ``NRT_api_functions`` struct as a
function table to access the NRT API. The struct is defined in
:ghfile:`numba/core/runtime/nrt_external.h`. Users can use the utility function
``numba.extending.include_path()`` to determine the include directory for
Numba provided C headers.
.. literalinclude:: ../../../numba/core/runtime/nrt_external.h
:language: C
:caption: `numba/core/runtime/nrt_external.h`
Inside Numba compiled code, the ``numba.core.unsafe.nrt.NRT_get_api()``
intrinsic can be used to obtain a pointer to the ``NRT_api_functions``.
Here is an example that uses the ``nrt_external.h``:
.. code-block:: C
#include <stdio.h>
#include "numba/core/runtime/nrt_external.h"
void my_dtor(void *ptr) {
free(ptr);
}
NRT_MemInfo* my_allocate(NRT_api_functions *nrt) {
/* heap allocate some memory */
void * data = malloc(10);
/* wrap the allocated memory; yield a new reference */
NRT_MemInfo *mi = nrt->manage_memory(data, my_dtor);
/* acquire reference */
nrt->acquire(mi);
/* release reference */
nrt->release(mi);
return mi;
}
It is important to ensure that the NRT is initialized prior to making calls to
it, calling ``numba.core.runtime.nrt.rtsys.initialize(context)`` from Python
will have the desired effect. Similarly the code snippet:
.. code-block:: Python
from numba.core.registry import cpu_target # Get the CPU target singleton
cpu_target.target_context # Access the target_context property to initialize
will achieve the same specifically for Numba's CPU target (the default). Failure
to initialize the NRT will result in access violations as function pointers for
various internal atomic operations will be missing in the ``NRT_MemSys`` struct.
Future Plan
===========
The plan for NRT is to make a standalone shared library that can be linked to
Numba compiled code, including use within the Python interpreter and without
the Python interpreter. To make that work, we will be doing some refactoring:
* numba :term:`NPM` code references statically compiled code in "helperlib.c".
Those functions should be moved to NRT.
|