1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894
|
.. highlight:: c
.. _defining-new-types:
**********************************
Defining Extension Types: Tutorial
**********************************
.. sectionauthor:: Michael Hudson <mwh@python.net>
.. sectionauthor:: Dave Kuhlman <dkuhlman@rexx.com>
.. sectionauthor:: Jim Fulton <jim@zope.com>
Python allows the writer of a C extension module to define new types that
can be manipulated from Python code, much like the built-in :class:`str`
and :class:`list` types. The code for all extension types follows a
pattern, but there are some details that you need to understand before you
can get started. This document is a gentle introduction to the topic.
.. _dnt-basics:
The Basics
==========
The :term:`CPython` runtime sees all Python objects as variables of type
:c:expr:`PyObject*`, which serves as a "base type" for all Python objects.
The :c:type:`PyObject` structure itself only contains the object's
:term:`reference count` and a pointer to the object's "type object".
This is where the action is; the type object determines which (C) functions
get called by the interpreter when, for instance, an attribute gets looked up
on an object, a method called, or it is multiplied by another object. These
C functions are called "type methods".
So, if you want to define a new extension type, you need to create a new type
object.
This sort of thing can only be explained by example, so here's a minimal, but
complete, module that defines a new type named :class:`!Custom` inside a C
extension module :mod:`!custom`:
.. note::
What we're showing here is the traditional way of defining *static*
extension types. It should be adequate for most uses. The C API also
allows defining heap-allocated extension types using the
:c:func:`PyType_FromSpec` function, which isn't covered in this tutorial.
.. literalinclude:: ../includes/newtypes/custom.c
Now that's quite a bit to take in at once, but hopefully bits will seem familiar
from the previous chapter. This file defines three things:
#. What a :class:`!Custom` **object** contains: this is the ``CustomObject``
struct, which is allocated once for each :class:`!Custom` instance.
#. How the :class:`!Custom` **type** behaves: this is the ``CustomType`` struct,
which defines a set of flags and function pointers that the interpreter
inspects when specific operations are requested.
#. How to define and execute the :mod:`!custom` module: this is the
``PyInit_custom`` function and the associated ``custom_module`` struct for
defining the module, and the ``custom_module_exec`` function to set up
a fresh module object.
The first bit is::
typedef struct {
PyObject_HEAD
} CustomObject;
This is what a Custom object will contain. ``PyObject_HEAD`` is mandatory
at the start of each object struct and defines a field called ``ob_base``
of type :c:type:`PyObject`, containing a pointer to a type object and a
reference count (these can be accessed using the macros :c:macro:`Py_TYPE`
and :c:macro:`Py_REFCNT` respectively). The reason for the macro is to
abstract away the layout and to enable additional fields in :ref:`debug builds
<debug-build>`.
.. note::
There is no semicolon above after the :c:macro:`PyObject_HEAD` macro.
Be wary of adding one by accident: some compilers will complain.
Of course, objects generally store additional data besides the standard
``PyObject_HEAD`` boilerplate; for example, here is the definition for
standard Python floats::
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
The second bit is the definition of the type object. ::
static PyTypeObject CustomType = {
.ob_base = PyVarObject_HEAD_INIT(NULL, 0)
.tp_name = "custom.Custom",
.tp_doc = PyDoc_STR("Custom objects"),
.tp_basicsize = sizeof(CustomObject),
.tp_itemsize = 0,
.tp_flags = Py_TPFLAGS_DEFAULT,
.tp_new = PyType_GenericNew,
};
.. note::
We recommend using C99-style designated initializers as above, to
avoid listing all the :c:type:`PyTypeObject` fields that you don't care
about and also to avoid caring about the fields' declaration order.
The actual definition of :c:type:`PyTypeObject` in :file:`object.h` has
many more :ref:`fields <type-structs>` than the definition above. The
remaining fields will be filled with zeros by the C compiler, and it's
common practice to not specify them explicitly unless you need them.
We're going to pick it apart, one field at a time::
.ob_base = PyVarObject_HEAD_INIT(NULL, 0)
This line is mandatory boilerplate to initialize the ``ob_base``
field mentioned above. ::
.tp_name = "custom.Custom",
The name of our type. This will appear in the default textual representation of
our objects and in some error messages, for example:
.. code-block:: pycon
>>> "" + custom.Custom()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "custom.Custom") to str
Note that the name is a dotted name that includes both the module name and the
name of the type within the module. The module in this case is :mod:`!custom` and
the type is :class:`!Custom`, so we set the type name to :class:`!custom.Custom`.
Using the real dotted import path is important to make your type compatible
with the :mod:`pydoc` and :mod:`pickle` modules. ::
.tp_basicsize = sizeof(CustomObject),
.tp_itemsize = 0,
This is so that Python knows how much memory to allocate when creating
new :class:`!Custom` instances. :c:member:`~PyTypeObject.tp_itemsize` is
only used for variable-sized objects and should otherwise be zero.
.. note::
If you want your type to be subclassable from Python, and your type has the same
:c:member:`~PyTypeObject.tp_basicsize` as its base type, you may have problems with multiple
inheritance. A Python subclass of your type will have to list your type first
in its :attr:`~type.__bases__`, or else it will not be able to call your type's
:meth:`~object.__new__` method without getting an error. You can avoid this problem by
ensuring that your type has a larger value for :c:member:`~PyTypeObject.tp_basicsize` than its
base type does. Most of the time, this will be true anyway, because either your
base type will be :class:`object`, or else you will be adding data members to
your base type, and therefore increasing its size.
We set the class flags to :c:macro:`Py_TPFLAGS_DEFAULT`. ::
.tp_flags = Py_TPFLAGS_DEFAULT,
All types should include this constant in their flags. It enables all of the
members defined until at least Python 3.3. If you need further members,
you will need to OR the corresponding flags.
We provide a doc string for the type in :c:member:`~PyTypeObject.tp_doc`. ::
.tp_doc = PyDoc_STR("Custom objects"),
To enable object creation, we have to provide a :c:member:`~PyTypeObject.tp_new`
handler. This is the equivalent of the Python method :meth:`~object.__new__`, but
has to be specified explicitly. In this case, we can just use the default
implementation provided by the API function :c:func:`PyType_GenericNew`. ::
.tp_new = PyType_GenericNew,
Everything else in the file should be familiar, except for some code in
:c:func:`!custom_module_exec`::
if (PyType_Ready(&CustomType) < 0) {
return -1;
}
This initializes the :class:`!Custom` type, filling in a number of members
to the appropriate default values, including :c:member:`~PyObject.ob_type` that we initially
set to ``NULL``. ::
if (PyModule_AddObjectRef(m, "Custom", (PyObject *) &CustomType) < 0) {
return -1;
}
This adds the type to the module dictionary. This allows us to create
:class:`!Custom` instances by calling the :class:`!Custom` class:
.. code-block:: pycon
>>> import custom
>>> mycustom = custom.Custom()
That's it! All that remains is to build it; put the above code in a file called
:file:`custom.c`,
.. literalinclude:: ../includes/newtypes/pyproject.toml
in a file called :file:`pyproject.toml`, and
.. code-block:: python
from setuptools import Extension, setup
setup(ext_modules=[Extension("custom", ["custom.c"])])
in a file called :file:`setup.py`; then typing
.. code-block:: shell-session
$ python -m pip install .
in a shell should produce a file :file:`custom.so` in a subdirectory
and install it; now fire up Python --- you should be able to ``import custom``
and play around with ``Custom`` objects.
That wasn't so hard, was it?
Of course, the current Custom type is pretty uninteresting. It has no data and
doesn't do anything. It can't even be subclassed.
Adding data and methods to the Basic example
============================================
Let's extend the basic example to add some data and methods. Let's also make
the type usable as a base class. We'll create a new module, :mod:`!custom2` that
adds these capabilities:
.. literalinclude:: ../includes/newtypes/custom2.c
This version of the module has a number of changes.
The :class:`!Custom` type now has three data attributes in its C struct,
*first*, *last*, and *number*. The *first* and *last* variables are Python
strings containing first and last names. The *number* attribute is a C integer.
The object structure is updated accordingly::
typedef struct {
PyObject_HEAD
PyObject *first; /* first name */
PyObject *last; /* last name */
int number;
} CustomObject;
Because we now have data to manage, we have to be more careful about object
allocation and deallocation. At a minimum, we need a deallocation method::
static void
Custom_dealloc(CustomObject *self)
{
Py_XDECREF(self->first);
Py_XDECREF(self->last);
Py_TYPE(self)->tp_free((PyObject *) self);
}
which is assigned to the :c:member:`~PyTypeObject.tp_dealloc` member::
.tp_dealloc = (destructor) Custom_dealloc,
This method first clears the reference counts of the two Python attributes.
:c:func:`Py_XDECREF` correctly handles the case where its argument is
``NULL`` (which might happen here if ``tp_new`` failed midway). It then
calls the :c:member:`~PyTypeObject.tp_free` member of the object's type
(computed by ``Py_TYPE(self)``) to free the object's memory. Note that
the object's type might not be :class:`!CustomType`, because the object may
be an instance of a subclass.
.. note::
The explicit cast to ``destructor`` above is needed because we defined
``Custom_dealloc`` to take a ``CustomObject *`` argument, but the ``tp_dealloc``
function pointer expects to receive a ``PyObject *`` argument. Otherwise,
the compiler will emit a warning. This is object-oriented polymorphism,
in C!
We want to make sure that the first and last names are initialized to empty
strings, so we provide a ``tp_new`` implementation::
static PyObject *
Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
CustomObject *self;
self = (CustomObject *) type->tp_alloc(type, 0);
if (self != NULL) {
self->first = PyUnicode_FromString("");
if (self->first == NULL) {
Py_DECREF(self);
return NULL;
}
self->last = PyUnicode_FromString("");
if (self->last == NULL) {
Py_DECREF(self);
return NULL;
}
self->number = 0;
}
return (PyObject *) self;
}
and install it in the :c:member:`~PyTypeObject.tp_new` member::
.tp_new = Custom_new,
The ``tp_new`` handler is responsible for creating (as opposed to initializing)
objects of the type. It is exposed in Python as the :meth:`~object.__new__` method.
It is not required to define a ``tp_new`` member, and indeed many extension
types will simply reuse :c:func:`PyType_GenericNew` as done in the first
version of the :class:`!Custom` type above. In this case, we use the ``tp_new``
handler to initialize the ``first`` and ``last`` attributes to non-``NULL``
default values.
``tp_new`` is passed the type being instantiated (not necessarily ``CustomType``,
if a subclass is instantiated) and any arguments passed when the type was
called, and is expected to return the instance created. ``tp_new`` handlers
always accept positional and keyword arguments, but they often ignore the
arguments, leaving the argument handling to initializer (a.k.a. ``tp_init``
in C or ``__init__`` in Python) methods.
.. note::
``tp_new`` shouldn't call ``tp_init`` explicitly, as the interpreter
will do it itself.
The ``tp_new`` implementation calls the :c:member:`~PyTypeObject.tp_alloc`
slot to allocate memory::
self = (CustomObject *) type->tp_alloc(type, 0);
Since memory allocation may fail, we must check the :c:member:`~PyTypeObject.tp_alloc`
result against ``NULL`` before proceeding.
.. note::
We didn't fill the :c:member:`~PyTypeObject.tp_alloc` slot ourselves. Rather
:c:func:`PyType_Ready` fills it for us by inheriting it from our base class,
which is :class:`object` by default. Most types use the default allocation
strategy.
.. note::
If you are creating a co-operative :c:member:`~PyTypeObject.tp_new` (one
that calls a base type's :c:member:`~PyTypeObject.tp_new` or :meth:`~object.__new__`),
you must *not* try to determine what method to call using method resolution
order at runtime. Always statically determine what type you are going to
call, and call its :c:member:`~PyTypeObject.tp_new` directly, or via
``type->tp_base->tp_new``. If you do not do this, Python subclasses of your
type that also inherit from other Python-defined classes may not work correctly.
(Specifically, you may not be able to create instances of such subclasses
without getting a :exc:`TypeError`.)
We also define an initialization function which accepts arguments to provide
initial values for our instance::
static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
static char *kwlist[] = {"first", "last", "number", NULL};
PyObject *first = NULL, *last = NULL, *tmp;
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist,
&first, &last,
&self->number))
return -1;
if (first) {
tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_XDECREF(tmp);
}
if (last) {
tmp = self->last;
Py_INCREF(last);
self->last = last;
Py_XDECREF(tmp);
}
return 0;
}
by filling the :c:member:`~PyTypeObject.tp_init` slot. ::
.tp_init = (initproc) Custom_init,
The :c:member:`~PyTypeObject.tp_init` slot is exposed in Python as the
:meth:`~object.__init__` method. It is used to initialize an object after it's
created. Initializers always accept positional and keyword arguments,
and they should return either ``0`` on success or ``-1`` on error.
Unlike the ``tp_new`` handler, there is no guarantee that ``tp_init``
is called at all (for example, the :mod:`pickle` module by default
doesn't call :meth:`~object.__init__` on unpickled instances). It can also be
called multiple times. Anyone can call the :meth:`!__init__` method on
our objects. For this reason, we have to be extra careful when assigning
the new attribute values. We might be tempted, for example to assign the
``first`` member like this::
if (first) {
Py_XDECREF(self->first);
Py_INCREF(first);
self->first = first;
}
But this would be risky. Our type doesn't restrict the type of the
``first`` member, so it could be any kind of object. It could have a
destructor that causes code to be executed that tries to access the
``first`` member; or that destructor could release the
:term:`Global interpreter Lock <GIL>` and let arbitrary code run in other
threads that accesses and modifies our object.
To be paranoid and protect ourselves against this possibility, we almost
always reassign members before decrementing their reference counts. When
don't we have to do this?
* when we absolutely know that the reference count is greater than 1;
* when we know that deallocation of the object [#]_ will neither release
the :term:`GIL` nor cause any calls back into our type's code;
* when decrementing a reference count in a :c:member:`~PyTypeObject.tp_dealloc`
handler on a type which doesn't support cyclic garbage collection [#]_.
We want to expose our instance variables as attributes. There are a
number of ways to do that. The simplest way is to define member definitions::
static PyMemberDef Custom_members[] = {
{"first", Py_T_OBJECT_EX, offsetof(CustomObject, first), 0,
"first name"},
{"last", Py_T_OBJECT_EX, offsetof(CustomObject, last), 0,
"last name"},
{"number", Py_T_INT, offsetof(CustomObject, number), 0,
"custom number"},
{NULL} /* Sentinel */
};
and put the definitions in the :c:member:`~PyTypeObject.tp_members` slot::
.tp_members = Custom_members,
Each member definition has a member name, type, offset, access flags and
documentation string. See the :ref:`Generic-Attribute-Management` section
below for details.
A disadvantage of this approach is that it doesn't provide a way to restrict the
types of objects that can be assigned to the Python attributes. We expect the
first and last names to be strings, but any Python objects can be assigned.
Further, the attributes can be deleted, setting the C pointers to ``NULL``. Even
though we can make sure the members are initialized to non-``NULL`` values, the
members can be set to ``NULL`` if the attributes are deleted.
We define a single method, :meth:`!Custom.name`, that outputs the objects name as the
concatenation of the first and last names. ::
static PyObject *
Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored))
{
if (self->first == NULL) {
PyErr_SetString(PyExc_AttributeError, "first");
return NULL;
}
if (self->last == NULL) {
PyErr_SetString(PyExc_AttributeError, "last");
return NULL;
}
return PyUnicode_FromFormat("%S %S", self->first, self->last);
}
The method is implemented as a C function that takes a :class:`!Custom` (or
:class:`!Custom` subclass) instance as the first argument. Methods always take an
instance as the first argument. Methods often take positional and keyword
arguments as well, but in this case we don't take any and don't need to accept
a positional argument tuple or keyword argument dictionary. This method is
equivalent to the Python method:
.. code-block:: python
def name(self):
return "%s %s" % (self.first, self.last)
Note that we have to check for the possibility that our :attr:`!first` and
:attr:`!last` members are ``NULL``. This is because they can be deleted, in which
case they are set to ``NULL``. It would be better to prevent deletion of these
attributes and to restrict the attribute values to be strings. We'll see how to
do that in the next section.
Now that we've defined the method, we need to create an array of method
definitions::
static PyMethodDef Custom_methods[] = {
{"name", (PyCFunction) Custom_name, METH_NOARGS,
"Return the name, combining the first and last name"
},
{NULL} /* Sentinel */
};
(note that we used the :c:macro:`METH_NOARGS` flag to indicate that the method
is expecting no arguments other than *self*)
and assign it to the :c:member:`~PyTypeObject.tp_methods` slot::
.tp_methods = Custom_methods,
Finally, we'll make our type usable as a base class for subclassing. We've
written our methods carefully so far so that they don't make any assumptions
about the type of the object being created or used, so all we need to do is
to add the :c:macro:`Py_TPFLAGS_BASETYPE` to our class flag definition::
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
We rename :c:func:`!PyInit_custom` to :c:func:`!PyInit_custom2`, update the
module name in the :c:type:`PyModuleDef` struct, and update the full class
name in the :c:type:`PyTypeObject` struct.
Finally, we update our :file:`setup.py` file to include the new module,
.. code-block:: python
from setuptools import Extension, setup
setup(ext_modules=[
Extension("custom", ["custom.c"]),
Extension("custom2", ["custom2.c"]),
])
and then we re-install so that we can ``import custom2``:
.. code-block:: shell-session
$ python -m pip install .
Providing finer control over data attributes
============================================
In this section, we'll provide finer control over how the :attr:`!first` and
:attr:`!last` attributes are set in the :class:`!Custom` example. In the previous
version of our module, the instance variables :attr:`!first` and :attr:`!last`
could be set to non-string values or even deleted. We want to make sure that
these attributes always contain strings.
.. literalinclude:: ../includes/newtypes/custom3.c
To provide greater control, over the :attr:`!first` and :attr:`!last` attributes,
we'll use custom getter and setter functions. Here are the functions for
getting and setting the :attr:`!first` attribute::
static PyObject *
Custom_getfirst(CustomObject *self, void *closure)
{
Py_INCREF(self->first);
return self->first;
}
static int
Custom_setfirst(CustomObject *self, PyObject *value, void *closure)
{
PyObject *tmp;
if (value == NULL) {
PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute");
return -1;
}
if (!PyUnicode_Check(value)) {
PyErr_SetString(PyExc_TypeError,
"The first attribute value must be a string");
return -1;
}
tmp = self->first;
Py_INCREF(value);
self->first = value;
Py_DECREF(tmp);
return 0;
}
The getter function is passed a :class:`!Custom` object and a "closure", which is
a void pointer. In this case, the closure is ignored. (The closure supports an
advanced usage in which definition data is passed to the getter and setter. This
could, for example, be used to allow a single set of getter and setter functions
that decide the attribute to get or set based on data in the closure.)
The setter function is passed the :class:`!Custom` object, the new value, and the
closure. The new value may be ``NULL``, in which case the attribute is being
deleted. In our setter, we raise an error if the attribute is deleted or if its
new value is not a string.
We create an array of :c:type:`PyGetSetDef` structures::
static PyGetSetDef Custom_getsetters[] = {
{"first", (getter) Custom_getfirst, (setter) Custom_setfirst,
"first name", NULL},
{"last", (getter) Custom_getlast, (setter) Custom_setlast,
"last name", NULL},
{NULL} /* Sentinel */
};
and register it in the :c:member:`~PyTypeObject.tp_getset` slot::
.tp_getset = Custom_getsetters,
The last item in a :c:type:`PyGetSetDef` structure is the "closure" mentioned
above. In this case, we aren't using a closure, so we just pass ``NULL``.
We also remove the member definitions for these attributes::
static PyMemberDef Custom_members[] = {
{"number", Py_T_INT, offsetof(CustomObject, number), 0,
"custom number"},
{NULL} /* Sentinel */
};
We also need to update the :c:member:`~PyTypeObject.tp_init` handler to only
allow strings [#]_ to be passed::
static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
static char *kwlist[] = {"first", "last", "number", NULL};
PyObject *first = NULL, *last = NULL, *tmp;
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist,
&first, &last,
&self->number))
return -1;
if (first) {
tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_DECREF(tmp);
}
if (last) {
tmp = self->last;
Py_INCREF(last);
self->last = last;
Py_DECREF(tmp);
}
return 0;
}
With these changes, we can assure that the ``first`` and ``last`` members are
never ``NULL`` so we can remove checks for ``NULL`` values in almost all cases.
This means that most of the :c:func:`Py_XDECREF` calls can be converted to
:c:func:`Py_DECREF` calls. The only place we can't change these calls is in
the ``tp_dealloc`` implementation, where there is the possibility that the
initialization of these members failed in ``tp_new``.
We also rename the module initialization function and module name in the
initialization function, as we did before, and we add an extra definition to the
:file:`setup.py` file.
Supporting cyclic garbage collection
====================================
Python has a :term:`cyclic garbage collector (GC) <garbage collection>` that
can identify unneeded objects even when their reference counts are not zero.
This can happen when objects are involved in cycles. For example, consider:
.. code-block:: pycon
>>> l = []
>>> l.append(l)
>>> del l
In this example, we create a list that contains itself. When we delete it, it
still has a reference from itself. Its reference count doesn't drop to zero.
Fortunately, Python's cyclic garbage collector will eventually figure out that
the list is garbage and free it.
In the second version of the :class:`!Custom` example, we allowed any kind of
object to be stored in the :attr:`!first` or :attr:`!last` attributes [#]_.
Besides, in the second and third versions, we allowed subclassing
:class:`!Custom`, and subclasses may add arbitrary attributes. For any of
those two reasons, :class:`!Custom` objects can participate in cycles:
.. code-block:: pycon
>>> import custom3
>>> class Derived(custom3.Custom): pass
...
>>> n = Derived()
>>> n.some_attribute = n
To allow a :class:`!Custom` instance participating in a reference cycle to
be properly detected and collected by the cyclic GC, our :class:`!Custom` type
needs to fill two additional slots and to enable a flag that enables these slots:
.. literalinclude:: ../includes/newtypes/custom4.c
First, the traversal method lets the cyclic GC know about subobjects that could
participate in cycles::
static int
Custom_traverse(CustomObject *self, visitproc visit, void *arg)
{
int vret;
if (self->first) {
vret = visit(self->first, arg);
if (vret != 0)
return vret;
}
if (self->last) {
vret = visit(self->last, arg);
if (vret != 0)
return vret;
}
return 0;
}
For each subobject that can participate in cycles, we need to call the
:c:func:`!visit` function, which is passed to the traversal method. The
:c:func:`!visit` function takes as arguments the subobject and the extra argument
*arg* passed to the traversal method. It returns an integer value that must be
returned if it is non-zero.
Python provides a :c:func:`Py_VISIT` macro that automates calling visit
functions. With :c:func:`Py_VISIT`, we can minimize the amount of boilerplate
in ``Custom_traverse``::
static int
Custom_traverse(CustomObject *self, visitproc visit, void *arg)
{
Py_VISIT(self->first);
Py_VISIT(self->last);
return 0;
}
.. note::
The :c:member:`~PyTypeObject.tp_traverse` implementation must name its
arguments exactly *visit* and *arg* in order to use :c:func:`Py_VISIT`.
Second, we need to provide a method for clearing any subobjects that can
participate in cycles::
static int
Custom_clear(CustomObject *self)
{
Py_CLEAR(self->first);
Py_CLEAR(self->last);
return 0;
}
Notice the use of the :c:func:`Py_CLEAR` macro. It is the recommended and safe
way to clear data attributes of arbitrary types while decrementing
their reference counts. If you were to call :c:func:`Py_XDECREF` instead
on the attribute before setting it to ``NULL``, there is a possibility
that the attribute's destructor would call back into code that reads the
attribute again (*especially* if there is a reference cycle).
.. note::
You could emulate :c:func:`Py_CLEAR` by writing::
PyObject *tmp;
tmp = self->first;
self->first = NULL;
Py_XDECREF(tmp);
Nevertheless, it is much easier and less error-prone to always
use :c:func:`Py_CLEAR` when deleting an attribute. Don't
try to micro-optimize at the expense of robustness!
The deallocator ``Custom_dealloc`` may call arbitrary code when clearing
attributes. It means the circular GC can be triggered inside the function.
Since the GC assumes reference count is not zero, we need to untrack the object
from the GC by calling :c:func:`PyObject_GC_UnTrack` before clearing members.
Here is our reimplemented deallocator using :c:func:`PyObject_GC_UnTrack`
and ``Custom_clear``::
static void
Custom_dealloc(CustomObject *self)
{
PyObject_GC_UnTrack(self);
Custom_clear(self);
Py_TYPE(self)->tp_free((PyObject *) self);
}
Finally, we add the :c:macro:`Py_TPFLAGS_HAVE_GC` flag to the class flags::
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC,
That's pretty much it. If we had written custom :c:member:`~PyTypeObject.tp_alloc` or
:c:member:`~PyTypeObject.tp_free` handlers, we'd need to modify them for cyclic
garbage collection. Most extensions will use the versions automatically provided.
Subclassing other types
=======================
It is possible to create new extension types that are derived from existing
types. It is easiest to inherit from the built in types, since an extension can
easily use the :c:type:`PyTypeObject` it needs. It can be difficult to share
these :c:type:`PyTypeObject` structures between extension modules.
In this example we will create a :class:`!SubList` type that inherits from the
built-in :class:`list` type. The new type will be completely compatible with
regular lists, but will have an additional :meth:`!increment` method that
increases an internal counter:
.. code-block:: pycon
>>> import sublist
>>> s = sublist.SubList(range(3))
>>> s.extend(s)
>>> print(len(s))
6
>>> print(s.increment())
1
>>> print(s.increment())
2
.. literalinclude:: ../includes/newtypes/sublist.c
As you can see, the source code closely resembles the :class:`!Custom` examples in
previous sections. We will break down the main differences between them. ::
typedef struct {
PyListObject list;
int state;
} SubListObject;
The primary difference for derived type objects is that the base type's
object structure must be the first value. The base type will already include
the :c:func:`PyObject_HEAD` at the beginning of its structure.
When a Python object is a :class:`!SubList` instance, its ``PyObject *`` pointer
can be safely cast to both ``PyListObject *`` and ``SubListObject *``::
static int
SubList_init(SubListObject *self, PyObject *args, PyObject *kwds)
{
if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0)
return -1;
self->state = 0;
return 0;
}
We see above how to call through to the :meth:`~object.__init__` method of the base
type.
This pattern is important when writing a type with custom
:c:member:`~PyTypeObject.tp_new` and :c:member:`~PyTypeObject.tp_dealloc`
members. The :c:member:`~PyTypeObject.tp_new` handler should not actually
create the memory for the object with its :c:member:`~PyTypeObject.tp_alloc`,
but let the base class handle it by calling its own :c:member:`~PyTypeObject.tp_new`.
The :c:type:`PyTypeObject` struct supports a :c:member:`~PyTypeObject.tp_base`
specifying the type's concrete base class. Due to cross-platform compiler
issues, you can't fill that field directly with a reference to
:c:type:`PyList_Type`; it should be done in the :c:data:`Py_mod_exec`
function::
static int
sublist_module_exec(PyObject *m)
{
SubListType.tp_base = &PyList_Type;
if (PyType_Ready(&SubListType) < 0) {
return -1;
}
if (PyModule_AddObjectRef(m, "SubList", (PyObject *) &SubListType) < 0) {
return -1;
}
return 0;
}
Before calling :c:func:`PyType_Ready`, the type structure must have the
:c:member:`~PyTypeObject.tp_base` slot filled in. When we are deriving an
existing type, it is not necessary to fill out the :c:member:`~PyTypeObject.tp_alloc`
slot with :c:func:`PyType_GenericNew` -- the allocation function from the base
type will be inherited.
After that, calling :c:func:`PyType_Ready` and adding the type object to the
module is the same as with the basic :class:`!Custom` examples.
.. rubric:: Footnotes
.. [#] This is true when we know that the object is a basic type, like a string or a
float.
.. [#] We relied on this in the :c:member:`~PyTypeObject.tp_dealloc` handler
in this example, because our type doesn't support garbage collection.
.. [#] We now know that the first and last members are strings, so perhaps we
could be less careful about decrementing their reference counts, however,
we accept instances of string subclasses. Even though deallocating normal
strings won't call back into our objects, we can't guarantee that deallocating
an instance of a string subclass won't call back into our objects.
.. [#] Also, even with our attributes restricted to strings instances, the user
could pass arbitrary :class:`str` subclasses and therefore still create
reference cycles.
|