1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
|
======================
Rawrefcount and the GC
======================
GC Interface
------------
"PyObject" is a raw structure with at least two fields, ob_refcnt and
ob_pypy_link. The ob_refcnt is the reference counter as used on
CPython. If the PyObject structure is linked to a live PyPy object,
its current address is stored in ob_pypy_link and ob_refcnt is bumped
by either the constant REFCNT_FROM_PYPY, or the constant
REFCNT_FROM_PYPY_LIGHT (== REFCNT_FROM_PYPY + SOME_HUGE_VALUE)
(to mean "light finalizer").
Most PyPy objects exist outside cpyext, and conversely in cpyext it is
possible that a lot of PyObjects exist without being seen by the rest
of PyPy. At the interface, however, we can "link" a PyPy object and a
PyObject. There are two kinds of link:
rawrefcount.create_link_pypy(p, ob)
Makes a link between an exising object gcref 'p' and a newly
allocated PyObject structure 'ob'. ob->ob_refcnt must be
initialized to either REFCNT_FROM_PYPY, or
REFCNT_FROM_PYPY_LIGHT. (The second case is an optimization:
when the GC finds the PyPy object and PyObject no longer
referenced, it can just free() the PyObject.)
rawrefcount.create_link_pyobj(p, ob)
Makes a link from an existing PyObject structure 'ob' to a newly
allocated W_CPyExtPlaceHolderObject 'p'. You must also add
REFCNT_FROM_PYPY to ob->ob_refcnt. For cases where the PyObject
contains all the data, and the PyPy object is just a proxy. The
W_CPyExtPlaceHolderObject should have only a field that contains
the address of the PyObject, but that's outside the scope of the
GC.
rawrefcount.from_obj(p)
If there is a link from object 'p' made with create_link_pypy(),
returns the corresponding 'ob'. Otherwise, returns NULL.
rawrefcount.to_obj(Class, ob)
Returns ob->ob_pypy_link, cast to an instance of 'Class'.
Collection logic
----------------
Objects existing purely on the C side have ob->ob_pypy_link == 0;
these are purely reference counted. On the other hand, if
ob->ob_pypy_link != 0, then ob->ob_refcnt is at least REFCNT_FROM_PYPY
and the object is part of a "link".
The idea is that links whose 'p' is not reachable from other PyPy
objects *and* whose 'ob->ob_refcnt' is REFCNT_FROM_PYPY or
REFCNT_FROM_PYPY_LIGHT are the ones who die. But it is more messy
because PyObjects still (usually) need to have a tp_dealloc called,
and this cannot occur immediately (and can do random things like
accessing other references this object points to, or resurrecting the
object).
Let P = list of links created with rawrefcount.create_link_pypy()
and O = list of links created with rawrefcount.create_link_pyobj().
The PyPy objects in the list O are all W_CPyExtPlaceHolderObject: all
the data is in the PyObjects, and all outsite references (if any) are
in C, as ``PyObject *`` fields.
So, during the collection we do this about P links:
.. code-block:: python
for (p, ob) in P:
if ob->ob_refcnt != REFCNT_FROM_PYPY
and ob->ob_refcnt != REFCNT_FROM_PYPY_LIGHT:
mark 'p' as surviving, as well as all its dependencies
At the end of the collection, the P and O links are both handled like
this:
.. code-block:: python
for (p, ob) in P + O:
if p is not surviving: # even if 'ob' might be surviving
unlink p and ob
if ob->ob_refcnt == REFCNT_FROM_PYPY_LIGHT:
free(ob)
elif ob->ob_refcnt > REFCNT_FROM_PYPY_LIGHT:
ob->ob_refcnt -= REFCNT_FROM_PYPY_LIGHT
else:
ob->ob_refcnt -= REFCNT_FROM_PYPY
if ob->ob_refcnt == 0:
invoke _Py_Dealloc(ob) later, outside the GC
GC Implementation
-----------------
We need two copies of both the P list and O list, for young or old
objects. All four lists can be regular AddressLists of 'ob' objects.
We also need an AddressDict mapping 'p' to 'ob' for all links in the P
list, and update it when PyPy objects move.
Further notes
-------------
XXX
XXX the rest is the ideal world, but as a first step, we'll look
XXX for the minimal tweaks needed to adapt the existing cpyext
XXX
For objects that are opaque in CPython, like <dict>, we always create
a PyPy object, and then when needed we make an empty PyObject and
attach it with create_link_pypy()/REFCNT_FROM_PYPY_LIGHT.
For <int> and <float> objects, the corresponding PyObjects contain a
"long" or "double" field too. We link them with create_link_pypy()
and we can use REFCNT_FROM_PYPY_LIGHT too: 'tp_dealloc' doesn't
need to be called, and instead just calling free() is fine.
For <type> objects, we need both a PyPy and a PyObject side. These
are made with create_link_pypy()/REFCNT_FROM_PYPY.
For custom PyXxxObjects allocated from the C extension module, we
need create_link_pyobj().
For <str> or <unicode> objects coming from PyPy, we use
create_link_pypy()/REFCNT_FROM_PYPY_LIGHT with a PyObject
preallocated with the size of the string. We copy the string
lazily into that area if PyString_AS_STRING() is called.
For <str>, <unicode>, <tuple> or <list> objects in the C extension
module, we first allocate it as only a PyObject, which supports
mutation of the data from C, like CPython. When it is exported to
PyPy we could make a W_CPyExtPlaceHolderObject with
create_link_pyobj().
For <tuple> objects coming from PyPy, if they are not specialized,
then the PyPy side holds a regular reference to the items. Then we
can allocate a PyTupleObject and store in it borrowed PyObject
pointers to the items. Such a case is created with
create_link_pypy()/REFCNT_FROM_PYPY_LIGHT. If it is specialized,
then it doesn't work because the items are created just-in-time on the
PyPy side. In this case, the PyTupleObject needs to hold real
references to the PyObject items, and we use create_link_pypy()/
REFCNT_FROM_PYPY. In all cases, we have a C array of PyObjects
that we can directly return from PySequence_Fast_ITEMS, PyTuple_ITEMS,
PyTuple_GetItem, and so on.
For <list> objects coming from PyPy, we can use a cpyext list
strategy. The list turns into a PyListObject, as if it had been
allocated from C in the first place. The special strategy can hold
(only) a direct reference to the PyListObject, and we can use either
create_link_pyobj() or create_link_pypy() (to be decided).
PySequence_Fast_ITEMS then works for lists too, and PyList_GetItem
can return a borrowed reference, and so on.
|