File: rinterface-memorymanagement.rst

package info (click to toggle)
rpy2 3.6.4-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,412 kB
  • sloc: python: 18,448; ansic: 492; makefile: 197; sh: 166
file content (241 lines) | stat: -rw-r--r-- 8,191 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
.. _rinterface-memory:

Memory management and garbage collection
----------------------------------------

The tracking of an R object (:c:type:`SEXP` in R's C-API) 
differs from Python as it does not involve reference counting.
It is using at attribute NAMED (more on this below),
and only considers for collection objects that are not preserved by
being contained in an other R object (for floating object, R's C-API
has 2 functions :c:func:`R_PreserveObject` and :c:func:`R_ReleaseObject` that do little more than placing object is in a container called :c:data:`R_PreciousList`).

Reference counting
^^^^^^^^^^^^^^^^^^

Rpy2 is using its own reference counting system in order to bridge R with
Python and keep as much as possible the pass-by-reference approach familiar
to Python users.

The number of times an R object is used in rpy2, therefore is protected
from garbage collection, is available from Python (obviously read-only):

>>> import rpy2.rinterface as ri
>>> ri.initr()
>>> x = ri.IntSexpVector([1,2,3])
>>> x.__sexp_refcount__
1

That counter will increment each time a new Python reference to it is created.

>>> letters = ri.baseenv['letters']
>>> letters.__sexp_refcount__
1
>>> letters_again = ri.baseenv['letters']
>>> # check that the R ID is the same
>>> letters_again.rid == letters.rid
True
>>> # reference count has increased
>>> letters_again.__sexp_refcount__
2
>>> letters.__sexp_refcount__
2

.. note::

   The attribute `rid` is simply the memory address at which the R-defined
   C-structure containing the R objects is located.

A list of all R IDs protected from garbage collection by rpy2
along with their reference count can be obtained by calling
:func:`rpy2.rinterface.protected_rids`.

We can check that our python object `x` is in indeed listed as protected
from garbage collection (yet it is not bound to any symbol in R - as far as
R is concerned it is like an anonymous variable):

>>> x.rid in (elt[0] for elt in ri.protected_rids())
True

The number of Python/rpy2 objects protecting the R objects from
garbage collection can is also available.

>>> [elt[1] for elt in ri.protected_rids() if elt[0]==x.rid]
[1]

.. note::

   The exact count will depend on what has happened with the current Python
   process, that is whether the R object is already tracked by rpy2 or not.

   Binding the rpy2 object to a new Python symbol will not increase the count
   (because Python knows that the two objects are the same, and R has not been
   involved in that):
   
   >>> y = x
   >>> [elt[1] for elt in ri.protected_rids() if elt[0]==x.rid]
   [1]

   On the other hand, explictly wrapping again the R object through an rpy2
   constructor will increase the count by one:

   >>> z = ri.IntSexpVector(x)
   >>> [elt[1] for elt in ri.protected_rids() if elt[0]==x.rid]
   [2]
   >>> x.rid == z.rid
   True

   In the last case, Python does not know that the 2 objects point to the
   same underlying R object and this mechanism is intended to prevent a
   premature garbage collection of the R object.

   >>> del(x); del(y) # remember that we did `y = x`
   >>> [elt[1] for elt in ri.protected_rids() if elt[0]==z.rid]
   [1]


To achieve this, and keep close to the pass-by-reference approach in Python,
the :c:type:`SexpObject` for a given R object is not part of a Python object
representing it. The Python object only holds a reference to it,
and each time a Python object pointing to a given R object 
(identified by its :c:type:`SEXP`) is created the rpy counter for it is
incremented.

The rpy2 object (proxy for an R object) is implemented as a regular Python
object to which a :c:type:`SexpObject` pointer is appended.

.. code-block:: c

   typedef struct {
       PyObject_HEAD 
       SexpObject *sObj;
   } PySexpObject;

   
The tracking of the capsule itself is what protects the
object from garbage collection on either the R or the Python side.

>>> letters_cstruct = letters.__sexp__
>>> del(letters, letters_again)

The underlying R object is available for collection after the capsule
is deleted (that particular object won't be deleted because R itself tracks it
as part of the base package).

>>> del(letters_cstruct)

Capsules of R objects
^^^^^^^^^^^^^^^^^^^^^

The :c:type:`SexpObject` can be passed around as a (relatively) opaque
C structure, using the attribute :attr:`__sexp__` (a Python `capsule`).

Behind the scene, the capsule is a singleton: given an R object,
it is created with the first Python (rpy2) object wrapping it and
a counter is increased and decreased as other Python objects
expose it as well.

At the C level, the `struct` :c:type:`SexpObject` is defined as:

- a reference count on the Python side

- a possible future reference count on the R side
  (currently unused)
  
- a pointer to the R :c:type:`SEXPREC`

.. code-block:: c
		
   typedef struct {
       Py_ssize_t pycount;
       int rcount;
       SEXP sexp;
   } SexpObject;

The capsule is used to provide a relatively safe composition-like flavor
to the inheritance-based general design of R objects in rpy2, but should
one require access to the underlying R :c:type:`SEXP` object it remains
possible to access it. The following example demonstrates one way to do
it without writing any C code:

.. code-block:: python

   import ctypes

   # Python C API: get the capsule name (of a capsule object)
   pycapsule_getname=ctypes.pythonapi.PyCapsule_GetName
   pycapsule_getname.argtypes = [ctypes.py_object,]
   pycapsule_getname.restype=ctypes.c_char_p
   
   # Python C API: return whether a Python objects is a valid capsule object
   pycapsule_isvalid=ctypes.pythonapi.PyCapsule_IsValid
   pycapsule_isvalid.argtypes=[ctypes.py_object, ctypes.c_char_p]
   pycapsule_isvalid.restype=ctypes.c_bool
   
   # Python C API: return the C pointer
   pycapsule_getpointer=ctypes.pythonapi.PyCapsule_GetPointer
   pycapsule_getpointer.argtypes=[ctypes.py_object, ctypes.c_char_p]
   pycapsule_getpointer.restype=ctypes.c_void_p

   class SexpObject(ctypes.Structure):
       """ C structure SexpObject as defined in the C
           layer of rpy2. """
       _fields_ = [('pycount', ctypes.c_ssize_t),
                   ('rcount', ctypes.c_int),
                   ('sexp', ctypes.c_void_p)]

   # Function to extract the pointer to the underlying R object
   # (*SEXPREC, that is SEXP)
   RPY2_CAPSULENAME=b'rpy2.rinterface._rinterface.SEXPOBJ_C_API'
   def get_sexp(obj):
       assert pycapsule_isvalid(obj, RPY2_CAPSULENAME)
       void_p=pycapsule_getpointer(obj, RPY2_CAPSULENAME)
       return ctypes.cast(void_p, ctypes.POINTER(SexpObject).contents.sexp

.. code-block:: python
		
   from rpy2.rinterface import globalenv
   # Pointer to SEXPREC for the R Global Environment
   sexp=get_sexp(globalenv)
      
Changing the `SEXP` in :c:type:`SexpObject` this way is not advised because
of the risk to confuse the object tracking in rpy2, and ultimately create a segfault.
(I have not thought too long about this. May be the object tracking is more robust
than it think. Just be warned.)
   
   
R's NAMED
^^^^^^^^^

.. warning::

   Starting with version 4.0, R not longer uses `NAMED` to keep track of whether
   an R object can be collected. It is now using a reference-counting system.

Whenever the pass-by-value paradigm is applied strictly,
garbage collection is straightforward as objects only live within
the scope they are declared, but R is using a slight modification
of this in order to minimize memory usage. Each R object has an
attribute :attr:`Sexp.named` attached to it, indicating
the need to copy the object.

>>> import rpy2.rinterface as ri
>>> ri.initr()
0
>>> ri.baseenv['letters'].named
0

Now we assign the vector *letters* in the R base namespace
to a variable *mine* in the R globalenv namespace:

>>> ri.baseenv['assign'](ri.StrSexpVector(("mine", )), ri.baseenv['letters'])
<rpy2.rinterface.SexpVector - Python:0xb77ad280 / R:0xa23c5c0>
>>> tuple(ri.globalenv)
("mine", )
>>> ri.globalenv["mine"].named
2

The *named* is 2 to indicate to :program:`R` that *mine* should be 
copied if a modification of any sort is performed on the object. That copy
will be local to the scope of the modification within R.