File: robjects_convert.rst

package info (click to toggle)
rpy2 3.6.4-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,412 kB
  • sloc: python: 18,448; ansic: 492; makefile: 197; sh: 166
file content (417 lines) | stat: -rw-r--r-- 16,131 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
.. module:: rpy2.robjects.conversion
   :synopsis: Converting rpy2 proxies for R objects into Python objects.

.. _robjects-conversion:

Mapping rpy2 objects to arbitrary python objects
================================================


Protocols
---------

The package has a low level and a high level interface to R. The low level is
closer to R's C API, while the high level is meant to provide more convenience
even if at the cost of performances. The low level (:mod:`rpy2.rinterface`)
is not devoid of any convenience. A minimal set of Pythonic characteristics are
present, allowing rpy2 objects to behave like Python objects of similar nature
and non-rpy2 objects be sometimes usable with R functions when there is
no ambiguity about what conversion between the two systems should be.

For example, R vectors (rank-one arrays) are wrapped to rpy2 classes
implementing the methods :meth:`__len_`, :meth:`__getitem__`, :meth:`__setitem__`
as defined in the sequence
protocol in Python. Python functions working with sequences can then be passed such R
objects:

.. code-block::

   import rpy2.rinterface as ri
   ri.initr()

   # R array of integers
   r_vec = ri.IntSexpVector([1,2,3])

   # enumerate() can use our r_vec
   for i, elt in enumerate(r_vec):
       print('r_vec[%i]: %i' % (i, elt))


rpy2 objects with compatible underlying C representations also implement
the :mod:`numpy` :attr:`__array_interface__`, allowing them be used in
:mod:`numpy` functions without the need for data copying or conversion.

.. note::

   Before the move to :mod:`cffi` Python's buffer protocol was also implemented
   but the Python does not allow classes to define it outside of the Python C-API,
   and `cffi` does not allow the use of the Python's C-API.

   Some rpy2 vectors will have a method :meth:`memoryview` that will return
   views that implement the buffer protocol.

R functions are mapped to Python objects that implement :meth:`__call__`. They
can be called just as if they were functions.

R environments are mapped to Python objects that implement :meth:`__len__`,
:meth:`__getitem__`, :meth:`__setitem__` in the mapping protocol so elements
can be accessed similarly to in a Python :class:`dict`.

.. warning::

   While it is technically possible to modify the way C-level R objects
   are shown to Python users through the `rinterface` level, it is not
   recommended. The `rinterface` level is quite close to R's C API and modifying it may quickly
   result in segfaults.

   On the other hand, the robjects-level is designed to facilitate the customization
   of object conversions between Python and R.


Conversion
----------

The high level interface between Python in :mod:`rpy2` uses a conversion system
each time an R object is represented in Python, and each time a Python objects
is passed to R (for example as a parameter to an R function). Those are the
conversion rules you'll mostly experience when using the API in :mod:`rpy2.robjects`
or in the "R magic" used from `ipython` or `jupyter`.

.. note::

   The set of active conversion rules can be customized, including within
   a context (see `Local conversion rules`_). Functions
   in the :mod:`rpy2.robjects` will use the active rules, but if
   wanting the object with currently cactive rules :func:`rpy2.robjects.conversion.get_conversion`
   must be used to fetch them.

   Behind the hood, the current active conversion system is set in a
   :class:`contextvars.ContextVar`. This allows the change of conversion rules to work safely
   with Python context managers. However, `contextvars` is relatively recent and will not play
   well with older Python code for multithreading. Whenever the case, the error
   `Conversion rules for `rpy2.robjects` appear to be missing` is very likely to be encountered
   when using `rpy2`. A workaround can be to wrap all calls to rpy2 in conversion rules's context.
   For example, to use the default converter:

   .. code-block:: python

   import rpy2.robjects as ro
   with ro.default_converter.context():
       # call to rpy2 here.
       pass

   Consult the rest of the documentation for more information about conversions.
   

This system is designed to manage the conversion between the low level (`rinterface`-level)
interface and an arbitrary Python-level representation those objects.
`py2rpy` will indicate a conversion from Python-level to `rinterface`-level,
and `rpy2py` from `rinterface`-level to Python-level.

If one wanted to turn all Python :class:`tuple` objects
into R `character` vectors (1D arrays of strings) before passing them to R the custom
conversion function would make an `rinterface`-level R objects from the Python object.
An implementation for this `py2rpy` function would look like:
 
.. code-block:: python

   from rpy2.rinterface import StrSexpVector

   
   def tuple_str(tpl):
       res = StrSexpVector(tpl)
       return res

The conversion system is an `robjects`-level feature, and by default the Python-level
representations are just high-level (`robjects`-level) representation. However, the package contains
optional conversion rules in modules :mod:`rpy2.robjects.numpy2ri` and
:mod:`rpy2.robjects.pandas2ri` to convert from and to :mod:`numpy` and :mod:`pandas` objects respectively.

.. note::

   Sections :ref:`robjects-numpy` and :ref:`robjects-pandas` contain information about
   working with rpy2 and :mod:`numpy` or :mod:`pandas` objects.


Converter objects
^^^^^^^^^^^^^^^^^

:class:`rpy2.robjects.conversion.Converter` objects are designed
to keep sets of conversion rules together. There can be as many instances
of that class as desired, but the one called `converter` in
:mod:`rpy2.robjects.conversion` is the one used whenever conversion is needed.

The :class:`Converter` has 2 attributes `rpy2py` and `py2rpy` to resolve
the conversion from R (`rinterface-level`) to an arbitrary Python representation,
and from an arbitrary Python representation to a suitable `rinterface` level.
Each of those is a single dispatch as implemented in
:meth:`functools.singledispatch`. This means that a conversion function,
such as the example function `tuple_str` above, just has to be associated with
the class of the object to convert from. In our example, the Python class is :class:`tuple`.

Our conversion function defined above can be registered in a converter as follows:

.. code-block:: python
   
   from rpy2.robjects.conversion import Converter
   seq_converter = Converter('sequence converter')
   seq_converter.py2rpy.register(tuple, tuple_str)

Alternatively, the registration can be done with a decorator when the function is declared:

.. code-block:: python

   my_converter = rpy2.robjects.conversion.Converter()

   @my_converter.py2rpy(tuple)
   def tuple_str(tpl):
       res = StrSexpVector(tpl)
       return res

The class :class:`rpy2.robjects.conversion.Converter` can group several conversion rules
into one object. This helps will defining sets of coherent conversion rules, or
conversion domains. :mod:`rpy2.robjects.numpy2ri.converter` and :mod:`rpy2.rojects.pandas2ri.converter`
are examples of such converters.

Sets of conversion rules can be layered on the top of one another
to create sets of combined conversion rules. To help with writing concise and
clear code, :class:`Converter` objects can be added. For example, creating a
converter that adds the rule above to the default conversion rules in rpy2
will look like:

.. code-block:: python
		
   from rpy2.robjects import default_converter
   conversion_rules = default_converter + seq_converter

While a dispatch solely based on Python classes will work very well in the
direction "Python to `rpy2.rinterface`" it will quickly show limits in the direction
"`rpy2.rinterface` to Python", especially when independently-developed conversions
must be  combined.

The issue with converting from `rpy2.rinterface` to Python is not working too well
because `rpy2.rinterface` mirrors the type of R objects at the C-level (as
defined in R's C-API), but class definitions in R often sit outside
of structure types found at the C level. They are just a mere attribute of the R object
that contains a list class names. For example, an R `data.frame` is a `VECSXP` at
C-level (that is an R `list`), but it has an attribute `"class"` that contains `"data.frame"`.
   
.. note::

   Nothing would prevent someone to set the `"class"` attribute to `"data.frame"` to an R
   object of different type at C-level. For example, it is perfectly possible to write
   the following in R, and create an invalid data frame:
   
   .. code-block:: r
		   
      > x <- c(1, 2, 3)
      > str(x)
      int [1:3] 1 2 3
      > class(x) <- "data.frame"
      > str(x)
      'data.frame':	0 obs. of  3 variables:
       'data.frame' int  character(0) character(0) character(0)
      Warning message:
        In format.data.frame(x, trim = TRUE, drop0trailing = TRUE, ...) :
        corrupt data frame: columns will be truncated or padded with NAs
 
To allow a dispatch based name-specified classes in R, the rpy2 conversion system
uses a secondary mechanism (the primary mechanism is the single dispatch-based one
presented above).

Instances of :class:`rpy2.robjects.conversion.NameClassMap` can map and R class name to
a Python class. Remember that this mapping only happen within the context of an :mod:`rpy2.rinterface`
class though. The attribute :attr:`rpy2.robjects.conversion.Converter._rpy2py_nc_name` is
a :class:`dict` where keys are :mod:`rpy2.rinterface` classes to wrap C-level R objects, and
values are instances of :class:`rpy2.robjects.conversion.NameClassMap`.

For example, a conversion rule for R objects of class "lm" that are R lists at
the C level (this is a real exemple - R's linear model fit objects are just that)
can be added to a converter with:

.. code-block:: python

   class Lm(rinterface.ListSexpVector):
       # implement attributes, properties, methods to make the handling of
       # the R object more convenient on the Python side
       pass

   clsmap = myconverter._rpy2py_nc_name[rinterface.ListSexpVector]
   clsmap.update({'lm': Lm})


.. _Local conversion rules:

Local conversion rules
^^^^^^^^^^^^^^^^^^^^^^

The conversion rules can be customized globally (See section `Customizing the conversion`)
or locally in a Python `with` block.

.. note::

   The use of local conversion rules is
   much recommended as modifying the global conversion rules can lead to wasted resources
   (e.g., unnecessary round-trip conversions if the code is successively passing results from
   calling R functions to the next R functions) or errors (conversion cannot be guaranteed to
   be without loss, as concepts present in either language are not always able to survive
   a round trip).
   
As an example, we show how to write an alternative to rpy2 not knowing what to do with
Python tuples.

.. code-block:: python

   x = (1, 2, 'c')

   from rpy2.robjects.packages import importr
   base = importr('base')

   # error here:
   # NotImplementedError: Conversion 'py2rpy' not defined for objects of type '<class 'tuple'>'
   res = base.paste(x, collapse="-")

This can be changed by using our converter defined above as an addition to the
default conversion scheme:

.. code-block:: python

   from rpy2.robjects import default_converter
   with conversion_rules.context():
       res = base.paste(x, collapse="-")

.. note::

   A local conversion rule can also ensure that code is robust against arbitrary changes
   in the conversion system made by the caller.

   For example, to ensure that a function always uses rpy2's default conversion,
   irrespective of what are the conversion rules defined by the caller of the code:

   .. code-block:: python

      from rpy2.robjects import default_converter

      def my_function(obj):
          with default_converter.context():
              # Block of code mixing Python code and calls to R functions
	      # interacting with the objects returned by R in the Python code.
	      # Within this block the conversion rules are the ones of
	      # `default_converter`.
	      pass

   Code in the :mod:`rpy2.robjects` will use whatever the active conversion rules are, but
   there are situations where the set of active conversion rules must be accessed. Whenever
   the case the conversion rules from the context manager can be named.
	  
   .. code-block:: python

      from rpy2.robjects import default_converter
      from rpy2.robjects.conversion import get_conversion

      def my_function(obj):
          with default_converter.context() as local_converter:
	      # `local_converter` is a rpy2.robjects.conversion.Converter
	      # object.
	      pass	  

    The converter returned by :meth:`rpy2.robjects.conversion.Converter.context` is
    a copy of the rules for the context.

    .. code-block:: python

        with default_converter.context() as local_converter:
	    # Conversion objects are not the same.
	    assert local_converter != default_converter
	    assert cv.py2rpy.registry != default_converter.py2rpy
	    assert cv.rpy2py.registry != default_converter.rpy2py
	    # The convertion rules are identical though.
	    assert dict(cv.py2rpy.registry) == dict(default_converter.py2rpy.registry)
	    assert dict(cv.rpy2py.registry) == dict(default_converter.rpy2py.registry)


Customizing the conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^

As an example, let's assume that one want to return atomic values
whenever an R numerical vector is of length one. This is only a matter
of writing a new function `rpy2py` that handles this, as shown below:

.. code-block:: python

   import rpy2.robjects as robjects
   from rpy2.rinterface import SexpVector
   
   @robjects.conversion.rpy2py.register(SexpVector)
   def my_rpy2py(obj):
       if len(obj) == 1:
           obj = obj[0]
       return obj

Then we can test it with:

>>> pi = robjects.r.pi
>>> type(pi)
<type 'float'>

At the time of writing :func:`singledispath` does not provide a way to `unregister`.
Removing the additional conversion rule without restarting Python is left as an
exercise for the reader.

.. note::

   Customizing the conversion of S4 classes should preferably done using a separate
   dedicated system.

   The system is rather simple and can easily be described with an example.

   .. code-block:: python

      import rpy2.robjects as robjects
      from rpy2.robjects.packages import importr

      class LMER(robjects.RS4):
          """Custom class."""
          pass

      lme4 = importr('lme4')

      res = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

      # Map the R/S4 class 'lmerMod' to our Python class LMER.
      with robjects.conversion.converter.rclass_map_context(
          rinterface.rinterface.SexpS4,
	  {'lmerMod': LMER}
      ):
          res2 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

   When running the example above, `res` is an instance of class
   :class:`rpy2.robjects.methods.RS4`,
   which is the default mapping for R `S4` instances, while `res2` is an instance of our
   custom class `LMER`.

   The class mapping is using the hierarchy of R/S4-defined classes and tries to find
   the first
   matching Python-defined class. For example, the R/S4 class `lmerMod` has a parent class
   `merMod` (defined in R S4). Let run the following example after the previous one.
   
   .. code-block:: python

      class MER(robjects.RS4):
          """Custom class."""
          pass

      with robjects.conversion.converter.rclass_map_context(
          rinterface.rinterface.SexpS4,
	  {'merMod': MER}
      ):
          res3 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

      with robjects.conversion.converter.rclass_map_context(
          rinterface.rinterface.SexpS4,
	  {'lmerMod': LMER,
           'merMod': MER}):
          res4 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

   `res3` will be a `MER` instance: there is no mapping for the R/S4 class `lmerMod` but there
   is a mapping for its R/S4 parent `merMod`. `res4` will be an `LMER` instance.