1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417
|
.. module:: rpy2.robjects.conversion
:synopsis: Converting rpy2 proxies for R objects into Python objects.
.. _robjects-conversion:
Mapping rpy2 objects to arbitrary python objects
================================================
Protocols
---------
The package has a low level and a high level interface to R. The low level is
closer to R's C API, while the high level is meant to provide more convenience
even if at the cost of performances. The low level (:mod:`rpy2.rinterface`)
is not devoid of any convenience. A minimal set of Pythonic characteristics are
present, allowing rpy2 objects to behave like Python objects of similar nature
and non-rpy2 objects be sometimes usable with R functions when there is
no ambiguity about what conversion between the two systems should be.
For example, R vectors (rank-one arrays) are wrapped to rpy2 classes
implementing the methods :meth:`__len_`, :meth:`__getitem__`, :meth:`__setitem__`
as defined in the sequence
protocol in Python. Python functions working with sequences can then be passed such R
objects:
.. code-block::
import rpy2.rinterface as ri
ri.initr()
# R array of integers
r_vec = ri.IntSexpVector([1,2,3])
# enumerate() can use our r_vec
for i, elt in enumerate(r_vec):
print('r_vec[%i]: %i' % (i, elt))
rpy2 objects with compatible underlying C representations also implement
the :mod:`numpy` :attr:`__array_interface__`, allowing them be used in
:mod:`numpy` functions without the need for data copying or conversion.
.. note::
Before the move to :mod:`cffi` Python's buffer protocol was also implemented
but the Python does not allow classes to define it outside of the Python C-API,
and `cffi` does not allow the use of the Python's C-API.
Some rpy2 vectors will have a method :meth:`memoryview` that will return
views that implement the buffer protocol.
R functions are mapped to Python objects that implement :meth:`__call__`. They
can be called just as if they were functions.
R environments are mapped to Python objects that implement :meth:`__len__`,
:meth:`__getitem__`, :meth:`__setitem__` in the mapping protocol so elements
can be accessed similarly to in a Python :class:`dict`.
.. warning::
While it is technically possible to modify the way C-level R objects
are shown to Python users through the `rinterface` level, it is not
recommended. The `rinterface` level is quite close to R's C API and modifying it may quickly
result in segfaults.
On the other hand, the robjects-level is designed to facilitate the customization
of object conversions between Python and R.
Conversion
----------
The high level interface between Python in :mod:`rpy2` uses a conversion system
each time an R object is represented in Python, and each time a Python objects
is passed to R (for example as a parameter to an R function). Those are the
conversion rules you'll mostly experience when using the API in :mod:`rpy2.robjects`
or in the "R magic" used from `ipython` or `jupyter`.
.. note::
The set of active conversion rules can be customized, including within
a context (see `Local conversion rules`_). Functions
in the :mod:`rpy2.robjects` will use the active rules, but if
wanting the object with currently cactive rules :func:`rpy2.robjects.conversion.get_conversion`
must be used to fetch them.
Behind the hood, the current active conversion system is set in a
:class:`contextvars.ContextVar`. This allows the change of conversion rules to work safely
with Python context managers. However, `contextvars` is relatively recent and will not play
well with older Python code for multithreading. Whenever the case, the error
`Conversion rules for `rpy2.robjects` appear to be missing` is very likely to be encountered
when using `rpy2`. A workaround can be to wrap all calls to rpy2 in conversion rules's context.
For example, to use the default converter:
.. code-block:: python
import rpy2.robjects as ro
with ro.default_converter.context():
# call to rpy2 here.
pass
Consult the rest of the documentation for more information about conversions.
This system is designed to manage the conversion between the low level (`rinterface`-level)
interface and an arbitrary Python-level representation those objects.
`py2rpy` will indicate a conversion from Python-level to `rinterface`-level,
and `rpy2py` from `rinterface`-level to Python-level.
If one wanted to turn all Python :class:`tuple` objects
into R `character` vectors (1D arrays of strings) before passing them to R the custom
conversion function would make an `rinterface`-level R objects from the Python object.
An implementation for this `py2rpy` function would look like:
.. code-block:: python
from rpy2.rinterface import StrSexpVector
def tuple_str(tpl):
res = StrSexpVector(tpl)
return res
The conversion system is an `robjects`-level feature, and by default the Python-level
representations are just high-level (`robjects`-level) representation. However, the package contains
optional conversion rules in modules :mod:`rpy2.robjects.numpy2ri` and
:mod:`rpy2.robjects.pandas2ri` to convert from and to :mod:`numpy` and :mod:`pandas` objects respectively.
.. note::
Sections :ref:`robjects-numpy` and :ref:`robjects-pandas` contain information about
working with rpy2 and :mod:`numpy` or :mod:`pandas` objects.
Converter objects
^^^^^^^^^^^^^^^^^
:class:`rpy2.robjects.conversion.Converter` objects are designed
to keep sets of conversion rules together. There can be as many instances
of that class as desired, but the one called `converter` in
:mod:`rpy2.robjects.conversion` is the one used whenever conversion is needed.
The :class:`Converter` has 2 attributes `rpy2py` and `py2rpy` to resolve
the conversion from R (`rinterface-level`) to an arbitrary Python representation,
and from an arbitrary Python representation to a suitable `rinterface` level.
Each of those is a single dispatch as implemented in
:meth:`functools.singledispatch`. This means that a conversion function,
such as the example function `tuple_str` above, just has to be associated with
the class of the object to convert from. In our example, the Python class is :class:`tuple`.
Our conversion function defined above can be registered in a converter as follows:
.. code-block:: python
from rpy2.robjects.conversion import Converter
seq_converter = Converter('sequence converter')
seq_converter.py2rpy.register(tuple, tuple_str)
Alternatively, the registration can be done with a decorator when the function is declared:
.. code-block:: python
my_converter = rpy2.robjects.conversion.Converter()
@my_converter.py2rpy(tuple)
def tuple_str(tpl):
res = StrSexpVector(tpl)
return res
The class :class:`rpy2.robjects.conversion.Converter` can group several conversion rules
into one object. This helps will defining sets of coherent conversion rules, or
conversion domains. :mod:`rpy2.robjects.numpy2ri.converter` and :mod:`rpy2.rojects.pandas2ri.converter`
are examples of such converters.
Sets of conversion rules can be layered on the top of one another
to create sets of combined conversion rules. To help with writing concise and
clear code, :class:`Converter` objects can be added. For example, creating a
converter that adds the rule above to the default conversion rules in rpy2
will look like:
.. code-block:: python
from rpy2.robjects import default_converter
conversion_rules = default_converter + seq_converter
While a dispatch solely based on Python classes will work very well in the
direction "Python to `rpy2.rinterface`" it will quickly show limits in the direction
"`rpy2.rinterface` to Python", especially when independently-developed conversions
must be combined.
The issue with converting from `rpy2.rinterface` to Python is not working too well
because `rpy2.rinterface` mirrors the type of R objects at the C-level (as
defined in R's C-API), but class definitions in R often sit outside
of structure types found at the C level. They are just a mere attribute of the R object
that contains a list class names. For example, an R `data.frame` is a `VECSXP` at
C-level (that is an R `list`), but it has an attribute `"class"` that contains `"data.frame"`.
.. note::
Nothing would prevent someone to set the `"class"` attribute to `"data.frame"` to an R
object of different type at C-level. For example, it is perfectly possible to write
the following in R, and create an invalid data frame:
.. code-block:: r
> x <- c(1, 2, 3)
> str(x)
int [1:3] 1 2 3
> class(x) <- "data.frame"
> str(x)
'data.frame': 0 obs. of 3 variables:
'data.frame' int character(0) character(0) character(0)
Warning message:
In format.data.frame(x, trim = TRUE, drop0trailing = TRUE, ...) :
corrupt data frame: columns will be truncated or padded with NAs
To allow a dispatch based name-specified classes in R, the rpy2 conversion system
uses a secondary mechanism (the primary mechanism is the single dispatch-based one
presented above).
Instances of :class:`rpy2.robjects.conversion.NameClassMap` can map and R class name to
a Python class. Remember that this mapping only happen within the context of an :mod:`rpy2.rinterface`
class though. The attribute :attr:`rpy2.robjects.conversion.Converter._rpy2py_nc_name` is
a :class:`dict` where keys are :mod:`rpy2.rinterface` classes to wrap C-level R objects, and
values are instances of :class:`rpy2.robjects.conversion.NameClassMap`.
For example, a conversion rule for R objects of class "lm" that are R lists at
the C level (this is a real exemple - R's linear model fit objects are just that)
can be added to a converter with:
.. code-block:: python
class Lm(rinterface.ListSexpVector):
# implement attributes, properties, methods to make the handling of
# the R object more convenient on the Python side
pass
clsmap = myconverter._rpy2py_nc_name[rinterface.ListSexpVector]
clsmap.update({'lm': Lm})
.. _Local conversion rules:
Local conversion rules
^^^^^^^^^^^^^^^^^^^^^^
The conversion rules can be customized globally (See section `Customizing the conversion`)
or locally in a Python `with` block.
.. note::
The use of local conversion rules is
much recommended as modifying the global conversion rules can lead to wasted resources
(e.g., unnecessary round-trip conversions if the code is successively passing results from
calling R functions to the next R functions) or errors (conversion cannot be guaranteed to
be without loss, as concepts present in either language are not always able to survive
a round trip).
As an example, we show how to write an alternative to rpy2 not knowing what to do with
Python tuples.
.. code-block:: python
x = (1, 2, 'c')
from rpy2.robjects.packages import importr
base = importr('base')
# error here:
# NotImplementedError: Conversion 'py2rpy' not defined for objects of type '<class 'tuple'>'
res = base.paste(x, collapse="-")
This can be changed by using our converter defined above as an addition to the
default conversion scheme:
.. code-block:: python
from rpy2.robjects import default_converter
with conversion_rules.context():
res = base.paste(x, collapse="-")
.. note::
A local conversion rule can also ensure that code is robust against arbitrary changes
in the conversion system made by the caller.
For example, to ensure that a function always uses rpy2's default conversion,
irrespective of what are the conversion rules defined by the caller of the code:
.. code-block:: python
from rpy2.robjects import default_converter
def my_function(obj):
with default_converter.context():
# Block of code mixing Python code and calls to R functions
# interacting with the objects returned by R in the Python code.
# Within this block the conversion rules are the ones of
# `default_converter`.
pass
Code in the :mod:`rpy2.robjects` will use whatever the active conversion rules are, but
there are situations where the set of active conversion rules must be accessed. Whenever
the case the conversion rules from the context manager can be named.
.. code-block:: python
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import get_conversion
def my_function(obj):
with default_converter.context() as local_converter:
# `local_converter` is a rpy2.robjects.conversion.Converter
# object.
pass
The converter returned by :meth:`rpy2.robjects.conversion.Converter.context` is
a copy of the rules for the context.
.. code-block:: python
with default_converter.context() as local_converter:
# Conversion objects are not the same.
assert local_converter != default_converter
assert cv.py2rpy.registry != default_converter.py2rpy
assert cv.rpy2py.registry != default_converter.rpy2py
# The convertion rules are identical though.
assert dict(cv.py2rpy.registry) == dict(default_converter.py2rpy.registry)
assert dict(cv.rpy2py.registry) == dict(default_converter.rpy2py.registry)
Customizing the conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^
As an example, let's assume that one want to return atomic values
whenever an R numerical vector is of length one. This is only a matter
of writing a new function `rpy2py` that handles this, as shown below:
.. code-block:: python
import rpy2.robjects as robjects
from rpy2.rinterface import SexpVector
@robjects.conversion.rpy2py.register(SexpVector)
def my_rpy2py(obj):
if len(obj) == 1:
obj = obj[0]
return obj
Then we can test it with:
>>> pi = robjects.r.pi
>>> type(pi)
<type 'float'>
At the time of writing :func:`singledispath` does not provide a way to `unregister`.
Removing the additional conversion rule without restarting Python is left as an
exercise for the reader.
.. note::
Customizing the conversion of S4 classes should preferably done using a separate
dedicated system.
The system is rather simple and can easily be described with an example.
.. code-block:: python
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
class LMER(robjects.RS4):
"""Custom class."""
pass
lme4 = importr('lme4')
res = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
# Map the R/S4 class 'lmerMod' to our Python class LMER.
with robjects.conversion.converter.rclass_map_context(
rinterface.rinterface.SexpS4,
{'lmerMod': LMER}
):
res2 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
When running the example above, `res` is an instance of class
:class:`rpy2.robjects.methods.RS4`,
which is the default mapping for R `S4` instances, while `res2` is an instance of our
custom class `LMER`.
The class mapping is using the hierarchy of R/S4-defined classes and tries to find
the first
matching Python-defined class. For example, the R/S4 class `lmerMod` has a parent class
`merMod` (defined in R S4). Let run the following example after the previous one.
.. code-block:: python
class MER(robjects.RS4):
"""Custom class."""
pass
with robjects.conversion.converter.rclass_map_context(
rinterface.rinterface.SexpS4,
{'merMod': MER}
):
res3 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
with robjects.conversion.converter.rclass_map_context(
rinterface.rinterface.SexpS4,
{'lmerMod': LMER,
'merMod': MER}):
res4 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
`res3` will be a `MER` instance: there is no mapping for the R/S4 class `lmerMod` but there
is a mapping for its R/S4 parent `merMod`. `res4` will be an `LMER` instance.
|