File: s4class.md

package info (click to toggle)
rpy2 3.6.4-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,412 kB
  • sloc: python: 18,448; ansic: 492; makefile: 197; sh: 166
file content (195 lines) | stat: -rw-r--r-- 4,684 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
```python
from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")
```

# Basic handling

The S4 system is one the OOP systems in R.
Its largest use might be in the Bioconductor collection of packages
for bioinformatics and computational biology.

We use the bioconductor `Biobase`:

```python
from rpy2.robjects.packages import importr
biobase = importr('Biobase')
```

The R package contains constructors for the S4 classes defined. They
are simply functions, and can be used as such through `rpy2`:

```python
eset = biobase.ExpressionSet() 
```

The object `eset` is an R object of type `S4`:
```python
type(eset)
```

It has a class as well:

```python
tuple(eset.rclass)
```

In R, objects attributes are also known as slots. The attribute names
can be listed with:

```python
tuple(eset.slotnames())
```

The attributes can also be accessed through the `rpy2` property `slots`.
`slots` is a mapping between attributes names (keys) and their associated
R object (values). It can be used as Python `dict`:

```python
# print keys
print(tuple(eset.slots.keys()))

# fetch `phenoData`
phdat = eset.slots['phenoData']

# phdat is an S4 object itself
pheno_dataf = phdat.slots['data']
```

# Mapping S4 classes to Python classes

Writing one's own Python class extending rpy2's `RS4` is straightforward.
That class can be used wrap our `eset` object

```python

from rpy2.robjects.methods import RS4   
class ExpressionSet(RS4):
    pass

eset_myclass = ExpressionSet(eset)
```

## Custom conversion

The conversion system can also be made aware our new class by customizing
the handling of S4 objects.

A simple implementation is a factory function that will conditionally wrap
the object in our Python class `ExpressionSet`:

```python
def rpy2py_s4(obj):
    if 'ExpressionSet' in obj.rclass:
        res = ExpressionSet(obj)
    else:
        res = robj
    return res

# try it
rpy2py_s4(eset)
```

That function can be be register to a `Converter`:

```python
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import Converter

my_converter = Converter('ExpressionSet-aware converter',
                         template=default_converter)

from rpy2.rinterface import SexpS4
my_converter.rpy2py.register(SexpS4, rpy2py_s4)

```

When using that converter, the matching R objects are returned as
instances of our Python class `ExpressionSet`:

```python

with my_converter.context() as cv:
    eset = biobase.ExpressionSet()
    print(type(eset))
```

## Class attributes

The R attribute `assayData` can be accessed
through the accessor method `exprs()` in R.
We can make it a property in our Python class:

```python
class ExpressionSet(RS4):
    def _exprs_get(self):
        return self.slots['assayData']
    def _exprs_set(self, value):
        self.slots['assayData'] = value
    exprs = property(_exprs_get,
                     _exprs_set,
                     None,
                     "R attribute `exprs`")
eset_myclass = ExpressionSet(eset)

eset_myclass.exprs
```

## Methods

In R's S4 methods are generic functions served by a multiple dispatch system.

A natural way to expose the S4 method to Python is to use the
`multipledispatch` package:

```python
from multipledispatch import dispatch
from functools import partial

my_namespace = dict()
dispatch = partial(dispatch, namespace=my_namespace)

@dispatch(ExpressionSet)
def rowmedians(eset,
               na_rm=False):
    res = biobase.rowMedians(eset,
                             na_rm=na_rm)
    return res

res = rowmedians(eset_myclass)
```

The R method `rowMedians` is also defined for matrices, which we can expose
on the Python end as well:

```python
from rpy2.robjects.vectors import Matrix
@dispatch(Matrix)
def rowmedians(m,
               na_rm=False):
    res = biobase.rowMedians(m,
                             na_rm=na_rm)
    return res
```

While this is working, one can note that we call the same R function
`rowMedians()` in the package `Biobase` in both Python decorated
functions. What is happening is that the dispatch is performed by R.

If this is ever becoming a performance issue, the specific R function
dispatched can be prefetched and explicitly called in the Python
function. For example:

```python
from rpy2.robjects.methods import getmethod
from rpy2.robjects.vectors import StrVector
_rowmedians_matrix = getmethod(StrVector(["rowMedians"]),
                               signature=StrVector(["matrix"]))
@dispatch(Matrix)
def rowmedians(m,
               na_rm=False):
    res = _rowmedians_matrix(m,
                             na_rm=na_rm)
    return res
```