1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314
|
.. _language:
RPython Language
================
Definition
----------
RPython is a restricted subset of Python that is amenable to static analysis.
Although there are additions to the language and some things might surprisingly
work, this is a rough list of restrictions that should be considered. Note
that there are tons of special cased restrictions that you'll encounter
as you go. The exact definition is "RPython is everything that our translation
toolchain can accept" :)
Flow restrictions
-----------------
**variables**
variables should contain values of at most one type as described in
`Object restrictions`_ at each control flow point, that means for
example that joining control paths using the same variable to
contain both a string and a int must be avoided. It is allowed to
mix None (basically with the role of a null pointer) with many other
types: wrapped objects, class instances, lists, dicts, strings, etc.
but *not* with int, floats or tuples.
**constants**
all module globals are considered constants. Their binding must not
be changed at run-time. Moreover, global (i.e. prebuilt) lists and
dictionaries are supposed to be immutable: modifying e.g. a global
list will give inconsistent results. However, global instances don't
have this restriction, so if you need mutable global state, store it
in the attributes of some prebuilt singleton instance.
**control structures**
all allowed, ``for`` loops restricted to builtin types, generators
very restricted.
**range**
``range`` and ``xrange`` are identical. ``range`` does not necessarily create an array,
only if the result is modified. It is allowed everywhere and completely
implemented. The only visible difference to CPython is the inaccessibility
of the ``xrange`` fields start, stop and step.
**definitions**
run-time definition of classes or functions is not allowed.
**generators**
generators are supported, but their exact scope is very limited. you can't
merge two different generator in one control point.
**exceptions**
fully supported.
see below `Exception rules`_ for restrictions on exceptions raised by built-in operations
Object restrictions
-------------------
We are using
**integer, float, boolean**
works.
**strings**
a lot of, but not all string methods are supported and those that are
supported, not necesarilly accept all arguments. Indexes can be
negative. In case they are not, then you get slightly more efficient
code if the translator can prove that they are non-negative. When
slicing a string it is necessary to prove that the slice start and
stop indexes are non-negative. There is no implicit str-to-unicode cast
anywhere. Simple string formatting using the ``%`` operator works, as long
as the format string is known at translation time; the only supported
formatting specifiers are ``%s``, ``%d``, ``%x``, ``%o``, ``%f``, plus
``%r`` but only for user-defined instances. Modifiers such as conversion
flags, precision, length etc. are not supported. Moreover, it is forbidden
to mix unicode and strings when formatting.
**tuples**
no variable-length tuples; use them to store or return pairs or n-tuples of
values. Each combination of types for elements and length constitute
a separate and not mixable type.
There is no general way to convert a list into a tuple, because the
length of the result would not be known statically. (You can of course
do ``t = (lst[0], lst[1], lst[2])`` if you know that ``lst`` has got 3
items.)
**lists**
lists are used as an allocated array. Lists are over-allocated, so list.append()
is reasonably fast. However, if you use a fixed-size list, the code
is more efficient. Annotator can figure out most of the time that your
list is fixed-size, even when you use list comprehension.
Negative or out-of-bound indexes are only allowed for the
most common operations, as follows:
- *indexing*:
positive and negative indexes are allowed. Indexes are checked when requested
by an IndexError exception clause.
- *slicing*:
the slice start must be within bounds. The stop doesn't need to, but it must
not be smaller than the start. All negative indexes are disallowed, except for
the [:-1] special case. No step. Slice deletion follows the same rules.
- *slice assignment*:
only supports ``lst[x:y] = sublist``, if ``len(sublist) == y - x``.
In other words, slice assignment cannot change the total length of the list,
but just replace items.
- *other operators*:
``+``, ``+=``, ``in``, ``*``, ``*=``, ``==``, ``!=`` work as expected.
- *methods*:
append, index, insert, extend, reverse, pop. The index used in pop() follows
the same rules as for *indexing* above. The index used in insert() must be within
bounds and not negative.
**dicts**
dicts with a unique key type only, provided it is hashable. Custom
hash functions and custom equality will not be honored.
Use ``rpython.rlib.objectmodel.r_dict`` for custom hash functions.
**sets**
sets are not directly supported in RPython. Instead you should use a
plain dict and fill the values with None. Values in that dict
will not consume space.
**list comprehensions**
May be used to create allocated, initialized arrays.
**functions**
+ function declarations may use defaults and ``*args``, but not
``**keywords``.
+ function calls may be done to a known function or to a variable one,
or to a method. You can call with positional and keyword arguments,
and you can pass a ``*args`` argument (it must be a tuple).
+ as explained above, tuples are not of a variable length. If you need
to call a function with a dynamic number of arguments, refactor the
function itself to accept a single argument which is a regular list.
+ dynamic dispatch enforces the use of signatures that are equal for all
possible called function, or at least "compatible enough". This
concerns mainly method calls, when the method is overridden or in any
way given different definitions in different classes. It also concerns
the less common case of explicitly manipulated function objects.
Describing the exact compatibility rules is rather involved (but if you
break them, you should get explicit errors from the rtyper and not
obscure crashes.)
**builtin functions**
A number of builtin functions can be used. The precise set can be
found in :source:`rpython/annotator/builtin.py` (see ``def builtin_xxx()``).
Some builtin functions may be limited in what they support, though.
``int, float, str, ord, chr``... are available as simple conversion
functions. Note that ``int, float, str``... have a special meaning as
a type inside of isinstance only.
**classes**
+ methods and other class attributes do not change after startup
+ single inheritance is fully supported
+ use `rpython.rlib.objectmodel.import_from_mixin(M)` in a class
body to copy the whole content of a class `M`. This can be used
to implement mixins: functions and staticmethods are duplicated
(the other class attributes are just copied unmodified).
+ classes are first-class objects too
**objects**
Normal rules apply. The only special methods that are honoured are
``__init__``, ``__del__``, ``__len__``, ``__getitem__``, ``__setitem__``,
``__getslice__``, ``__setslice__``, and ``__iter__``. To handle slicing,
``__getslice__`` and ``__setslice__`` must be used; using ``__getitem__`` and
``__setitem__`` for slicing isn't supported. Additionally, using negative
indices for slicing is still not support, even when using ``__getslice__``.
Note that the destructor ``__del__`` should only contain `simple
operations`__; for any kind of more complex destructor, consider
using instead ``rpython.rlib.rgc.FinalizerQueue``.
.. __: garbage_collection.html
This layout makes the number of types to take care about quite limited.
Integer Types
-------------
While implementing the integer type, we stumbled over the problem that
integers are quite in flux in CPython right now. Starting with Python 2.4,
integers mutate into longs on overflow. In contrast, we need
a way to perform wrap-around machine-sized arithmetic by default, while still
being able to check for overflow when we need it explicitly. Moreover, we need
a consistent behavior before and after translation.
We use normal integers for signed arithmetic. It means that before
translation we get longs in case of overflow, and after translation we get a
silent wrap-around. Whenever we need more control, we use the following
helpers (which live in :source:`rpython/rlib/rarithmetic.py`):
**ovfcheck()**
This special function should only be used with a single arithmetic operation
as its argument, e.g. ``z = ovfcheck(x+y)``. Its intended meaning is to
perform the given operation in overflow-checking mode.
At run-time, in Python, the ovfcheck() function itself checks the result
and raises OverflowError if it is a ``long``. But the code generators use
ovfcheck() as a hint: they replace the whole ``ovfcheck(x+y)`` expression
with a single overflow-checking addition in C.
**intmask()**
This function is used for wrap-around arithmetic. It returns the lower bits
of its argument, masking away anything that doesn't fit in a C "signed long int".
Its purpose is, in Python, to convert from a Python ``long`` that resulted from a
previous operation back to a Python ``int``. The code generators ignore
intmask() entirely, as they are doing wrap-around signed arithmetic all the time
by default anyway. (We have no equivalent of the "int" versus "long int"
distinction of C at the moment and assume "long ints" everywhere.)
**r_uint**
In a few cases (e.g. hash table manipulation), we need machine-sized unsigned
arithmetic. For these cases there is the r_uint class, which is a pure
Python implementation of word-sized unsigned integers that silently wrap
around. ("word-sized" and "machine-sized" are used equivalently and mean
the native size, which you get using "unsigned long" in C.)
The purpose of this class (as opposed to helper functions as above)
is consistent typing: both Python and the annotator will propagate r_uint
instances in the program and interpret all the operations between them as
unsigned. Instances of r_uint are special-cased by the code generators to
use the appropriate low-level type and operations.
Mixing of (signed) integers and r_uint in operations produces r_uint that
means unsigned results. To convert back from r_uint to signed integers, use
intmask().
Exception rules
---------------
Exceptions are by default not generated for simple cases.::
#!/usr/bin/python
lst = [1,2,3,4,5]
item = lst[i] # this code is not checked for out-of-bound access
try:
item = lst[i]
except IndexError:
# complain
Code with no exception handlers does not raise exceptions (after it has been
translated, that is. When you run it on top of CPython, it may raise
exceptions, of course). By supplying an exception handler, you ask for error
checking. Without, you assure the system that the operation cannot fail.
This rule does not apply to *function calls*: any called function is
assumed to be allowed to raise any exception.
For example::
x = 5.1
x = x + 1.2 # not checked for float overflow
try:
x = x + 1.2
except OverflowError:
# float result too big
But::
z = some_function(x, y) # can raise any exception
try:
z = some_other_function(x, y)
except IndexError:
# only catches explicitly-raised IndexErrors in some_other_function()
# other exceptions can be raised, too, and will not be caught here.
The ovfcheck() function described above follows the same rule: in case of
overflow, it explicitly raise OverflowError, which can be caught anywhere.
Exceptions explicitly raised or re-raised will always be generated.
PyPy is debuggable on top of CPython
------------------------------------
PyPy has the advantage that it is runnable on standard
CPython. That means, we can run all of PyPy with all exception
handling enabled, so we might catch cases where we failed to
adhere to our implicit assertions.
|