1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146
|
---
title: "Primer on Python for R Users"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Primer on Python for R Users}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
<!-- ```{r setup, include=FALSE} -->
<!-- library(reticulate) -->
<!-- # this vignette requires python 3.8 or newer -->
<!-- eval <- tryCatch({ -->
<!-- config <- py_config() -->
<!-- numeric_version(config$version) >= "3.8" && py_numpy_available() -->
<!-- }, error = function(e) FALSE) -->
<!-- knitr::opts_chunk$set( -->
<!-- collapse = TRUE, -->
<!-- comment = "#>", -->
<!-- eval = eval -->
<!-- ) -->
<!-- ``` -->
``` r
library(reticulate)
```
## Primer on Python for R users
You may find yourself wanting to read and understand some Python, or
even port some Python to R. This guide is designed to enable you to do
these tasks as quickly as possible. As you'll see, R and Python are
similar enough that this is possible without necessarily learning all of
Python. We start with the basics of container types and work up to the
mechanics of classes, dunders, the iterator protocol, the context
protocol, and more!
### Whitespace
Whitespace matters in Python. In R, expressions are grouped into a code
block with `{}`. In Python, that is done by making the expressions share
an indentation level. For example, an expression with an R code block
might be:
``` r
if (TRUE) {
cat("This is one expression. \n")
cat("This is another expression. \n")
}
#> This is one expression.
#> This is another expression.
```
The equivalent in Python:
``` python
if True:
print("This is one expression.")
print("This is another expression.")
#> This is one expression.
#> This is another expression.
```
Python accepts tabs or spaces as the indentation spacer, but the rules
get tricky when they're mixed. Most style guides suggest (and IDE's
default to) using spaces only.
### Container Types
In R, the `list()` is a container you can use to organize R objects. R's
`list()` is feature packed, and there is no single direct equivalent in
Python that supports all the same features. Instead there are (at least)
4 different Python container types you need to be aware of: lists,
dictionaries, tuples, and sets.
#### Lists
Python lists are typically created using bare brackets `[]`. The Python
built-in `list()` function is more of a coercion function, closer in
spirit to R's `as.list()`. The most important thing to know about Python
lists is that they are modified in place. Note in the example below that
`y` reflects the changes made to `x`, because the underlying list object
which both symbols point to is modified in place.
``` python
x = [1, 2, 3]
y = x # `y` and `x` now refer to the same list!
x.append(4)
print("x is", x)
#> x is [1, 2, 3, 4]
print("y is", y)
#> y is [1, 2, 3, 4]
```
One Python idiom that might be concerning to R users is that of growing
lists through the `append()` method. Growing lists in R is typically
slow and best avoided. But because Python's list are modified in place
(and a full copy of the list is avoided when appending items), it is
efficient to grow Python lists in place.
Some syntactic sugar around Python lists you might encounter is the
usage of `+` and `*` with lists. These are concatenation and replication
operators, akin to R's `c()` and `rep()`.
``` python
x = [1]
x
#> [1]
x + x
#> [1, 1]
x * 3
#> [1, 1, 1]
```
You can index into lists with integers using trailing `[]`, but note
that indexing is 0-based.
``` python
x = [1, 2, 3]
x[0]
#> 1
x[1]
#> 2
x[2]
#> 3
try:
x[3]
except Exception as e:
print(e)
#> list index out of range
```
When indexing, negative numbers count from the end of the container.
``` python
x = [1, 2, 3]
x[-1]
#> 3
x[-2]
#> 2
x[-3]
#> 1
```
You can slice ranges of lists using the `:` inside brackets. Note that
the slice syntax is ***not*** inclusive of the end of the slice range.
You can optionally also specify a stride.
``` python
x = [1, 2, 3, 4, 5, 6]
x[0:2] # get items at index positions 0, 1
#> [1, 2]
x[1:] # get items from index position 1 to the end
#> [2, 3, 4, 5, 6]
x[:-2] # get items from beginning up to the 2nd to last.
#> [1, 2, 3, 4]
x[:] # get all the items (idiom used to copy the list so as not to modify in place)
#> [1, 2, 3, 4, 5, 6]
x[::2] # get all the items, with a stride of 2
#> [1, 3, 5]
x[1::2] # get all the items from index 1 to the end, with a stride of 2
#> [2, 4, 6]
```
#### Tuples
Tuples behave like lists, except they are not mutable, and they don't
have the same modify-in-place methods like `append()`. They are
typically constructed using bare `()`, but parentheses are not strictly
required, and you may see an implicit tuple being defined just from a
comma separated series of expressions. Because parentheses can also be
used to specify order of operations in expressions like `(x + 3) * 4`, a
special syntax is required to define tuples of length 1: a trailing
comma. Tuples are most commonly encountered in functions that take a
variable number of arguments.
``` python
x = (1, 2) # tuple of length 2
type(x)
#> <class 'tuple'>
len(x)
#> 2
x
#> (1, 2)
x = (1,) # tuple of length 1
type(x)
#> <class 'tuple'>
len(x)
#> 1
x
#> (1,)
x = () # tuple of length 0
print(f"{type(x) = }; {len(x) = }; {x = }")
#> type(x) = <class 'tuple'>; len(x) = 0; x = ()
# example of an interpolated string literals
x = 1, 2 # also a tuple
type(x)
#> <class 'tuple'>
len(x)
#> 2
x = 1, # beware a single trailing comma! This is a tuple!
type(x)
#> <class 'tuple'>
len(x)
#> 1
```
##### Packing and Unpacking
Tuples are the container that powers the *packing* and *unpacking*
semantics in Python. Python provides the convenience of allowing you to
assign multiple symbols in one expression. This is called *unpacking*.
For example:
``` python
x = (1, 2, 3)
a, b, c = x
a
#> 1
b
#> 2
c
#> 3
```
(You can access similar unpacking behavior from R using
`` zeallot::`%<-%` ``).
Tuple unpacking can occur in a variety of contexts, such as iteration:
``` python
xx = (("a", 1),
("b", 2))
for x1, x2 in xx:
print("x1 = ", x1)
print("x2 = ", x2)
#> x1 = a
#> x2 = 1
#> x1 = b
#> x2 = 2
```
If you attempt to unpack a container to the wrong number of symbols,
Python raises an error:
``` python
x = (1, 2, 3)
a, b, c = x # success
a, b = x # error, x has too many values to unpack
#> ValueError: too many values to unpack (expected 2)
a, b, c, d = x # error, x has not enough values to unpack
#> ValueError: not enough values to unpack (expected 4, got 3)
```
It is possible to unpack a variable number of arguments, using `*` as a
prefix to a symbol. (You'll see the `*` prefix again when we talk about
functions)
``` python
x = (1, 2, 3)
a, *the_rest = x
a
#> 1
the_rest
#> [2, 3]
```
You can also unpack nested structures:
``` python
x = ((1, 2), (3, 4))
(a, b), (c, d) = x
```
#### Dictionaries
Dictionaries are most similar to R environments. They are a container
where you can retrieve items by name, though in Python the name (called
a *key* in Python's parlance) does not need to be a string like in R. It
can be any Python object with a `hash()` method (meaning, it can be
almost any Python object). They can be created using syntax like
`{key: value}`. Like Python lists, they are modified in place. Note that
`r_to_py()` converts R named lists to dictionaries.
``` python
d = {"key1": 1,
"key2": 2}
d2 = d
d
#> {'key1': 1, 'key2': 2}
d["key1"]
#> 1
d["key3"] = 3
d2 # modified in place!
#> {'key1': 1, 'key2': 2, 'key3': 3}
```
Like R environments (and unlike R's named lists), you cannot index into
a dictionary with an integer to get an item at a specific index
position. Dictionaries are *unordered* containers. (However---beginning
with Python 3.7, dictionaries do preserve the item insertion order).
``` python
d = {"key1": 1, "key2": 2}
d[1] # error
#> KeyError: 1
```
A container that closest matches the semantics of R's named list is the
[`OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict),
but that's relatively uncommon in Python code so we don't cover it
further.
#### Sets
Sets are a container that can be used to efficiently track unique items
or deduplicate lists. They are constructed using `{val1, val2}` (like a
dictionary, but without `:`). Think of them as dictionary where you only
use the keys. Sets have many efficient methods for membership
operations, like `intersection()`, `issubset()`, `union()` and so on.
``` python
s = {1, 2, 3}
type(s)
#> <class 'set'>
s
#> {1, 2, 3}
s.add(1)
s
#> {1, 2, 3}
```
### Iteration with `for`
The `for` statement in Python can be used to iterate over any kind of
container.
``` python
for x in [1, 2, 3]:
print(x)
#> 1
#> 2
#> 3
```
R has a relatively limited set of objects that can be passed to `for`.
Python by comparison, provides an iterator protocol interface, which
means that authors can define custom objects, with custom behavior that
is invoked by `for`. (We'll have an example for how to define a custom
iterable when we get to classes). You may want to use a Python iterable
from R using reticulate, so it's helpful to peel back the syntactic
sugar a little to show what the `for` statement is doing in Python, and
how you can step through it manually.
There are two things that happen: first, an iterator is constructed from
the supplied object. Then, the new iterator object is repeatedly called
with `next()` until it is exhausted.
``` python
l = [1, 2, 3]
it = iter(l) # create an iterator object
it
#> <list_iterator object at 0x1402267a0>
# call `next` on the iterator until it is exhausted:
next(it)
#> 1
next(it)
#> 2
next(it)
#> 3
next(it)
#> StopIteration
```
In R, you can use reticulate to step through an iterator the same way.
``` r
library(reticulate)
l <- r_to_py(list(1, 2, 3))
it <- as_iterator(l)
iter_next(it)
#> 1.0
iter_next(it)
#> 2.0
iter_next(it)
#> 3.0
iter_next(it, completed = "StopIteration")
#> [1] "StopIteration"
```
Iterating over dictionaries first requires understanding if you are
iterating over the keys, values, or both. Dictionaries have methods that
allow you to specify which.
``` python
d = {"key1": 1, "key2": 2}
for key in d:
print(key)
#> key1
#> key2
for value in d.values():
print(value)
#> 1
#> 2
for key, value in d.items():
print(key, ":", value)
#> key1 : 1
#> key2 : 2
```
#### Comprehensions
Comprehensions are special syntax that allow you to construct a
container like a list or a dict, while also executing a small operation
or single expression on each element. You can think of it as special
syntax for R's `lapply`.
For example:
``` python
x = [1, 2, 3]
# a list comprehension built from x, where you add 100 to each element
l = [element + 100 for element in x]
l
#> [101, 102, 103]
# a dict comprehension built from x, where the key is a string.
# Python's str() is like R's as.character()
d = {str(element) : element + 100
for element in x}
d
#> {'1': 101, '2': 102, '3': 103}
```
### Defining Functions with `def`
Python functions are defined with the `def` statement. The syntax for
specifying function arguments and default values is very similar to R.
``` python
def my_function(name = "World"):
print("Hello", name)
my_function()
#> Hello World
my_function("Friend")
#> Hello Friend
```
The equivalent R snippet would be
``` r
my_function <- function(name = "World") {
cat("Hello", name, "\n")
}
my_function()
#> Hello World
my_function("Friend")
#> Hello Friend
```
Unlike R functions, the last value in a function is not automatically
returned. Python requires an explicit return statement.
``` python
def fn():
1
print(fn())
#> None
def fn():
return 1
print(fn())
#> 1
```
(Note for advanced R users: Python has no equivalent of R's argument
"promises". Function argument default values are evaluated once, when
the function is constructed. This can be surprising if you define a
Python function with a mutable object as a default argument value, like
a Python list!)
``` python
def my_func(x = []):
x.append("was called")
print(x)
my_func()
#> ['was called']
my_func()
#> ['was called', 'was called']
my_func()
#> ['was called', 'was called', 'was called']
```
You can also define Python functions that take a variable number of
arguments, similar to `...` in R. A notable difference is that R's `...`
makes no distinction between named and unnamed arguments, but Python
does. In Python, prefixing a single `*` captures unnamed arguments, and
two `**` signifies that *keyword* arguments are captured.
``` python
def my_func(*args, **kwargs):
print("args = ", args) # args is a tuple
print("kwargs = ", kwargs) # kwargs is a dictionary
my_func(1, 2, 3, a = 4, b = 5, c = 6)
#> args = (1, 2, 3)
#> kwargs = {'a': 4, 'b': 5, 'c': 6}
```
Whereas the `*` and `**` in a function definition signature *pack*
arguments, in a function call they *unpack* arguments. Unpacking
arguments in a function call is equivalent to using `do.call()` in R.
``` python
def my_func(a, b, c):
print(a, b, c)
args = (1, 2, 3)
my_func(*args)
#> 1 2 3
kwargs = {"a": 1, "b": 2, "c": 3}
my_func(**kwargs)
#> 1 2 3
```
### Defining Classes with `class`
One could argue that in R, the preeminent unit of composition for code
is the `function`, and in Python, it's the `class`. You can be a very
productive R user and never use R6, reference classes, or similar R
equivalents to the object-oriented style of Python `class`'s.
In Python, however, understanding the basics of how `class` objects work
is requisite knowledge, because `class`'s are how you organize and find
methods in Python. (In contrast to R's approach, where methods are found
by dispatching from a generic). Fortunately, the basics of `class`'s are
accessible.
Don't be intimidated if this is your first exposure to object oriented
programming. We'll start by building up a simple Python class for
demonstration purposes.
``` python
class MyClass:
pass # `pass` means do nothing.
MyClass
#> <class '__main__.MyClass'>
type(MyClass)
#> <class 'type'>
instance = MyClass()
instance
#> <__main__.MyClass object at 0x14023b260>
type(instance)
#> <class '__main__.MyClass'>
```
Like the `def` statement, the `class` statement binds a new callable
symbol, `MyClass`. First note the strong naming convention, classes are
typically `CamelCase`, and functions are typically `snake_case`. After
defining `MyClass`, you can interact with it, and see that it has type
`'type'`. Calling `MyClass()` creates a new object **instance** of the
class, which has type `'MyClass'` (ignore the `__main__.` prefix for
now). The instance prints with its memory address, which is a strong
hint that it's common to be managing many instances of a class, and that
the instance is mutable (modified-in-place by default).
In the first example, we defined an empty `class`, but when we inspect
it we see that it already comes with a bunch of attributes (`dir()` in
Python is equivalent to `names()` in R):
``` python
dir(MyClass)
#> ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']
```
#### What are all the underscores?
Python typically indicates that something is special by wrapping the
name in double underscores. A special double-underscore-wrapped token is
commonly called a "dunder". "Special" is not a technical term, it just
means that the token invokes a Python language feature. Some dunder
tokens are merely ways code authors can plug into specific syntactic
sugars, others are values provided by the interpreter that would be
otherwise hard to acquire, yet others are for extending language
interfaces (e.g., the iteration protocol), and finally, a small handful
of dunders are truly complicated to understand. Fortunately, as an R
user looking to use some Python features through reticulate, you only
need to know about a few easy-to-understand dunders.
The most common dunder method you'll encounter when reading Python code
is `__init__()`. This is a function that is called when the class
constructor is called, that is, when a class is **instantiated**. It is
meant to initialize the new class instance. (In very sophisticated code
bases, you may also encounter classes where `__new__` is also defined,
this is called before `__init__`).
``` python
class MyClass:
print("MyClass's definition body is being evaluated")
def __init__(self):
print(self, "is initializing")
#> MyClass's definition body is being evaluated
print("MyClass is finished being created")
#> MyClass is finished being created
instance = MyClass()
#> <__main__.MyClass object at 0x140266330> is initializing
print(instance)
#> <__main__.MyClass object at 0x140266330>
instance2 = MyClass()
#> <__main__.MyClass object at 0x11e3ad490> is initializing
print(instance2)
#> <__main__.MyClass object at 0x11e3ad490>
```
A few things to note:
- the `class` statement takes a code block that is defined by a common
indentation level. The code block has the same exact semantics as
any other expression that takes a code block, like `if` and `def`.
The body of the class is evaluated only **once**, when the class
constructor is first being created. Beware that any objects defined
here are shared by all instances of the class!
- `__init__` is just a normal function, defined with `def` like any
other function. Except it's inside the class body.
- `__init__` take an argument: `self`. `self` is the class instance
being initialized (note the identical memory address between `self`
and `instance`). Also note that we didn't provide `self` when call
`MyClass()` to create the class instance, `self` was spliced into
the function call by the interpreter.
- `__init__` is called each time a new instance is created.
Functions defined inside a `class` code block are called *methods*, and
the important thing to know about methods is that each time they are
called from a class instance, the instance is spliced into the function
call as the first argument. This applies to all functions defined in a
class, including dunders. (The sole exception is if the function is
decorated with something like `@classmethod` or `@staticmethod`).
``` python
class MyClass:
def a_method(self):
print("MyClass.a_method() was called with", self)
instance = MyClass()
instance.a_method()
#> MyClass.a_method() was called with <__main__.MyClass object at 0x11e3c7f20>
MyClass.a_method() # error, missing required argument `self`
#> TypeError: MyClass.a_method() missing 1 required positional argument: 'self'
MyClass.a_method(instance) # identical to instance.a_method()
#> MyClass.a_method() was called with <__main__.MyClass object at 0x11e3c7f20>
```
Other dunder's worth knowing about are:
- `__getitem__`: the function invoked when subsetting an instance with
`[` (Equivalent to defining a `[` S3 method in R.
- `__getattr__`: the function invoked when subsetting with `.`
(Equivalent to defining a `$` S3 method in R.
- `__iter__` and `__next__`: functions invoked by `for`.
- `__call__`: invoked when a class instance is called like a function
(e.g., `instance()`).
- `__bool__`: invoked by `if` and `while` (equivalent to
`as.logical()` in R, but returning only a scalar, not a vector).
- `__repr__`, `__str__`, functions invoked for formatting and pretty
printing (akin to `format()`, `dput()`, and `print()` methods in R).
- `__enter__` and `__exit__`: functions invoked by `with`.
- Many [built-in](https://docs.python.org/3/library/functions.html)
Python functions are just sugar for invoking the dunder. For
example: calling `repr(x)` is identical to `x.__repr__()`. Other
builtins that are just sugar for invoking the dunder are `next()`,
`iter()`, `str()`, `list()`, `dict()`, `bool()`, `dir()`, `hash()`
and more!
#### Iterators, revisited
Now that we have the basics of `class`, it's time to revisit iterators.
First, some terminology:
**iterable**: something that can be iterated over. Concretely, a class
that defines an `__iter__` method, whose job is to return an *iterator*.
**iterator**: something that iterates. Concretely, a class that defines
a `__next__` method, whose job is to return the next element each time
it is called, and then raises a `StopIteration` exception once it's
exhausted.
It's common to see classes that are both iterables and iterators, where
the `__iter__` method is just a stub that returns `self`.
Here is a custom iterable / iterator implementation of Python's `range`
(similar to `seq` in R)
``` python
class MyRange:
def __init__(self, start, end):
self.start = start
self.end = end
def __iter__(self):
# reset our counter.
self._index = self.start - 1
return self
def __next__(self):
if self._index < self.end:
self._index += 1 # increment
return self._index
else:
raise StopIteration
for x in MyRange(1, 3):
print(x)
#> 1
#> 2
#> 3
# doing what `for` does, but manually
r = MyRange(1, 3)
it = iter(r)
next(it)
#> 1
next(it)
#> 2
next(it)
#> 3
next(it)
#> StopIteration
```
### Defining Generators with `yield`.
Generators are special Python functions that contain one or more `yield`
statements. As soon as `yield` is included in a code block passed to
`def`, the semantics change substantially. You're no longer defining a
mere function, but a generator constructor! In turn, calling a generator
constructor creates a generator object, which is just another type of
iterator.
Here is an example:
``` python
def my_generator_constructor():
yield 1
yield 2
yield 3
# At first glance it presents like a regular function
my_generator_constructor
#> <function my_generator_constructor at 0x1402579c0>
type(my_generator_constructor)
#> <class 'function'>
# But calling it returns something special, a 'generator object'
my_generator = my_generator_constructor()
my_generator
#> <generator object my_generator_constructor at 0x11e3ff530>
type(my_generator)
#> <class 'generator'>
# The generator object is both an iterable and an iterator
# it's __iter__ method is just a stub that returns `self`
iter(my_generator) == my_generator == my_generator.__iter__()
#> True
# step through it like any other iterator
next(my_generator)
#> 1
my_generator.__next__() # next() is just sugar for calling the dunder
#> 2
next(my_generator)
#> 3
next(my_generator)
#> StopIteration
```
Encountering `yield` is like hitting the pause button on a functions
execution, it preserves the state of everything in the function body and
returns control to whatever is iterating over the generator object.
Calling `next()` on the generator object resumes execution of the
function body until the next `yield` is encountered, or the function
finishes.
### Iteration closing remarks
Iteration is deeply baked into the Python language, and R users may be
surprised by how things in Python are iterable, iterators, or powered by
the iterator protocol under the hood. For example, the built-in `map()`
(equivalent to R's `lapply()`) yields an iterator, not a list.
Similarly, a tuple comprehension like `(elem for elem in x)` produces an
iterator. Most features dealing with files are iterators, and so on.
Any time you find an iterator inconvenient, you can materialize all the
elements into a list using the Python built-in `list()`, or
`reticulate::iterate()` in R. Also, if you like the readability of
`for`, you can utilize similar semantics to Python's `for` using
`coro::loop()`.
### `import` and Modules
In R, authors can bundle their code into shareable extensions called R
packages, and R users can access objects from R packages via `library()`
or `::`. In Python, authors bundle code into *modules*, and users access
modules using `import`. Consider the line:
``` python
import numpy
```
This statement has Python go out to the file system, find an installed
Python module named 'numpy', load it (commonly meaning: evaluate its
`__init__.py` file and construct a `module` type), and bind it to the
symbol `numpy`.
The closest equivalent to this in R might be:
``` r
dplyr <- loadNamespace("dplyr")
```
#### Where are modules found?
In Python, the file system locations where modules are searched can be
accessed (and modified) from the list found at `sys.path`. This is
Python's equivalent to R's `.libPaths()`. `sys.path` will typically
contain paths to the current working directory, the Python installation
which contains the built-in standard library, administrator installed
modules, user installed modules, values from environment variables like
`PYTHONPATH`, and any modifications made directly to `sys.path` by other
code in the current Python session (though this is relatively uncommon
in practice).
``` python
import sys
sys.path
#> ['', '/Users/tomasz/.pyenv/versions/3.12.4/bin', '/Users/tomasz/.pyenv/versions/3.12.4/lib/python312.zip', '/Users/tomasz/.pyenv/versions/3.12.4/lib/python3.12', '/Users/tomasz/.pyenv/versions/3.12.4/lib/python3.12/lib-dynload', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/site-packages', '/Users/tomasz/github/rstudio/reticulate/inst/python', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python312.zip', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/lib-dynload']
```
You can inspect where a module was loaded from by accessing the dunder
`__path__` or `__file__` (especially useful when troubleshooting
installation issues):
``` python
import os
os.__file__
#> '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/os.py'
numpy.__path__
#> ['/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/site-packages/numpy']
```
Once a module is loaded, you can access symbols from the module using
`.` (equivalent to `::`, or maybe `$.environment`, in R).
``` python
numpy.abs(-1)
#> 1
```
There is also special syntax for specifying the symbol a module is bound
to upon import, and for importing only some specific symbols.
``` python
import numpy # import
import numpy as np # import and bind to a custom symbol `np`
np is numpy # test for identicalness, similar to identical(np, numpy)
#> True
from numpy import abs # import only `numpy.abs`, bind it to `abs`
abs is numpy.abs
#> True
from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2`
abs2 is numpy.abs
#> True
```
If you're looking for the Python equivalent of R's `library()`, which
makes all of a package's exported symbols available, it might be using
`import` with a `*` wildcard, though it's relatively uncommon to do so.
The `*` wildcard will expand to include all the symbols in module, or
all the symbols listed in `__all__`, if it is defined.
``` python
from numpy import *
```
Python doesn't make a distinction like R does between package exported
and internal symbols. In Python, all module symbols are equal, though
there is the naming convention that intended-to-be-internal symbols are
prefixed with a single leading underscore. (Two leading underscores
invoke an advanced language feature called "name mangling", which is
outside the scope of this introduction).
### Integers and Floats
R users generally don't need to be aware of the difference between
integers and floating point numbers, but that's not the case in Python.
If this is your first exposure to numeric data types, here are the
essentials:
- integer types can only represent whole numbers like `1` or `2`, not
floating point numbers like `1.2`.
- floating-point types can represent any number, but with some degree
of imprecision.
In R, writing a bare literal number like `12` produces a floating point
type, whereas in Python, it produces an integer. You can produce an
integer literal in R by appending an `L`, as in `12L`. Many Python
functions expect integers, and will error when provided a float.
For example, say we have a Python function that expects an integer:
``` python
def a_strict_Python_function(x):
assert isinstance(x, int), "x is not an int"
print("Yay! x was an int")
```
When calling it from R, you must be sure to call it with an integer:
``` r
library(reticulate)
py$a_strict_Python_function(3) # error
#> x is not an int
py$a_strict_Python_function(3L) # success
#> Yay! x was an int
py$a_strict_Python_function(as.integer(3)) # success
#> Yay! x was an int
```
### What about R vectors?
R is a language designed for numerical computing first. Numeric vector
data types are baked deep into the R language, to the point that the
language doesn't even distinguish scalars from vectors. By comparison,
numerical computing capabilities in Python are generally provided by
third party packages (*modules*, in Python parlance).
In Python, the `numpy` module is most commonly used to handle contiguous
arrays of data. The closest equivalent to an R numeric vector is a numpy
array, or sometimes, a list of scalar numbers (some Pythonistas might
argue for `array.array()` here, but that's so rarely encountered in
actual Python code we don't mention it further).
Teaching the NumPy interface is beyond the scope of this primer, but
it's worth pointing out some potential tripping hazards for users
accustomed to R arrays:
- When indexing into multidimensional numpy arrays, trailing
dimensions can be omitted and are implicitly treated as missing. The
consequence is that iterating over arrays means iterating over the
first dimension. For example, this iterates over the rows of a
matrix.
``` python
import numpy as np
m = np.arange(12).reshape((3,4))
m
#> array([[ 0, 1, 2, 3],
#> [ 4, 5, 6, 7],
#> [ 8, 9, 10, 11]])
m[0, :] # first row
#> array([0, 1, 2, 3])
m[0] # also first row
#> array([0, 1, 2, 3])
for row in m:
print(row)
#> [0 1 2 3]
#> [4 5 6 7]
#> [ 8 9 10 11]
```
- Many numpy operations modify the array in place! This is surprising
to R users, who are used to the convenience and safety of R's
copy-on-modify semantics. Unfortunately, there is no simple scheme
or naming convention you can rely on to quickly determine if a
particular method modifies in-place or creates a new array copy. The
only reliable way is to consult the
[documentation](https://numpy.org/doc/stable/reference/index.html#reference),
and conduct small experiments at the `reticulate::repl_python()`.
### Decorators
Decorators are just functions that take a function as an argument, and
then typically returns another function. Any function can be invoked as
a decorator with the `@` syntax, which is just sugar for this simple
action:
``` python
def my_decorator(func):
func.x = "a decorator modified this function by adding an attribute `x`"
return func
def my_function(): pass
my_function = my_decorator(my_function)
# @ is just fancy syntax for the above two lines
@my_decorator
def my_function(): pass
```
One decorator you might encounter frequently is:
- `@property`, which automatically calls a class method when the
attribute is accessed (similar to `makeActiveBinding()` in R).
### `with` and context management
Any object that defines `__enter__` and `__exit__` methods implements
the "context" protocol, and can be passed to `with`. For example, here
is a custom implementation of a context manager that temporarily changes
the current working directory (equivalent to R's `withr::with_dir()`)
``` python
from os import getcwd, chdir
class wd_context:
def __init__(self, wd):
self.new_wd = wd
def __enter__(self):
self.original_wd = getcwd()
chdir(self.new_wd)
def __exit__(self, *args):
# __exit__ takes some additional argument that are commonly ignored
chdir(self.original_wd)
getcwd()
#> '/Users/tomasz/github/rstudio/reticulate/vignettes'
with wd_context(".."):
print("in the context, wd is:", getcwd())
#> in the context, wd is: /Users/tomasz/github/rstudio/reticulate
getcwd()
#> '/Users/tomasz/github/rstudio/reticulate/vignettes'
```
### Learning More
Hopefully, this short primer to Python has provided a good foundation
for confidently reading Python documentation and code, and using Python
modules from R via reticulate. Of course, there is much, much more to
learn about Python. Googling questions about Python reliably brings up
pages of results, but not always sorted in order of most useful. Blog
posts and tutorials targeting beginners can be valuable, but remember
that Python's official documentation is generally excellent, and it
should be your first destination when you have questions.
<https://docs.Python.org/3/>
<https://docs.Python.org/3/library/index.html>
To learn Python more fully, the built-in official tutorial is also
excellent and comprehensive (but does require a time commitment to get
value out of it) <https://docs.Python.org/3/tutorial/index.html>
Finally, don't forget to solidify your understanding by conducting small
experiments at the `reticulate::repl_python()`.
Thank you for reading!
|