1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349
|
---
title: "Managing an R Package's Python Dependencies"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Managing an R Package's Python Dependencies}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
::: {.alert .alert-warning}
> ## **⚠ Deprecated Vignette**
> This vignette is retained for reference but outlines an outdated method for managing Python dependencies in R packages.
>
> It is now recommended to use `py_require()` to specify Python dependencies.
> For details, see `?py_require()` and the [Using reticulate in an R Package](package.html) vignette.
:::
If you're writing an R package that uses `reticulate` as an interface to
a Python session, you likely also need one or more Python packages
installed on the user's machine for your package to work properly. In
addition, you'd likely prefer to spare users as much as possible from
details around how Python + `reticulate` are configured. This vignette
documents a few approaches for accomplishing these goals.
## Creating a "Pit of Success"
Overall, the goal of an R package author using reticulate is to create a
default experience that works reliably and doesn't require users to
intervene or to have a sophisticated understanding of Python
installation management. At the same time, it should also be easy to
adjust the default behavior. There are two key questions to keep in
mind:
- What will the default behavior be when the user expresses no
preference for any specific Python installation?
- How can users express a preference for a specific Python
installation that is satisfiable (and why would they want to)?
Packages like [tensorflow](https://tensorflow.rstudio.com) approach this
task by providing a helper function, `tensorflow::install_tensorflow()`,
and documenting that users can call this function to prepare the
environment. For example:
``` r
library(tensorflow)
install_tensorflow()
# use tensorflow
```
As a best practice, an R package's Python dependencies should default to
installing in an isolated virtual environment specifically designated
for the R package. This minimizes the risk of inadvertently disrupting
another Python installation on the user's system.
As an example, `install_tensorflow()` takes an argument `envname` with a
default value of `"r-tensorflow"`. This default value ensures that
`install_tensorflow()` will install into an environment named
`"r-tensorflow"`, optionally creating it as needed.
The counterpart to the default behavior of `install_tensorflow()` is the
work that happens in `tensorflow::.onLoad()`, where the R package
expresses a preference, on behalf of the user, to use the `r-tensorflow`
environment if it exists. Inside the package, these two parts work
together to create a "pit of success":
``` r
install_tensorflow <- function(..., envname = "r-tensorflow") {
reticulate::py_install("tensorflow", envname = envname, ...)
}
.onLoad <- function(...) {
use_virtualenv("r-tensorflow", required = FALSE)
}
```
The R package:
- in `.onLoad()` expresses to reticulate a soft preference for an
environment named "r-tensorflow", and
- with `install_tensorflow()`, provides a convenient way to make the
optional hint in `.onLoad()` actionable, by actually creating the
"r-tensorflow" environment.
With this setup, the default experience is for the user to call
`install_tensorflow()` once (creating a "r-tensorflow" environment).
Subsequently, calls to `library(tensorflow)` will cause reticulate to
use the `r-tensorflow` environment, and for everything to "just work".
The risk of disrupting another Python environment, or of this one being
disrupting, is minimal, since the environment is designated for the R
package. At the same time, if the environment is disrupted at some time
later (perhaps because something with conflicting Python dependencies
was manually installed), the user can easily revert to a working state
by calling `install_tensorflow()`.
Python environments can occasionally get into a broken state when
conflicting package versions are installed, and the most reliable way to
get back to a working state is to delete the environment and start over
with a fresh one. For this reason, `install_tensorflow()` removes any
pre-existing "r-tensorflow" Python environments first. Deleting a Python
environment however is not something to be done lightly, so the default
is to only delete the default "r-tensorflow" environment. Here is an
example of the helper `install_tensorflow()` with the "reset" behavior.
``` r
#' @importFrom reticulate py_install virtualenv_exists virtualenv_remove
install_tensorflow <-
function(...,
envname = "r-tensorflow",
new_env = identical(envname, "r-tensorflow")) {
if(new_env && virtualenv_exists(envname))
virtualenv_remove(envname)
py_install(packages = "tensorflow", envname = envname, ...)
}
```
## Managing Multiple Package Dependencies
One drawback of the isolated-package-environments approach is that if
multiple R packages using reticulate are in use, then those packages
won't all be able to use their preferred Python environment in the same
R session (since there can only be one active Python environment at a
time within an R session). To resolve this, users will have to take a
slightly more active role in managing their Python environments.
However, this can be as simple as supplying a unique environment name.
The most straightforward approach is for users to create a dedicated
Python environment for a specific project. For example, a user can
create a virtual environment in the project directory, like this:
``` r
envname <- "./venv"
tensorflow::install_tensorflow(envname = envname)
pysparklyr::install_pyspark(envname = envname)
```
As described in the [Order of Python Discovery](versions.html) guide,
reticulate will automatically discover and use a Python virtual
environment in the current working directory like this. Alternatively,
if the environment exists outside the project directory, the user could
then place an `.Renviron` or `.Rprofile` file in the project directory,
ensuring that reticulate will use always use the Python environment
configured for that project. For example, an `.Renviron` file in the
project directory could contain:
``` sh
RETICULATE_PYTHON_ENV=~/my/project/venv
```
Or an `.Rprofile` file in the project directory could contain:
``` r
Sys.setenv("RETICULATE_PYTHON_ENV" = "~/my/project/venv")
```
This approach minimizes the risk that an existing, already working,
Python environment will accidentally be broken by installing packages,
due to inadvertently upgrading or downgrading other Python packages
already installed in the environment.
Another approach is for users to install your R packages' Python
dependencies into another Python environment that is already on the
search path. For example, users can *opt-in* to installing into the
default `r-reticulate` venv:
``` r
tensorflow::install_tensorflow(envname = "r-reticulate")
```
Or they can install one package's dependencies into another package's
default environment. For example, installing spark into the default
`"r-tensorflow"` environment:
``` r
tensorflow::install_tensorflow() # creates an "r-tensorflow" env
pysparklyr::install_pyspark(envname = "r-tensorflow")
```
This approach---exporting an installation helper function that defaults
to a particular environment, and a hint in `.onLoad()` to use that
environment---is one way to create a "pit of success". It encourages a
default workflow that is robust and reliable, especially for users
not yet familiar with the mechanics of Python installation management.
At the same time, an installation helper function empowers users to
manage Python environments through simply providing an environment name.
It makes it easy to combine dependencies of multiple R packages, and,
should anything go wrong due to conflicting Python dependencies, it also
provides a straightforward way to revert to a working state at any time,
by calling the helper function without arguments.
## Automatic Configuration
An alternative approach to the one described above is to do automatic
configuration. It's possible for client packages to declare their Python
dependencies in such a way that they are automatically installed in the
currently activated Python environment. This is a maximally convenient
approach; when it works it can feel a little bit magical, but it is also
potentially dangerous and can result in frustration if something goes
wrong. You can opt in to this behavior as a package author through your
packages `DESCRIPTION` file, with the use of the `Config/reticulate`
field.
With automatic configuration, `reticulate` envisions a world wherein
different R packages wrapping Python packages can live together in the
same Python environment / R session. This approach only works when the
Python packages being wrapped don't have conflicting dependencies.
You must be a judge of the Python dependencies your R package
requires--if automatically bootstrapping an installation of the Python
package into the user's active Python environment, whatever it may
contain, is a safe action to perform by default. For example, this is
most likely a safe action for a Python package like `requests`, but
perhaps not a safe choice for a frequently updated package with many
dependencies, like `torch` or `tensorflow` (e.g., it's not uncommon for
`torch` and `tensorflow` to have conflicting version requirements for
dependencies like `numpy` or `cuda`). Keep in mind that, unlike CRAN,
PyPI does not perform any compatibility or consistency checks across the
package repository.
### Using `Config/reticulate`
As a package author, you can opt in to automatic configuration like
this. For example, if we had a package `rscipy` that acted as an
interface to the [SciPy](https://scipy.org) Python package, we might use
the following `DESCRIPTION` file:
```
Package: rscipy
Title: An R Interface to scipy
Version: 1.0.0
Description: Provides an R interface to the Python package scipy.
Config/reticulate:
list(
packages = list(
list(package = "scipy")
)
)
< ... other fields ... >
```
### Installation
With this, `reticulate` will take care of automatically configuring a
Python environment for the user when the `rscipy` package is loaded and
used (i.e. it's no longer necessary to provide the user with a special
`install_tensorflow()`-type function, though it's still recommended to
do so).
Specifically, after the `rscipy` package is loaded, the following will
occur:
1. Unless the user has explicitly instructed `reticulate` to use an
existing Python environment, `reticulate` will prompt the user to
download and install
[Miniconda](https://docs.conda.io/en/latest/miniconda.html) (if
necessary).
2. After this, when the Python session is initialized by `reticulate`,
all declared dependencies of loaded packages in `Config/reticulate`
will be discovered.
3. These dependencies will then be installed into an appropriate Conda
environment, as provided by the Miniconda installation.
In this case, the end user workflow will be exactly as with an R package
that has no Python dependencies:
``` r
library(rscipy)
# use the package
```
If the user has no compatible version of Python available on their
system, they will be prompted to install Miniconda. If they do have
Python already, then the required Python packages (in this case `scipy`)
will be installed in the standard shared environment for R sessions
(typically a virtual environment, or a Conda environment named
"r-reticulate").
In effect, users have to pay a one-time, mostly automated initialization
cost in order to use your package, and then things will work as any other
R package would. In particular, users are otherwise spared from details
about how `reticulate` works.
### `.onLoad` Configuration
In some cases, a user may try to load your package after Python has
already been initialized. To ensure that `reticulate` can still
configure the active Python environment, you can include the following
code:
``` r
.onLoad <- function(libname, pkgname) {
reticulate::configure_environment(pkgname)
}
```
This will instruct `reticulate` to immediately try to configure the
active Python environment, installing any required Python packages as
necessary.
## Versions
The goal of these mechanisms is to allow easy interoperability between R
packages that have Python dependencies, as well as to minimize
specialized version/configuration steps for end users. To that end,
`reticulate` will (by default) track an older version of Python than the
current release, giving Python packages time to adapt. Python 2 will not
be supported.
Tools for breaking these rules are not yet implemented, but will be
provided as the need arises.
## Format
Declared Python package dependencies should have the following format:
- **package**: The name of the Python package.
- **version**: The version of the package that should be installed.
When left unspecified, the latest available version will be
installed. This should only be set in exceptional cases---for
example, if the most recently-released version of a Python package
breaks compatibility with your package (or other Python packages) in
a fundamental way. If multiple R packages request different versions
of a particular Python package, `reticulate` will signal a warning.
- **pip**: Whether this package should be retrieved from the
[PyPI](https://pypi.org) using `pip`. If `FALSE`, it will be
downloaded from the Anaconda repositories instead.
For example, we could change the `Config/reticulate` directive from
above to specify that `scipy [1.3.0]` be installed from PyPI (with
`pip`):
```
Config/reticulate:
list(
packages = list(
list(package = "scipy", version = "1.3.0", pip = TRUE)
)
)
```
|