File: lazyeval-old.Rmd

package info (click to toggle)
r-cran-lazyeval 0.2.2-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, forky, sid, trixie
  • size: 596 kB
  • sloc: ansic: 310; sh: 9; makefile: 2
file content (212 lines) | stat: -rw-r--r-- 5,887 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: "Lazyeval: a new approach to NSE"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Lazyeval: a new approach to NSE}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
rownames(mtcars) <- NULL
```

This document outlines my previous approach to non-standard evaluation (NSE). You should avoid it unless you are working with an older version of dplyr or tidyr.

There are three key ideas:

* Instead of using `substitute()`, use `lazyeval::lazy()` to capture both expression
  and environment. (Or use `lazyeval::lazy_dots(...)` to capture promises in `...`)
  
* Every function that uses NSE should have a standard evaluation (SE) escape 
  hatch that does the actual computation. The SE-function name should end with 
  `_`.
  
* The SE-function has a flexible input specification to make it easy for people
  to program with.

## `lazy()`

The key tool that makes this approach possible is `lazy()`, an equivalent to `substitute()` that captures both expression and environment associated with a function argument:

```{r}
library(lazyeval)
f <- function(x = a - b) {
  lazy(x)
}
f()
f(a + b)
```

As a complement to `eval()`, the lazy package provides `lazy_eval()` that uses the environment associated with the lazy object:

```{r}
a <- 10
b <- 1
lazy_eval(f())
lazy_eval(f(a + b))
```

The second argument to lazy eval is a list or data frame where names should be looked up first:

```{r}
lazy_eval(f(), list(a = 1))
```

`lazy_eval()` also works with formulas, since they contain the same information as a lazy object: an expression (only the RHS is used by convention) and an environment:

```{r}
lazy_eval(~ a + b)
h <- function(i) {
  ~ 10 + i
}
lazy_eval(h(1))
```

## Standard evaluation

Whenever we need a function that does non-standard evaluation, always write the standard evaluation version first. For example, let's implement our own version of `subset()`:

```{r}
subset2_ <- function(df, condition) {
  r <- lazy_eval(condition, df)
  r <- r & !is.na(r)
  df[r, , drop = FALSE]
} 

subset2_(mtcars, lazy(mpg > 31))
```

`lazy_eval()` will always coerce it's first argument into a lazy object, so a variety of specifications will work:

```{r}
subset2_(mtcars, ~mpg > 31)
subset2_(mtcars, quote(mpg > 31))
subset2_(mtcars, "mpg > 31")
```

Note that quoted called and strings don't have environments associated with them, so `as.lazy()` defaults to using `baseenv()`. This will work if the expression is self-contained (i.e. doesn't contain any references to variables in the local environment), and will otherwise fail quickly and robustly.

## Non-standard evaluation

With the SE version in hand, writing the NSE version is easy. We just use `lazy()` to capture the unevaluated expression and corresponding environment:

```{r}
subset2 <- function(df, condition) {
  subset2_(df, lazy(condition))
}
subset2(mtcars, mpg > 31)
```

This standard evaluation escape hatch is very important because it allows us to implement different NSE approaches. For example, we could create a subsetting function that finds all rows where a variable is above a threshold:

```{r}
above_threshold <- function(df, var, threshold) {
  cond <- interp(~ var > x, var = lazy(var), x = threshold)
  subset2_(df, cond)
}
above_threshold(mtcars, mpg, 31)
```

Here we're using `interp()` to modify a formula. We use the value of `threshold` and the expression in  by `var`.

## Scoping

Because `lazy()` captures the environment associated with the function argument, we automatically avoid a subtle scoping bug present in `subset()`:
  
```{r}
x <- 31
f1 <- function(...) {
  x <- 30
  subset(mtcars, ...)
}
# Uses 30 instead of 31
f1(mpg > x)

f2 <- function(...) {
  x <- 30
  subset2(mtcars, ...)
}
# Correctly uses 31
f2(mpg > x)
```

`lazy()` has another advantage over `substitute()` - by default, it follows promises across function invocations. This simplifies the casual use of NSE.

```{r, eval = FALSE}
x <- 31
g1 <- function(comp) {
  x <- 30
  subset(mtcars, comp)
}
g1(mpg > x)
#> Error: object 'mpg' not found
```

```{r}
g2 <- function(comp) {
  x <- 30
  subset2(mtcars, comp)
}
g2(mpg > x)
```

Note that `g2()` doesn't have a standard-evaluation escape hatch, so it's not suitable for programming with in the same way that `subset2_()` is. 

## Chained promises

Take the following example:

```{r}
library(lazyeval)
f1 <- function(x) lazy(x)
g1 <- function(y) f1(y)

g1(a + b)
```

`lazy()` returns `a + b` because it always tries to find the top-level promise.

In this case the process looks like this:

1. Find the object that `x` is bound to.
2. It's a promise, so find the expr it's bound to (`y`, a symbol) and the
   environment in which it should be evaluated (the environment of `g()`).
3. Since `x` is bound to a symbol, look up its value: it's bound to a promise.
4. That promise has expression `a + b` and should be evaluated in the global
   environment.
5. The expression is not a symbol, so stop.

Occasionally, you want to avoid this recursive behaviour, so you can use `follow_symbol = FALSE`:

```{r}
f2 <- function(x) lazy(x, .follow_symbols = FALSE)
g2 <- function(y) f2(y)

g2(a + b)
```

Either way, if you evaluate the lazy expression you'll get the same result:

```{r}
a <- 10
b <- 1

lazy_eval(g1(a + b))
lazy_eval(g2(a + b))
```

Note that the resolution of chained promises only works with unevaluated objects. This is because R deletes the information about the environment associated with a promise when it has been forced, so that the garbage collector is allowed to remove the environment from memory in case it is no longer used. `lazy()` will fail with an error in such situations.

```{r, error = TRUE, purl = FALSE}
var <- 0

f3 <- function(x) {
  force(x)
  lazy(x)
}

f3(var)
```