File: bcstkprot.md

package info (click to toggle)
r-base 4.5.2-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 112,924 kB
sloc: ansic: 291,338; fortran: 111,889; javascript: 14,798; yacc: 6,154; sh: 5,689; makefile: 5,239; tcl: 4,562; perl: 963; objc: 791; f90: 758; asm: 258; java: 31; sed: 1
file content (149 lines) | stat: -rw-r--r-- 5,389 bytes
parent folder | download | duplicates (4)
---
title: "Protecting the BC Stack from Mutation"
author: Luke Tierney
output: html_document
---

## Background

In computing an expression like

```r
<expr1> + <expr2>
```

the value of `<expr1>` is computed first and then the value of `<expr2>`.
It is possible, though unusual, that the expression for computing
`<expr2>` might contain complex assignments. These assignments must not
mutate the computed value of `<expr1>`. In R 3.4.4 this was not handled
properly and this example produced the wrong result:

```r
x <- 1 + 0
x + { x[1] <- 2; 1}
## [1] 3  # should be 2
```

This issue was present in all calls to multi-argument `BUILTIN`
functions from the AST interpreter.  A similar issue existed in subset
operations, implemented by a `SPECIAL`:

```r
x <- matrix(1:4, 2)
i <- 1 + 0
x[i, { i[1]<-2; 1}]
## [1] 2  # this is m[2, 1]; should be 1
```

Similar issues also existed in byte compiled code where a complex
assignment could mutate values on the stack from earlier computations.

A variant of this issue was reported to the R-devel list by Lukas
Stadler in August 2017.


## Fixes in R 3.5.0 and R 3.6.0

These issues were partially addressed in R 3.5.0 and more completely
in R 3.6.0. The issue for `BUILTIN` calls is handled in `eval.c` and
needs no further action or maintenance elsewhere.

Implementation changes to `SPECIAL` functions that evaluate more than
one argument need to pay attention to this issue and make sure values
computed earlier cannot be mutated while evaluating later
arguments. Ths should be done using `INCREMENT_LINKS` and
`DECREMENT_LINKS`, e.g. for two arguments as

```c
val1 = eval(arg1, rho);
INCREMENT_LINKS(val1);
val2 = eval(arg2, rho);
DECREMENT_LINKS(val1);
```


## Changes in R 4.0.0

The fix for protecting the byte code stack used in R 3.6.0 required
the compiler to emit additional instructions protecting and
unprotecting values on the stack. This note describes an alternative
approach with less overhead that will be incorporated in R 4.0.0.
(Committed to R_devel in r77275.)

Legitimate mutations through R code can only occur in a few places:

- In byte code between `STARTASSIGN/ENDASSIGN` or
  `STARTASSIGN2/ENDASSIGN2` instructions.

- In calls to `applyDefine` from the AST interpreter.

The byte code interpreter also mutates values on the stack in a few
situations where this is known to be safe. One place is- in updating
the value of the loop variable in `for` loops for some data
types. This is safe for values that are not shared as they cannot also
appear lower on the stack.

To make sure that these operations do not mutate any boxed value on
the BC stack all values on the stack need to have their link counts
(NAMED or REFCNT) incremented and then decrement around these possible
mutation points.  This is done by calling

- `INCLNK_stack` before entering possibly mutating code;
- `DECLNK_stack` after leaving this code.

These functions use a stack pointer `R_BCProtStack` to keep track of
how much of the stack has been protected. The pointer is stored in
contexts and used to decrement link counts on a LONGJMP.

A key assumption is that none of the values on the protected stack
will be changed between matching `INCLNK_stack` and `DECLNK_stack`
operations. The binding cache and the variable value for a `for` loop
may need to change, but these also do not need to be protected. Their
stack fields are marked with the `NLNKSXP` tag so they will not have
their links incremented and decremented.

The stack is protected around every call to `applydefine`. This
ensures that values on the stack are protected during complex
assignment operations during calls into the AST interpreter.

To reduce link count adjustments for objects lower on the stack, the
stack protection pointer adjustment during byte compiled complex
assignments has several components:

- The stack protection pointer is raised at the beginning of a
  `bcEval` call and restored at the end. This means that top level
  complex assignments in byte compiled code need no further
  protection.
	
- The protection pointer does need to be raised around non-top-level
  complex assignments. This is done by `INCLNKSTK` and `DECLNKSTK`
  instructions emitted by the compiler for non-top-level complex
  assignments.

Together these ensure that the stack is protected during all byte
compiled complex assignments.

To further reduce unnecessary link count adjustments:

- For `for` loops the stack protection pointer is adjusted around the
  loop body to avoid processing the loop data for every stack
  protection pointer adjustment needed while executing the body.

- Similarly, the stack protection pointer is adjusted for loops
  needing a jump context to avoid repeated processing of the context
  data.

A final efficiency improvement is to defer actually committing the
link count increments until they are needed. This means that code
written in a functional style that does not use complex assignments
does not need to actually perform the link count adjustments.  The
points where count increments are committed are

- in `applydefine` for the AST interpreter;

- in the `STARTASSIGN` and `STARTASSIGN` instructions.

**Raising the minimal BC level will allow redoing the binding cache to
be much smaller, and allowing the non-small-cache option to be
dropped. The hacks for supporting old byte code can also be dropped at
that point.**