1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
|
[](https://travis-ci.org/kevinushey/sourcetools) [](https://ci.appveyor.com/project/kevinushey/sourcetools)
# sourcetools
Tools for reading, tokenizing, and (eventually) parsing `R` code.
## Getting Started
`sourcetools` is not yet on CRAN -- install with
```r
devtools::install_github("kevinushey/sourcetools")
```
## Reading
`sourcetools` comes with a couple fast functions for reading
files into `R`.
Use `read()` and `read_lines()` to quickly read a file into
`R` as character vectors. `read_lines()` handles both Windows
style `\r\n` line endings, as well as Unix-style `\n` endings.
```r
text <- replicate(10000, paste(sample(letters, 200, TRUE), collapse = ""))
file <- tempfile()
cat(text, file = file, sep = "\n")
mb <- microbenchmark::microbenchmark(times = 10,
readChar = readChar(file, file.info(file)$size, TRUE),
readLines = readLines(file),
read = read(file),
read_lines = read_lines(file)
)
print(mb, digits = 3)
```
```
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## readChar 5.2 6.54 10.5 7.02 8.73 36.56 10 ab
## readLines 155.9 159.69 162.4 161.95 163.15 171.76 10 c
## read 5.3 5.48 6.5 5.97 7.52 9.35 10 a
## read_lines 13.5 13.95 14.4 14.09 14.50 16.97 10 b
```
```r
unlink(file)
```
## Tokenization
`sourcetools` provides the `tokenize_string()` and
`tokenize_file()` functions for generating a tokenized
representation of R code. These produce 'raw' tokenized
representations of the code, with each token's value as a
string, and a recorded row, column, and type:
```r
tokenize_string("if (x < 10) 20")
```
```
## value row column type
## 1 if 1 1 keyword
## 2 1 3 whitespace
## 3 ( 1 4 bracket
## 4 x 1 5 symbol
## 5 1 6 whitespace
## 6 < 1 7 operator
## 7 1 8 whitespace
## 8 10 1 9 number
## 9 ) 1 11 bracket
## 10 1 12 whitespace
## 11 20 1 13 number
```
|