File: bit-demo.Rmd

package info (click to toggle)
r-cran-bit 4.0.4%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 996 kB
  • sloc: ansic: 5,083; makefile: 6
file content (137 lines) | stat: -rw-r--r-- 2,043 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Demo of the bit package"
author: "Dr. Jens Oehlschlägel"
date: '`r Sys.Date()`'
output:
  pdf_document:
    toc: yes
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Demo of the bit package} 
  %\VignetteEngine{knitr::rmarkdown} 
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, results = "hide", message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
require(bit)
.ff.version <- try(packageVersion("ff"), silent = TRUE)
.ff.is.available <- !inherits(.ff.version, "try-error") && .ff.version >= "4.0.0" && require(ff)
#tools::buildVignette("vignettes/bit-demo.Rmd")
#devtools::build_vignettes()
```

---

## bit type

Create a huge boolean vector (no NAs allowed)

```{r}
n <- 1e8
b1 <- bit(n)
b1
```

It costs only one bit per element

```{r}
object.size(b1)/n
```


A couple of standard methods work 

```{r}
b1[10:30] <- TRUE
summary(b1)
```

Create a another boolean vector with TRUE in some different positions

```{r}
b2 <- bit(n)
b2[20:40] <- TRUE
b2
```

fast boolean operations

```{r}
b1 & b2
```

fast boolean operations

```{r}
summary(b1 & b2)
```


## bitwhich type

Since we have a very skewed distribution we may coerce to an even sparser representation

```{r}
w1 <- as.bitwhich(b1) 
w2 <- as.bitwhich(b2)
object.size(w1)/n
```

and everything 

```{r}
w1 & w2
```

works as expected

```{r}
summary(w1 & w2)
```


even mixing

```{r}
summary(b1 & w2)
```


## processing chunks

Many bit functions support a range restriction, 

```{r}
summary(b1, range=c(1,1000))
```

which is useful 

```{r}
as.which(b1, range=c(1, 1000))
```

for filtered chunked looping 

```{r}
lapply(chunk(from=1, to=n, length=10), function(i)as.which(b1, range=i))
```

over large ff vectors

```{r, eval=.ff.is.available}
options(ffbatchbytes=1024^3)
x <- ff(vmode="single", length=n)
x[1:1000] <- runif(1000)
lapply(chunk(x, length.out = 10), function(i)sum(x[as.hi(b1, range=i)]))
```

and wrap-up

```{r, eval=.ff.is.available}
delete(x)
rm(x, b1, b2, w1, w2, n)
```

for more info check the usage vignette