File: bit-demo.Rmd

package info (click to toggle)
r-cran-bit 4.6.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,000 kB
  • sloc: ansic: 5,145; sh: 13; makefile: 6
file content (134 lines) | stat: -rw-r--r-- 1,994 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: "Demo of the bit package"
author: "Dr. Jens Oehlschlägel"
date: '`r Sys.Date()`'
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Demo of the bit package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, results = "hide", message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(bit)
.ff.is.available = requireNamespace("ff", quietly=TRUE) && packageVersion("ff") >= "4.0.0"
if (.ff.is.available) library(ff)
#tools::buildVignette("vignettes/bit-demo.Rmd")
#devtools::build_vignettes()
```

---

## bit type

Create a huge boolean vector (no NAs allowed)

```{r}
n <- 1e8
b1 <- bit(n)
b1
```

It costs only one bit per element

```{r}
object.size(b1) / n
```


A couple of standard methods work

```{r}
b1[10:30] <- TRUE
summary(b1)
```

Create a another boolean vector with TRUE in some different positions

```{r}
b2 <- bit(n)
b2[20:40] <- TRUE
b2
```

fast boolean operations

```{r}
b1 & b2
```

fast boolean operations

```{r}
summary(b1 & b2)
```


## bitwhich type

Since we have a very skewed distribution we may coerce to an even sparser representation

```{r}
w1 <- as.bitwhich(b1)
w2 <- as.bitwhich(b2)
object.size(w1) / n
```

and everything

```{r}
w1 & w2
```

works as expected

```{r}
summary(w1 & w2)
```


even mixing

```{r}
summary(b1 & w2)
```


## processing chunks

Many bit functions support a range restriction,

```{r}
summary(b1, range=c(1, 1000))
```

which is useful

```{r}
as.which(b1, range=c(1, 1000))
```

for filtered chunked looping

```{r}
lapply(chunk(from=1, to=n, length=10), function(i) as.which(b1, range=i))
```

over large ff vectors

```{r, eval=.ff.is.available}
options(ffbatchbytes=1024^3)
x <- ff(vmode="single", length=n)
x[1:1000] <- runif(1000)
lapply(chunk(x, length.out = 10), function(i) sum(x[as.hi(b1, range=i)]))
```

and wrap-up

```{r, eval=.ff.is.available}
delete(x)
rm(x, b1, b2, w1, w2, n)
```

for more info check the usage vignette