1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
|
---
title: "Demo of the bit package"
author: "Dr. Jens Oehlschlägel"
date: '`r Sys.Date()`'
output:
pdf_document:
toc: yes
toc_depth: 3
vignette: >
%\VignetteIndexEntry{Demo of the bit package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo = FALSE, results = "hide", message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
require(bit)
.ff.version <- try(packageVersion("ff"), silent = TRUE)
.ff.is.available <- !inherits(.ff.version, "try-error") && .ff.version >= "4.0.0" && require(ff)
#tools::buildVignette("vignettes/bit-demo.Rmd")
#devtools::build_vignettes()
```
---
## bit type
Create a huge boolean vector (no NAs allowed)
```{r}
n <- 1e8
b1 <- bit(n)
b1
```
It costs only one bit per element
```{r}
object.size(b1)/n
```
A couple of standard methods work
```{r}
b1[10:30] <- TRUE
summary(b1)
```
Create a another boolean vector with TRUE in some different positions
```{r}
b2 <- bit(n)
b2[20:40] <- TRUE
b2
```
fast boolean operations
```{r}
b1 & b2
```
fast boolean operations
```{r}
summary(b1 & b2)
```
## bitwhich type
Since we have a very skewed distribution we may coerce to an even sparser representation
```{r}
w1 <- as.bitwhich(b1)
w2 <- as.bitwhich(b2)
object.size(w1)/n
```
and everything
```{r}
w1 & w2
```
works as expected
```{r}
summary(w1 & w2)
```
even mixing
```{r}
summary(b1 & w2)
```
## processing chunks
Many bit functions support a range restriction,
```{r}
summary(b1, range=c(1,1000))
```
which is useful
```{r}
as.which(b1, range=c(1, 1000))
```
for filtered chunked looping
```{r}
lapply(chunk(from=1, to=n, length=10), function(i)as.which(b1, range=i))
```
over large ff vectors
```{r, eval=.ff.is.available}
options(ffbatchbytes=1024^3)
x <- ff(vmode="single", length=n)
x[1:1000] <- runif(1000)
lapply(chunk(x, length.out = 10), function(i)sum(x[as.hi(b1, range=i)]))
```
and wrap-up
```{r, eval=.ff.is.available}
delete(x)
rm(x, b1, b2, w1, w2, n)
```
for more info check the usage vignette
|