1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
|
---
title: "Cryptographic Hashing in R"
date: "`r Sys.Date()`"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Cryptographic Hashing in R}
\usepackage[utf8]{inputenc}
output:
html_document
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(comment = "")
library(openssl)
```
The functions `sha1`, `sha256`, `sha512`, `md4`, `md5` and `ripemd160` bind to the respective [digest functions](https://docs.openssl.org/1.1.1/man1/openssl-dgst/) in OpenSSL's libcrypto. Both binary and string inputs are supported and the output type will match the input type.
```{r}
md5("foo")
md5(charToRaw("foo"))
```
Functions are fully vectorized for the case of character vectors: a vector with n strings will return n hashes.
```{r}
# Vectorized for strings
md5(c("foo", "bar", "baz"))
```
Besides character and raw vectors we can pass a connection object (e.g. a file, socket or url). In this case the function will stream-hash the binary contents of the connection.
```{r}
# Stream-hash a file
myfile <- system.file("CITATION")
md5(file(myfile))
```
Same for URLs. The hash of the [`R-installer.exe`](https://cran.r-project.org/bin/windows/base/old/4.0.0/R-4.0.0-win.exe) below should match the one in [`md5sum.txt`](https://cran.r-project.org/bin/windows/base/old/4.0.0/md5sum.txt)
```{r eval=FALSE}
# Stream-hash from a network connection
as.character(md5(url("https://cran.r-project.org/bin/windows/base/old/4.0.0/R-4.0.0-win.exe")))
# Compare
readLines('https://cran.r-project.org/bin/windows/base/old/4.0.0/md5sum.txt')
```
## Compare to digest
Similar functionality is also available in the **digest** package, but with a slightly different interface:
```{r}
# Compare to digest
library(digest)
digest("foo", "md5", serialize = FALSE)
# Other way around
digest(cars, skip = 0)
md5(serialize(cars, NULL))
```
|