1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
|
---
title: "Radix trees in Rcpp"
author: "Oliver Keyes"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Radix trees in Rcpp}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
A **radix tree** is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values *associated* with those keys.
`triebeard` provides an implementation of radix trees for Rcpp (and also for use directly in R). To start using radix trees in your Rcpp development, simply modify your C++ file to include at the top:
```{Rcpp, eval=FALSE}
//[[Rcpp::depends(triebeard)]]
#include <radix.h>
```
## Constructing trees
Trees are constructed using the syntax:
```{Rcpp, eval=FALSE}
radix_tree<type1, type2> radix;
```
Where `type` represents the type of the keys (for example, `std::string`) and `type2` the type of the values.
Radix trees can have any scalar type as keys, although strings are most typical; they can also have any scalar type for values. Once you've constructed a tree, new entries can be added in a very R-like way: `radix[new_key] = new_value;`. Entries can also be removed, with `radix.erase(key)`.
## Matching against trees
We then move on to the fun bit: matching! As mentioned, radix trees are really good for matching arbitrary values against keys (well, keys of the same type) and retrieving the associated values.
There are three types of supported matching; longest, prefix, and greedy. Longest does exactly what it says on the tin: it finds the key-value pair where the longest initial part of the key matches the arbitrary value:
```{Rcpp, eval=FALSE}
radix_tree<std::string, std::string> radix;
radix["turnin"] = "entry the first";
radix["turin"] = "entry the second";
radix_tree<std::string, std::string>::iterator it;
it = radix.longest_match("turing");
if(it = radix.end()){
printf("No match was found :(");
} else {
std::string result = "Key of longest match: " + it->first + " , value of longest match: " + it->second;
}
```
Prefix matching provides all trie entries where the value-to-match is a *prefix* of the key:
```{Rcpp, eval=FALSE}
radix_tree<std::string, std::string> radix;
radix["turnin"] = "entry the first";
radix["turin"] = "entry the second";
std::vector<radix_tree<std::string, std::string>::iterator> vec;
std::vector<radix_tree<std::string, std::string>::iterator>::iterator it;
it = radix.prefix_match("tur");
if(it == vec.end()){
printf("No match was found :(");
} else {
for (it = vec.begin(); it != vec.end(); ++it) {
std::string result = "Key of a prefix match: " + it->first + " , value of a prefix match: " + it->second;
}
}
```
Greedy matching matches very, very fuzzily (a value of 'bring', for example, will match 'blind', 'bind' and 'binary') and, syntactically, looks exactly the same as prefix-matching, albeit with `radix.greedy_match()` instead of `radix.prefix_match()`.
### Other trie things
If you have ideas for other trie-like structures, or functions that would be useful with *these* tries, the best approach
is to either [request it](https://github.com/Ironholds/triebeard/issues) or [add it](https://github.com/Ironholds/triebeard/pulls)!
|