File: README.md

package info (click to toggle)
rust-yada 0.5.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 148 kB
  • sloc: makefile: 4
file content (102 lines) | stat: -rw-r--r-- 3,153 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# Yada: Yet Another Double-Array

[![crate-name at crates.io](https://img.shields.io/crates/v/yada.svg)](https://crates.io/crates/yada)
[![crate-name at docs.rs](https://docs.rs/yada/badge.svg)](https://docs.rs/yada)

Yada is a yet another double-array trie library aiming for fast search and
compact data representation.

## Features

- Build static double-array tries
  - Yada adopts the compact binary representation of double-array nodes like
  [Darts-clone](https://github.com/s-yata/darts-clone).
- Common prefix search
  - The method returns an `Iterator` that is an effective way to find multiple
  values without heap allocation.
- Exact match search
  - The method finds a value associated with an exact match key as a `Option`.

## Requirements

- Rust version >= 1.46.0 

## Usage

See also [example code](examples/build_and_search.rs) for more details.

### Build a double-array trie

```rust
use yada::builder::DoubleArrayBuilder;

// make a keyset which have key-value pairs
let keyset = &[
    ("a".as_bytes(), 0),
    ("ab".as_bytes(), 1),
    ("abc".as_bytes(), 2),
    ("b".as_bytes(), 3),
    ("bc".as_bytes(), 4),
    ("c".as_bytes(), 5),
];

// build a double-array trie binary
let da_bytes: Option<Vec<u8>> = DoubleArrayBuilder::build(keyset);
```

### Search entries by keys

```rust
use yada::DoubleArray;

// create a double-array trie instance
let da = DoubleArray::new(da_bytes.unwrap());

// exact match search
for (key, value) in keyset {
    assert_eq!(da.exact_match_search(key), Some(*value as u32));
}
assert_eq!(da.exact_match_search("abc".as_bytes()), Some(2));
assert_eq!(da.exact_match_search("abcd".as_bytes()), None);

// common prefix search
assert_eq!(
    da.common_prefix_search("abcd".as_bytes())
        .collect::<Vec<_>>(),
    vec![(0, 1), (1, 2), (2, 3)] // match "a", "ab", "abc", value and key length
);
assert_eq!(
    da.common_prefix_search("d".as_bytes()).collect::<Vec<_>>(),
    vec![] // don't match
);
```

## Limitations

- The value must be represented as a 31 bit unsigned integer, typed `u32`.
  - Yada uses the most significant bit (MSB) as a flag to distinguish between a value node and others.
- The offset of an double-array node is 29 bits wide, so it can represent up to
 ~536M nodes.
  - It means this limitation results in the size upper bound ~2GB of double-arrays.

## License

Licensed under either of

- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT license ([LICENSE-MIT](LICENSE-MIT))

at your option.

## Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.

## References

- [Aoe, J. An Efficient Digital Search Algorithm by Using a Double-Array Structure.
IEEE Transactions on Software Engineering. Vol. 15, 9 (Sep 1989). pp. 1066-1077.](https://ieeexplore.ieee.org/document/31365)
- [Darts: Double ARray Trie System](http://chasen.org/~taku/software/darts/)
- [Darts-clone: A clone of Darts (Double-ARray Trie System)](https://github.com/s-yata/darts-clone)