File: README.md

package info (click to toggle)
bbhash 1.0.0-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 184 kB
  • sloc: cpp: 1,813; makefile: 56; sh: 10
file content (62 lines) | stat: -rw-r--r-- 2,649 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
[![Build Status](https://travis-ci.org/rizkg/BBHash.svg?branch=master)](https://travis-ci.org/rizkg/BBHash)

# BBHash
BBHash is a simple library for building minimal perfect hash function.
It is designed to handle large scale datasets. The function is just a little bit larger than other state-of-the-art libraries, it takes approximately 3 bits / elements (compared to 2.62 bits/elem for the emphf lib), but construction is faster and does not require additional memory. 

It is easy to include in other projects (just include a single .h file) and has no dependencies.

# Citation
A preprint paper is available on arXiv: https://arxiv.org/abs/1702.03154

# Usage
Here is a simple example showing how to build and query a mphf with input keys in a std::vector<u_int64_t> . Input keys can also be read from a disk file, or from some user-defined iterator.

     #include "BooPHF.h"
     //tells bbhash to use included hash function working on u_int64_t input keys :
     typedef boomphf::SingleHashFunctor<u_int64_t>  hasher_t;
     typedef boomphf::mphf<  u_int64_t, hasher_t  > boophf_t;
     
    std::vector<u_int64_t> input_keys;
    //
    ... fill the input_keys vector
    //build the mphf  
    boophf_t * bphf = new boomphf::mphf<u_int64_t,hasher_t>(input_keys.size(),input_keys,nthreads);
     
     //query the mphf :
     uint64_t  idx = bphf->lookup(input_keys[0]);

# Types supported
The master branch works with Plain Old Data types only (POD). To work with other types, use the "alltypes" branch (it runs slighlty slower). The alltypes branch includes a sample code with strings.

# How to run test

A sample usage is provided in file example.cpp, compile and run with: ( params are nb_elements nb_threads)

    make
    ./example 100000000 1
    

File Bootest.cpp contains more options, use ./Bootest with  -check to check correctness of the hash function, and -bench to benchmark lookup performance.
    
Here is a sample output :
    
    $./Bootest 100000000 8 -check -bench
    key file generated 
    Construct a BooPHF with  100000000 elements  
    [Building BooPHF]  100  %   elapsed:   0 min 31 sec   remaining:   0 min 0  sec
    BooPHF constructed perfect hash for 100000000 keys in 30.95s
    Bitarray       305808384  bits (99.49 %)   (array + ranks )
    final hash       1557024  bits (0.51 %) (nb in final hash 4634)
    boophf  bits/elem : 3.073654
     --- boophf working correctly --- 
    bench lookups  sample size 9999872 
    BBhash bench lookups average 258.06 ns +- stddev  5.55 %   (fingerprint 5000111281850410) 




# Authors
Guillaume Rizk, Antoine Limasset, Rayan Chikhi

guillaume.rizk@algorizk.com