1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
|
### Introduction
Compact Encoding Detection(CED for short) is a library written in C++ that
scans given raw bytes and detect the most likely text encoding.
Basic usage:
```
#include "compact_enc_det/compact_enc_det.h"
const char* text = "Input text";
bool is_reliable;
int bytes_consumed;
Encoding encoding = CompactEncDet::DetectEncoding(
text, strlen(text),
nullptr, nullptr, nullptr,
UNKNOWN_ENCODING,
UNKNOWN_LANGUAGE,
CompactEncDet::WEB_CORPUS,
false,
&bytes_consumed,
&is_reliable);
```
### How to build
You need [CMake](https://cmake.org/) to build the package. After unzipping
the source code , run `autogen.sh` to build everything automatically.
The script also downloads [Google Test](https://github.com/google/googletest)
framework needed to build the unittest.
```
$ cd compact_enc_det
$ ./autogen.sh
...
$ bin/ced_unittest
```
On Windows, run `cmake .` to download the test framework, and generate
project files for Visual Studio.
```
D:\packages\compact_enc_det> cmake .
```
|