File: README.md

package info (click to toggle)
liblouis 3.36.0-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 86,248 kB
sloc: ansic: 37,162; makefile: 1,298; python: 772; lisp: 390; sh: 339; perl: 221; cpp: 21
file content (150 lines) | stat: -rw-r--r-- 5,697 bytes
parent folder | download | duplicates (4)
# Translation fuzzers

There are 3 fuzzers related to translation: [`fuzz_translate`](fuzz_translate.c) and
[`fuzz_translate_generic`](fuzz_translate_generic.c) target `lou_translateString`,
[`fuzz_backtranslate`](fuzz_backtranslate.c) fuzzes `lou_backTranslateString`.

`fuzz_translate` is run continuously through a Github workflow. `fuzz_translate_generic`
and `fuzz_backtranslate` are run continously through OSS-fuzz.

# Table fuzzer

[`table_fuzzer`](table_fuzzer.cc) fuzzes the `compileFile` function. It is run continously
through OSS-fuzz.

# Configure Liblouis for fuzzing

`./configure --with-fuzzer` will check if your are using clang and clang++ as compilers
(by looking at `CC` and `CXX`) and enables generation of compilation instructions for the
fuzzer targets (translation fuzzers only). The `--with-coverage` switch will add
`-fprofile-instr-generate -fcoverage-mapping` to `AM_CPPFLAGS` (in
[liblouis/Makefile.am](../../liblouis/Makefile.am) and
[tests/fuzzing/Makefile.am](Makefile.am)).

To configure and build the project with fuzzer and coverage:

```sh
./autogen.sh
export CC=clang
export CXX=clang++
./configure --with-coverage --with-fuzzer
make -j8
```

Now you are able to run the fuzzers.

# Run `fuzz_translate`

fuzz_translate requires two parameters. With the `FUZZ_TABLE` environment variable you set
the table(s) to fuzz. An input corpus can optionally be provided via the first argument. A
corpus is a directory with files containing sample inputs that will be used by libfuzzer
to craft the data passed to the fuzzing function. The idea is to keep corpus as minimal as
possible. If you don't provide a corpus directory, libfuzzer generates random inputs.

Here is an example:

```sh
# move to tests/fuzzing directory
cd tests/fuzzing

# we assume here you have added corpus files into the "CORPUS" directory
export FUZZ_TABLE=../../tables/en-us-g2.ctb
./fuzz_translate CORPUS/

# run the fuzzer using parallelization
# you can also set more jobs than workers (stopped jobs will instantly be replaced by a new fuzzer process)
./fuzz_translate CORPUS/ -workers=8 -jobs=8

# to start the fuzzer and let it generate random input (without corpus files):
# here we don't parallelize the process
./fuzz_translate
```

After running the fuzzer multiple times with the same corpus, it might be possible that
many corpus files added by the fuzzer explore the same code paths. Hopefully, libfuzzer
allows you to minimize a corpus. There is a simple bash script that allows you to do that:

```sh
./minimize-corpus.sh CORPUS/
```

If you have added a PoC file in the corpus directory that you want to keep intact, change
its extension to .txt and use the `--preserve-txt` switch:

```sh
./minimize-corpus.sh --preserve-txt CORPUS/
```

# Look at fuzzer coverage

If you want to see what are the code parts that are explored by the fuzzer, you can use
clang coverage. Configure with the `--with-overage` switch, run the fuzzer and show
coverage data from the run with llvm tools. To be able to use the coverage data, you first
need to compile the raw profile data file of the run. By default, this file is created
after execution under the name of default.profraw but you can specify it with
`LLVM_PROFILE_FILE`. Here is how to do that:

```
export LLVM_PROFILE_FILE=fuzz_translate.profraw
export FUZZ_TABLE=../../tables/en-us-g2.ctb
./fuzz_translate CORPUS/ -workers=8 -jobs=8

# wait for a bit and press Ctrl+C

# compile raw profile
llvm-profdata merge -sparse fuzz_translate.profraw -o fuzz_translate.profdata

# show coverage (red lines are the one wich are reached)
llvm-cov show .libs/fuzz_translate -instr-profile=fuzz_translate.profdata
```

# Continous integration

## OSS-Fuzz

OSS-Fuzz compiles and runs [`table_fuzzer`](table_fuzzer.cc),
[`fuzz_translate_generic`](fuzz_translate_generic.c) and
[`fuzz_backtranslate`](fuzz_backtranslate.c) using the Dockerfile and build.sh script from
the [Liblouis OSS-Fuzz project
folder](https://github.com/google/oss-fuzz/blob/master/projects/liblouis). The Dockerfile
clones the Liblouis repository and build.sh invokes
[tests/fuzzing/build.sh](test/fuzzing/build.sh).

Reported issues can be found in Google's [Monorail issue
tracker](https://bugs.chromium.org/p/oss-fuzz/issues/list?q=label%3AProj-liblouis). Note
that some issues are only visible for the maintainers.

### Reproducing OSS-Fuzz issues

The [OSS-Fuzz
documenation](https://google.github.io/oss-fuzz/advanced-topics/reproducing/#building-using-docker)
explains how to reproduce OSS-Fuzz issues using the reproducer (or testcase) files and
Docker.

If you have configured Liblouis for fuzzing (with clang) you can also reproduce issues
natively:

```sh
export CC=clang
export CXX=clang++
# optional: enable 'address' and 'undefined' sanitizers in order to be able to reproduce ASAN and UBSAN crashes
export CFLAGS="-fsanitize=address,undefined"
./configure --with-fuzzer
make
tests/fuzzing/fuzz_translate_generic <path/to/testcase_file>
```

## Github workflows

### [`fuzzing.yml`](../../.github/workflows/fuzzing.yml)

Builds and runs `fuzz_translate` for every table with extenion .utb or .ctb, and uploads
possible crash PoC's as artifacts. This happens once a week. See [the list of previous
runs](https://github.com/liblouis/liblouis/actions/workflows/fuzzing.yml).

### [`cifuzz.yml`](../../.github/workflows/cifuzz.yml)

This will do the same as OSS-Fuzz, but on Github's infrastructure. The idea is to be able
to catch fuzzing issues in an earlier stage. Possible crash PoC's are uploaded as
artifacts. The workflow is triggered on pull requests. See [the list of previous
runs](https://github.com/liblouis/liblouis/actions/workflows/cifuzz.yml).