File: optimizations.md

package info (click to toggle)
rustc 1.85.0%2Bdfsg2-3
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 893,176 kB
  • sloc: xml: 158,127; python: 35,830; javascript: 19,497; cpp: 19,002; sh: 17,245; ansic: 13,127; asm: 4,376; makefile: 1,051; lisp: 29; perl: 29; ruby: 19; sql: 11
file content (34 lines) | stat: -rw-r--r-- 1,529 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Optimizations
This document tracks which optimizations have been done after the initial implementation passed corpus tests and a good amount of fuzzing.

## Introducing more unsafe code:
These optimizations introduced more unsafe code. These should yield significant improvements, or else they are not really worth it.

### Optimizing bitreader with byteorder which uses ptr::copy_nonoverlapping
* Reverse bitreader_reversed::get_bits was identified by linux perf tool using about 36% of the whole time
* Benchmark: decode enwik9

* Before: about 14.7 seconds
* After: about 12.2 seconds with about 25% of the time used for get_bits()

### Optimizing decodebuffer::repeat with ptr::copy_nonoverlapping
* decodebuffer::repeate was identified by linux perf tool using about 28% of the whole time
* Benchmark: decode enwik9

* Before: about 9.9 seconds
* After: about 9.4 seconds

### Use custom ringbuffer in the decodebuffer
The decode buffer must be able to do two things efficiently
* Collect bytes from the front
* Copy bytes from the contents to the end

The stdlibs VecDequeu and Vec can each do one but not the other efficiently. So a custom implementation of a ringbuffer was written.

## Introducing NO additional unsafe code
These are just nice to have

### Even better bitreaders
Studying this material lead to a big improvement in bitreader speed
* https://fgiesen.wordpress.com/2018/02/19/reading-bits-in-far-too-many-ways-part-1/
* https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far-too-many-ways-part-2/