1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
|
## AWS C Compression
This is a cross-platform C99 implementation of compression algorithms such as
gzip, and huffman encoding/decoding. Currently only huffman is implemented.
## License
This library is licensed under the Apache 2.0 License.
## Usage
### Building
Note that aws-c-compression has a dependency on aws-c-common:
```
git clone git@github.com:awslabs/aws-c-common.git
cmake -DCMAKE_PREFIX_PATH=<install-path> -DCMAKE_INSTALL_PREFIX=<install-path> -S aws-c-common -B aws-c-common/build
cmake --build aws-c-common/build --target install
git clone git@github.com:awslabs/aws-c-compression.git
cmake -DCMAKE_PREFIX_PATH=<install-path> -DCMAKE_INSTALL_PREFIX=<install-path> -S aws-c-compression -B aws-c-compression/build
cmake --build aws-c-compression/build --target install
```
### Huffman
The Huffman implemention in this library is designed around the concept of a
generic "symbol coder" object, which defines how each symbol (value between 0
and 255) is encoded and decoded. This object looks like this:
```c
typedef struct aws_huffman_code (*aws_huffman_symbol_encoder)(uint8_t symbol, void *userdata);
typedef uint8_t (*aws_huffman_symbol_decoder)(uint32_t bits, uint8_t *symbol, void *userdata);
struct aws_huffman_symbol_coder {
aws_huffman_symbol_encoder encode;
aws_huffman_symbol_decoder decode;
void *userdata;
};
```
These callbacks may be implemented manually, or you may use the included
Huffman coder generator to generate one from a table definition file. The
generator expects to be called with the following arguments:
```shell
$ aws-c-compression-huffman-generator path/to/table.def path/to/generated.c coder_name
```
The table definition file should be in the following format:
```c
/* sym bits code len */
HUFFMAN_CODE( 0, "1100101110", 0x32e, 10)
HUFFMAN_CODE( 1, "1100101111", 0x32f, 10)
/* ... */
```
The HUFFMAN_CODE macro expects 4 arguments:
* sym: the symbol value [0-255]
* bits: the bits representing the symbol in string form
* code: the bits representing the symbol in numeric form
* len: the number of bits used to represent the symbol
> #### Note
> This file may also be `#include`d in the following way to generate a static
> list of codes:
> ```c
> /* Provides the HUFFMAN_CODE macro */
> #include <aws/testing/compression/huffman.h>
>
> static struct huffman_test_code_point code_points[] = {
> #include "test_huffman_static_table.def"
> };
> ```
This will emit a c file which exports a function with the following signiture:
```c
struct aws_huffman_symbol_coder *{coder_name}_get_coder();
```
Note that this function does not allocate, but maintains a static instance of
the coder.
An example implementation of this file is provided in
`tests/test_huffman_static_table.def`.
To use the coder, forward declare that function, and pass the result as the
second argument to `aws_huffman_encoder_init` and `aws_huffman_decoder_init`.
```c
struct aws_huffman_encoder encoder;
aws_huffman_encoder_init(&encoder, {coder_name}_get_coder());
struct aws_huffman_decoder decoder;
aws_huffman_decoder_init(&decoder, {coder_name}_get_coder())
```
#### Encoding
```c
/**
* Encode a symbol buffer into the output buffer.
*
* \param[in] encoder The encoder object to use
* \param[in] to_encode The symbol buffer to encode
* \param[in,out] length In: The length of to_decode. Out: The number of bytes read from to_encode
* \param[in] output The buffer to write encoded bytes to
* \param[in,out] output_size In: The size of output. Out: The number of bytes written to output
*
* \return AWS_OP_SUCCESS if encoding is successful, AWS_OP_ERR the code for the error that occured
*/
int aws_huffman_encode(struct aws_huffman_encoder *encoder, const char *to_encode, size_t *length, uint8_t *output, size_t *output_size);
```
The encoder is built to support partial encoding. This means that if there
isn't enough space in `output`, the encoder will encode as much as possible,
update `length` to indicate how much was consumed, `output_size` won't change,
and `AWS_ERROR_SHORT_BUFFER` will be raised. `aws_huffman_encode` may then be
called again like the following pseudo-code:
```c
void encode_and_send(const char *to_encode, size_t size) {
while (size > 0) {
uint8_t output[some_chunk_size];
size_t output_size = sizeof(output);
size_t bytes_read = size;
aws_huffman_encode(encoder, to_encode, &bytes_read, output, &output_size);
/* AWS_ERROR_SHORT_BUFFER was raised... */
send_output_to_someone_else(output, output_size);
to_encode += bytes_read;
size -= bytes_read;
}
/* Be sure to reset the encoder after use */
aws_huffman_encoder_reset(encoder);
}
```
`aws_huffman_encoder` also has a `uint8_t` field called `eos_padding` that
defines how any unwritten bits in the last byte of output are filled. The most
significant bits will used. For example, if the last byte contains only 3 bits
and `eos_padding` is `0b01010101`, `01010` will be appended to the byte.
#### Decoding
```c
/**
* Decodes a byte buffer into the provided symbol array.
*
* \param[in] decoder The decoder object to use
* \param[in] to_decode The encoded byte buffer to read from
* \param[in,out] length In: The length of to_decode. Out: The number of bytes read from to_decode
* \param[in] output The buffer to write decoded symbols to
* \param[in,out] output_size In: The size of output. Out: The number of bytes written to output
*
* \return AWS_OP_SUCCESS if encoding is successful, AWS_OP_ERR the code for the error that occured
*/
int aws_huffman_decode(struct aws_huffman_decoder *decoder, const uint8_t *to_decode, size_t *length, char *output, size_t *output_size);
```
The decoder is built to support partial encoding. This means that if there
isn't enough space in `output`, the decoder will decode as much as possible,
update `length` to indicate how much was consumed, `output_size` won't change,
and `AWS_ERROR_SHORT_BUFFER` will be raised. `aws_huffman_decode` may then be
called again like the following pseudo-code:
```c
void decode_and_send(const char *to_decode, size_t size) {
while (size > 0) {
uint8_t output[some_chunk_size];
size_t output_size = sizeof(output);
size_t bytes_read = size;
aws_huffman_decode(decoder, to_decode, &bytes_read, output, &output_size);
/* AWS_ERROR_SHORT_BUFFER was raised... */
send_output_to_someone_else(output, output_size);
to_decode += bytes_read;
size -= bytes_read;
}
/* Be sure to reset the decoder after use */
aws_huffman_decoder_reset(decoder);
}
```
Upon completion of a decode, the most significant bits of
`decoder->working_bits` will contain the final bits of `to_decode` that could
not match a symbol. This is useful for verifying the padding bits of a stream.
For example, to validate that a stream ends in all 1's (like HPACK requires),
you could do the following:
```c
AWS_ASSERT(decoder->working_bits == UINT64_MAX << (64 - decoder->num_bits));
```
|