1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378
|
# libdeflate release notes
## Version 1.14
Significantly improved decompression performance on all platforms. Examples
include (measuring DEFLATE only):
| Platform | Speedup over v1.13 |
|------------------------------------|--------------------|
| x86_64 (Intel Comet Lake), gcc | 1.287x |
| x86_64 (Intel Comet Lake), clang | 1.437x |
| x86_64 (Intel Ice Lake), gcc | 1.332x |
| x86_64 (Intel Ice Lake), clang | 1.296x |
| x86_64 (Intel Sandy Bridge), gcc | 1.162x |
| x86_64 (Intel Sandy Bridge), clang | 1.092x |
| x86_64 (AMD Zen 2), gcc | 1.263x |
| x86_64 (AMD Zen 2), clang | 1.259x |
| i386 (Intel Comet Lake), gcc | 1.570x |
| i386 (Intel Comet Lake), clang | 1.344x |
| arm64 (Apple M1), clang | 1.306x |
| arm64 (Cortex-A76), clang | 1.355x |
| arm64 (Cortex-A55), clang | 1.190x |
| arm32 (Cortex-A76), clang | 1.665x |
| arm32 (Cortex-A55), clang | 1.283x |
Thanks to Dougall Johnson (https://dougallj.wordpress.com/) for ideas for many
of the improvements.
## Version 1.13
* Changed the 32-bit Windows build of the library to use the default calling
convention (cdecl) instead of stdcall, reverting a change from libdeflate 1.4.
* Fixed a couple macOS compatibility issues with the gzip program.
## Version 1.12
This release focuses on improving the performance of the CRC-32 and Adler-32
checksum algorithms on x86 and ARM (both 32-bit and 64-bit).
* Build updates:
* Fixed building libdeflate on Apple platforms.
* For Visual Studio builds, Visual Studio 2015 or later is now required.
* CRC-32 algorithm updates:
* Improved CRC-32 performance on short inputs on x86 and ARM.
* Improved CRC-32 performance on Apple Silicon Macs by using a 12-way pmull
implementation. Performance on large inputs on M1 is now about 67 GB/s,
compared to 8 GB/s before, or 31 GB/s with the Apple-provided zlib.
* Improved CRC-32 performance on some other ARM CPUs by reworking the code so
that multiple crc32 instructions can be issued in parallel.
* Improved CRC-32 performance on some x86 CPUs by increasing the stride length
of the pclmul implementation.
* Adler-32 algorithm updates:
* Improved Adler-32 performance on some x86 CPUs by optimizing the AVX-2
implementation. E.g., performance on Zen 1 improved from 19 to 30 GB/s, and
on Ice Lake from 35 to 41 GB/s (if the AVX-512 implementation is excluded).
* Removed the AVX-512 implementation of Adler-32 to avoid CPU frequency
downclocking, and because the AVX-2 implementation was made faster.
* Improved Adler-32 performance on some ARM CPUs by optimizing the NEON
implementation. E.g., Apple M1 improved from about 36 to 52 GB/s.
## Version 1.11
* Library updates:
* Improved compression performance slightly.
* Detect arm64 CPU features on Apple platforms, which should improve
performance in some areas such as CRC-32 computation.
* Program updates:
* The included `gzip` and `gunzip` programs now support the `-q` option.
* The included `gunzip` program now passes through non-gzip data when both
the `-f` and `-c` options are used.
* Build updates:
* Avoided a build error on arm32 with certain gcc versions, by disabling
building `crc32_arm()` as dynamically-dispatched code when needed.
* Support building with the LLVM toolchain on Windows.
* Disabled the use of the "stdcall" ABI in static library builds on Windows.
* Use the correct `install_name` in macOS builds.
* Support Haiku builds.
## Version 1.10
* Added an additional check to the decompressor to make it quickly detect
certain bad inputs and not try to generate an unbounded amount of output.
Note: this was only a problem when decompressing with an unknown output size,
which isn't the recommended use case of libdeflate. However,
`libdeflate-gunzip` has to do this, and it would run out of memory as it would
keep trying to allocate a larger output buffer.
* Fixed a build error on Solaris.
* Cleaned up a few things in the compression code.
## Version 1.9
* Made many improvements to the compression algorithms, and rebalanced the
compression levels:
* Heuristics were implemented which significantly improve the compression
ratio on data where short matches aren't useful, such as DNA sequencing
data. This applies to all compression levels, but primarily to levels 1-9.
* Level 1 was made much faster, though it often compresses slightly worse than
before (but still better than zlib).
* Levels 8-9 were also made faster, though they often compress slightly worse
than before (but still better than zlib). On some data, levels 8-9 are much
faster and compress much better than before; this change addressed an issue
where levels 8-9 did poorly on certain files. The algorithm used by levels
8-9 is now more similar to that of levels 6-7 than to that of levels 10-12.
* Levels 2-3, 7, and 10-12 were strengthened slightly.
* Levels 4-6 were also strengthened slightly, but some of this improvement was
traded off to speed them up slightly as well.
* Levels 1-9 had their per-compressor memory usage greatly reduced.
As always, compression ratios will vary depending on the input data, and
compression speeds will vary depending on the input data and target platform.
* `make install` will now install a pkg-config file for libdeflate.
* The Makefile now supports the `DISABLE_SHARED` parameter to disable building
the shared library.
* Improved the Android build support in the Makefile.
## Version 1.8
* Added `-t` (test) option to `libdeflate-gunzip`.
* Unaligned access optimizations are now enabled on WebAssembly builds.
* Fixed a build error when building with the Intel C Compiler (ICC).
* Fixed a build error when building with uClibc.
* libdeflate's CI system has switched from Travis CI to GitHub Actions.
* Made some improvements to test scripts.
## Version 1.7
* Added support for compression level 0, "no compression".
* Added an ARM CRC32 instruction accelerated implementation of CRC32.
* Added support for linking the programs to the shared library version of
libdeflate rather than to the static library version.
* Made the compression level affect the minimum input size at which compression
is attempted.
* Fixed undefined behavior in x86 Adler32 implementation. (No miscompilations
were observed in practice.)
* Fixed undefined behavior in x86 CPU feature code. (No miscompilations were
observed in practice.)
* Fixed installing shared lib symlink on macOS.
* Documented third-party bindings.
* Made a lot of improvements to the testing scripts and the CI configuration
file.
* Lots of other small improvements and cleanups.
## Version 1.6
* Prevented gcc 10 from miscompiling libdeflate (workaround for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).
* Removed workaround for gcc 5 and earlier producing slow code on ARM32. If
this affects you, please upgrade your compiler.
* New API function: `libdeflate_zlib_decompress_ex()`. It provides the actual
size of the stream that was decompressed, like the gzip and DEFLATE
equivalents.
* `libdeflate_zlib_decompress()` now accepts trailing bytes after the end of the
stream, like the gzip and DEFLATE equivalents.
* Added support for custom memory allocators. (New API function:
`libdeflate_set_memory_allocator()`)
* Added support for building the library in freestanding mode.
* Building libdeflate no longer requires `CPPFLAGS=-Icommon`.
## Version 1.5
* Fixed up stdcall support on 32-bit Windows: the functions are now exported
using both suffixed and non-suffixed names, and fixed `libdeflate.h` to be
MSVC-compatible again.
## Version 1.4
* The 32-bit Windows build of libdeflate now uses the "stdcall" calling
convention instead of "cdecl". If you're calling `libdeflate.dll` directly
from C or C++, you'll need to recompile your code. If you're calling it from
another language, or calling it indirectly using `LoadLibrary()`, you'll need
to update your code to use the stdcall calling convention.
* The Makefile now supports building libdeflate as a shared
library (`.dylib`) on macOS.
* Fixed a bug where support for certain optimizations and optional features
(file access hints and more precise timestamps) was incorrectly omitted when
libdeflate was compiled with `-Werror`.
* Added `make check` target to the Makefile.
* Added CI configuration files.
## Version 1.3
* `make install` now supports customizing the directories into which binaries,
headers, and libraries are installed.
* `make install` now installs into `/usr/local` by default. To change it, use
e.g. `make install PREFIX=/usr`.
* `make install` now works on more platforms.
* The Makefile now supports overriding the optimization flags.
* The compression functions now correctly handle an output data buffer >= 4 GiB
in size, and `gzip` and `gunzip` now correctly handle multi-gigabyte files (if
enough memory is available).
## Version 1.2
* Slight improvements to decompression speed.
* Added an AVX-512BW implementation of Adler-32.
* The Makefile now supports a user-specified installation `PREFIX`.
* Fixed build error with some Visual Studio versions.
## Version 1.1
* Fixed crash in CRC-32 code when the prebuilt libdeflate for 32-bit Windows was
called by a program built with Visual Studio.
* Improved the worst-case decompression speed of malicious data.
* Fixed build error when compiling for an ARM processor without hardware
floating point support.
* Improved performance on the PowerPC64 architecture.
* Added soname to `libdeflate.so`, to make packaging easier.
* Added `make install` target to the Makefile.
* The Makefile now supports user-specified `CPPFLAGS`.
* The Windows binary releases now include the import library for
`libdeflate.dll`. `libdeflate.lib` is now the import library, and
`libdeflatestatic.lib` is the static library.
## Version 1.0
* Added support for multi-member gzip files.
* Moved architecture-specific code into subdirectories. If you aren't using the
provided Makefile to build libdeflate, you now need to compile `lib/*.c` and
`lib/*/*.c` instead of just `lib/*.c`.
* Added an ARM PMULL implementation of CRC-32, which speeds up gzip compression
and decompression on 32-bit and 64-bit ARM processors that have the
Cryptography Extensions.
* Improved detection of CPU features, resulting in accelerated functions being
used in more cases. This includes:
* Detect CPU features on 32-bit x86, not just 64-bit as was done previously.
* Detect CPU features on ARM, both 32 and 64-bit. (Limited to Linux only
currently.)
## Version 0.8
* Build fixes for certain platforms and compilers.
* libdeflate now produces the same output on all CPU architectures.
* Improved documentation for building libdeflate on Windows.
## Version 0.7
* Fixed a very rare bug that caused data to be compressed incorrectly. The bug
affected compression levels 7 and below since libdeflate v0.2. Although there
have been no user reports of the bug, and I believe it would have been highly
unlikely to encounter on realistic data, it could occur on data specially
crafted to reproduce it.
* Fixed a compilation error when building with clang 3.7.
## Version 0.6
* Various improvements to the gzip program's behavior.
* Faster CRC-32 on AVX-capable processors.
* Other minor changes.
## Version 0.5
* The CRC-32 checksum algorithm has been optimized with carryless multiplication
instructions for `x86_64` (PCLMUL). This speeds up gzip compression and
decompression.
* Build fixes for certain platforms and compilers.
* Added more test programs and scripts.
* libdeflate is now entirely MIT-licensed.
## Version 0.4
* The Adler-32 checksum algorithm has been optimized with vector instructions
for `x86_64` (SSE2 and AVX2) and ARM (NEON). This speeds up zlib compression
and decompression.
* To avoid naming collisions, functions and definitions in libdeflate's API have
been renamed to be prefixed with `libdeflate_` or `LIBDEFLATE_`. Programs
using the old API will need to be updated.
* Various bug fixes and other improvements.
## Version 0.3
* Some bug fixes and other minor changes.
## Version 0.2
* Implemented a new block splitting algorithm which typically improves the
compression ratio slightly at all compression levels.
* The compressor now outputs each block using the cheapest type (dynamic
Huffman, static Huffman, or uncompressed).
* The gzip program has received an overhaul and now behaves more like the
standard version.
* Build system updates, including: some build options were changed and some
build options were removed, and the default 'make' target now includes the
gzip program as well as the library.
## Version 0.1
* Initial official release.
|