1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
|
===========
Performance
===========
Computation costs for `static chunking` are barely measurable: As chunking does
not depend on the actual message but only its length, computation costs are
essentially limited to a single :code:`xrange` call.
`Content-defined chunking`, however, is expensive: The algorithm has to compute
hash values for rolling hash window contents at `every` byte position of the
message that is to be chunked. To minimize costs, fastchunking works as follows:
1. The message (fragment) is passed in its entirety to the C++ extension.
2. Chunking is performed within the C++ extension.
3. The resulting list of chunk boundaries is communicated back to Python and
converted into a Python list.
Based on a 100 MiB random content, the author measured the following throughput
on an Intel Core i7-4600U in a single, non-representative test run:
=========== ==========
chunk size throughput
=========== ==========
64 bytes 49 MiB/s
128 bytes 57 MiB/s
256 bytes 62 MiB/s
512 bytes 63 MiB/s
1024 bytes 67 MiB/s
2048 bytes 68 MiB/s
4096 bytes 70 MiB/s
8192 bytes 71 MiB/s
16384 bytes 71 MiB/s
32768 bytes 71 MiB/s
=========== ==========
|