File: compilation.rst

package info (click to toggle)
construct 2.10.58%2Bdfsg1-1
links: PTS, VCS
area: main
in suites: bullseye
size: 1,780 kB
sloc: python: 11,135; makefile: 132
file content (912 lines) | stat: -rw-r--r-- 69,111 bytes
======================
Compilation feature
======================

.. warning:: This feature is fully implemented but may not be fully mature.


Overall
=========

Construct 2.9 adds an experimental feature: compiling user made constructs into much faster (but less feature-rich) code. If you are familiar with Kaitai Struct, an alternative framework to Construct, Kaitai compiles yaml-based schemas into pure Python modules. Construct on the other hand, defines schemas in pure Python and compiles them into pure Python modules. Once you define a construct, you can use it to parse and build blobs without compilation. Compilation has only one purpose: performance.

It should be made clear that currently the compiler supports only parsing. Building and sizeof are deferred to original constructs, from which a compiled instance was made. Building support may be added in the future, depending on popularity of this feature. In that sense, perhaps the documentation should use the term "compiled parser" rather than "compiled construct".


Requirements
---------------

Compilation feature requires Construct 2.9, preferrably the newest version to date. More importantly, you should have a test suite of your own. Construct aims to be reliable, but the compiler makes some undocumented assumptions, and generates a code that "takes shortcuts". Since few checks are ommited by generated code, you should not use it to parse corrupted data.


Restrictions
---------------

Compiled classes only parse faster, building and sizeof defers to core classes

Sizeof is applied during compilation (not during parsing and building)

Lambdas (unlike this expressions) are not compilable.

Exceptions do not include `path` information

Struct Sequence FocusedSeq Union LazyStruct do not support `_subcons _stream` context entries

Parsed hooks are not supported, ignored


Compiling schemas
===================

Every construct (even those that do not compile) has a parameter-less `compile` method that returns also a construct (instance of Compiled class). It may be a good idea to compile something that is used for processing megabyte-sized data or millions of blobs. That compiled instance has `parse` and `build` methods just like the construct is was compiled from. Therefore, in your code, you can simply reassign the compiled instance over the original one.

>>> st = Struct("num" / Byte)
>>> st.parse(b"\x01")
Container(num=1)
>>> st = st.compile(filename="copyforinspection.py")
>>> st.parse(b"\x01")
Container(num=1)

Performance boost can be easily measured. This method also happens to be testing the correctness of the compiled parser, by making sure that both original and compiled instance parse into same results.

>>> print(st.benchmark(sampledata))
Timeit measurements:
parsing:           0.0000475557 sec/call
parsing compiled:  0.0000159182 sec/call
building:          0.0000591526 sec/call


Motivation
============

The code generated by compiler and core classes have essentially same functionality, but there is a noticable difference in performance. First half of performance boost is thanks to pre-processing, as shown in this chapter. Pre-processing means inserting constants instead of variable lookups, constants means just variables that are known at compile time. The second half is thanks to pypy. This chapter explains the performance difference by comparing `Struct FormatField BytesInteger Bytes` classes, including using the context. Example construct:

::

    Struct(
        "num8" / Int8ub,
        "num24" / Int24ub,
        "data" / Bytes(this.num8),
    )

Compiled parsing code:

::

    def read_bytes(io, count):
        assert count >= 0
        data = io.read(count)
        assert len(data) == count
        return data
    def parse_struct_1(io, this):
        this = Container(_ = this)
        try:
            this['num8'] = unpack('>B', read_bytes(io, 1))[0]
            this['num24'] = int.from_bytes(read_bytes(io, 3), byteorder='big', signed=False)
            this['data'] = read_bytes(io, this.num8)
        except StopIteration:
            pass
        del this['_']
        return this
    def parseall(io, this):
        return parse_struct_1(io, this)
    compiledschema = Compiled(None, None, parseall)

Non-compiled parsing code:

::

    def _read_stream(stream, length):
        if length < 0:
            raise StreamError("length must be non-negative, found %s" % length)
        try:
            data = stream.read(length)
        except Exception:
            raise StreamError("stream.read() failed, requested %s bytes" % (length,))
        if len(data) != length:
            raise StreamError("could not read enough bytes, expected %d, found %d" % (length, len(data)))
        return data

    class FormatField(Construct):
        def _parse(self, stream, context, path):
            data = _read_stream(stream, self.length)
            try:
                return struct.unpack(self.fmtstr, data)[0]
            except Exception:
                raise FormatFieldError("struct %r error during parsing" % self.fmtstr)

    class BytesInteger(Construct):
        def _parse(self, stream, context, path):
            length = self.length(context) if callable(self.length) else self.length
            data = _read_stream(stream, length)
            if self.swapped:
                data = data[::-1]
            return bytes2integer(data, self.signed)

    class Bytes(Construct):
        def _parse(self, stream, context, path):
            length = self.length(context) if callable(self.length) else self.length
            return _read_stream(stream, length)

    class Renamed(Subconstruct):
        def _parse(self, stream, context, path):
            path += " -> %s" % (self.name,)
            return self.subcon._parse(stream, context, path)

    class Struct(Construct):
        def _parse(self, stream, context, path):
            obj = Container()
            context = Container(_ = context)
            context._subcons = Container({sc.name:sc for sc in self.subcons if sc.name})
            for sc in self.subcons:
                try:
                    subobj = sc._parse(stream, context, path)
                    if sc.name:
                        obj[sc.name] = subobj
                        context[sc.name] = subobj
                except StopIteration:
                    break
            return obj


There are several "shortcuts" that the compiled code does:

Function calls are relatively expensive, so an inlined expression is faster than a function returning the same exact expression. Therefore FormatField compiles into `struct.unpack(..., read_bytes(io, ...))` directly.

Literals like 1 and '>B' are faster than object field lookup, dictionary lookup, or passing function arguments. Therefore each instance of FormatField compiles into a similar expression but with different format-strings and byte-counts inlined, usually literals.

Passing parameters to functions is slower than just referring to variables in same scope. Therefore, for example, compiled Struct creates "this" variable that is accessible to all expressions generated by subcons, as it exists in same scope, but core Struct would call subcon._parse and pass entire context as parameter value, regardless whether that subcon even uses a context (for example FormatField VarInt have no need for a context). Its similar but not exactly the same with "restream" function. The lambda in second parameter is rebounding `io` to a different object (a stream that gets created inside restream function). On the other hand, `this` is not rebounded, it exists in outer scope.

If statement (or conditional ternary operator) with two possible expressions and a condition that could be evaluated at compile-time is slower than just one or the other expression. Therefore, for example, BytesInteger does a lookup to check if field is swapped, but compiled BytesInteger simply inlines 'big' or 'little' literal. Moreover, Struct checks if each subcon has a name and then inserts a value into the context dictionary, but compiled Struct simply has an assignment or not. This shortcut also applies to most constructs, those that accept context lambdas as parameters. Generated classes do not need to check if a parameter is a constant or a lambda, because what gets emitted is either something like "1" which is a literal, or something like "this.field" which is an object lookup. Both are valid expressions and evaluate without red tape, or checks.

Looping over an iterable is slower than a block of code that accesses each item once. The reason its slower is that each iteration must fetch another item, and also check termination condition. Loop unrolling technique requires the iterable (or list rather) to be known at compile-time, which is the case with Struct and Sequence instances. Therefore, compiled Struct emits one line per subcon, but core Struct loops over its subcons.

Function calls that only defer to another function are only wasting CPU cycles. This relates specifically to Renamed class, which in compiled code emits same code as its subcon. Entire functionality of Renamed class (maintaining path information) is not supported in compiled code, where it would serve as mere subconstruct, just deferring to subcon.

Building two identical dictionaries is slower than building just one. Struct maintains two dictionaries (called obj and context) which differ only by _ key, but compiled Struct maintains only one dictionary and removes the _ key before returning it.

This expressions (not lambdas) are expensive to compute in regular code but something like "this.field" in a compiled code is merely one object field lookup. Same applies to `len_ obj_ list_` expressions since they share the implementation with `this` expression.

Container is an implementation of so called AttrDict. It captures access to its attributes (field in this.field) and treats it as dictionary key access (this.field becomes this["field"]). However, due to internal CPython drawbacks, capturing attribute access involves some red tape, unlike accessing keys, which is done directly. Therefore compiled Struct emits lines that assign to Container keys, not attributes.


Empirical evidence
---------------------

The "shortcuts" that are described above are not much, but amount to quite a large portion of actual run-time. In fact, they amount to about a third (31%) of entire run-time. Note that this benchmark includes only pure-python compile-time optimisations.

Notice that results are in microseconds (10**-6).

::

    -------------------------------- benchmark: 158 tests --------------------------------
    Name (time in us)                                  Min                StdDev          
    --------------------------------------------------------------------------------------
    test_class_array_parse                        284.7820 (74.05)       31.0403 (118.46) 
    test_class_array_parse_compiled                73.6430 (19.15)       10.7624 (41.07)  
    test_class_greedyrange_parse                  325.6610 (84.67)       31.8383 (121.50) 
    test_class_greedyrange_parse_compiled         300.9270 (78.24)       24.0149 (91.65)  
    test_class_repeatuntil_parse                   10.2730 (2.67)         0.8322 (3.18)   
    test_class_repeatuntil_parse_compiled           7.3020 (1.90)         1.3155 (5.02)   
    test_class_string_parse                        21.2270 (5.52)         1.3555 (5.17)   
    test_class_string_parse_compiled               18.9030 (4.91)         1.6023 (6.11)   
    test_class_cstring_parse                       10.9060 (2.84)         1.0971 (4.19)   
    test_class_cstring_parse_compiled               9.4050 (2.45)         1.6083 (6.14)   
    test_class_pascalstring_parse                   7.9290 (2.06)         0.4959 (1.89)   
    test_class_pascalstring_parse_compiled          6.6670 (1.73)         0.6601 (2.52)   
    test_class_struct_parse                        43.5890 (11.33)        4.4993 (17.17)  
    test_class_struct_parse_compiled               18.7370 (4.87)         2.0198 (7.71)   
    test_class_sequence_parse                      20.7810 (5.40)         2.6298 (10.04)  
    test_class_sequence_parse_compiled             11.9820 (3.12)         3.2669 (12.47)  
    test_class_union_parse                         91.0570 (23.68)       10.2126 (38.97)  
    test_class_union_parse_compiled                31.9240 (8.30)         3.5955 (13.72)  
    test_overall_parse                          3,200.7850 (832.23)     224.9197 (858.34) 
    test_overall_parse_compiled                 2,229.9610 (579.81)     118.2029 (451.09) 
    --------------------------------------------------------------------------------------

..
    -------------------------------- benchmark: 158 tests --------------------------------
    Name (time in us)                                  Min                StdDev          
    --------------------------------------------------------------------------------------
    test_class_aligned_build                        7.8420 (2.04)         0.8678 (3.31)   
    test_class_aligned_parse                        6.6060 (1.72)         0.6813 (2.60)   
    test_class_aligned_parse_compiled               5.3540 (1.39)         1.4117 (5.39)   
    test_class_array_build                        326.6060 (84.92)       38.4864 (146.87) 
    test_class_array_parse                        284.7820 (74.05)       31.0403 (118.46) 
    test_class_array_parse_compiled                73.6430 (19.15)       10.7624 (41.07)  
    test_class_bitsinteger_build                   19.5040 (5.07)         0.9291 (3.55)   
    test_class_bitsinteger_parse                   19.2790 (5.01)         3.8293 (14.61)  
    test_class_bitsinteger_parse_compiled          17.9910 (4.68)         4.5695 (17.44)  
    test_class_bitsswapped1_build                  20.2650 (5.27)         2.7666 (10.56)  
    test_class_bitsswapped1_parse                  18.8030 (4.89)         3.6720 (14.01)  
    test_class_bitsswapped1_parse_compiled         18.3760 (4.78)         3.1836 (12.15)  
    test_class_bitsswapped2_build                 860.2690 (223.68)      65.2748 (249.10) 
    test_class_bitsswapped2_parse                 810.8180 (210.82)     113.5936 (433.50) 
    test_class_bitwise1_build                      38.3340 (9.97)         2.8267 (10.79)  
    test_class_bitwise1_parse                      19.0340 (4.95)         1.6937 (6.46)   
    test_class_bitwise1_parse_compiled             18.3380 (4.77)         1.9169 (7.32)   
    test_class_bitwise2_build                   5,181.2200 (>1000.0)    176.1713 (672.30) 
    test_class_bitwise2_parse                   4,641.4420 (>1000.0)    149.0798 (568.92) 
    test_class_bytes_build                          5.2700 (1.37)         0.3894 (1.49)   
    test_class_bytes_parse                          4.3720 (1.14)         0.2620 (1.0)    
    test_class_bytes_parse_compiled                 4.3770 (1.14)         0.4845 (1.85)   
    test_class_bytesinteger_build                   7.1130 (1.85)         0.5597 (2.14)   
    test_class_bytesinteger_parse                   6.1550 (1.60)         0.8879 (3.39)   
    test_class_bytesinteger_parse_compiled          5.9690 (1.55)         0.8120 (3.10)   
    test_class_byteswapped1_build                   7.8880 (2.05)         1.6156 (6.17)   
    test_class_byteswapped1_parse                   6.6990 (1.74)         1.4248 (5.44)   
    test_class_byteswapped1_parse_compiled          5.8140 (1.51)         1.0893 (4.16)   
    test_class_bytewise1_build                     54.3910 (14.14)        3.5353 (13.49)  
    test_class_bytewise1_parse                     51.2590 (13.33)        4.9621 (18.94)  
    test_class_bytewise1_parse_compiled            51.1530 (13.30)        5.0922 (19.43)  
    test_class_bytewise2_build                  1,264.2500 (328.72)      76.9591 (293.69) 
    test_class_bytewise2_parse                  1,233.1150 (320.62)      65.5335 (250.09) 
    test_class_check_build                          7.7850 (2.02)         0.9710 (3.71)   
    test_class_check_parse                          7.5500 (1.96)         1.0495 (4.01)   
    test_class_check_parse_compiled                 5.7900 (1.51)         0.7776 (2.97)   
    test_class_computed_build                       6.7760 (1.76)         0.6328 (2.41)   
    test_class_computed_parse                       6.5940 (1.71)         0.6383 (2.44)   
    test_class_computed_parse_compiled              6.7670 (1.76)         0.7396 (2.82)   
    test_class_const_build                          5.8600 (1.52)         0.6461 (2.47)   
    test_class_const_parse                          4.8930 (1.27)         0.3691 (1.41)   
    test_class_const_parse_compiled                 4.6680 (1.21)         0.6549 (2.50)   
    test_class_cstring_build                        7.7910 (2.03)        32.0498 (122.31) 
    test_class_cstring_parse                       10.9060 (2.84)         1.0971 (4.19)   
    test_class_cstring_parse_compiled               9.4050 (2.45)         1.6083 (6.14)   
    test_class_default_build                        5.8910 (1.53)         0.7784 (2.97)   
    test_class_default_parse                        5.0430 (1.31)         0.5048 (1.93)   
    test_class_default_parse_compiled               4.7200 (1.23)         0.7015 (2.68)   
    test_class_enum_build                           6.4310 (1.67)         0.4820 (1.84)   
    test_class_enum_parse                           6.4100 (1.67)         0.2944 (1.12)   
    test_class_enum_parse_compiled                  4.9280 (1.28)         0.5852 (2.23)   
    test_class_flag_build                           4.7740 (1.24)         0.5016 (1.91)   
    test_class_flag_parse                           4.2450 (1.10)         0.8202 (3.13)   
    test_class_flag_parse_compiled                  4.4510 (1.16)         0.7262 (2.77)   
    test_class_flagsenum_build                      9.5940 (2.49)         2.3077 (8.81)   
    test_class_flagsenum_parse                     14.9890 (3.90)         1.1867 (4.53)   
    test_class_flagsenum_parse_compiled            12.5860 (3.27)         7.8440 (29.93)  
    test_class_focusedseq_build                    27.4290 (7.13)         3.5810 (13.67)  
    test_class_focusedseq_parse                    23.9230 (6.22)         2.9801 (11.37)  
    test_class_focusedseq_parse_compiled           11.4680 (2.98)         1.8008 (6.87)   
    test_class_formatfield_build                    5.3830 (1.40)         0.3952 (1.51)   
    test_class_formatfield_parse                    4.7820 (1.24)         0.3797 (1.45)   
    test_class_formatfield_parse_compiled           4.7870 (1.24)         0.7985 (3.05)   
    test_class_greedybytes_build                    3.9610 (1.03)         0.5677 (2.17)   
    test_class_greedybytes_parse                    3.8460 (1.0)          0.3800 (1.45)   
    test_class_greedybytes_parse_compiled           3.9150 (1.02)         0.4162 (1.59)   
    test_class_greedyrange_build                  328.9710 (85.54)       17.5818 (67.10)  
    test_class_greedyrange_parse                  325.6610 (84.67)       31.8383 (121.50) 
    test_class_greedyrange_parse_compiled         300.9270 (78.24)       24.0149 (91.65)  
    test_class_greedystring_build                   5.3440 (1.39)         0.6892 (2.63)   
    test_class_greedystring_parse                   5.0730 (1.32)         0.9543 (3.64)   
    test_class_greedystring_parse_compiled          4.5540 (1.18)         0.5366 (2.05)   
    test_class_hex_build                            4.6150 (1.20)         0.5106 (1.95)   
    test_class_hex_parse                            5.2830 (1.37)         0.8942 (3.41)   
    test_class_hex_parse_compiled                   3.9050 (1.02)         0.6158 (2.35)   
    test_class_hexdump_build                        4.6340 (1.20)         0.8433 (3.22)   
    test_class_hexdump_parse                        5.0960 (1.33)         1.0297 (3.93)   
    test_class_hexdump_parse_compiled               3.9120 (1.02)         0.7631 (2.91)   
    test_class_ifthenelse_build                     8.9100 (2.32)         0.9234 (3.52)   
    test_class_ifthenelse_parse                     8.3680 (2.18)         0.7548 (2.88)   
    test_class_ifthenelse_parse_compiled            6.7390 (1.75)         0.7323 (2.79)   
    test_class_mapping_build                        6.3000 (1.64)         0.9057 (3.46)   
    test_class_mapping_parse                        5.6000 (1.46)         1.6992 (6.48)   
    test_class_mapping_parse_compiled               4.9730 (1.29)         0.6396 (2.44)   
    test_class_namedtuple1_build                   18.0560 (4.69)         2.1252 (8.11)   
    test_class_namedtuple1_parse                   16.8770 (4.39)         2.5048 (9.56)   
    test_class_namedtuple1_parse_compiled           9.0800 (2.36)         1.3966 (5.33)   
    test_class_namedtuple2_build                   46.3020 (12.04)        4.8023 (18.33)  
    test_class_namedtuple2_parse                   34.1590 (8.88)         3.9813 (15.19)  
    test_class_namedtuple2_parse_compiled          16.1740 (4.21)         2.1471 (8.19)   
    test_class_numpy_build                        212.2070 (55.18)       19.0170 (72.57)  
    test_class_numpy_parse                        287.4910 (74.75)    1,033.8723 (>1000.0)
    test_class_numpy_parse_compiled               289.1160 (75.17)       31.5770 (120.50) 
    test_class_padded_build                         7.6610 (1.99)         1.0465 (3.99)   
    test_class_padded_parse                         6.5550 (1.70)         0.8192 (3.13)   
    test_class_padded_parse_compiled                5.3810 (1.40)         0.6683 (2.55)   
    test_class_padding_build                        6.1410 (1.60)         0.4382 (1.67)   
    test_class_padding_parse                        5.3390 (1.39)         0.3259 (1.24)   
    test_class_padding_parse_compiled               4.5490 (1.18)         0.6567 (2.51)   
    test_class_pascalstring_build                   9.0730 (2.36)         0.6574 (2.51)   
    test_class_pascalstring_parse                   7.9290 (2.06)         0.4959 (1.89)   
    test_class_pascalstring_parse_compiled          6.6670 (1.73)         0.6601 (2.52)   
    test_class_peek_build                          14.8610 (3.86)         1.5169 (5.79)   
    test_class_peek_parse                          19.3210 (5.02)         1.7638 (6.73)   
    test_class_peek_parse_compiled                 11.9050 (3.10)         1.2330 (4.71)   
    test_class_pickled_build                        5.5730 (1.45)         0.8605 (3.28)   
    test_class_pickled_parse                        8.1680 (2.12)         0.8642 (3.30)   
    test_class_pickled_parse_compiled               8.9110 (2.32)         1.5638 (5.97)   
    test_class_pointer_build                        7.2010 (1.87)         0.3975 (1.52)   
    test_class_pointer_parse                        6.3530 (1.65)         0.6129 (2.34)   
    test_class_pointer_parse_compiled               5.7300 (1.49)         0.6892 (2.63)   
    test_class_prefixed_build                       7.8600 (2.04)         0.4987 (1.90)   
    test_class_prefixed_parse                       6.8100 (1.77)         0.7110 (2.71)   
    test_class_prefixed_parse_compiled              6.1950 (1.61)         0.6435 (2.46)   
    test_class_prefixedarray_build                855.3260 (222.39)      55.4369 (211.56) 
    test_class_prefixedarray_parse                757.6910 (197.01)      49.8982 (190.42) 
    test_class_prefixedarray_parse_compiled       184.4760 (47.97)       14.9617 (57.10)  
    test_class_rawcopy_build1                      13.3870 (3.48)         2.1631 (8.25)   
    test_class_rawcopy_build2                      16.8280 (4.38)         3.4464 (13.15)  
    test_class_rawcopy_parse                       14.4990 (3.77)         1.3540 (5.17)   
    test_class_rawcopy_parse_compiled              14.9130 (3.88)         4.8756 (18.61)  
    test_class_rebuild_build                        5.8890 (1.53)         0.5504 (2.10)   
    test_class_rebuild_parse                        5.0030 (1.30)         0.6272 (2.39)   
    test_class_rebuild_parse_compiled               4.8300 (1.26)         0.5108 (1.95)   
    test_class_repeatuntil_build                   11.1090 (2.89)         0.8754 (3.34)   
    test_class_repeatuntil_parse                   10.2730 (2.67)         0.8322 (3.18)   
    test_class_repeatuntil_parse_compiled           7.3020 (1.90)         1.3155 (5.02)   
    test_class_select_build                        19.3270 (5.03)         2.1872 (8.35)   
    test_class_select_parse                         5.5500 (1.44)         0.5927 (2.26)   
    test_class_select_parse_compiled                5.9140 (1.54)         0.9409 (3.59)   
    test_class_sequence_build                      23.9440 (6.23)         3.7300 (14.23)  
    test_class_sequence_parse                      20.7810 (5.40)         2.6298 (10.04)  
    test_class_sequence_parse_compiled             11.9820 (3.12)         3.2669 (12.47)  
    test_class_string_build                         8.4160 (2.19)         0.5589 (2.13)   
    test_class_string_parse                        21.2270 (5.52)         1.3555 (5.17)   
    test_class_string_parse_compiled               18.9030 (4.91)         1.6023 (6.11)   
    test_class_struct_build                        49.0800 (12.76)        3.9414 (15.04)  
    test_class_struct_parse                        43.5890 (11.33)        4.4993 (17.17)  
    test_class_struct_parse_compiled               18.7370 (4.87)         2.0198 (7.71)   
    test_class_switch_build                         9.2500 (2.41)         0.4969 (1.90)   
    test_class_switch_parse                         8.4710 (2.20)         0.7958 (3.04)   
    test_class_switch_parse_compiled                7.1160 (1.85)         0.7794 (2.97)   
    test_class_timestamp1_build                     9.7510 (2.54)         1.0072 (3.84)   
    test_class_timestamp1_parse                    29.7140 (7.73)         2.7236 (10.39)  
    test_class_timestamp1_parse_compiled           30.2160 (7.86)         3.5592 (13.58)  
    test_class_timestamp2_build                   100.4570 (26.12)       15.4131 (58.82)  
    test_class_timestamp2_parse                   106.5390 (27.70)       12.0199 (45.87)  
    test_class_timestamp2_parse_compiled          107.6340 (27.99)       17.3917 (66.37)  
    test_class_union_build                         55.8850 (14.53)        6.5646 (25.05)  
    test_class_union_parse                         91.0570 (23.68)       10.2126 (38.97)  
    test_class_union_parse_compiled                31.9240 (8.30)         3.5955 (13.72)  
    test_class_varint_build                        14.9650 (3.89)         0.8179 (3.12)   
    test_class_varint_parse                        18.6660 (4.85)         1.6747 (6.39)   
    test_class_varint_parse_compiled               19.6660 (5.11)         5.0212 (19.16)  
    test_overall_build                          2,848.2370 (740.57)   5,609.2037 (>1000.0)
    test_overall_build_compiled                 2,852.9260 (741.79)     163.0128 (622.09) 
    test_overall_parse                          3,200.7850 (832.23)     224.9197 (858.34) 
    test_overall_parse_compiled                 2,229.9610 (579.81)     118.2029 (451.09) 
    --------------------------------------------------------------------------------------


Motivation, part 2
=====================

The second part of optimisation is just running the generated code on pypy. Since pypy is not using any type annotations, there is nothing to discuss in this chapter. The benchmark reflects the same code as in previous chapter, but ran on Pypy 2.7 rather than CPython 3.6.

Empirical evidence
---------------------

Notice that results are in nanoseconds (10**-9).

::

    ------------------------------------- benchmark: 152 tests ------------------------------------
    Name (time in ns)                                      Min                     StdDev          
    -----------------------------------------------------------------------------------------------
    test_class_array_parse                         11,042.9974 (103.52)       40,792.8559 (46.97)  
    test_class_array_parse_compiled                 9,088.0058 (85.20)        43,001.3909 (49.52)  
    test_class_greedyrange_parse                   14,402.0014 (135.01)       49,834.2047 (57.38)  
    test_class_greedyrange_parse_compiled           9,801.0059 (91.88)        39,296.4529 (45.25)  
    test_class_repeatuntil_parse                      318.4996 (2.99)          2,469.5524 (2.84)   
    test_class_repeatuntil_parse_compiled             309.3746 (2.90)        103,425.2134 (119.09) 
    test_class_string_parse                           966.8991 (9.06)        537,241.0095 (618.62) 
    test_class_string_parse_compiled                  726.6994 (6.81)          3,719.2657 (4.28)   
    test_class_cstring_parse                          782.2993 (7.33)          4,111.8970 (4.73)   
    test_class_cstring_parse_compiled                 591.1992 (5.54)        479,164.9746 (551.75) 
    test_class_pascalstring_parse                     465.0911 (4.36)          4,262.4397 (4.91)   
    test_class_pascalstring_parse_compiled            298.4118 (2.80)        122,279.2150 (140.80) 
    test_class_struct_parse                         2,633.9985 (24.69)        14,654.3095 (16.87)  
    test_class_struct_parse_compiled                  949.7991 (8.90)          4,228.2890 (4.87)   
    test_class_sequence_parse                       1,310.6008 (12.29)         5,811.8046 (6.69)   
    test_class_sequence_parse_compiled                732.2000 (6.86)          4,703.9483 (5.42)   
    test_class_union_parse                          5,619.9933 (52.69)        30,590.0630 (35.22)  
    test_class_union_parse_compiled                 2,699.9987 (25.31)        15,888.8206 (18.30)  
    test_overall_parse                          1,332,581.9891 (>1000.0)   2,274,995.4192 (>1000.0)
    test_overall_parse_compiled                   690,380.0095 (>1000.0)     602,697.9721 (694.00) 
    -----------------------------------------------------------------------------------------------

..
    ------------------------------------- benchmark: 152 tests ------------------------------------
    Name (time in ns)                                      Min                     StdDev          
    -----------------------------------------------------------------------------------------------
    test_class_aligned_build                          740.5994 (6.94)          4,143.5039 (4.77)   
    test_class_aligned_parse                          602.1000 (5.64)          4,001.4447 (4.61)   
    test_class_aligned_parse_compiled                 237.5240 (2.23)        233,368.4415 (268.72) 
    test_class_array_build                         12,085.9913 (113.30)    4,199,133.4429 (>1000.0)
    test_class_array_parse                         11,042.9974 (103.52)       40,792.8559 (46.97)  
    test_class_array_parse_compiled                 9,088.0058 (85.20)        43,001.3909 (49.52)  
    test_class_bitsinteger_build                    3,602.4940 (33.77)     1,177,244.9019 (>1000.0)
    test_class_bitsinteger_parse                    2,823.5008 (26.47)        14,156.0060 (16.30)  
    test_class_bitsinteger_parse_compiled           2,768.9966 (25.96)        14,832.6464 (17.08)  
    test_class_bitsswapped1_build                   5,726.9935 (53.69)        29,157.1889 (33.57)  
    test_class_bitsswapped1_parse                   6,172.9952 (57.87)        28,735.2233 (33.09)  
    test_class_bitsswapped1_parse_compiled          5,715.9923 (53.59)        26,115.4525 (30.07)  
    test_class_bitsswapped2_build                  38,265.0032 (358.72)       92,216.9408 (106.19) 
    test_class_bitsswapped2_parse                  36,199.9992 (339.36)       99,672.2831 (114.77) 
    test_class_bitwise1_build                       7,979.0043 (74.80)        18,320.0158 (21.10)  
    test_class_bitwise1_parse                       5,914.0002 (55.44)        15,593.2498 (17.96)  
    test_class_bitwise1_parse_compiled              5,969.9960 (55.97)        10,953.7787 (12.61)  
    test_class_bitwise2_build                     136,212.0092 (>1000.0)     126,711.5616 (145.91) 
    test_class_bitwise2_parse                     120,290.0021 (>1000.0)     100,256.6237 (115.44) 
    test_class_bytes_build                            106.6699 (1.0)          45,663.4740 (52.58)  
    test_class_bytes_parse                            166.0601 (1.56)         26,090.0331 (30.04)  
    test_class_bytes_parse_compiled                   172.6300 (1.62)         38,715.3059 (44.58)  
    test_class_bytesinteger_build                     440.4998 (4.13)          2,794.5403 (3.22)   
    test_class_bytesinteger_parse                     397.6915 (3.73)          2,760.2520 (3.18)   
    test_class_bytesinteger_parse_compiled            404.1537 (3.79)        314,221.4811 (361.82) 
    test_class_byteswapped1_build                     423.0011 (3.97)        439,883.6772 (506.52) 
    test_class_byteswapped1_parse                     700.1989 (6.56)          5,650.5263 (6.51)   
    test_class_byteswapped1_parse_compiled            467.4551 (4.38)        375,681.4718 (432.59) 
    test_class_bytewise1_build                     13,313.0088 (124.81)       40,142.8640 (46.22)  
    test_class_bytewise1_parse                     13,626.0060 (127.74)    2,380,928.9149 (>1000.0)
    test_class_bytewise1_parse_compiled            13,586.0028 (127.36)       35,062.2700 (40.37)  
    test_class_bytewise2_build                     72,109.9932 (676.01)       73,553.4202 (84.70)  
    test_class_bytewise2_parse                     66,791.9958 (626.16)      140,635.6099 (161.94) 
    test_class_check_build                            740.6998 (6.94)          4,307.2706 (4.96)   
    test_class_check_parse                            541.0999 (5.07)          3,440.5007 (3.96)   
    test_class_check_parse_compiled                   545.6997 (5.12)        679,945.6527 (782.95) 
    test_class_computed_build                         679.1000 (6.37)        605,315.9050 (697.01) 
    test_class_computed_parse                         526.0008 (4.93)          3,428.9984 (3.95)   
    test_class_computed_parse_compiled                552.2001 (5.18)          3,464.2913 (3.99)   
    test_class_const_build                            310.6879 (2.91)          2,745.9160 (3.16)   
    test_class_const_parse                            176.2500 (1.65)         79,386.8928 (91.41)  
    test_class_const_parse_compiled                   182.1501 (1.71)         94,547.7996 (108.87) 
    test_class_cstring_build                          491.0001 (4.60)          3,734.7308 (4.30)   
    test_class_cstring_parse                          782.2993 (7.33)          4,111.8970 (4.73)   
    test_class_cstring_parse_compiled                 591.1992 (5.54)        479,164.9746 (551.75) 
    test_class_default_build                          461.9995 (4.33)          3,437.9897 (3.96)   
    test_class_default_parse                          220.9200 (2.07)            875.7176 (1.01)   
    test_class_default_parse_compiled                 167.3000 (1.57)        115,216.5525 (132.67) 
    test_class_enum_build                             318.2495 (2.98)        329,774.1824 (379.73) 
    test_class_enum_parse                             216.3301 (2.03)         98,506.1576 (113.43) 
    test_class_enum_parse_compiled                    150.8200 (1.41)         56,082.0649 (64.58)  
    test_class_flag_build                             204.2799 (1.92)        130,206.5059 (149.93) 
    test_class_flag_parse                             153.9801 (1.44)        100,694.1426 (115.95) 
    test_class_flag_parse_compiled                    139.8900 (1.31)            868.4449 (1.0)    
    test_class_flagsenum_build                        573.3993 (5.38)          4,344.7692 (5.00)   
    test_class_flagsenum_parse                        652.1004 (6.11)        422,339.3586 (486.32) 
    test_class_flagsenum_parse_compiled               464.5461 (4.35)          3,596.9171 (4.14)   
    test_class_focusedseq_build                     2,233.9998 (20.94)         6,533.8875 (7.52)   
    test_class_focusedseq_parse                     1,345.1005 (12.61)         5,739.1458 (6.61)   
    test_class_focusedseq_parse_compiled              615.0003 (5.77)          3,967.2471 (4.57)   
    test_class_formatfield_build                      282.0557 (2.64)        286,541.4444 (329.95) 
    test_class_formatfield_parse                      237.0500 (2.22)         63,666.5654 (73.31)  
    test_class_formatfield_parse_compiled             154.2599 (1.45)         35,054.4102 (40.36)  
    test_class_greedybytes_build                      110.4000 (1.03)         89,466.1548 (103.02) 
    test_class_greedybytes_parse                      117.2700 (1.10)         94,205.4030 (108.48) 
    test_class_greedybytes_parse_compiled             118.3101 (1.11)         88,084.6992 (101.43) 
    test_class_greedyrange_build                   12,186.0066 (114.24)       37,782.4850 (43.51)  
    test_class_greedyrange_parse                   14,402.0014 (135.01)       49,834.2047 (57.38)  
    test_class_greedyrange_parse_compiled           9,801.0059 (91.88)        39,296.4529 (45.25)  
    test_class_greedystring_build                     348.3331 (3.27)          3,029.8253 (3.49)   
    test_class_greedystring_parse                     473.3645 (4.44)          3,041.7270 (3.50)   
    test_class_greedystring_parse_compiled            409.9241 (3.84)        387,658.3773 (446.38) 
    test_class_hex_build                              459.6355 (4.31)          4,006.9444 (4.61)   
    test_class_hex_parse                              291.4441 (2.73)        182,038.6025 (209.61) 
    test_class_hex_parse_compiled                     126.4800 (1.19)         84,815.3901 (97.66)  
    test_class_hexdump_build                          450.4157 (4.22)          3,790.8239 (4.37)   
    test_class_hexdump_parse                          284.8335 (2.67)        294,559.8261 (339.18) 
    test_class_hexdump_parse_compiled                 128.8101 (1.21)         78,435.0791 (90.32)  
    test_class_ifthenelse_build                       982.9993 (9.22)          4,688.0488 (5.40)   
    test_class_ifthenelse_parse                       851.1997 (7.98)        580,777.8856 (668.76) 
    test_class_ifthenelse_parse_compiled              733.0003 (6.87)          4,714.3734 (5.43)   
    test_class_mapping_build                          336.3336 (3.15)        419,990.5974 (483.61) 
    test_class_mapping_parse                          226.8000 (2.13)        111,247.9039 (128.10) 
    test_class_mapping_parse_compiled                 184.2000 (1.73)            872.1972 (1.00)   
    test_class_namedtuple1_build                      918.4005 (8.61)          3,765.2820 (4.34)   
    test_class_namedtuple1_parse                      673.6998 (6.32)          3,434.7049 (3.96)   
    test_class_namedtuple1_parse_compiled             610.4994 (5.72)        551,488.8854 (635.03) 
    test_class_namedtuple2_build                    3,212.0006 (30.11)        13,384.9602 (15.41)  
    test_class_namedtuple2_parse                    1,786.3000 (16.75)         4,818.3417 (5.55)   
    test_class_namedtuple2_parse_compiled             728.0993 (6.83)          3,332.2180 (3.84)   
    test_class_padded_build                           732.6991 (6.87)          3,967.5355 (4.57)   
    test_class_padded_parse                           583.3004 (5.47)          4,356.6780 (5.02)   
    test_class_padded_parse_compiled                  301.4703 (2.83)        305,922.3763 (352.26) 
    test_class_padding_build                          499.1823 (4.68)          3,525.5175 (4.06)   
    test_class_padding_parse                          350.1996 (3.28)        328,502.3785 (378.27) 
    test_class_padding_parse_compiled                 192.7000 (1.81)         82,517.9180 (95.02)  
    test_class_pascalstring_build                     483.4543 (4.53)        243,109.6546 (279.94) 
    test_class_pascalstring_parse                     465.0911 (4.36)          4,262.4397 (4.91)   
    test_class_pascalstring_parse_compiled            298.4118 (2.80)        122,279.2150 (140.80) 
    test_class_peek_build                             952.7997 (8.93)          6,047.5404 (6.96)   
    test_class_peek_parse                           1,454.3999 (13.63)       774,202.5660 (891.48) 
    test_class_peek_parse_compiled                    438.8183 (4.11)          3,811.7552 (4.39)   
    test_class_pointer_build                          576.9005 (5.41)          3,782.3046 (4.36)   
    test_class_pointer_parse                          377.6430 (3.54)        393,433.4406 (453.03) 
    test_class_pointer_parse_compiled                 210.3799 (1.97)            947.6097 (1.09)   
    test_class_prefixed_build                         888.7000 (8.33)          5,004.2176 (5.76)   
    test_class_prefixed_parse                         757.0008 (7.10)        524,495.2616 (603.95) 
    test_class_prefixed_parse_compiled                471.9080 (4.42)        439,226.7896 (505.76) 
    test_class_prefixedarray_build                 37,869.9915 (355.02)       59,808.3893 (68.87)  
    test_class_prefixedarray_parse                 29,731.0035 (278.72)   10,591,190.0651 (>1000.0)
    test_class_prefixedarray_parse_compiled        22,710.9995 (212.91)       65,049.0162 (74.90)  
    test_class_rawcopy_build1                       1,041.5999 (9.76)          5,312.0368 (6.12)   
    test_class_rawcopy_build2                       1,513.5010 (14.19)       931,668.4553 (>1000.0)
    test_class_rawcopy_parse                        1,064.9004 (9.98)          5,628.3455 (6.48)   
    test_class_rawcopy_parse_compiled                 669.7999 (6.28)          4,616.0835 (5.32)   
    test_class_rebuild_build                          409.5006 (3.84)          3,371.2846 (3.88)   
    test_class_rebuild_parse                          225.8090 (2.12)          1,961.0702 (2.26)   
    test_class_rebuild_parse_compiled                 164.7700 (1.54)         82,487.8733 (94.98)  
    test_class_repeatuntil_build                      475.6360 (4.46)          3,568.2374 (4.11)   
    test_class_repeatuntil_parse                      318.4996 (2.99)          2,469.5524 (2.84)   
    test_class_repeatuntil_parse_compiled             309.3746 (2.90)        103,425.2134 (119.09) 
    test_class_select_build                         7,528.9863 (70.58)        23,358.3203 (26.90)  
    test_class_select_parse                           395.7684 (3.71)        468,021.0341 (538.92) 
    test_class_select_parse_compiled                  194.6000 (1.82)            911.6117 (1.05)   
    test_class_sequence_build                       1,521.9004 (14.27)         6,600.0406 (7.60)   
    test_class_sequence_parse                       1,310.6008 (12.29)         5,811.8046 (6.69)   
    test_class_sequence_parse_compiled                732.2000 (6.86)          4,703.9483 (5.42)   
    test_class_string_build                           535.1001 (5.02)        289,163.7688 (332.97) 
    test_class_string_parse                           966.8991 (9.06)        537,241.0095 (618.62) 
    test_class_string_parse_compiled                  726.6994 (6.81)          3,719.2657 (4.28)   
    test_class_struct_build                         2,857.5014 (26.79)        16,764.1319 (19.30)  
    test_class_struct_parse                         2,633.9985 (24.69)        14,654.3095 (16.87)  
    test_class_struct_parse_compiled                  949.7991 (8.90)          4,228.2890 (4.87)   
    test_class_switch_build                         1,079.1002 (10.12)         4,754.6705 (5.47)   
    test_class_switch_parse                           948.8998 (8.90)          4,558.0161 (5.25)   
    test_class_switch_parse_compiled                  783.7996 (7.35)          4,640.9683 (5.34)   
    test_class_timestamp1_build                       771.2006 (7.23)          3,534.5051 (4.07)   
    test_class_timestamp1_parse                     2,018.1993 (18.92)         5,448.9309 (6.27)   
    test_class_timestamp1_parse_compiled            1,970.7004 (18.47)       891,363.4033 (>1000.0)
    test_class_timestamp2_build                     5,808.9936 (54.46)        28,921.4390 (33.30)  
    test_class_timestamp2_parse                     7,547.0016 (70.75)        38,718.9886 (44.58)  
    test_class_timestamp2_parse_compiled            7,391.9946 (69.30)        36,903.9105 (42.49)  
    test_class_union_build                          3,535.9990 (33.15)        17,829.5208 (20.53)  
    test_class_union_parse                          5,619.9933 (52.69)        30,590.0630 (35.22)  
    test_class_union_parse_compiled                 2,699.9987 (25.31)        15,888.8206 (18.30)  
    test_class_varint_build                           944.5997 (8.86)          5,002.7418 (5.76)   
    test_class_varint_parse                           861.3002 (8.07)          4,343.2995 (5.00)   
    test_class_varint_parse_compiled                  863.2996 (8.09)          4,426.6909 (5.10)   
    test_overall_build                            554,530.0082 (>1000.0)     475,067.7994 (547.03) 
    test_overall_build_compiled                   358,168.0066 (>1000.0)     127,081.1333 (146.33) 
    test_overall_parse                          1,332,581.9891 (>1000.0)   2,274,995.4192 (>1000.0)
    test_overall_parse_compiled                   690,380.0095 (>1000.0)     602,697.9721 (694.00) 
    -----------------------------------------------------------------------------------------------


Motivation, part 3
=====================

.. warning:: Benchmarks revealed that pypy makes the code run much faster than cython, therefore cython improvements were withdrawn, and compiler now generates pure python code that is compatible with Python 2 including pypy. This chapter is no longer relevant. It remained just for educational purposes.

This chapter talks about the second half of optimisation, which is due to Cython type annotations and type inference. I should state for the record, that I am no expert at Cython, and following explanatations are merely "the way I understand it". Please take that into account when reading it. Fourth example:

::

    Struct(
        "num1" / Int8ul,
        "num2" / Int24ul,
        "fixedarray1" / Array(3, Int8ul),
        "name1" / CString("utf8"),
    )

::

    cdef bytes read_bytes(io, int count):
        if not count >= 0: raise StreamError
        cdef bytes data = io.read(count)
        if not len(data) == count: raise StreamError
        return data
    cdef bytes parse_nullterminatedstring(io, int unitsize, bytes finalunit):
        cdef list result = []
        cdef bytes unit
        while True:
            unit = read_bytes(io, unitsize)
            if unit == finalunit:
                break
            result.append(unit)
        return b"".join(result)
    def parse_struct_1(io, this):
        this = Container(_ = this)
        try:
            this['num1'] = unpack('<B', read_bytes(io, 1))[0]
            this['num2'] = int.from_bytes(read_bytes(io, 3), byteorder='little', signed=False)
            this['fixedarray1'] = ListContainer((unpack('<B', read_bytes(io, 1))[0]) for i in range(3))
            this['name1'] = (parse_nullterminatedstring(io, 1, b'\x00')).decode('utf8')
            pass
        except StopIteration:
            pass
        del this['_']
        del this['_index']
        return this
    def parseall(io, this):
        return parse_struct_1(io, this)
    compiled = Compiled(None, None, parseall)


The primary cause of speedup in cython is this: if a variable is of known type, then operations on that variable can skip certain checks. If a variable is a pure python object, then those checks need to be added. A variable is considered of known type if either (1) its annotated like "cdef bytes data" or (2) its inferred like when using an annotated function call result like in "parse_nullterminatedstring(...).decode(...)" since "cdef bytes parse_nullterminatedstring(...)". If a variable is known to be a list, then calling "append" on it doesnt require checking if that object has such a method or matching signature (parameters). If a variable is known to be a bytes, then "len(data)" can be compiled into bytes-type length function, not a general-purpose length function that works on arbitrary objects, and also "unit == finalunit" can be compiled into bytes-type equality. If a variable is known to be a unicode, then ".decode('utf8')" can be compiled into str-type implementation. If cython knows that "struct.unpack" returns only tuples, then "...[0]" would compile into tuple-type getitem (index access). Examples are many, but the pattern is the same: type-specific code is faster than type-general code.

Second cause of speedup is due to special handling of integers. While most annotations like "cdef bytes" refer to specific albeit Python types, the "cdef int" actually does not refer to any Python type. It represents a C-integer which is allocated on the stack or in registers, unlike the other types which are allocated on the heap. All operations on C-integers are therefore much faster than on Python-integers. In example code, this affects "count >= 0" and "len(data) == count".


Empirical evidence
---------------------

Below micro-benchmarks show the difference between core classes and cython-compiled classes. Only those where performance boost was highest are listed (although they also happen to be the most important), some other classes have little speedup, and some have none.

Notice that results are in microseconds (10**-6).

::

    ------------------------------- benchmark: 152 tests -------------------------------
    Name (time in us)                                  Min              StdDev          
    ------------------------------------------------------------------------------------
    test_class_array_parse                        286.5460 (73.85)     42.8831 (89.84)  
    test_class_array_parse_compiled                30.7200 (7.92)       6.9577 (14.58)  
    test_class_greedyrange_parse                  320.9860 (82.73)     45.9480 (96.26)  
    test_class_greedyrange_parse_compiled         262.7010 (67.71)     36.4504 (76.36)  
    test_class_repeatuntil_parse                   10.1850 (2.63)       2.4147 (5.06)   
    test_class_repeatuntil_parse_compiled           6.8880 (1.78)       1.5471 (3.24)   
    test_class_string_parse                        20.4400 (5.27)       4.4044 (9.23)   
    test_class_string_parse_compiled                9.1470 (2.36)       2.2427 (4.70)   
    test_class_cstring_parse                       11.2290 (2.89)       1.6216 (3.40)   
    test_class_cstring_parse_compiled               5.6080 (1.45)       1.0321 (2.16)   
    test_class_pascalstring_parse                   7.8560 (2.02)       1.8567 (3.89)   
    test_class_pascalstring_parse_compiled          5.8910 (1.52)       0.9466 (1.98)   
    test_class_struct_parse                        44.1300 (11.37)      6.8434 (14.34)  
    test_class_struct_parse_compiled               16.9070 (4.36)       3.0500 (6.39)   
    test_class_sequence_parse                      21.5420 (5.55)       2.6852 (5.63)   
    test_class_sequence_parse_compiled             10.1530 (2.62)       2.1645 (4.53)   
    test_class_union_parse                         91.9150 (23.69)     10.7812 (22.59)  
    test_class_union_parse_compiled                22.5970 (5.82)      15.2649 (31.98)  
    test_overall_parse                          2,126.2570 (548.01)   255.0154 (534.27) 
    test_overall_parse_compiled                 1,124.9560 (289.94)   127.4730 (267.06) 
    ------------------------------------------------------------------------------------

..
    ------------------------------- benchmark: 152 tests -------------------------------
    Name (time in us)                                  Min              StdDev          
    ------------------------------------------------------------------------------------
    test_class_aligned_build                        7.8110 (2.01)       1.4475 (3.03)   
    test_class_aligned_parse                        6.7560 (1.74)       2.4557 (5.14)   
    test_class_aligned_parse_compiled               4.7080 (1.21)       1.0038 (2.10)   
    test_class_array_build                        331.7150 (85.49)     45.1915 (94.68)  
    test_class_array_parse                        286.5460 (73.85)     42.8831 (89.84)  
    test_class_array_parse_compiled                30.7200 (7.92)       6.9577 (14.58)  
    test_class_bitsinteger_build                   19.4150 (5.00)       6.0416 (12.66)  
    test_class_bitsinteger_parse                   19.2520 (4.96)       6.7657 (14.17)  
    test_class_bitsinteger_parse_compiled          17.4700 (4.50)      11.1148 (23.29)  
    test_class_bitsswapped1_build                  20.0300 (5.16)       3.5605 (7.46)   
    test_class_bitsswapped1_parse                  18.9740 (4.89)       3.1174 (6.53)   
    test_class_bitsswapped1_parse_compiled         17.4030 (4.49)       3.2099 (6.72)   
    test_class_bitsswapped2_build                 866.5650 (223.34)    99.0145 (207.44) 
    test_class_bitsswapped2_parse                 813.8270 (209.75)   104.6734 (219.29) 
    test_class_bitwise1_build                      38.7430 (9.99)       4.1560 (8.71)   
    test_class_bitwise1_parse                      18.8820 (4.87)       3.8922 (8.15)   
    test_class_bitwise1_parse_compiled             17.5770 (4.53)       2.1345 (4.47)   
    test_class_bitwise2_build                   5,249.8520 (>1000.0)  247.1093 (517.70) 
    test_class_bitwise2_parse                   4,650.4640 (>1000.0)  605.3646 (>1000.0)
    test_class_bytes_build                          5.3900 (1.39)       0.7781 (1.63)   
    test_class_bytes_parse                          4.4180 (1.14)       0.4773 (1.0)    
    test_class_bytes_parse_compiled                 4.0220 (1.04)       0.7253 (1.52)   
    test_class_bytesinteger_build                   7.1450 (1.84)       1.4272 (2.99)   
    test_class_bytesinteger_parse                   6.2820 (1.62)       1.4176 (2.97)   
    test_class_bytesinteger_parse_compiled          5.3420 (1.38)       1.8858 (3.95)   
    test_class_byteswapped1_build                   7.9820 (2.06)       1.5524 (3.25)   
    test_class_byteswapped1_parse                   6.6840 (1.72)       1.2694 (2.66)   
    test_class_byteswapped1_parse_compiled          4.9890 (1.29)       1.1038 (2.31)   
    test_class_bytewise1_build                     53.7710 (13.86)      5.8007 (12.15)  
    test_class_bytewise1_parse                     49.7540 (12.82)      7.8771 (16.50)  
    test_class_bytewise1_parse_compiled            48.5480 (12.51)      5.0040 (10.48)  
    test_class_bytewise2_build                  1,270.0850 (327.34)   116.3612 (243.78) 
    test_class_bytewise2_parse                  1,225.2780 (315.79)    99.7644 (209.01) 
    test_class_check_build                          7.9260 (2.04)       1.7875 (3.74)   
    test_class_check_parse                          7.7250 (1.99)       1.7400 (3.65)   
    test_class_check_parse_compiled                 5.8770 (1.51)       1.5456 (3.24)   
    test_class_computed_build                       6.9660 (1.80)       1.0798 (2.26)   
    test_class_computed_parse                       6.6770 (1.72)       1.6214 (3.40)   
    test_class_computed_parse_compiled              5.6290 (1.45)       0.9689 (2.03)   
    test_class_const_build                          5.9990 (1.55)       1.4849 (3.11)   
    test_class_const_parse                          4.8720 (1.26)       1.1863 (2.49)   
    test_class_const_parse_compiled                 4.2520 (1.10)       0.9856 (2.06)   
    test_class_cstring_build                        7.8570 (2.03)       1.2683 (2.66)   
    test_class_cstring_parse                       11.2290 (2.89)       1.6216 (3.40)   
    test_class_cstring_parse_compiled               5.6080 (1.45)       1.0321 (2.16)   
    test_class_default_build                        6.0770 (1.57)       1.2640 (2.65)   
    test_class_default_parse                        5.1160 (1.32)       1.1421 (2.39)   
    test_class_default_parse_compiled               4.4890 (1.16)       1.2474 (2.61)   
    test_class_enum_build                           6.3000 (1.62)       0.9694 (2.03)   
    test_class_enum_parse                           6.3900 (1.65)       0.9849 (2.06)   
    test_class_enum_parse_compiled                  4.5520 (1.17)       0.7292 (1.53)   
    test_class_flag_build                           4.7940 (1.24)       0.6771 (1.42)   
    test_class_flag_parse                           4.3500 (1.12)       0.6541 (1.37)   
    test_class_flag_parse_compiled                  4.1380 (1.07)       0.5723 (1.20)   
    test_class_flagsenum_build                      9.7270 (2.51)       1.1748 (2.46)   
    test_class_flagsenum_parse                     15.2000 (3.92)       2.1840 (4.58)   
    test_class_flagsenum_parse_compiled            11.6480 (3.00)       1.5491 (3.25)   
    test_class_focusedseq_build                    27.1080 (6.99)       6.3815 (13.37)  
    test_class_focusedseq_parse                    23.6720 (6.10)       3.4153 (7.16)   
    test_class_focusedseq_parse_compiled           10.7130 (2.76)       2.1026 (4.41)   
    test_class_formatfield_build                    5.3590 (1.38)       1.1223 (2.35)   
    test_class_formatfield_parse                    4.7750 (1.23)       0.8140 (1.71)   
    test_class_formatfield_parse_compiled           4.4370 (1.14)       0.9037 (1.89)   
    test_class_greedybytes_build                    4.0550 (1.05)       1.1607 (2.43)   
    test_class_greedybytes_parse                    3.8800 (1.0)        0.5046 (1.06)   
    test_class_greedybytes_parse_compiled           3.9690 (1.02)       1.1108 (2.33)   
    test_class_greedyrange_build                  332.8790 (85.79)     43.8336 (91.83)  
    test_class_greedyrange_parse                  320.9860 (82.73)     45.9480 (96.26)  
    test_class_greedyrange_parse_compiled         262.7010 (67.71)     36.4504 (76.36)  
    test_class_greedystring_build                   5.3930 (1.39)       0.7442 (1.56)   
    test_class_greedystring_parse                   5.0800 (1.31)       1.1375 (2.38)   
    test_class_greedystring_parse_compiled          4.6150 (1.19)       0.9228 (1.93)   
    test_class_hex_build                            4.5730 (1.18)       0.8108 (1.70)   
    test_class_hex_parse                            5.4210 (1.40)       0.9506 (1.99)   
    test_class_hex_parse_compiled                   4.0000 (1.03)       0.8198 (1.72)   
    test_class_hexdump_build                        4.5640 (1.18)       0.8572 (1.80)   
    test_class_hexdump_parse                        5.1660 (1.33)       0.8708 (1.82)   
    test_class_hexdump_parse_compiled               3.9460 (1.02)       0.8104 (1.70)   
    test_class_ifthenelse_build                     9.0200 (2.32)       3.1983 (6.70)   
    test_class_ifthenelse_parse                     8.5450 (2.20)       4.2003 (8.80)   
    test_class_ifthenelse_parse_compiled            6.4490 (1.66)       3.5984 (7.54)   
    test_class_mapping_build                        6.1160 (1.58)       0.9536 (2.00)   
    test_class_mapping_parse                        5.5320 (1.43)       0.9137 (1.91)   
    test_class_mapping_parse_compiled               4.5650 (1.18)       0.8350 (1.75)   
    test_class_namedtuple1_build                   18.3450 (4.73)       2.1664 (4.54)   
    test_class_namedtuple1_parse                   17.1850 (4.43)       2.9482 (6.18)   
    test_class_namedtuple1_parse_compiled           7.1810 (1.85)       1.0228 (2.14)   
    test_class_namedtuple2_build                   47.7850 (12.32)      6.1995 (12.99)  
    test_class_namedtuple2_parse                   34.4330 (8.87)       3.8498 (8.07)   
    test_class_namedtuple2_parse_compiled          15.4160 (3.97)       2.5158 (5.27)   
    test_class_numpy_build                        212.5540 (54.78)     27.0343 (56.64)  
    test_class_numpy_parse                        288.5380 (74.37)     45.4344 (95.19)  
    test_class_numpy_parse_compiled               290.8960 (74.97)    110.2389 (230.95) 
    test_class_padded_build                         7.7810 (2.01)       3.6378 (7.62)   
    test_class_padded_parse                         6.6460 (1.71)       1.2688 (2.66)   
    test_class_padded_parse_compiled                4.7090 (1.21)       1.2451 (2.61)   
    test_class_padding_build                        6.1880 (1.59)       1.4536 (3.05)   
    test_class_padding_parse                        5.4070 (1.39)       1.1753 (2.46)   
    test_class_padding_parse_compiled               4.1200 (1.06)       1.1916 (2.50)   
    test_class_pascalstring_build                   9.1680 (2.36)       1.4623 (3.06)   
    test_class_pascalstring_parse                   7.8560 (2.02)       1.8567 (3.89)   
    test_class_pascalstring_parse_compiled          5.8910 (1.52)       0.9466 (1.98)   
    test_class_peek_build                          14.8710 (3.83)       2.6207 (5.49)   
    test_class_peek_parse                          19.5870 (5.05)       3.6857 (7.72)   
    test_class_peek_parse_compiled                 10.6000 (2.73)       2.0105 (4.21)   
    test_class_pickled_build                        5.6150 (1.45)       1.2695 (2.66)   
    test_class_pickled_parse                        8.3370 (2.15)       1.5174 (3.18)   
    test_class_pickled_parse_compiled               8.9810 (2.31)       1.7670 (3.70)   
    test_class_pointer_build                        7.2470 (1.87)       1.3817 (2.89)   
    test_class_pointer_parse                        6.3760 (1.64)       1.2557 (2.63)   
    test_class_pointer_parse_compiled               5.0970 (1.31)       0.9715 (2.04)   
    test_class_prefixed_build                       7.8970 (2.04)       1.8404 (3.86)   
    test_class_prefixed_parse                       6.7860 (1.75)       1.3916 (2.92)   
    test_class_prefixed_parse_compiled              5.2350 (1.35)       1.3229 (2.77)   
    test_class_prefixedarray_build                873.1850 (225.05)    84.7384 (177.53) 
    test_class_prefixedarray_parse                763.2760 (196.72)    88.0787 (184.53) 
    test_class_prefixedarray_parse_compiled        79.4790 (20.48)     11.9930 (25.13)  
    test_class_rawcopy_build1                      13.8040 (3.56)       2.1913 (4.59)   
    test_class_rawcopy_build2                      16.9810 (4.38)       2.6092 (5.47)   
    test_class_rawcopy_parse                       15.2890 (3.94)       3.6678 (7.68)   
    test_class_rawcopy_parse_compiled              14.8570 (3.83)       2.6335 (5.52)   
    test_class_rebuild_build                        6.0380 (1.56)       1.2981 (2.72)   
    test_class_rebuild_parse                        5.1540 (1.33)       0.8264 (1.73)   
    test_class_rebuild_parse_compiled               4.5160 (1.16)       0.7145 (1.50)   
    test_class_repeatuntil_build                   11.0780 (2.86)       2.4318 (5.09)   
    test_class_repeatuntil_parse                   10.1850 (2.63)       2.4147 (5.06)   
    test_class_repeatuntil_parse_compiled           6.8880 (1.78)       1.5471 (3.24)   
    test_class_select_build                        19.1100 (4.93)       6.5128 (13.64)  
    test_class_select_parse                         5.6280 (1.45)       3.2641 (6.84)   
    test_class_select_parse_compiled                5.5660 (1.43)       3.7881 (7.94)   
    test_class_sequence_build                      24.5060 (6.32)       5.1873 (10.87)  
    test_class_sequence_parse                      21.5420 (5.55)       2.6852 (5.63)   
    test_class_sequence_parse_compiled             10.1530 (2.62)       2.1645 (4.53)   
    test_class_string_build                         8.5320 (2.20)       1.8491 (3.87)   
    test_class_string_parse                        20.4400 (5.27)       4.4044 (9.23)   
    test_class_string_parse_compiled                9.1470 (2.36)       2.2427 (4.70)   
    test_class_struct_build                        49.1730 (12.67)      5.5050 (11.53)  
    test_class_struct_parse                        44.1300 (11.37)      6.8434 (14.34)  
    test_class_struct_parse_compiled               16.9070 (4.36)       3.0500 (6.39)   
    test_class_switch_build                         9.5110 (2.45)       1.7349 (3.63)   
    test_class_switch_parse                         8.7100 (2.24)       1.9867 (4.16)   
    test_class_switch_parse_compiled                6.7830 (1.75)       1.1652 (2.44)   
    test_class_union_build                         57.0540 (14.70)     12.0599 (25.27)  
    test_class_union_parse                         91.9150 (23.69)     10.7812 (22.59)  
    test_class_union_parse_compiled                22.5970 (5.82)      15.2649 (31.98)  
    test_class_varint_build                        15.2000 (3.92)       3.2498 (6.81)   
    test_class_varint_parse                        18.9080 (4.87)       4.2807 (8.97)   
    test_class_varint_parse_compiled               19.6070 (5.05)       4.0409 (8.47)   
    test_overall_build                          1,970.9570 (507.98)   189.2782 (396.54) 
    test_overall_build_compiled                 1,987.8950 (512.35)   166.3636 (348.54) 
    test_overall_parse                          2,126.2570 (548.01)   255.0154 (534.27) 
    test_overall_parse_compiled                 1,124.9560 (289.94)   127.4730 (267.06) 
    ------------------------------------------------------------------------------------


Comparison with Kaitai Struct
================================

Kaitai Struct is a very respectable competitor, so I believe a benchmark-based comparison should be presented. Construct and Kaitai have very different capabilities: Kaitai supports about a dozen languages, Construct only supports Python, Kaitai offers only basic common features, Construct offers python-only stuff like Numpy and Pickle support, Kaitai does only parsing, Construct does also building. In a sense, those libraries are in two different categories (like sumo and karate). There are multiple scenarios where either library would not be usable.

Example used for comparison:

::

    Struct(
        "count" / Int32ul,
        "items" / Array(this.count, Struct(
            "num1" / Int8ul,
            "num2" / Int24ul,
            "flags" / BitStruct(
                "bool1" / Flag,
                "num4" / BitsInteger(3),
                Padding(4),
            ),
            "fixedarray1" / Array(3, Int8ul),
            "name1" / CString("utf8"),
            "name2" / PascalString(Int8ul, "utf8"),
        )),
    )

::

    meta:
      id: comparison_1_kaitai
      encoding: utf-8
      endian: le
    seq:
      - id: count
        type: u4
      - id: items
        repeat: expr
        repeat-expr: count
        type: item
    types:
      item:
        seq:
          - id: num1
            type: u1
          - id: num2_lo
            type: u2
          - id: num2_hi
            type: u1
          - id: flags
            type: flags
          - id: fixedarray1
            repeat: expr
            repeat-expr: 3
            type: u1
          - id: name1
            type: strz
          - id: len_name2
            type: u1
          - id: name2
            type: str
            size: len_name2
        instances:
          num2:
            value: 'num2_hi << 16 | num2_lo'
        types:
          flags:
            seq:
              - id: bool1
                type: b1
              - id: num4
                type: b3
              - id: padding
                type: b4


Suprisingly, Kaitai won the benchmark! Honestly, I am shocked and dismayed that it did. The only explanation that I can point out, is that Kaitai is parsing structs into class objects (with attributes) while Construct parses into dictionaries (with keys). However that one detail seems unlikely explanation for the huge discrepancy in benchmark results. Perhaps there is a flaw in the methodology. But until that is proven, Kaitai gets its respects. Congrats.

::

    $ python3.6 comparison_1_construct.py 
    Timeit measurements:
    parsing:           0.1024609069 sec/call
    parsing compiled:  0.0410809368 sec/call

    $ pypy comparison_1_construct.py 
    Timeit measurements:
    parsing:           0.0108308416 sec/call
    parsing compiled:  0.0062594243 sec/call

::

    $ python3.6 comparison_1_kaitai.py 
    Timeit measurements:
    parsing:           0.0250326035 sec/call

    $ pypy comparison_1_kaitai.py 
    Timeit measurements:
    parsing:           0.0019435351 sec/call