File: compilation.rst

package info (click to toggle)
construct 2.10.58%2Bdfsg1-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 1,780 kB
  • sloc: python: 11,135; makefile: 132
file content (912 lines) | stat: -rw-r--r-- 69,111 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
======================
Compilation feature
======================

.. warning:: This feature is fully implemented but may not be fully mature.


Overall
=========

Construct 2.9 adds an experimental feature: compiling user made constructs into much faster (but less feature-rich) code. If you are familiar with Kaitai Struct, an alternative framework to Construct, Kaitai compiles yaml-based schemas into pure Python modules. Construct on the other hand, defines schemas in pure Python and compiles them into pure Python modules. Once you define a construct, you can use it to parse and build blobs without compilation. Compilation has only one purpose: performance.

It should be made clear that currently the compiler supports only parsing. Building and sizeof are deferred to original constructs, from which a compiled instance was made. Building support may be added in the future, depending on popularity of this feature. In that sense, perhaps the documentation should use the term "compiled parser" rather than "compiled construct".


Requirements
---------------

Compilation feature requires Construct 2.9, preferrably the newest version to date. More importantly, you should have a test suite of your own. Construct aims to be reliable, but the compiler makes some undocumented assumptions, and generates a code that "takes shortcuts". Since few checks are ommited by generated code, you should not use it to parse corrupted data.


Restrictions
---------------

Compiled classes only parse faster, building and sizeof defers to core classes

Sizeof is applied during compilation (not during parsing and building)

Lambdas (unlike this expressions) are not compilable.

Exceptions do not include `path` information

Struct Sequence FocusedSeq Union LazyStruct do not support `_subcons _stream` context entries

Parsed hooks are not supported, ignored


Compiling schemas
===================

Every construct (even those that do not compile) has a parameter-less `compile` method that returns also a construct (instance of Compiled class). It may be a good idea to compile something that is used for processing megabyte-sized data or millions of blobs. That compiled instance has `parse` and `build` methods just like the construct is was compiled from. Therefore, in your code, you can simply reassign the compiled instance over the original one.

>>> st = Struct("num" / Byte)
>>> st.parse(b"\x01")
Container(num=1)
>>> st = st.compile(filename="copyforinspection.py")
>>> st.parse(b"\x01")
Container(num=1)

Performance boost can be easily measured. This method also happens to be testing the correctness of the compiled parser, by making sure that both original and compiled instance parse into same results.

>>> print(st.benchmark(sampledata))
Timeit measurements:
parsing:           0.0000475557 sec/call
parsing compiled:  0.0000159182 sec/call
building:          0.0000591526 sec/call


Motivation
============

The code generated by compiler and core classes have essentially same functionality, but there is a noticable difference in performance. First half of performance boost is thanks to pre-processing, as shown in this chapter. Pre-processing means inserting constants instead of variable lookups, constants means just variables that are known at compile time. The second half is thanks to pypy. This chapter explains the performance difference by comparing `Struct FormatField BytesInteger Bytes` classes, including using the context. Example construct:

::

    Struct(
        "num8" / Int8ub,
        "num24" / Int24ub,
        "data" / Bytes(this.num8),
    )

Compiled parsing code:

::

    def read_bytes(io, count):
        assert count >= 0
        data = io.read(count)
        assert len(data) == count
        return data
    def parse_struct_1(io, this):
        this = Container(_ = this)
        try:
            this['num8'] = unpack('>B', read_bytes(io, 1))[0]
            this['num24'] = int.from_bytes(read_bytes(io, 3), byteorder='big', signed=False)
            this['data'] = read_bytes(io, this.num8)
        except StopIteration:
            pass
        del this['_']
        return this
    def parseall(io, this):
        return parse_struct_1(io, this)
    compiledschema = Compiled(None, None, parseall)

Non-compiled parsing code:

::

    def _read_stream(stream, length):
        if length < 0:
            raise StreamError("length must be non-negative, found %s" % length)
        try:
            data = stream.read(length)
        except Exception:
            raise StreamError("stream.read() failed, requested %s bytes" % (length,))
        if len(data) != length:
            raise StreamError("could not read enough bytes, expected %d, found %d" % (length, len(data)))
        return data

    class FormatField(Construct):
        def _parse(self, stream, context, path):
            data = _read_stream(stream, self.length)
            try:
                return struct.unpack(self.fmtstr, data)[0]
            except Exception:
                raise FormatFieldError("struct %r error during parsing" % self.fmtstr)

    class BytesInteger(Construct):
        def _parse(self, stream, context, path):
            length = self.length(context) if callable(self.length) else self.length
            data = _read_stream(stream, length)
            if self.swapped:
                data = data[::-1]
            return bytes2integer(data, self.signed)

    class Bytes(Construct):
        def _parse(self, stream, context, path):
            length = self.length(context) if callable(self.length) else self.length
            return _read_stream(stream, length)

    class Renamed(Subconstruct):
        def _parse(self, stream, context, path):
            path += " -> %s" % (self.name,)
            return self.subcon._parse(stream, context, path)

    class Struct(Construct):
        def _parse(self, stream, context, path):
            obj = Container()
            context = Container(_ = context)
            context._subcons = Container({sc.name:sc for sc in self.subcons if sc.name})
            for sc in self.subcons:
                try:
                    subobj = sc._parse(stream, context, path)
                    if sc.name:
                        obj[sc.name] = subobj
                        context[sc.name] = subobj
                except StopIteration:
                    break
            return obj


There are several "shortcuts" that the compiled code does:

Function calls are relatively expensive, so an inlined expression is faster than a function returning the same exact expression. Therefore FormatField compiles into `struct.unpack(..., read_bytes(io, ...))` directly.

Literals like 1 and '>B' are faster than object field lookup, dictionary lookup, or passing function arguments. Therefore each instance of FormatField compiles into a similar expression but with different format-strings and byte-counts inlined, usually literals.

Passing parameters to functions is slower than just referring to variables in same scope. Therefore, for example, compiled Struct creates "this" variable that is accessible to all expressions generated by subcons, as it exists in same scope, but core Struct would call subcon._parse and pass entire context as parameter value, regardless whether that subcon even uses a context (for example FormatField VarInt have no need for a context). Its similar but not exactly the same with "restream" function. The lambda in second parameter is rebounding `io` to a different object (a stream that gets created inside restream function). On the other hand, `this` is not rebounded, it exists in outer scope.

If statement (or conditional ternary operator) with two possible expressions and a condition that could be evaluated at compile-time is slower than just one or the other expression. Therefore, for example, BytesInteger does a lookup to check if field is swapped, but compiled BytesInteger simply inlines 'big' or 'little' literal. Moreover, Struct checks if each subcon has a name and then inserts a value into the context dictionary, but compiled Struct simply has an assignment or not. This shortcut also applies to most constructs, those that accept context lambdas as parameters. Generated classes do not need to check if a parameter is a constant or a lambda, because what gets emitted is either something like "1" which is a literal, or something like "this.field" which is an object lookup. Both are valid expressions and evaluate without red tape, or checks.

Looping over an iterable is slower than a block of code that accesses each item once. The reason its slower is that each iteration must fetch another item, and also check termination condition. Loop unrolling technique requires the iterable (or list rather) to be known at compile-time, which is the case with Struct and Sequence instances. Therefore, compiled Struct emits one line per subcon, but core Struct loops over its subcons.

Function calls that only defer to another function are only wasting CPU cycles. This relates specifically to Renamed class, which in compiled code emits same code as its subcon. Entire functionality of Renamed class (maintaining path information) is not supported in compiled code, where it would serve as mere subconstruct, just deferring to subcon.

Building two identical dictionaries is slower than building just one. Struct maintains two dictionaries (called obj and context) which differ only by _ key, but compiled Struct maintains only one dictionary and removes the _ key before returning it.

This expressions (not lambdas) are expensive to compute in regular code but something like "this.field" in a compiled code is merely one object field lookup. Same applies to `len_ obj_ list_` expressions since they share the implementation with `this` expression.

Container is an implementation of so called AttrDict. It captures access to its attributes (field in this.field) and treats it as dictionary key access (this.field becomes this["field"]). However, due to internal CPython drawbacks, capturing attribute access involves some red tape, unlike accessing keys, which is done directly. Therefore compiled Struct emits lines that assign to Container keys, not attributes.


Empirical evidence
---------------------

The "shortcuts" that are described above are not much, but amount to quite a large portion of actual run-time. In fact, they amount to about a third (31%) of entire run-time. Note that this benchmark includes only pure-python compile-time optimisations.

Notice that results are in microseconds (10**-6).

::

    -------------------------------- benchmark: 158 tests --------------------------------
    Name (time in us)                                  Min                StdDev          
    --------------------------------------------------------------------------------------
    test_class_array_parse                        284.7820 (74.05)       31.0403 (118.46) 
    test_class_array_parse_compiled                73.6430 (19.15)       10.7624 (41.07)  
    test_class_greedyrange_parse                  325.6610 (84.67)       31.8383 (121.50) 
    test_class_greedyrange_parse_compiled         300.9270 (78.24)       24.0149 (91.65)  
    test_class_repeatuntil_parse                   10.2730 (2.67)         0.8322 (3.18)   
    test_class_repeatuntil_parse_compiled           7.3020 (1.90)         1.3155 (5.02)   
    test_class_string_parse                        21.2270 (5.52)         1.3555 (5.17)   
    test_class_string_parse_compiled               18.9030 (4.91)         1.6023 (6.11)   
    test_class_cstring_parse                       10.9060 (2.84)         1.0971 (4.19)   
    test_class_cstring_parse_compiled               9.4050 (2.45)         1.6083 (6.14)   
    test_class_pascalstring_parse                   7.9290 (2.06)         0.4959 (1.89)   
    test_class_pascalstring_parse_compiled          6.6670 (1.73)         0.6601 (2.52)   
    test_class_struct_parse                        43.5890 (11.33)        4.4993 (17.17)  
    test_class_struct_parse_compiled               18.7370 (4.87)         2.0198 (7.71)   
    test_class_sequence_parse                      20.7810 (5.40)         2.6298 (10.04)  
    test_class_sequence_parse_compiled             11.9820 (3.12)         3.2669 (12.47)  
    test_class_union_parse                         91.0570 (23.68)       10.2126 (38.97)  
    test_class_union_parse_compiled                31.9240 (8.30)         3.5955 (13.72)  
    test_overall_parse                          3,200.7850 (832.23)     224.9197 (858.34) 
    test_overall_parse_compiled                 2,229.9610 (579.81)     118.2029 (451.09) 
    --------------------------------------------------------------------------------------

..
    -------------------------------- benchmark: 158 tests --------------------------------
    Name (time in us)                                  Min                StdDev          
    --------------------------------------------------------------------------------------
    test_class_aligned_build                        7.8420 (2.04)         0.8678 (3.31)   
    test_class_aligned_parse                        6.6060 (1.72)         0.6813 (2.60)   
    test_class_aligned_parse_compiled               5.3540 (1.39)         1.4117 (5.39)   
    test_class_array_build                        326.6060 (84.92)       38.4864 (146.87) 
    test_class_array_parse                        284.7820 (74.05)       31.0403 (118.46) 
    test_class_array_parse_compiled                73.6430 (19.15)       10.7624 (41.07)  
    test_class_bitsinteger_build                   19.5040 (5.07)         0.9291 (3.55)   
    test_class_bitsinteger_parse                   19.2790 (5.01)         3.8293 (14.61)  
    test_class_bitsinteger_parse_compiled          17.9910 (4.68)         4.5695 (17.44)  
    test_class_bitsswapped1_build                  20.2650 (5.27)         2.7666 (10.56)  
    test_class_bitsswapped1_parse                  18.8030 (4.89)         3.6720 (14.01)  
    test_class_bitsswapped1_parse_compiled         18.3760 (4.78)         3.1836 (12.15)  
    test_class_bitsswapped2_build                 860.2690 (223.68)      65.2748 (249.10) 
    test_class_bitsswapped2_parse                 810.8180 (210.82)     113.5936 (433.50) 
    test_class_bitwise1_build                      38.3340 (9.97)         2.8267 (10.79)  
    test_class_bitwise1_parse                      19.0340 (4.95)         1.6937 (6.46)   
    test_class_bitwise1_parse_compiled             18.3380 (4.77)         1.9169 (7.32)   
    test_class_bitwise2_build                   5,181.2200 (>1000.0)    176.1713 (672.30) 
    test_class_bitwise2_parse                   4,641.4420 (>1000.0)    149.0798 (568.92) 
    test_class_bytes_build                          5.2700 (1.37)         0.3894 (1.49)   
    test_class_bytes_parse                          4.3720 (1.14)         0.2620 (1.0)    
    test_class_bytes_parse_compiled                 4.3770 (1.14)         0.4845 (1.85)   
    test_class_bytesinteger_build                   7.1130 (1.85)         0.5597 (2.14)   
    test_class_bytesinteger_parse                   6.1550 (1.60)         0.8879 (3.39)   
    test_class_bytesinteger_parse_compiled          5.9690 (1.55)         0.8120 (3.10)   
    test_class_byteswapped1_build                   7.8880 (2.05)         1.6156 (6.17)   
    test_class_byteswapped1_parse                   6.6990 (1.74)         1.4248 (5.44)   
    test_class_byteswapped1_parse_compiled          5.8140 (1.51)         1.0893 (4.16)   
    test_class_bytewise1_build                     54.3910 (14.14)        3.5353 (13.49)  
    test_class_bytewise1_parse                     51.2590 (13.33)        4.9621 (18.94)  
    test_class_bytewise1_parse_compiled            51.1530 (13.30)        5.0922 (19.43)  
    test_class_bytewise2_build                  1,264.2500 (328.72)      76.9591 (293.69) 
    test_class_bytewise2_parse                  1,233.1150 (320.62)      65.5335 (250.09) 
    test_class_check_build                          7.7850 (2.02)         0.9710 (3.71)   
    test_class_check_parse                          7.5500 (1.96)         1.0495 (4.01)   
    test_class_check_parse_compiled                 5.7900 (1.51)         0.7776 (2.97)   
    test_class_computed_build                       6.7760 (1.76)         0.6328 (2.41)   
    test_class_computed_parse                       6.5940 (1.71)         0.6383 (2.44)   
    test_class_computed_parse_compiled              6.7670 (1.76)         0.7396 (2.82)   
    test_class_const_build                          5.8600 (1.52)         0.6461 (2.47)   
    test_class_const_parse                          4.8930 (1.27)         0.3691 (1.41)   
    test_class_const_parse_compiled                 4.6680 (1.21)         0.6549 (2.50)   
    test_class_cstring_build                        7.7910 (2.03)        32.0498 (122.31) 
    test_class_cstring_parse                       10.9060 (2.84)         1.0971 (4.19)   
    test_class_cstring_parse_compiled               9.4050 (2.45)         1.6083 (6.14)   
    test_class_default_build                        5.8910 (1.53)         0.7784 (2.97)   
    test_class_default_parse                        5.0430 (1.31)         0.5048 (1.93)   
    test_class_default_parse_compiled               4.7200 (1.23)         0.7015 (2.68)   
    test_class_enum_build                           6.4310 (1.67)         0.4820 (1.84)   
    test_class_enum_parse                           6.4100 (1.67)         0.2944 (1.12)   
    test_class_enum_parse_compiled                  4.9280 (1.28)         0.5852 (2.23)   
    test_class_flag_build                           4.7740 (1.24)         0.5016 (1.91)   
    test_class_flag_parse                           4.2450 (1.10)         0.8202 (3.13)   
    test_class_flag_parse_compiled                  4.4510 (1.16)         0.7262 (2.77)   
    test_class_flagsenum_build                      9.5940 (2.49)         2.3077 (8.81)   
    test_class_flagsenum_parse                     14.9890 (3.90)         1.1867 (4.53)   
    test_class_flagsenum_parse_compiled            12.5860 (3.27)         7.8440 (29.93)  
    test_class_focusedseq_build                    27.4290 (7.13)         3.5810 (13.67)  
    test_class_focusedseq_parse                    23.9230 (6.22)         2.9801 (11.37)  
    test_class_focusedseq_parse_compiled           11.4680 (2.98)         1.8008 (6.87)   
    test_class_formatfield_build                    5.3830 (1.40)         0.3952 (1.51)   
    test_class_formatfield_parse                    4.7820 (1.24)         0.3797 (1.45)   
    test_class_formatfield_parse_compiled           4.7870 (1.24)         0.7985 (3.05)   
    test_class_greedybytes_build                    3.9610 (1.03)         0.5677 (2.17)   
    test_class_greedybytes_parse                    3.8460 (1.0)          0.3800 (1.45)   
    test_class_greedybytes_parse_compiled           3.9150 (1.02)         0.4162 (1.59)   
    test_class_greedyrange_build                  328.9710 (85.54)       17.5818 (67.10)  
    test_class_greedyrange_parse                  325.6610 (84.67)       31.8383 (121.50) 
    test_class_greedyrange_parse_compiled         300.9270 (78.24)       24.0149 (91.65)  
    test_class_greedystring_build                   5.3440 (1.39)         0.6892 (2.63)   
    test_class_greedystring_parse                   5.0730 (1.32)         0.9543 (3.64)   
    test_class_greedystring_parse_compiled          4.5540 (1.18)         0.5366 (2.05)   
    test_class_hex_build                            4.6150 (1.20)         0.5106 (1.95)   
    test_class_hex_parse                            5.2830 (1.37)         0.8942 (3.41)   
    test_class_hex_parse_compiled                   3.9050 (1.02)         0.6158 (2.35)   
    test_class_hexdump_build                        4.6340 (1.20)         0.8433 (3.22)   
    test_class_hexdump_parse                        5.0960 (1.33)         1.0297 (3.93)   
    test_class_hexdump_parse_compiled               3.9120 (1.02)         0.7631 (2.91)   
    test_class_ifthenelse_build                     8.9100 (2.32)         0.9234 (3.52)   
    test_class_ifthenelse_parse                     8.3680 (2.18)         0.7548 (2.88)   
    test_class_ifthenelse_parse_compiled            6.7390 (1.75)         0.7323 (2.79)   
    test_class_mapping_build                        6.3000 (1.64)         0.9057 (3.46)   
    test_class_mapping_parse                        5.6000 (1.46)         1.6992 (6.48)   
    test_class_mapping_parse_compiled               4.9730 (1.29)         0.6396 (2.44)   
    test_class_namedtuple1_build                   18.0560 (4.69)         2.1252 (8.11)   
    test_class_namedtuple1_parse                   16.8770 (4.39)         2.5048 (9.56)   
    test_class_namedtuple1_parse_compiled           9.0800 (2.36)         1.3966 (5.33)   
    test_class_namedtuple2_build                   46.3020 (12.04)        4.8023 (18.33)  
    test_class_namedtuple2_parse                   34.1590 (8.88)         3.9813 (15.19)  
    test_class_namedtuple2_parse_compiled          16.1740 (4.21)         2.1471 (8.19)   
    test_class_numpy_build                        212.2070 (55.18)       19.0170 (72.57)  
    test_class_numpy_parse                        287.4910 (74.75)    1,033.8723 (>1000.0)
    test_class_numpy_parse_compiled               289.1160 (75.17)       31.5770 (120.50) 
    test_class_padded_build                         7.6610 (1.99)         1.0465 (3.99)   
    test_class_padded_parse                         6.5550 (1.70)         0.8192 (3.13)   
    test_class_padded_parse_compiled                5.3810 (1.40)         0.6683 (2.55)   
    test_class_padding_build                        6.1410 (1.60)         0.4382 (1.67)   
    test_class_padding_parse                        5.3390 (1.39)         0.3259 (1.24)   
    test_class_padding_parse_compiled               4.5490 (1.18)         0.6567 (2.51)   
    test_class_pascalstring_build                   9.0730 (2.36)         0.6574 (2.51)   
    test_class_pascalstring_parse                   7.9290 (2.06)         0.4959 (1.89)   
    test_class_pascalstring_parse_compiled          6.6670 (1.73)         0.6601 (2.52)   
    test_class_peek_build                          14.8610 (3.86)         1.5169 (5.79)   
    test_class_peek_parse                          19.3210 (5.02)         1.7638 (6.73)   
    test_class_peek_parse_compiled                 11.9050 (3.10)         1.2330 (4.71)   
    test_class_pickled_build                        5.5730 (1.45)         0.8605 (3.28)   
    test_class_pickled_parse                        8.1680 (2.12)         0.8642 (3.30)   
    test_class_pickled_parse_compiled               8.9110 (2.32)         1.5638 (5.97)   
    test_class_pointer_build                        7.2010 (1.87)         0.3975 (1.52)   
    test_class_pointer_parse                        6.3530 (1.65)         0.6129 (2.34)   
    test_class_pointer_parse_compiled               5.7300 (1.49)         0.6892 (2.63)   
    test_class_prefixed_build                       7.8600 (2.04)         0.4987 (1.90)   
    test_class_prefixed_parse                       6.8100 (1.77)         0.7110 (2.71)   
    test_class_prefixed_parse_compiled              6.1950 (1.61)         0.6435 (2.46)   
    test_class_prefixedarray_build                855.3260 (222.39)      55.4369 (211.56) 
    test_class_prefixedarray_parse                757.6910 (197.01)      49.8982 (190.42) 
    test_class_prefixedarray_parse_compiled       184.4760 (47.97)       14.9617 (57.10)  
    test_class_rawcopy_build1                      13.3870 (3.48)         2.1631 (8.25)   
    test_class_rawcopy_build2                      16.8280 (4.38)         3.4464 (13.15)  
    test_class_rawcopy_parse                       14.4990 (3.77)         1.3540 (5.17)   
    test_class_rawcopy_parse_compiled              14.9130 (3.88)         4.8756 (18.61)  
    test_class_rebuild_build                        5.8890 (1.53)         0.5504 (2.10)   
    test_class_rebuild_parse                        5.0030 (1.30)         0.6272 (2.39)   
    test_class_rebuild_parse_compiled               4.8300 (1.26)         0.5108 (1.95)   
    test_class_repeatuntil_build                   11.1090 (2.89)         0.8754 (3.34)   
    test_class_repeatuntil_parse                   10.2730 (2.67)         0.8322 (3.18)   
    test_class_repeatuntil_parse_compiled           7.3020 (1.90)         1.3155 (5.02)   
    test_class_select_build                        19.3270 (5.03)         2.1872 (8.35)   
    test_class_select_parse                         5.5500 (1.44)         0.5927 (2.26)   
    test_class_select_parse_compiled                5.9140 (1.54)         0.9409 (3.59)   
    test_class_sequence_build                      23.9440 (6.23)         3.7300 (14.23)  
    test_class_sequence_parse                      20.7810 (5.40)         2.6298 (10.04)  
    test_class_sequence_parse_compiled             11.9820 (3.12)         3.2669 (12.47)  
    test_class_string_build                         8.4160 (2.19)         0.5589 (2.13)   
    test_class_string_parse                        21.2270 (5.52)         1.3555 (5.17)   
    test_class_string_parse_compiled               18.9030 (4.91)         1.6023 (6.11)   
    test_class_struct_build                        49.0800 (12.76)        3.9414 (15.04)  
    test_class_struct_parse                        43.5890 (11.33)        4.4993 (17.17)  
    test_class_struct_parse_compiled               18.7370 (4.87)         2.0198 (7.71)   
    test_class_switch_build                         9.2500 (2.41)         0.4969 (1.90)   
    test_class_switch_parse                         8.4710 (2.20)         0.7958 (3.04)   
    test_class_switch_parse_compiled                7.1160 (1.85)         0.7794 (2.97)   
    test_class_timestamp1_build                     9.7510 (2.54)         1.0072 (3.84)   
    test_class_timestamp1_parse                    29.7140 (7.73)         2.7236 (10.39)  
    test_class_timestamp1_parse_compiled           30.2160 (7.86)         3.5592 (13.58)  
    test_class_timestamp2_build                   100.4570 (26.12)       15.4131 (58.82)  
    test_class_timestamp2_parse                   106.5390 (27.70)       12.0199 (45.87)  
    test_class_timestamp2_parse_compiled          107.6340 (27.99)       17.3917 (66.37)  
    test_class_union_build                         55.8850 (14.53)        6.5646 (25.05)  
    test_class_union_parse                         91.0570 (23.68)       10.2126 (38.97)  
    test_class_union_parse_compiled                31.9240 (8.30)         3.5955 (13.72)  
    test_class_varint_build                        14.9650 (3.89)         0.8179 (3.12)   
    test_class_varint_parse                        18.6660 (4.85)         1.6747 (6.39)   
    test_class_varint_parse_compiled               19.6660 (5.11)         5.0212 (19.16)  
    test_overall_build                          2,848.2370 (740.57)   5,609.2037 (>1000.0)
    test_overall_build_compiled                 2,852.9260 (741.79)     163.0128 (622.09) 
    test_overall_parse                          3,200.7850 (832.23)     224.9197 (858.34) 
    test_overall_parse_compiled                 2,229.9610 (579.81)     118.2029 (451.09) 
    --------------------------------------------------------------------------------------


Motivation, part 2
=====================

The second part of optimisation is just running the generated code on pypy. Since pypy is not using any type annotations, there is nothing to discuss in this chapter. The benchmark reflects the same code as in previous chapter, but ran on Pypy 2.7 rather than CPython 3.6.

Empirical evidence
---------------------

Notice that results are in nanoseconds (10**-9).

::

    ------------------------------------- benchmark: 152 tests ------------------------------------
    Name (time in ns)                                      Min                     StdDev          
    -----------------------------------------------------------------------------------------------
    test_class_array_parse                         11,042.9974 (103.52)       40,792.8559 (46.97)  
    test_class_array_parse_compiled                 9,088.0058 (85.20)        43,001.3909 (49.52)  
    test_class_greedyrange_parse                   14,402.0014 (135.01)       49,834.2047 (57.38)  
    test_class_greedyrange_parse_compiled           9,801.0059 (91.88)        39,296.4529 (45.25)  
    test_class_repeatuntil_parse                      318.4996 (2.99)          2,469.5524 (2.84)   
    test_class_repeatuntil_parse_compiled             309.3746 (2.90)        103,425.2134 (119.09) 
    test_class_string_parse                           966.8991 (9.06)        537,241.0095 (618.62) 
    test_class_string_parse_compiled                  726.6994 (6.81)          3,719.2657 (4.28)   
    test_class_cstring_parse                          782.2993 (7.33)          4,111.8970 (4.73)   
    test_class_cstring_parse_compiled                 591.1992 (5.54)        479,164.9746 (551.75) 
    test_class_pascalstring_parse                     465.0911 (4.36)          4,262.4397 (4.91)   
    test_class_pascalstring_parse_compiled            298.4118 (2.80)        122,279.2150 (140.80) 
    test_class_struct_parse                         2,633.9985 (24.69)        14,654.3095 (16.87)  
    test_class_struct_parse_compiled                  949.7991 (8.90)          4,228.2890 (4.87)   
    test_class_sequence_parse                       1,310.6008 (12.29)         5,811.8046 (6.69)   
    test_class_sequence_parse_compiled                732.2000 (6.86)          4,703.9483 (5.42)   
    test_class_union_parse                          5,619.9933 (52.69)        30,590.0630 (35.22)  
    test_class_union_parse_compiled                 2,699.9987 (25.31)        15,888.8206 (18.30)  
    test_overall_parse                          1,332,581.9891 (>1000.0)   2,274,995.4192 (>1000.0)
    test_overall_parse_compiled                   690,380.0095 (>1000.0)     602,697.9721 (694.00) 
    -----------------------------------------------------------------------------------------------

..
    ------------------------------------- benchmark: 152 tests ------------------------------------
    Name (time in ns)                                      Min                     StdDev          
    -----------------------------------------------------------------------------------------------
    test_class_aligned_build                          740.5994 (6.94)          4,143.5039 (4.77)   
    test_class_aligned_parse                          602.1000 (5.64)          4,001.4447 (4.61)   
    test_class_aligned_parse_compiled                 237.5240 (2.23)        233,368.4415 (268.72) 
    test_class_array_build                         12,085.9913 (113.30)    4,199,133.4429 (>1000.0)
    test_class_array_parse                         11,042.9974 (103.52)       40,792.8559 (46.97)  
    test_class_array_parse_compiled                 9,088.0058 (85.20)        43,001.3909 (49.52)  
    test_class_bitsinteger_build                    3,602.4940 (33.77)     1,177,244.9019 (>1000.0)
    test_class_bitsinteger_parse                    2,823.5008 (26.47)        14,156.0060 (16.30)  
    test_class_bitsinteger_parse_compiled           2,768.9966 (25.96)        14,832.6464 (17.08)  
    test_class_bitsswapped1_build                   5,726.9935 (53.69)        29,157.1889 (33.57)  
    test_class_bitsswapped1_parse                   6,172.9952 (57.87)        28,735.2233 (33.09)  
    test_class_bitsswapped1_parse_compiled          5,715.9923 (53.59)        26,115.4525 (30.07)  
    test_class_bitsswapped2_build                  38,265.0032 (358.72)       92,216.9408 (106.19) 
    test_class_bitsswapped2_parse                  36,199.9992 (339.36)       99,672.2831 (114.77) 
    test_class_bitwise1_build                       7,979.0043 (74.80)        18,320.0158 (21.10)  
    test_class_bitwise1_parse                       5,914.0002 (55.44)        15,593.2498 (17.96)  
    test_class_bitwise1_parse_compiled              5,969.9960 (55.97)        10,953.7787 (12.61)  
    test_class_bitwise2_build                     136,212.0092 (>1000.0)     126,711.5616 (145.91) 
    test_class_bitwise2_parse                     120,290.0021 (>1000.0)     100,256.6237 (115.44) 
    test_class_bytes_build                            106.6699 (1.0)          45,663.4740 (52.58)  
    test_class_bytes_parse                            166.0601 (1.56)         26,090.0331 (30.04)  
    test_class_bytes_parse_compiled                   172.6300 (1.62)         38,715.3059 (44.58)  
    test_class_bytesinteger_build                     440.4998 (4.13)          2,794.5403 (3.22)   
    test_class_bytesinteger_parse                     397.6915 (3.73)          2,760.2520 (3.18)   
    test_class_bytesinteger_parse_compiled            404.1537 (3.79)        314,221.4811 (361.82) 
    test_class_byteswapped1_build                     423.0011 (3.97)        439,883.6772 (506.52) 
    test_class_byteswapped1_parse                     700.1989 (6.56)          5,650.5263 (6.51)   
    test_class_byteswapped1_parse_compiled            467.4551 (4.38)        375,681.4718 (432.59) 
    test_class_bytewise1_build                     13,313.0088 (124.81)       40,142.8640 (46.22)  
    test_class_bytewise1_parse                     13,626.0060 (127.74)    2,380,928.9149 (>1000.0)
    test_class_bytewise1_parse_compiled            13,586.0028 (127.36)       35,062.2700 (40.37)  
    test_class_bytewise2_build                     72,109.9932 (676.01)       73,553.4202 (84.70)  
    test_class_bytewise2_parse                     66,791.9958 (626.16)      140,635.6099 (161.94) 
    test_class_check_build                            740.6998 (6.94)          4,307.2706 (4.96)   
    test_class_check_parse                            541.0999 (5.07)          3,440.5007 (3.96)   
    test_class_check_parse_compiled                   545.6997 (5.12)        679,945.6527 (782.95) 
    test_class_computed_build                         679.1000 (6.37)        605,315.9050 (697.01) 
    test_class_computed_parse                         526.0008 (4.93)          3,428.9984 (3.95)   
    test_class_computed_parse_compiled                552.2001 (5.18)          3,464.2913 (3.99)   
    test_class_const_build                            310.6879 (2.91)          2,745.9160 (3.16)   
    test_class_const_parse                            176.2500 (1.65)         79,386.8928 (91.41)  
    test_class_const_parse_compiled                   182.1501 (1.71)         94,547.7996 (108.87) 
    test_class_cstring_build                          491.0001 (4.60)          3,734.7308 (4.30)   
    test_class_cstring_parse                          782.2993 (7.33)          4,111.8970 (4.73)   
    test_class_cstring_parse_compiled                 591.1992 (5.54)        479,164.9746 (551.75) 
    test_class_default_build                          461.9995 (4.33)          3,437.9897 (3.96)   
    test_class_default_parse                          220.9200 (2.07)            875.7176 (1.01)   
    test_class_default_parse_compiled                 167.3000 (1.57)        115,216.5525 (132.67) 
    test_class_enum_build                             318.2495 (2.98)        329,774.1824 (379.73) 
    test_class_enum_parse                             216.3301 (2.03)         98,506.1576 (113.43) 
    test_class_enum_parse_compiled                    150.8200 (1.41)         56,082.0649 (64.58)  
    test_class_flag_build                             204.2799 (1.92)        130,206.5059 (149.93) 
    test_class_flag_parse                             153.9801 (1.44)        100,694.1426 (115.95) 
    test_class_flag_parse_compiled                    139.8900 (1.31)            868.4449 (1.0)    
    test_class_flagsenum_build                        573.3993 (5.38)          4,344.7692 (5.00)   
    test_class_flagsenum_parse                        652.1004 (6.11)        422,339.3586 (486.32) 
    test_class_flagsenum_parse_compiled               464.5461 (4.35)          3,596.9171 (4.14)   
    test_class_focusedseq_build                     2,233.9998 (20.94)         6,533.8875 (7.52)   
    test_class_focusedseq_parse                     1,345.1005 (12.61)         5,739.1458 (6.61)   
    test_class_focusedseq_parse_compiled              615.0003 (5.77)          3,967.2471 (4.57)   
    test_class_formatfield_build                      282.0557 (2.64)        286,541.4444 (329.95) 
    test_class_formatfield_parse                      237.0500 (2.22)         63,666.5654 (73.31)  
    test_class_formatfield_parse_compiled             154.2599 (1.45)         35,054.4102 (40.36)  
    test_class_greedybytes_build                      110.4000 (1.03)         89,466.1548 (103.02) 
    test_class_greedybytes_parse                      117.2700 (1.10)         94,205.4030 (108.48) 
    test_class_greedybytes_parse_compiled             118.3101 (1.11)         88,084.6992 (101.43) 
    test_class_greedyrange_build                   12,186.0066 (114.24)       37,782.4850 (43.51)  
    test_class_greedyrange_parse                   14,402.0014 (135.01)       49,834.2047 (57.38)  
    test_class_greedyrange_parse_compiled           9,801.0059 (91.88)        39,296.4529 (45.25)  
    test_class_greedystring_build                     348.3331 (3.27)          3,029.8253 (3.49)   
    test_class_greedystring_parse                     473.3645 (4.44)          3,041.7270 (3.50)   
    test_class_greedystring_parse_compiled            409.9241 (3.84)        387,658.3773 (446.38) 
    test_class_hex_build                              459.6355 (4.31)          4,006.9444 (4.61)   
    test_class_hex_parse                              291.4441 (2.73)        182,038.6025 (209.61) 
    test_class_hex_parse_compiled                     126.4800 (1.19)         84,815.3901 (97.66)  
    test_class_hexdump_build                          450.4157 (4.22)          3,790.8239 (4.37)   
    test_class_hexdump_parse                          284.8335 (2.67)        294,559.8261 (339.18) 
    test_class_hexdump_parse_compiled                 128.8101 (1.21)         78,435.0791 (90.32)  
    test_class_ifthenelse_build                       982.9993 (9.22)          4,688.0488 (5.40)   
    test_class_ifthenelse_parse                       851.1997 (7.98)        580,777.8856 (668.76) 
    test_class_ifthenelse_parse_compiled              733.0003 (6.87)          4,714.3734 (5.43)   
    test_class_mapping_build                          336.3336 (3.15)        419,990.5974 (483.61) 
    test_class_mapping_parse                          226.8000 (2.13)        111,247.9039 (128.10) 
    test_class_mapping_parse_compiled                 184.2000 (1.73)            872.1972 (1.00)   
    test_class_namedtuple1_build                      918.4005 (8.61)          3,765.2820 (4.34)   
    test_class_namedtuple1_parse                      673.6998 (6.32)          3,434.7049 (3.96)   
    test_class_namedtuple1_parse_compiled             610.4994 (5.72)        551,488.8854 (635.03) 
    test_class_namedtuple2_build                    3,212.0006 (30.11)        13,384.9602 (15.41)  
    test_class_namedtuple2_parse                    1,786.3000 (16.75)         4,818.3417 (5.55)   
    test_class_namedtuple2_parse_compiled             728.0993 (6.83)          3,332.2180 (3.84)   
    test_class_padded_build                           732.6991 (6.87)          3,967.5355 (4.57)   
    test_class_padded_parse                           583.3004 (5.47)          4,356.6780 (5.02)   
    test_class_padded_parse_compiled                  301.4703 (2.83)        305,922.3763 (352.26) 
    test_class_padding_build                          499.1823 (4.68)          3,525.5175 (4.06)   
    test_class_padding_parse                          350.1996 (3.28)        328,502.3785 (378.27) 
    test_class_padding_parse_compiled                 192.7000 (1.81)         82,517.9180 (95.02)  
    test_class_pascalstring_build                     483.4543 (4.53)        243,109.6546 (279.94) 
    test_class_pascalstring_parse                     465.0911 (4.36)          4,262.4397 (4.91)   
    test_class_pascalstring_parse_compiled            298.4118 (2.80)        122,279.2150 (140.80) 
    test_class_peek_build                             952.7997 (8.93)          6,047.5404 (6.96)   
    test_class_peek_parse                           1,454.3999 (13.63)       774,202.5660 (891.48) 
    test_class_peek_parse_compiled                    438.8183 (4.11)          3,811.7552 (4.39)   
    test_class_pointer_build                          576.9005 (5.41)          3,782.3046 (4.36)   
    test_class_pointer_parse                          377.6430 (3.54)        393,433.4406 (453.03) 
    test_class_pointer_parse_compiled                 210.3799 (1.97)            947.6097 (1.09)   
    test_class_prefixed_build                         888.7000 (8.33)          5,004.2176 (5.76)   
    test_class_prefixed_parse                         757.0008 (7.10)        524,495.2616 (603.95) 
    test_class_prefixed_parse_compiled                471.9080 (4.42)        439,226.7896 (505.76) 
    test_class_prefixedarray_build                 37,869.9915 (355.02)       59,808.3893 (68.87)  
    test_class_prefixedarray_parse                 29,731.0035 (278.72)   10,591,190.0651 (>1000.0)
    test_class_prefixedarray_parse_compiled        22,710.9995 (212.91)       65,049.0162 (74.90)  
    test_class_rawcopy_build1                       1,041.5999 (9.76)          5,312.0368 (6.12)   
    test_class_rawcopy_build2                       1,513.5010 (14.19)       931,668.4553 (>1000.0)
    test_class_rawcopy_parse                        1,064.9004 (9.98)          5,628.3455 (6.48)   
    test_class_rawcopy_parse_compiled                 669.7999 (6.28)          4,616.0835 (5.32)   
    test_class_rebuild_build                          409.5006 (3.84)          3,371.2846 (3.88)   
    test_class_rebuild_parse                          225.8090 (2.12)          1,961.0702 (2.26)   
    test_class_rebuild_parse_compiled                 164.7700 (1.54)         82,487.8733 (94.98)  
    test_class_repeatuntil_build                      475.6360 (4.46)          3,568.2374 (4.11)   
    test_class_repeatuntil_parse                      318.4996 (2.99)          2,469.5524 (2.84)   
    test_class_repeatuntil_parse_compiled             309.3746 (2.90)        103,425.2134 (119.09) 
    test_class_select_build                         7,528.9863 (70.58)        23,358.3203 (26.90)  
    test_class_select_parse                           395.7684 (3.71)        468,021.0341 (538.92) 
    test_class_select_parse_compiled                  194.6000 (1.82)            911.6117 (1.05)   
    test_class_sequence_build                       1,521.9004 (14.27)         6,600.0406 (7.60)   
    test_class_sequence_parse                       1,310.6008 (12.29)         5,811.8046 (6.69)   
    test_class_sequence_parse_compiled                732.2000 (6.86)          4,703.9483 (5.42)   
    test_class_string_build                           535.1001 (5.02)        289,163.7688 (332.97) 
    test_class_string_parse                           966.8991 (9.06)        537,241.0095 (618.62) 
    test_class_string_parse_compiled                  726.6994 (6.81)          3,719.2657 (4.28)   
    test_class_struct_build                         2,857.5014 (26.79)        16,764.1319 (19.30)  
    test_class_struct_parse                         2,633.9985 (24.69)        14,654.3095 (16.87)  
    test_class_struct_parse_compiled                  949.7991 (8.90)          4,228.2890 (4.87)   
    test_class_switch_build                         1,079.1002 (10.12)         4,754.6705 (5.47)   
    test_class_switch_parse                           948.8998 (8.90)          4,558.0161 (5.25)   
    test_class_switch_parse_compiled                  783.7996 (7.35)          4,640.9683 (5.34)   
    test_class_timestamp1_build                       771.2006 (7.23)          3,534.5051 (4.07)   
    test_class_timestamp1_parse                     2,018.1993 (18.92)         5,448.9309 (6.27)   
    test_class_timestamp1_parse_compiled            1,970.7004 (18.47)       891,363.4033 (>1000.0)
    test_class_timestamp2_build                     5,808.9936 (54.46)        28,921.4390 (33.30)  
    test_class_timestamp2_parse                     7,547.0016 (70.75)        38,718.9886 (44.58)  
    test_class_timestamp2_parse_compiled            7,391.9946 (69.30)        36,903.9105 (42.49)  
    test_class_union_build                          3,535.9990 (33.15)        17,829.5208 (20.53)  
    test_class_union_parse                          5,619.9933 (52.69)        30,590.0630 (35.22)  
    test_class_union_parse_compiled                 2,699.9987 (25.31)        15,888.8206 (18.30)  
    test_class_varint_build                           944.5997 (8.86)          5,002.7418 (5.76)   
    test_class_varint_parse                           861.3002 (8.07)          4,343.2995 (5.00)   
    test_class_varint_parse_compiled                  863.2996 (8.09)          4,426.6909 (5.10)   
    test_overall_build                            554,530.0082 (>1000.0)     475,067.7994 (547.03) 
    test_overall_build_compiled                   358,168.0066 (>1000.0)     127,081.1333 (146.33) 
    test_overall_parse                          1,332,581.9891 (>1000.0)   2,274,995.4192 (>1000.0)
    test_overall_parse_compiled                   690,380.0095 (>1000.0)     602,697.9721 (694.00) 
    -----------------------------------------------------------------------------------------------


Motivation, part 3
=====================

.. warning:: Benchmarks revealed that pypy makes the code run much faster than cython, therefore cython improvements were withdrawn, and compiler now generates pure python code that is compatible with Python 2 including pypy. This chapter is no longer relevant. It remained just for educational purposes.

This chapter talks about the second half of optimisation, which is due to Cython type annotations and type inference. I should state for the record, that I am no expert at Cython, and following explanatations are merely "the way I understand it". Please take that into account when reading it. Fourth example:

::

    Struct(
        "num1" / Int8ul,
        "num2" / Int24ul,
        "fixedarray1" / Array(3, Int8ul),
        "name1" / CString("utf8"),
    )

::

    cdef bytes read_bytes(io, int count):
        if not count >= 0: raise StreamError
        cdef bytes data = io.read(count)
        if not len(data) == count: raise StreamError
        return data
    cdef bytes parse_nullterminatedstring(io, int unitsize, bytes finalunit):
        cdef list result = []
        cdef bytes unit
        while True:
            unit = read_bytes(io, unitsize)
            if unit == finalunit:
                break
            result.append(unit)
        return b"".join(result)
    def parse_struct_1(io, this):
        this = Container(_ = this)
        try:
            this['num1'] = unpack('<B', read_bytes(io, 1))[0]
            this['num2'] = int.from_bytes(read_bytes(io, 3), byteorder='little', signed=False)
            this['fixedarray1'] = ListContainer((unpack('<B', read_bytes(io, 1))[0]) for i in range(3))
            this['name1'] = (parse_nullterminatedstring(io, 1, b'\x00')).decode('utf8')
            pass
        except StopIteration:
            pass
        del this['_']
        del this['_index']
        return this
    def parseall(io, this):
        return parse_struct_1(io, this)
    compiled = Compiled(None, None, parseall)


The primary cause of speedup in cython is this: if a variable is of known type, then operations on that variable can skip certain checks. If a variable is a pure python object, then those checks need to be added. A variable is considered of known type if either (1) its annotated like "cdef bytes data" or (2) its inferred like when using an annotated function call result like in "parse_nullterminatedstring(...).decode(...)" since "cdef bytes parse_nullterminatedstring(...)". If a variable is known to be a list, then calling "append" on it doesnt require checking if that object has such a method or matching signature (parameters). If a variable is known to be a bytes, then "len(data)" can be compiled into bytes-type length function, not a general-purpose length function that works on arbitrary objects, and also "unit == finalunit" can be compiled into bytes-type equality. If a variable is known to be a unicode, then ".decode('utf8')" can be compiled into str-type implementation. If cython knows that "struct.unpack" returns only tuples, then "...[0]" would compile into tuple-type getitem (index access). Examples are many, but the pattern is the same: type-specific code is faster than type-general code.

Second cause of speedup is due to special handling of integers. While most annotations like "cdef bytes" refer to specific albeit Python types, the "cdef int" actually does not refer to any Python type. It represents a C-integer which is allocated on the stack or in registers, unlike the other types which are allocated on the heap. All operations on C-integers are therefore much faster than on Python-integers. In example code, this affects "count >= 0" and "len(data) == count".


Empirical evidence
---------------------

Below micro-benchmarks show the difference between core classes and cython-compiled classes. Only those where performance boost was highest are listed (although they also happen to be the most important), some other classes have little speedup, and some have none.

Notice that results are in microseconds (10**-6).

::

    ------------------------------- benchmark: 152 tests -------------------------------
    Name (time in us)                                  Min              StdDev          
    ------------------------------------------------------------------------------------
    test_class_array_parse                        286.5460 (73.85)     42.8831 (89.84)  
    test_class_array_parse_compiled                30.7200 (7.92)       6.9577 (14.58)  
    test_class_greedyrange_parse                  320.9860 (82.73)     45.9480 (96.26)  
    test_class_greedyrange_parse_compiled         262.7010 (67.71)     36.4504 (76.36)  
    test_class_repeatuntil_parse                   10.1850 (2.63)       2.4147 (5.06)   
    test_class_repeatuntil_parse_compiled           6.8880 (1.78)       1.5471 (3.24)   
    test_class_string_parse                        20.4400 (5.27)       4.4044 (9.23)   
    test_class_string_parse_compiled                9.1470 (2.36)       2.2427 (4.70)   
    test_class_cstring_parse                       11.2290 (2.89)       1.6216 (3.40)   
    test_class_cstring_parse_compiled               5.6080 (1.45)       1.0321 (2.16)   
    test_class_pascalstring_parse                   7.8560 (2.02)       1.8567 (3.89)   
    test_class_pascalstring_parse_compiled          5.8910 (1.52)       0.9466 (1.98)   
    test_class_struct_parse                        44.1300 (11.37)      6.8434 (14.34)  
    test_class_struct_parse_compiled               16.9070 (4.36)       3.0500 (6.39)   
    test_class_sequence_parse                      21.5420 (5.55)       2.6852 (5.63)   
    test_class_sequence_parse_compiled             10.1530 (2.62)       2.1645 (4.53)   
    test_class_union_parse                         91.9150 (23.69)     10.7812 (22.59)  
    test_class_union_parse_compiled                22.5970 (5.82)      15.2649 (31.98)  
    test_overall_parse                          2,126.2570 (548.01)   255.0154 (534.27) 
    test_overall_parse_compiled                 1,124.9560 (289.94)   127.4730 (267.06) 
    ------------------------------------------------------------------------------------

..
    ------------------------------- benchmark: 152 tests -------------------------------
    Name (time in us)                                  Min              StdDev          
    ------------------------------------------------------------------------------------
    test_class_aligned_build                        7.8110 (2.01)       1.4475 (3.03)   
    test_class_aligned_parse                        6.7560 (1.74)       2.4557 (5.14)   
    test_class_aligned_parse_compiled               4.7080 (1.21)       1.0038 (2.10)   
    test_class_array_build                        331.7150 (85.49)     45.1915 (94.68)  
    test_class_array_parse                        286.5460 (73.85)     42.8831 (89.84)  
    test_class_array_parse_compiled                30.7200 (7.92)       6.9577 (14.58)  
    test_class_bitsinteger_build                   19.4150 (5.00)       6.0416 (12.66)  
    test_class_bitsinteger_parse                   19.2520 (4.96)       6.7657 (14.17)  
    test_class_bitsinteger_parse_compiled          17.4700 (4.50)      11.1148 (23.29)  
    test_class_bitsswapped1_build                  20.0300 (5.16)       3.5605 (7.46)   
    test_class_bitsswapped1_parse                  18.9740 (4.89)       3.1174 (6.53)   
    test_class_bitsswapped1_parse_compiled         17.4030 (4.49)       3.2099 (6.72)   
    test_class_bitsswapped2_build                 866.5650 (223.34)    99.0145 (207.44) 
    test_class_bitsswapped2_parse                 813.8270 (209.75)   104.6734 (219.29) 
    test_class_bitwise1_build                      38.7430 (9.99)       4.1560 (8.71)   
    test_class_bitwise1_parse                      18.8820 (4.87)       3.8922 (8.15)   
    test_class_bitwise1_parse_compiled             17.5770 (4.53)       2.1345 (4.47)   
    test_class_bitwise2_build                   5,249.8520 (>1000.0)  247.1093 (517.70) 
    test_class_bitwise2_parse                   4,650.4640 (>1000.0)  605.3646 (>1000.0)
    test_class_bytes_build                          5.3900 (1.39)       0.7781 (1.63)   
    test_class_bytes_parse                          4.4180 (1.14)       0.4773 (1.0)    
    test_class_bytes_parse_compiled                 4.0220 (1.04)       0.7253 (1.52)   
    test_class_bytesinteger_build                   7.1450 (1.84)       1.4272 (2.99)   
    test_class_bytesinteger_parse                   6.2820 (1.62)       1.4176 (2.97)   
    test_class_bytesinteger_parse_compiled          5.3420 (1.38)       1.8858 (3.95)   
    test_class_byteswapped1_build                   7.9820 (2.06)       1.5524 (3.25)   
    test_class_byteswapped1_parse                   6.6840 (1.72)       1.2694 (2.66)   
    test_class_byteswapped1_parse_compiled          4.9890 (1.29)       1.1038 (2.31)   
    test_class_bytewise1_build                     53.7710 (13.86)      5.8007 (12.15)  
    test_class_bytewise1_parse                     49.7540 (12.82)      7.8771 (16.50)  
    test_class_bytewise1_parse_compiled            48.5480 (12.51)      5.0040 (10.48)  
    test_class_bytewise2_build                  1,270.0850 (327.34)   116.3612 (243.78) 
    test_class_bytewise2_parse                  1,225.2780 (315.79)    99.7644 (209.01) 
    test_class_check_build                          7.9260 (2.04)       1.7875 (3.74)   
    test_class_check_parse                          7.7250 (1.99)       1.7400 (3.65)   
    test_class_check_parse_compiled                 5.8770 (1.51)       1.5456 (3.24)   
    test_class_computed_build                       6.9660 (1.80)       1.0798 (2.26)   
    test_class_computed_parse                       6.6770 (1.72)       1.6214 (3.40)   
    test_class_computed_parse_compiled              5.6290 (1.45)       0.9689 (2.03)   
    test_class_const_build                          5.9990 (1.55)       1.4849 (3.11)   
    test_class_const_parse                          4.8720 (1.26)       1.1863 (2.49)   
    test_class_const_parse_compiled                 4.2520 (1.10)       0.9856 (2.06)   
    test_class_cstring_build                        7.8570 (2.03)       1.2683 (2.66)   
    test_class_cstring_parse                       11.2290 (2.89)       1.6216 (3.40)   
    test_class_cstring_parse_compiled               5.6080 (1.45)       1.0321 (2.16)   
    test_class_default_build                        6.0770 (1.57)       1.2640 (2.65)   
    test_class_default_parse                        5.1160 (1.32)       1.1421 (2.39)   
    test_class_default_parse_compiled               4.4890 (1.16)       1.2474 (2.61)   
    test_class_enum_build                           6.3000 (1.62)       0.9694 (2.03)   
    test_class_enum_parse                           6.3900 (1.65)       0.9849 (2.06)   
    test_class_enum_parse_compiled                  4.5520 (1.17)       0.7292 (1.53)   
    test_class_flag_build                           4.7940 (1.24)       0.6771 (1.42)   
    test_class_flag_parse                           4.3500 (1.12)       0.6541 (1.37)   
    test_class_flag_parse_compiled                  4.1380 (1.07)       0.5723 (1.20)   
    test_class_flagsenum_build                      9.7270 (2.51)       1.1748 (2.46)   
    test_class_flagsenum_parse                     15.2000 (3.92)       2.1840 (4.58)   
    test_class_flagsenum_parse_compiled            11.6480 (3.00)       1.5491 (3.25)   
    test_class_focusedseq_build                    27.1080 (6.99)       6.3815 (13.37)  
    test_class_focusedseq_parse                    23.6720 (6.10)       3.4153 (7.16)   
    test_class_focusedseq_parse_compiled           10.7130 (2.76)       2.1026 (4.41)   
    test_class_formatfield_build                    5.3590 (1.38)       1.1223 (2.35)   
    test_class_formatfield_parse                    4.7750 (1.23)       0.8140 (1.71)   
    test_class_formatfield_parse_compiled           4.4370 (1.14)       0.9037 (1.89)   
    test_class_greedybytes_build                    4.0550 (1.05)       1.1607 (2.43)   
    test_class_greedybytes_parse                    3.8800 (1.0)        0.5046 (1.06)   
    test_class_greedybytes_parse_compiled           3.9690 (1.02)       1.1108 (2.33)   
    test_class_greedyrange_build                  332.8790 (85.79)     43.8336 (91.83)  
    test_class_greedyrange_parse                  320.9860 (82.73)     45.9480 (96.26)  
    test_class_greedyrange_parse_compiled         262.7010 (67.71)     36.4504 (76.36)  
    test_class_greedystring_build                   5.3930 (1.39)       0.7442 (1.56)   
    test_class_greedystring_parse                   5.0800 (1.31)       1.1375 (2.38)   
    test_class_greedystring_parse_compiled          4.6150 (1.19)       0.9228 (1.93)   
    test_class_hex_build                            4.5730 (1.18)       0.8108 (1.70)   
    test_class_hex_parse                            5.4210 (1.40)       0.9506 (1.99)   
    test_class_hex_parse_compiled                   4.0000 (1.03)       0.8198 (1.72)   
    test_class_hexdump_build                        4.5640 (1.18)       0.8572 (1.80)   
    test_class_hexdump_parse                        5.1660 (1.33)       0.8708 (1.82)   
    test_class_hexdump_parse_compiled               3.9460 (1.02)       0.8104 (1.70)   
    test_class_ifthenelse_build                     9.0200 (2.32)       3.1983 (6.70)   
    test_class_ifthenelse_parse                     8.5450 (2.20)       4.2003 (8.80)   
    test_class_ifthenelse_parse_compiled            6.4490 (1.66)       3.5984 (7.54)   
    test_class_mapping_build                        6.1160 (1.58)       0.9536 (2.00)   
    test_class_mapping_parse                        5.5320 (1.43)       0.9137 (1.91)   
    test_class_mapping_parse_compiled               4.5650 (1.18)       0.8350 (1.75)   
    test_class_namedtuple1_build                   18.3450 (4.73)       2.1664 (4.54)   
    test_class_namedtuple1_parse                   17.1850 (4.43)       2.9482 (6.18)   
    test_class_namedtuple1_parse_compiled           7.1810 (1.85)       1.0228 (2.14)   
    test_class_namedtuple2_build                   47.7850 (12.32)      6.1995 (12.99)  
    test_class_namedtuple2_parse                   34.4330 (8.87)       3.8498 (8.07)   
    test_class_namedtuple2_parse_compiled          15.4160 (3.97)       2.5158 (5.27)   
    test_class_numpy_build                        212.5540 (54.78)     27.0343 (56.64)  
    test_class_numpy_parse                        288.5380 (74.37)     45.4344 (95.19)  
    test_class_numpy_parse_compiled               290.8960 (74.97)    110.2389 (230.95) 
    test_class_padded_build                         7.7810 (2.01)       3.6378 (7.62)   
    test_class_padded_parse                         6.6460 (1.71)       1.2688 (2.66)   
    test_class_padded_parse_compiled                4.7090 (1.21)       1.2451 (2.61)   
    test_class_padding_build                        6.1880 (1.59)       1.4536 (3.05)   
    test_class_padding_parse                        5.4070 (1.39)       1.1753 (2.46)   
    test_class_padding_parse_compiled               4.1200 (1.06)       1.1916 (2.50)   
    test_class_pascalstring_build                   9.1680 (2.36)       1.4623 (3.06)   
    test_class_pascalstring_parse                   7.8560 (2.02)       1.8567 (3.89)   
    test_class_pascalstring_parse_compiled          5.8910 (1.52)       0.9466 (1.98)   
    test_class_peek_build                          14.8710 (3.83)       2.6207 (5.49)   
    test_class_peek_parse                          19.5870 (5.05)       3.6857 (7.72)   
    test_class_peek_parse_compiled                 10.6000 (2.73)       2.0105 (4.21)   
    test_class_pickled_build                        5.6150 (1.45)       1.2695 (2.66)   
    test_class_pickled_parse                        8.3370 (2.15)       1.5174 (3.18)   
    test_class_pickled_parse_compiled               8.9810 (2.31)       1.7670 (3.70)   
    test_class_pointer_build                        7.2470 (1.87)       1.3817 (2.89)   
    test_class_pointer_parse                        6.3760 (1.64)       1.2557 (2.63)   
    test_class_pointer_parse_compiled               5.0970 (1.31)       0.9715 (2.04)   
    test_class_prefixed_build                       7.8970 (2.04)       1.8404 (3.86)   
    test_class_prefixed_parse                       6.7860 (1.75)       1.3916 (2.92)   
    test_class_prefixed_parse_compiled              5.2350 (1.35)       1.3229 (2.77)   
    test_class_prefixedarray_build                873.1850 (225.05)    84.7384 (177.53) 
    test_class_prefixedarray_parse                763.2760 (196.72)    88.0787 (184.53) 
    test_class_prefixedarray_parse_compiled        79.4790 (20.48)     11.9930 (25.13)  
    test_class_rawcopy_build1                      13.8040 (3.56)       2.1913 (4.59)   
    test_class_rawcopy_build2                      16.9810 (4.38)       2.6092 (5.47)   
    test_class_rawcopy_parse                       15.2890 (3.94)       3.6678 (7.68)   
    test_class_rawcopy_parse_compiled              14.8570 (3.83)       2.6335 (5.52)   
    test_class_rebuild_build                        6.0380 (1.56)       1.2981 (2.72)   
    test_class_rebuild_parse                        5.1540 (1.33)       0.8264 (1.73)   
    test_class_rebuild_parse_compiled               4.5160 (1.16)       0.7145 (1.50)   
    test_class_repeatuntil_build                   11.0780 (2.86)       2.4318 (5.09)   
    test_class_repeatuntil_parse                   10.1850 (2.63)       2.4147 (5.06)   
    test_class_repeatuntil_parse_compiled           6.8880 (1.78)       1.5471 (3.24)   
    test_class_select_build                        19.1100 (4.93)       6.5128 (13.64)  
    test_class_select_parse                         5.6280 (1.45)       3.2641 (6.84)   
    test_class_select_parse_compiled                5.5660 (1.43)       3.7881 (7.94)   
    test_class_sequence_build                      24.5060 (6.32)       5.1873 (10.87)  
    test_class_sequence_parse                      21.5420 (5.55)       2.6852 (5.63)   
    test_class_sequence_parse_compiled             10.1530 (2.62)       2.1645 (4.53)   
    test_class_string_build                         8.5320 (2.20)       1.8491 (3.87)   
    test_class_string_parse                        20.4400 (5.27)       4.4044 (9.23)   
    test_class_string_parse_compiled                9.1470 (2.36)       2.2427 (4.70)   
    test_class_struct_build                        49.1730 (12.67)      5.5050 (11.53)  
    test_class_struct_parse                        44.1300 (11.37)      6.8434 (14.34)  
    test_class_struct_parse_compiled               16.9070 (4.36)       3.0500 (6.39)   
    test_class_switch_build                         9.5110 (2.45)       1.7349 (3.63)   
    test_class_switch_parse                         8.7100 (2.24)       1.9867 (4.16)   
    test_class_switch_parse_compiled                6.7830 (1.75)       1.1652 (2.44)   
    test_class_union_build                         57.0540 (14.70)     12.0599 (25.27)  
    test_class_union_parse                         91.9150 (23.69)     10.7812 (22.59)  
    test_class_union_parse_compiled                22.5970 (5.82)      15.2649 (31.98)  
    test_class_varint_build                        15.2000 (3.92)       3.2498 (6.81)   
    test_class_varint_parse                        18.9080 (4.87)       4.2807 (8.97)   
    test_class_varint_parse_compiled               19.6070 (5.05)       4.0409 (8.47)   
    test_overall_build                          1,970.9570 (507.98)   189.2782 (396.54) 
    test_overall_build_compiled                 1,987.8950 (512.35)   166.3636 (348.54) 
    test_overall_parse                          2,126.2570 (548.01)   255.0154 (534.27) 
    test_overall_parse_compiled                 1,124.9560 (289.94)   127.4730 (267.06) 
    ------------------------------------------------------------------------------------


Comparison with Kaitai Struct
================================

Kaitai Struct is a very respectable competitor, so I believe a benchmark-based comparison should be presented. Construct and Kaitai have very different capabilities: Kaitai supports about a dozen languages, Construct only supports Python, Kaitai offers only basic common features, Construct offers python-only stuff like Numpy and Pickle support, Kaitai does only parsing, Construct does also building. In a sense, those libraries are in two different categories (like sumo and karate). There are multiple scenarios where either library would not be usable.

Example used for comparison:

::

    Struct(
        "count" / Int32ul,
        "items" / Array(this.count, Struct(
            "num1" / Int8ul,
            "num2" / Int24ul,
            "flags" / BitStruct(
                "bool1" / Flag,
                "num4" / BitsInteger(3),
                Padding(4),
            ),
            "fixedarray1" / Array(3, Int8ul),
            "name1" / CString("utf8"),
            "name2" / PascalString(Int8ul, "utf8"),
        )),
    )

::

    meta:
      id: comparison_1_kaitai
      encoding: utf-8
      endian: le
    seq:
      - id: count
        type: u4
      - id: items
        repeat: expr
        repeat-expr: count
        type: item
    types:
      item:
        seq:
          - id: num1
            type: u1
          - id: num2_lo
            type: u2
          - id: num2_hi
            type: u1
          - id: flags
            type: flags
          - id: fixedarray1
            repeat: expr
            repeat-expr: 3
            type: u1
          - id: name1
            type: strz
          - id: len_name2
            type: u1
          - id: name2
            type: str
            size: len_name2
        instances:
          num2:
            value: 'num2_hi << 16 | num2_lo'
        types:
          flags:
            seq:
              - id: bool1
                type: b1
              - id: num4
                type: b3
              - id: padding
                type: b4


Suprisingly, Kaitai won the benchmark! Honestly, I am shocked and dismayed that it did. The only explanation that I can point out, is that Kaitai is parsing structs into class objects (with attributes) while Construct parses into dictionaries (with keys). However that one detail seems unlikely explanation for the huge discrepancy in benchmark results. Perhaps there is a flaw in the methodology. But until that is proven, Kaitai gets its respects. Congrats.

::

    $ python3.6 comparison_1_construct.py 
    Timeit measurements:
    parsing:           0.1024609069 sec/call
    parsing compiled:  0.0410809368 sec/call

    $ pypy comparison_1_construct.py 
    Timeit measurements:
    parsing:           0.0108308416 sec/call
    parsing compiled:  0.0062594243 sec/call

::

    $ python3.6 comparison_1_kaitai.py 
    Timeit measurements:
    parsing:           0.0250326035 sec/call

    $ pypy comparison_1_kaitai.py 
    Timeit measurements:
    parsing:           0.0019435351 sec/call