1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912
|
======================
Compilation feature
======================
.. warning:: This feature is fully implemented but may not be fully mature.
Overall
=========
Construct 2.9 adds an experimental feature: compiling user made constructs into much faster (but less feature-rich) code. If you are familiar with Kaitai Struct, an alternative framework to Construct, Kaitai compiles yaml-based schemas into pure Python modules. Construct on the other hand, defines schemas in pure Python and compiles them into pure Python modules. Once you define a construct, you can use it to parse and build blobs without compilation. Compilation has only one purpose: performance.
It should be made clear that currently the compiler supports only parsing. Building and sizeof are deferred to original constructs, from which a compiled instance was made. Building support may be added in the future, depending on popularity of this feature. In that sense, perhaps the documentation should use the term "compiled parser" rather than "compiled construct".
Requirements
---------------
Compilation feature requires Construct 2.9, preferrably the newest version to date. More importantly, you should have a test suite of your own. Construct aims to be reliable, but the compiler makes some undocumented assumptions, and generates a code that "takes shortcuts". Since few checks are ommited by generated code, you should not use it to parse corrupted data.
Restrictions
---------------
Compiled classes only parse faster, building and sizeof defers to core classes
Sizeof is applied during compilation (not during parsing and building)
Lambdas (unlike this expressions) are not compilable.
Exceptions do not include `path` information
Struct Sequence FocusedSeq Union LazyStruct do not support `_subcons _stream` context entries
Parsed hooks are not supported, ignored
Compiling schemas
===================
Every construct (even those that do not compile) has a parameter-less `compile` method that returns also a construct (instance of Compiled class). It may be a good idea to compile something that is used for processing megabyte-sized data or millions of blobs. That compiled instance has `parse` and `build` methods just like the construct is was compiled from. Therefore, in your code, you can simply reassign the compiled instance over the original one.
>>> st = Struct("num" / Byte)
>>> st.parse(b"\x01")
Container(num=1)
>>> st = st.compile(filename="copyforinspection.py")
>>> st.parse(b"\x01")
Container(num=1)
Performance boost can be easily measured. This method also happens to be testing the correctness of the compiled parser, by making sure that both original and compiled instance parse into same results.
>>> print(st.benchmark(sampledata))
Timeit measurements:
parsing: 0.0000475557 sec/call
parsing compiled: 0.0000159182 sec/call
building: 0.0000591526 sec/call
Motivation
============
The code generated by compiler and core classes have essentially same functionality, but there is a noticable difference in performance. First half of performance boost is thanks to pre-processing, as shown in this chapter. Pre-processing means inserting constants instead of variable lookups, constants means just variables that are known at compile time. The second half is thanks to pypy. This chapter explains the performance difference by comparing `Struct FormatField BytesInteger Bytes` classes, including using the context. Example construct:
::
Struct(
"num8" / Int8ub,
"num24" / Int24ub,
"data" / Bytes(this.num8),
)
Compiled parsing code:
::
def read_bytes(io, count):
assert count >= 0
data = io.read(count)
assert len(data) == count
return data
def parse_struct_1(io, this):
this = Container(_ = this)
try:
this['num8'] = unpack('>B', read_bytes(io, 1))[0]
this['num24'] = int.from_bytes(read_bytes(io, 3), byteorder='big', signed=False)
this['data'] = read_bytes(io, this.num8)
except StopIteration:
pass
del this['_']
return this
def parseall(io, this):
return parse_struct_1(io, this)
compiledschema = Compiled(None, None, parseall)
Non-compiled parsing code:
::
def _read_stream(stream, length):
if length < 0:
raise StreamError("length must be non-negative, found %s" % length)
try:
data = stream.read(length)
except Exception:
raise StreamError("stream.read() failed, requested %s bytes" % (length,))
if len(data) != length:
raise StreamError("could not read enough bytes, expected %d, found %d" % (length, len(data)))
return data
class FormatField(Construct):
def _parse(self, stream, context, path):
data = _read_stream(stream, self.length)
try:
return struct.unpack(self.fmtstr, data)[0]
except Exception:
raise FormatFieldError("struct %r error during parsing" % self.fmtstr)
class BytesInteger(Construct):
def _parse(self, stream, context, path):
length = self.length(context) if callable(self.length) else self.length
data = _read_stream(stream, length)
if self.swapped:
data = data[::-1]
return bytes2integer(data, self.signed)
class Bytes(Construct):
def _parse(self, stream, context, path):
length = self.length(context) if callable(self.length) else self.length
return _read_stream(stream, length)
class Renamed(Subconstruct):
def _parse(self, stream, context, path):
path += " -> %s" % (self.name,)
return self.subcon._parse(stream, context, path)
class Struct(Construct):
def _parse(self, stream, context, path):
obj = Container()
context = Container(_ = context)
context._subcons = Container({sc.name:sc for sc in self.subcons if sc.name})
for sc in self.subcons:
try:
subobj = sc._parse(stream, context, path)
if sc.name:
obj[sc.name] = subobj
context[sc.name] = subobj
except StopIteration:
break
return obj
There are several "shortcuts" that the compiled code does:
Function calls are relatively expensive, so an inlined expression is faster than a function returning the same exact expression. Therefore FormatField compiles into `struct.unpack(..., read_bytes(io, ...))` directly.
Literals like 1 and '>B' are faster than object field lookup, dictionary lookup, or passing function arguments. Therefore each instance of FormatField compiles into a similar expression but with different format-strings and byte-counts inlined, usually literals.
Passing parameters to functions is slower than just referring to variables in same scope. Therefore, for example, compiled Struct creates "this" variable that is accessible to all expressions generated by subcons, as it exists in same scope, but core Struct would call subcon._parse and pass entire context as parameter value, regardless whether that subcon even uses a context (for example FormatField VarInt have no need for a context). Its similar but not exactly the same with "restream" function. The lambda in second parameter is rebounding `io` to a different object (a stream that gets created inside restream function). On the other hand, `this` is not rebounded, it exists in outer scope.
If statement (or conditional ternary operator) with two possible expressions and a condition that could be evaluated at compile-time is slower than just one or the other expression. Therefore, for example, BytesInteger does a lookup to check if field is swapped, but compiled BytesInteger simply inlines 'big' or 'little' literal. Moreover, Struct checks if each subcon has a name and then inserts a value into the context dictionary, but compiled Struct simply has an assignment or not. This shortcut also applies to most constructs, those that accept context lambdas as parameters. Generated classes do not need to check if a parameter is a constant or a lambda, because what gets emitted is either something like "1" which is a literal, or something like "this.field" which is an object lookup. Both are valid expressions and evaluate without red tape, or checks.
Looping over an iterable is slower than a block of code that accesses each item once. The reason its slower is that each iteration must fetch another item, and also check termination condition. Loop unrolling technique requires the iterable (or list rather) to be known at compile-time, which is the case with Struct and Sequence instances. Therefore, compiled Struct emits one line per subcon, but core Struct loops over its subcons.
Function calls that only defer to another function are only wasting CPU cycles. This relates specifically to Renamed class, which in compiled code emits same code as its subcon. Entire functionality of Renamed class (maintaining path information) is not supported in compiled code, where it would serve as mere subconstruct, just deferring to subcon.
Building two identical dictionaries is slower than building just one. Struct maintains two dictionaries (called obj and context) which differ only by _ key, but compiled Struct maintains only one dictionary and removes the _ key before returning it.
This expressions (not lambdas) are expensive to compute in regular code but something like "this.field" in a compiled code is merely one object field lookup. Same applies to `len_ obj_ list_` expressions since they share the implementation with `this` expression.
Container is an implementation of so called AttrDict. It captures access to its attributes (field in this.field) and treats it as dictionary key access (this.field becomes this["field"]). However, due to internal CPython drawbacks, capturing attribute access involves some red tape, unlike accessing keys, which is done directly. Therefore compiled Struct emits lines that assign to Container keys, not attributes.
Empirical evidence
---------------------
The "shortcuts" that are described above are not much, but amount to quite a large portion of actual run-time. In fact, they amount to about a third (31%) of entire run-time. Note that this benchmark includes only pure-python compile-time optimisations.
Notice that results are in microseconds (10**-6).
::
-------------------------------- benchmark: 158 tests --------------------------------
Name (time in us) Min StdDev
--------------------------------------------------------------------------------------
test_class_array_parse 284.7820 (74.05) 31.0403 (118.46)
test_class_array_parse_compiled 73.6430 (19.15) 10.7624 (41.07)
test_class_greedyrange_parse 325.6610 (84.67) 31.8383 (121.50)
test_class_greedyrange_parse_compiled 300.9270 (78.24) 24.0149 (91.65)
test_class_repeatuntil_parse 10.2730 (2.67) 0.8322 (3.18)
test_class_repeatuntil_parse_compiled 7.3020 (1.90) 1.3155 (5.02)
test_class_string_parse 21.2270 (5.52) 1.3555 (5.17)
test_class_string_parse_compiled 18.9030 (4.91) 1.6023 (6.11)
test_class_cstring_parse 10.9060 (2.84) 1.0971 (4.19)
test_class_cstring_parse_compiled 9.4050 (2.45) 1.6083 (6.14)
test_class_pascalstring_parse 7.9290 (2.06) 0.4959 (1.89)
test_class_pascalstring_parse_compiled 6.6670 (1.73) 0.6601 (2.52)
test_class_struct_parse 43.5890 (11.33) 4.4993 (17.17)
test_class_struct_parse_compiled 18.7370 (4.87) 2.0198 (7.71)
test_class_sequence_parse 20.7810 (5.40) 2.6298 (10.04)
test_class_sequence_parse_compiled 11.9820 (3.12) 3.2669 (12.47)
test_class_union_parse 91.0570 (23.68) 10.2126 (38.97)
test_class_union_parse_compiled 31.9240 (8.30) 3.5955 (13.72)
test_overall_parse 3,200.7850 (832.23) 224.9197 (858.34)
test_overall_parse_compiled 2,229.9610 (579.81) 118.2029 (451.09)
--------------------------------------------------------------------------------------
..
-------------------------------- benchmark: 158 tests --------------------------------
Name (time in us) Min StdDev
--------------------------------------------------------------------------------------
test_class_aligned_build 7.8420 (2.04) 0.8678 (3.31)
test_class_aligned_parse 6.6060 (1.72) 0.6813 (2.60)
test_class_aligned_parse_compiled 5.3540 (1.39) 1.4117 (5.39)
test_class_array_build 326.6060 (84.92) 38.4864 (146.87)
test_class_array_parse 284.7820 (74.05) 31.0403 (118.46)
test_class_array_parse_compiled 73.6430 (19.15) 10.7624 (41.07)
test_class_bitsinteger_build 19.5040 (5.07) 0.9291 (3.55)
test_class_bitsinteger_parse 19.2790 (5.01) 3.8293 (14.61)
test_class_bitsinteger_parse_compiled 17.9910 (4.68) 4.5695 (17.44)
test_class_bitsswapped1_build 20.2650 (5.27) 2.7666 (10.56)
test_class_bitsswapped1_parse 18.8030 (4.89) 3.6720 (14.01)
test_class_bitsswapped1_parse_compiled 18.3760 (4.78) 3.1836 (12.15)
test_class_bitsswapped2_build 860.2690 (223.68) 65.2748 (249.10)
test_class_bitsswapped2_parse 810.8180 (210.82) 113.5936 (433.50)
test_class_bitwise1_build 38.3340 (9.97) 2.8267 (10.79)
test_class_bitwise1_parse 19.0340 (4.95) 1.6937 (6.46)
test_class_bitwise1_parse_compiled 18.3380 (4.77) 1.9169 (7.32)
test_class_bitwise2_build 5,181.2200 (>1000.0) 176.1713 (672.30)
test_class_bitwise2_parse 4,641.4420 (>1000.0) 149.0798 (568.92)
test_class_bytes_build 5.2700 (1.37) 0.3894 (1.49)
test_class_bytes_parse 4.3720 (1.14) 0.2620 (1.0)
test_class_bytes_parse_compiled 4.3770 (1.14) 0.4845 (1.85)
test_class_bytesinteger_build 7.1130 (1.85) 0.5597 (2.14)
test_class_bytesinteger_parse 6.1550 (1.60) 0.8879 (3.39)
test_class_bytesinteger_parse_compiled 5.9690 (1.55) 0.8120 (3.10)
test_class_byteswapped1_build 7.8880 (2.05) 1.6156 (6.17)
test_class_byteswapped1_parse 6.6990 (1.74) 1.4248 (5.44)
test_class_byteswapped1_parse_compiled 5.8140 (1.51) 1.0893 (4.16)
test_class_bytewise1_build 54.3910 (14.14) 3.5353 (13.49)
test_class_bytewise1_parse 51.2590 (13.33) 4.9621 (18.94)
test_class_bytewise1_parse_compiled 51.1530 (13.30) 5.0922 (19.43)
test_class_bytewise2_build 1,264.2500 (328.72) 76.9591 (293.69)
test_class_bytewise2_parse 1,233.1150 (320.62) 65.5335 (250.09)
test_class_check_build 7.7850 (2.02) 0.9710 (3.71)
test_class_check_parse 7.5500 (1.96) 1.0495 (4.01)
test_class_check_parse_compiled 5.7900 (1.51) 0.7776 (2.97)
test_class_computed_build 6.7760 (1.76) 0.6328 (2.41)
test_class_computed_parse 6.5940 (1.71) 0.6383 (2.44)
test_class_computed_parse_compiled 6.7670 (1.76) 0.7396 (2.82)
test_class_const_build 5.8600 (1.52) 0.6461 (2.47)
test_class_const_parse 4.8930 (1.27) 0.3691 (1.41)
test_class_const_parse_compiled 4.6680 (1.21) 0.6549 (2.50)
test_class_cstring_build 7.7910 (2.03) 32.0498 (122.31)
test_class_cstring_parse 10.9060 (2.84) 1.0971 (4.19)
test_class_cstring_parse_compiled 9.4050 (2.45) 1.6083 (6.14)
test_class_default_build 5.8910 (1.53) 0.7784 (2.97)
test_class_default_parse 5.0430 (1.31) 0.5048 (1.93)
test_class_default_parse_compiled 4.7200 (1.23) 0.7015 (2.68)
test_class_enum_build 6.4310 (1.67) 0.4820 (1.84)
test_class_enum_parse 6.4100 (1.67) 0.2944 (1.12)
test_class_enum_parse_compiled 4.9280 (1.28) 0.5852 (2.23)
test_class_flag_build 4.7740 (1.24) 0.5016 (1.91)
test_class_flag_parse 4.2450 (1.10) 0.8202 (3.13)
test_class_flag_parse_compiled 4.4510 (1.16) 0.7262 (2.77)
test_class_flagsenum_build 9.5940 (2.49) 2.3077 (8.81)
test_class_flagsenum_parse 14.9890 (3.90) 1.1867 (4.53)
test_class_flagsenum_parse_compiled 12.5860 (3.27) 7.8440 (29.93)
test_class_focusedseq_build 27.4290 (7.13) 3.5810 (13.67)
test_class_focusedseq_parse 23.9230 (6.22) 2.9801 (11.37)
test_class_focusedseq_parse_compiled 11.4680 (2.98) 1.8008 (6.87)
test_class_formatfield_build 5.3830 (1.40) 0.3952 (1.51)
test_class_formatfield_parse 4.7820 (1.24) 0.3797 (1.45)
test_class_formatfield_parse_compiled 4.7870 (1.24) 0.7985 (3.05)
test_class_greedybytes_build 3.9610 (1.03) 0.5677 (2.17)
test_class_greedybytes_parse 3.8460 (1.0) 0.3800 (1.45)
test_class_greedybytes_parse_compiled 3.9150 (1.02) 0.4162 (1.59)
test_class_greedyrange_build 328.9710 (85.54) 17.5818 (67.10)
test_class_greedyrange_parse 325.6610 (84.67) 31.8383 (121.50)
test_class_greedyrange_parse_compiled 300.9270 (78.24) 24.0149 (91.65)
test_class_greedystring_build 5.3440 (1.39) 0.6892 (2.63)
test_class_greedystring_parse 5.0730 (1.32) 0.9543 (3.64)
test_class_greedystring_parse_compiled 4.5540 (1.18) 0.5366 (2.05)
test_class_hex_build 4.6150 (1.20) 0.5106 (1.95)
test_class_hex_parse 5.2830 (1.37) 0.8942 (3.41)
test_class_hex_parse_compiled 3.9050 (1.02) 0.6158 (2.35)
test_class_hexdump_build 4.6340 (1.20) 0.8433 (3.22)
test_class_hexdump_parse 5.0960 (1.33) 1.0297 (3.93)
test_class_hexdump_parse_compiled 3.9120 (1.02) 0.7631 (2.91)
test_class_ifthenelse_build 8.9100 (2.32) 0.9234 (3.52)
test_class_ifthenelse_parse 8.3680 (2.18) 0.7548 (2.88)
test_class_ifthenelse_parse_compiled 6.7390 (1.75) 0.7323 (2.79)
test_class_mapping_build 6.3000 (1.64) 0.9057 (3.46)
test_class_mapping_parse 5.6000 (1.46) 1.6992 (6.48)
test_class_mapping_parse_compiled 4.9730 (1.29) 0.6396 (2.44)
test_class_namedtuple1_build 18.0560 (4.69) 2.1252 (8.11)
test_class_namedtuple1_parse 16.8770 (4.39) 2.5048 (9.56)
test_class_namedtuple1_parse_compiled 9.0800 (2.36) 1.3966 (5.33)
test_class_namedtuple2_build 46.3020 (12.04) 4.8023 (18.33)
test_class_namedtuple2_parse 34.1590 (8.88) 3.9813 (15.19)
test_class_namedtuple2_parse_compiled 16.1740 (4.21) 2.1471 (8.19)
test_class_numpy_build 212.2070 (55.18) 19.0170 (72.57)
test_class_numpy_parse 287.4910 (74.75) 1,033.8723 (>1000.0)
test_class_numpy_parse_compiled 289.1160 (75.17) 31.5770 (120.50)
test_class_padded_build 7.6610 (1.99) 1.0465 (3.99)
test_class_padded_parse 6.5550 (1.70) 0.8192 (3.13)
test_class_padded_parse_compiled 5.3810 (1.40) 0.6683 (2.55)
test_class_padding_build 6.1410 (1.60) 0.4382 (1.67)
test_class_padding_parse 5.3390 (1.39) 0.3259 (1.24)
test_class_padding_parse_compiled 4.5490 (1.18) 0.6567 (2.51)
test_class_pascalstring_build 9.0730 (2.36) 0.6574 (2.51)
test_class_pascalstring_parse 7.9290 (2.06) 0.4959 (1.89)
test_class_pascalstring_parse_compiled 6.6670 (1.73) 0.6601 (2.52)
test_class_peek_build 14.8610 (3.86) 1.5169 (5.79)
test_class_peek_parse 19.3210 (5.02) 1.7638 (6.73)
test_class_peek_parse_compiled 11.9050 (3.10) 1.2330 (4.71)
test_class_pickled_build 5.5730 (1.45) 0.8605 (3.28)
test_class_pickled_parse 8.1680 (2.12) 0.8642 (3.30)
test_class_pickled_parse_compiled 8.9110 (2.32) 1.5638 (5.97)
test_class_pointer_build 7.2010 (1.87) 0.3975 (1.52)
test_class_pointer_parse 6.3530 (1.65) 0.6129 (2.34)
test_class_pointer_parse_compiled 5.7300 (1.49) 0.6892 (2.63)
test_class_prefixed_build 7.8600 (2.04) 0.4987 (1.90)
test_class_prefixed_parse 6.8100 (1.77) 0.7110 (2.71)
test_class_prefixed_parse_compiled 6.1950 (1.61) 0.6435 (2.46)
test_class_prefixedarray_build 855.3260 (222.39) 55.4369 (211.56)
test_class_prefixedarray_parse 757.6910 (197.01) 49.8982 (190.42)
test_class_prefixedarray_parse_compiled 184.4760 (47.97) 14.9617 (57.10)
test_class_rawcopy_build1 13.3870 (3.48) 2.1631 (8.25)
test_class_rawcopy_build2 16.8280 (4.38) 3.4464 (13.15)
test_class_rawcopy_parse 14.4990 (3.77) 1.3540 (5.17)
test_class_rawcopy_parse_compiled 14.9130 (3.88) 4.8756 (18.61)
test_class_rebuild_build 5.8890 (1.53) 0.5504 (2.10)
test_class_rebuild_parse 5.0030 (1.30) 0.6272 (2.39)
test_class_rebuild_parse_compiled 4.8300 (1.26) 0.5108 (1.95)
test_class_repeatuntil_build 11.1090 (2.89) 0.8754 (3.34)
test_class_repeatuntil_parse 10.2730 (2.67) 0.8322 (3.18)
test_class_repeatuntil_parse_compiled 7.3020 (1.90) 1.3155 (5.02)
test_class_select_build 19.3270 (5.03) 2.1872 (8.35)
test_class_select_parse 5.5500 (1.44) 0.5927 (2.26)
test_class_select_parse_compiled 5.9140 (1.54) 0.9409 (3.59)
test_class_sequence_build 23.9440 (6.23) 3.7300 (14.23)
test_class_sequence_parse 20.7810 (5.40) 2.6298 (10.04)
test_class_sequence_parse_compiled 11.9820 (3.12) 3.2669 (12.47)
test_class_string_build 8.4160 (2.19) 0.5589 (2.13)
test_class_string_parse 21.2270 (5.52) 1.3555 (5.17)
test_class_string_parse_compiled 18.9030 (4.91) 1.6023 (6.11)
test_class_struct_build 49.0800 (12.76) 3.9414 (15.04)
test_class_struct_parse 43.5890 (11.33) 4.4993 (17.17)
test_class_struct_parse_compiled 18.7370 (4.87) 2.0198 (7.71)
test_class_switch_build 9.2500 (2.41) 0.4969 (1.90)
test_class_switch_parse 8.4710 (2.20) 0.7958 (3.04)
test_class_switch_parse_compiled 7.1160 (1.85) 0.7794 (2.97)
test_class_timestamp1_build 9.7510 (2.54) 1.0072 (3.84)
test_class_timestamp1_parse 29.7140 (7.73) 2.7236 (10.39)
test_class_timestamp1_parse_compiled 30.2160 (7.86) 3.5592 (13.58)
test_class_timestamp2_build 100.4570 (26.12) 15.4131 (58.82)
test_class_timestamp2_parse 106.5390 (27.70) 12.0199 (45.87)
test_class_timestamp2_parse_compiled 107.6340 (27.99) 17.3917 (66.37)
test_class_union_build 55.8850 (14.53) 6.5646 (25.05)
test_class_union_parse 91.0570 (23.68) 10.2126 (38.97)
test_class_union_parse_compiled 31.9240 (8.30) 3.5955 (13.72)
test_class_varint_build 14.9650 (3.89) 0.8179 (3.12)
test_class_varint_parse 18.6660 (4.85) 1.6747 (6.39)
test_class_varint_parse_compiled 19.6660 (5.11) 5.0212 (19.16)
test_overall_build 2,848.2370 (740.57) 5,609.2037 (>1000.0)
test_overall_build_compiled 2,852.9260 (741.79) 163.0128 (622.09)
test_overall_parse 3,200.7850 (832.23) 224.9197 (858.34)
test_overall_parse_compiled 2,229.9610 (579.81) 118.2029 (451.09)
--------------------------------------------------------------------------------------
Motivation, part 2
=====================
The second part of optimisation is just running the generated code on pypy. Since pypy is not using any type annotations, there is nothing to discuss in this chapter. The benchmark reflects the same code as in previous chapter, but ran on Pypy 2.7 rather than CPython 3.6.
Empirical evidence
---------------------
Notice that results are in nanoseconds (10**-9).
::
------------------------------------- benchmark: 152 tests ------------------------------------
Name (time in ns) Min StdDev
-----------------------------------------------------------------------------------------------
test_class_array_parse 11,042.9974 (103.52) 40,792.8559 (46.97)
test_class_array_parse_compiled 9,088.0058 (85.20) 43,001.3909 (49.52)
test_class_greedyrange_parse 14,402.0014 (135.01) 49,834.2047 (57.38)
test_class_greedyrange_parse_compiled 9,801.0059 (91.88) 39,296.4529 (45.25)
test_class_repeatuntil_parse 318.4996 (2.99) 2,469.5524 (2.84)
test_class_repeatuntil_parse_compiled 309.3746 (2.90) 103,425.2134 (119.09)
test_class_string_parse 966.8991 (9.06) 537,241.0095 (618.62)
test_class_string_parse_compiled 726.6994 (6.81) 3,719.2657 (4.28)
test_class_cstring_parse 782.2993 (7.33) 4,111.8970 (4.73)
test_class_cstring_parse_compiled 591.1992 (5.54) 479,164.9746 (551.75)
test_class_pascalstring_parse 465.0911 (4.36) 4,262.4397 (4.91)
test_class_pascalstring_parse_compiled 298.4118 (2.80) 122,279.2150 (140.80)
test_class_struct_parse 2,633.9985 (24.69) 14,654.3095 (16.87)
test_class_struct_parse_compiled 949.7991 (8.90) 4,228.2890 (4.87)
test_class_sequence_parse 1,310.6008 (12.29) 5,811.8046 (6.69)
test_class_sequence_parse_compiled 732.2000 (6.86) 4,703.9483 (5.42)
test_class_union_parse 5,619.9933 (52.69) 30,590.0630 (35.22)
test_class_union_parse_compiled 2,699.9987 (25.31) 15,888.8206 (18.30)
test_overall_parse 1,332,581.9891 (>1000.0) 2,274,995.4192 (>1000.0)
test_overall_parse_compiled 690,380.0095 (>1000.0) 602,697.9721 (694.00)
-----------------------------------------------------------------------------------------------
..
------------------------------------- benchmark: 152 tests ------------------------------------
Name (time in ns) Min StdDev
-----------------------------------------------------------------------------------------------
test_class_aligned_build 740.5994 (6.94) 4,143.5039 (4.77)
test_class_aligned_parse 602.1000 (5.64) 4,001.4447 (4.61)
test_class_aligned_parse_compiled 237.5240 (2.23) 233,368.4415 (268.72)
test_class_array_build 12,085.9913 (113.30) 4,199,133.4429 (>1000.0)
test_class_array_parse 11,042.9974 (103.52) 40,792.8559 (46.97)
test_class_array_parse_compiled 9,088.0058 (85.20) 43,001.3909 (49.52)
test_class_bitsinteger_build 3,602.4940 (33.77) 1,177,244.9019 (>1000.0)
test_class_bitsinteger_parse 2,823.5008 (26.47) 14,156.0060 (16.30)
test_class_bitsinteger_parse_compiled 2,768.9966 (25.96) 14,832.6464 (17.08)
test_class_bitsswapped1_build 5,726.9935 (53.69) 29,157.1889 (33.57)
test_class_bitsswapped1_parse 6,172.9952 (57.87) 28,735.2233 (33.09)
test_class_bitsswapped1_parse_compiled 5,715.9923 (53.59) 26,115.4525 (30.07)
test_class_bitsswapped2_build 38,265.0032 (358.72) 92,216.9408 (106.19)
test_class_bitsswapped2_parse 36,199.9992 (339.36) 99,672.2831 (114.77)
test_class_bitwise1_build 7,979.0043 (74.80) 18,320.0158 (21.10)
test_class_bitwise1_parse 5,914.0002 (55.44) 15,593.2498 (17.96)
test_class_bitwise1_parse_compiled 5,969.9960 (55.97) 10,953.7787 (12.61)
test_class_bitwise2_build 136,212.0092 (>1000.0) 126,711.5616 (145.91)
test_class_bitwise2_parse 120,290.0021 (>1000.0) 100,256.6237 (115.44)
test_class_bytes_build 106.6699 (1.0) 45,663.4740 (52.58)
test_class_bytes_parse 166.0601 (1.56) 26,090.0331 (30.04)
test_class_bytes_parse_compiled 172.6300 (1.62) 38,715.3059 (44.58)
test_class_bytesinteger_build 440.4998 (4.13) 2,794.5403 (3.22)
test_class_bytesinteger_parse 397.6915 (3.73) 2,760.2520 (3.18)
test_class_bytesinteger_parse_compiled 404.1537 (3.79) 314,221.4811 (361.82)
test_class_byteswapped1_build 423.0011 (3.97) 439,883.6772 (506.52)
test_class_byteswapped1_parse 700.1989 (6.56) 5,650.5263 (6.51)
test_class_byteswapped1_parse_compiled 467.4551 (4.38) 375,681.4718 (432.59)
test_class_bytewise1_build 13,313.0088 (124.81) 40,142.8640 (46.22)
test_class_bytewise1_parse 13,626.0060 (127.74) 2,380,928.9149 (>1000.0)
test_class_bytewise1_parse_compiled 13,586.0028 (127.36) 35,062.2700 (40.37)
test_class_bytewise2_build 72,109.9932 (676.01) 73,553.4202 (84.70)
test_class_bytewise2_parse 66,791.9958 (626.16) 140,635.6099 (161.94)
test_class_check_build 740.6998 (6.94) 4,307.2706 (4.96)
test_class_check_parse 541.0999 (5.07) 3,440.5007 (3.96)
test_class_check_parse_compiled 545.6997 (5.12) 679,945.6527 (782.95)
test_class_computed_build 679.1000 (6.37) 605,315.9050 (697.01)
test_class_computed_parse 526.0008 (4.93) 3,428.9984 (3.95)
test_class_computed_parse_compiled 552.2001 (5.18) 3,464.2913 (3.99)
test_class_const_build 310.6879 (2.91) 2,745.9160 (3.16)
test_class_const_parse 176.2500 (1.65) 79,386.8928 (91.41)
test_class_const_parse_compiled 182.1501 (1.71) 94,547.7996 (108.87)
test_class_cstring_build 491.0001 (4.60) 3,734.7308 (4.30)
test_class_cstring_parse 782.2993 (7.33) 4,111.8970 (4.73)
test_class_cstring_parse_compiled 591.1992 (5.54) 479,164.9746 (551.75)
test_class_default_build 461.9995 (4.33) 3,437.9897 (3.96)
test_class_default_parse 220.9200 (2.07) 875.7176 (1.01)
test_class_default_parse_compiled 167.3000 (1.57) 115,216.5525 (132.67)
test_class_enum_build 318.2495 (2.98) 329,774.1824 (379.73)
test_class_enum_parse 216.3301 (2.03) 98,506.1576 (113.43)
test_class_enum_parse_compiled 150.8200 (1.41) 56,082.0649 (64.58)
test_class_flag_build 204.2799 (1.92) 130,206.5059 (149.93)
test_class_flag_parse 153.9801 (1.44) 100,694.1426 (115.95)
test_class_flag_parse_compiled 139.8900 (1.31) 868.4449 (1.0)
test_class_flagsenum_build 573.3993 (5.38) 4,344.7692 (5.00)
test_class_flagsenum_parse 652.1004 (6.11) 422,339.3586 (486.32)
test_class_flagsenum_parse_compiled 464.5461 (4.35) 3,596.9171 (4.14)
test_class_focusedseq_build 2,233.9998 (20.94) 6,533.8875 (7.52)
test_class_focusedseq_parse 1,345.1005 (12.61) 5,739.1458 (6.61)
test_class_focusedseq_parse_compiled 615.0003 (5.77) 3,967.2471 (4.57)
test_class_formatfield_build 282.0557 (2.64) 286,541.4444 (329.95)
test_class_formatfield_parse 237.0500 (2.22) 63,666.5654 (73.31)
test_class_formatfield_parse_compiled 154.2599 (1.45) 35,054.4102 (40.36)
test_class_greedybytes_build 110.4000 (1.03) 89,466.1548 (103.02)
test_class_greedybytes_parse 117.2700 (1.10) 94,205.4030 (108.48)
test_class_greedybytes_parse_compiled 118.3101 (1.11) 88,084.6992 (101.43)
test_class_greedyrange_build 12,186.0066 (114.24) 37,782.4850 (43.51)
test_class_greedyrange_parse 14,402.0014 (135.01) 49,834.2047 (57.38)
test_class_greedyrange_parse_compiled 9,801.0059 (91.88) 39,296.4529 (45.25)
test_class_greedystring_build 348.3331 (3.27) 3,029.8253 (3.49)
test_class_greedystring_parse 473.3645 (4.44) 3,041.7270 (3.50)
test_class_greedystring_parse_compiled 409.9241 (3.84) 387,658.3773 (446.38)
test_class_hex_build 459.6355 (4.31) 4,006.9444 (4.61)
test_class_hex_parse 291.4441 (2.73) 182,038.6025 (209.61)
test_class_hex_parse_compiled 126.4800 (1.19) 84,815.3901 (97.66)
test_class_hexdump_build 450.4157 (4.22) 3,790.8239 (4.37)
test_class_hexdump_parse 284.8335 (2.67) 294,559.8261 (339.18)
test_class_hexdump_parse_compiled 128.8101 (1.21) 78,435.0791 (90.32)
test_class_ifthenelse_build 982.9993 (9.22) 4,688.0488 (5.40)
test_class_ifthenelse_parse 851.1997 (7.98) 580,777.8856 (668.76)
test_class_ifthenelse_parse_compiled 733.0003 (6.87) 4,714.3734 (5.43)
test_class_mapping_build 336.3336 (3.15) 419,990.5974 (483.61)
test_class_mapping_parse 226.8000 (2.13) 111,247.9039 (128.10)
test_class_mapping_parse_compiled 184.2000 (1.73) 872.1972 (1.00)
test_class_namedtuple1_build 918.4005 (8.61) 3,765.2820 (4.34)
test_class_namedtuple1_parse 673.6998 (6.32) 3,434.7049 (3.96)
test_class_namedtuple1_parse_compiled 610.4994 (5.72) 551,488.8854 (635.03)
test_class_namedtuple2_build 3,212.0006 (30.11) 13,384.9602 (15.41)
test_class_namedtuple2_parse 1,786.3000 (16.75) 4,818.3417 (5.55)
test_class_namedtuple2_parse_compiled 728.0993 (6.83) 3,332.2180 (3.84)
test_class_padded_build 732.6991 (6.87) 3,967.5355 (4.57)
test_class_padded_parse 583.3004 (5.47) 4,356.6780 (5.02)
test_class_padded_parse_compiled 301.4703 (2.83) 305,922.3763 (352.26)
test_class_padding_build 499.1823 (4.68) 3,525.5175 (4.06)
test_class_padding_parse 350.1996 (3.28) 328,502.3785 (378.27)
test_class_padding_parse_compiled 192.7000 (1.81) 82,517.9180 (95.02)
test_class_pascalstring_build 483.4543 (4.53) 243,109.6546 (279.94)
test_class_pascalstring_parse 465.0911 (4.36) 4,262.4397 (4.91)
test_class_pascalstring_parse_compiled 298.4118 (2.80) 122,279.2150 (140.80)
test_class_peek_build 952.7997 (8.93) 6,047.5404 (6.96)
test_class_peek_parse 1,454.3999 (13.63) 774,202.5660 (891.48)
test_class_peek_parse_compiled 438.8183 (4.11) 3,811.7552 (4.39)
test_class_pointer_build 576.9005 (5.41) 3,782.3046 (4.36)
test_class_pointer_parse 377.6430 (3.54) 393,433.4406 (453.03)
test_class_pointer_parse_compiled 210.3799 (1.97) 947.6097 (1.09)
test_class_prefixed_build 888.7000 (8.33) 5,004.2176 (5.76)
test_class_prefixed_parse 757.0008 (7.10) 524,495.2616 (603.95)
test_class_prefixed_parse_compiled 471.9080 (4.42) 439,226.7896 (505.76)
test_class_prefixedarray_build 37,869.9915 (355.02) 59,808.3893 (68.87)
test_class_prefixedarray_parse 29,731.0035 (278.72) 10,591,190.0651 (>1000.0)
test_class_prefixedarray_parse_compiled 22,710.9995 (212.91) 65,049.0162 (74.90)
test_class_rawcopy_build1 1,041.5999 (9.76) 5,312.0368 (6.12)
test_class_rawcopy_build2 1,513.5010 (14.19) 931,668.4553 (>1000.0)
test_class_rawcopy_parse 1,064.9004 (9.98) 5,628.3455 (6.48)
test_class_rawcopy_parse_compiled 669.7999 (6.28) 4,616.0835 (5.32)
test_class_rebuild_build 409.5006 (3.84) 3,371.2846 (3.88)
test_class_rebuild_parse 225.8090 (2.12) 1,961.0702 (2.26)
test_class_rebuild_parse_compiled 164.7700 (1.54) 82,487.8733 (94.98)
test_class_repeatuntil_build 475.6360 (4.46) 3,568.2374 (4.11)
test_class_repeatuntil_parse 318.4996 (2.99) 2,469.5524 (2.84)
test_class_repeatuntil_parse_compiled 309.3746 (2.90) 103,425.2134 (119.09)
test_class_select_build 7,528.9863 (70.58) 23,358.3203 (26.90)
test_class_select_parse 395.7684 (3.71) 468,021.0341 (538.92)
test_class_select_parse_compiled 194.6000 (1.82) 911.6117 (1.05)
test_class_sequence_build 1,521.9004 (14.27) 6,600.0406 (7.60)
test_class_sequence_parse 1,310.6008 (12.29) 5,811.8046 (6.69)
test_class_sequence_parse_compiled 732.2000 (6.86) 4,703.9483 (5.42)
test_class_string_build 535.1001 (5.02) 289,163.7688 (332.97)
test_class_string_parse 966.8991 (9.06) 537,241.0095 (618.62)
test_class_string_parse_compiled 726.6994 (6.81) 3,719.2657 (4.28)
test_class_struct_build 2,857.5014 (26.79) 16,764.1319 (19.30)
test_class_struct_parse 2,633.9985 (24.69) 14,654.3095 (16.87)
test_class_struct_parse_compiled 949.7991 (8.90) 4,228.2890 (4.87)
test_class_switch_build 1,079.1002 (10.12) 4,754.6705 (5.47)
test_class_switch_parse 948.8998 (8.90) 4,558.0161 (5.25)
test_class_switch_parse_compiled 783.7996 (7.35) 4,640.9683 (5.34)
test_class_timestamp1_build 771.2006 (7.23) 3,534.5051 (4.07)
test_class_timestamp1_parse 2,018.1993 (18.92) 5,448.9309 (6.27)
test_class_timestamp1_parse_compiled 1,970.7004 (18.47) 891,363.4033 (>1000.0)
test_class_timestamp2_build 5,808.9936 (54.46) 28,921.4390 (33.30)
test_class_timestamp2_parse 7,547.0016 (70.75) 38,718.9886 (44.58)
test_class_timestamp2_parse_compiled 7,391.9946 (69.30) 36,903.9105 (42.49)
test_class_union_build 3,535.9990 (33.15) 17,829.5208 (20.53)
test_class_union_parse 5,619.9933 (52.69) 30,590.0630 (35.22)
test_class_union_parse_compiled 2,699.9987 (25.31) 15,888.8206 (18.30)
test_class_varint_build 944.5997 (8.86) 5,002.7418 (5.76)
test_class_varint_parse 861.3002 (8.07) 4,343.2995 (5.00)
test_class_varint_parse_compiled 863.2996 (8.09) 4,426.6909 (5.10)
test_overall_build 554,530.0082 (>1000.0) 475,067.7994 (547.03)
test_overall_build_compiled 358,168.0066 (>1000.0) 127,081.1333 (146.33)
test_overall_parse 1,332,581.9891 (>1000.0) 2,274,995.4192 (>1000.0)
test_overall_parse_compiled 690,380.0095 (>1000.0) 602,697.9721 (694.00)
-----------------------------------------------------------------------------------------------
Motivation, part 3
=====================
.. warning:: Benchmarks revealed that pypy makes the code run much faster than cython, therefore cython improvements were withdrawn, and compiler now generates pure python code that is compatible with Python 2 including pypy. This chapter is no longer relevant. It remained just for educational purposes.
This chapter talks about the second half of optimisation, which is due to Cython type annotations and type inference. I should state for the record, that I am no expert at Cython, and following explanatations are merely "the way I understand it". Please take that into account when reading it. Fourth example:
::
Struct(
"num1" / Int8ul,
"num2" / Int24ul,
"fixedarray1" / Array(3, Int8ul),
"name1" / CString("utf8"),
)
::
cdef bytes read_bytes(io, int count):
if not count >= 0: raise StreamError
cdef bytes data = io.read(count)
if not len(data) == count: raise StreamError
return data
cdef bytes parse_nullterminatedstring(io, int unitsize, bytes finalunit):
cdef list result = []
cdef bytes unit
while True:
unit = read_bytes(io, unitsize)
if unit == finalunit:
break
result.append(unit)
return b"".join(result)
def parse_struct_1(io, this):
this = Container(_ = this)
try:
this['num1'] = unpack('<B', read_bytes(io, 1))[0]
this['num2'] = int.from_bytes(read_bytes(io, 3), byteorder='little', signed=False)
this['fixedarray1'] = ListContainer((unpack('<B', read_bytes(io, 1))[0]) for i in range(3))
this['name1'] = (parse_nullterminatedstring(io, 1, b'\x00')).decode('utf8')
pass
except StopIteration:
pass
del this['_']
del this['_index']
return this
def parseall(io, this):
return parse_struct_1(io, this)
compiled = Compiled(None, None, parseall)
The primary cause of speedup in cython is this: if a variable is of known type, then operations on that variable can skip certain checks. If a variable is a pure python object, then those checks need to be added. A variable is considered of known type if either (1) its annotated like "cdef bytes data" or (2) its inferred like when using an annotated function call result like in "parse_nullterminatedstring(...).decode(...)" since "cdef bytes parse_nullterminatedstring(...)". If a variable is known to be a list, then calling "append" on it doesnt require checking if that object has such a method or matching signature (parameters). If a variable is known to be a bytes, then "len(data)" can be compiled into bytes-type length function, not a general-purpose length function that works on arbitrary objects, and also "unit == finalunit" can be compiled into bytes-type equality. If a variable is known to be a unicode, then ".decode('utf8')" can be compiled into str-type implementation. If cython knows that "struct.unpack" returns only tuples, then "...[0]" would compile into tuple-type getitem (index access). Examples are many, but the pattern is the same: type-specific code is faster than type-general code.
Second cause of speedup is due to special handling of integers. While most annotations like "cdef bytes" refer to specific albeit Python types, the "cdef int" actually does not refer to any Python type. It represents a C-integer which is allocated on the stack or in registers, unlike the other types which are allocated on the heap. All operations on C-integers are therefore much faster than on Python-integers. In example code, this affects "count >= 0" and "len(data) == count".
Empirical evidence
---------------------
Below micro-benchmarks show the difference between core classes and cython-compiled classes. Only those where performance boost was highest are listed (although they also happen to be the most important), some other classes have little speedup, and some have none.
Notice that results are in microseconds (10**-6).
::
------------------------------- benchmark: 152 tests -------------------------------
Name (time in us) Min StdDev
------------------------------------------------------------------------------------
test_class_array_parse 286.5460 (73.85) 42.8831 (89.84)
test_class_array_parse_compiled 30.7200 (7.92) 6.9577 (14.58)
test_class_greedyrange_parse 320.9860 (82.73) 45.9480 (96.26)
test_class_greedyrange_parse_compiled 262.7010 (67.71) 36.4504 (76.36)
test_class_repeatuntil_parse 10.1850 (2.63) 2.4147 (5.06)
test_class_repeatuntil_parse_compiled 6.8880 (1.78) 1.5471 (3.24)
test_class_string_parse 20.4400 (5.27) 4.4044 (9.23)
test_class_string_parse_compiled 9.1470 (2.36) 2.2427 (4.70)
test_class_cstring_parse 11.2290 (2.89) 1.6216 (3.40)
test_class_cstring_parse_compiled 5.6080 (1.45) 1.0321 (2.16)
test_class_pascalstring_parse 7.8560 (2.02) 1.8567 (3.89)
test_class_pascalstring_parse_compiled 5.8910 (1.52) 0.9466 (1.98)
test_class_struct_parse 44.1300 (11.37) 6.8434 (14.34)
test_class_struct_parse_compiled 16.9070 (4.36) 3.0500 (6.39)
test_class_sequence_parse 21.5420 (5.55) 2.6852 (5.63)
test_class_sequence_parse_compiled 10.1530 (2.62) 2.1645 (4.53)
test_class_union_parse 91.9150 (23.69) 10.7812 (22.59)
test_class_union_parse_compiled 22.5970 (5.82) 15.2649 (31.98)
test_overall_parse 2,126.2570 (548.01) 255.0154 (534.27)
test_overall_parse_compiled 1,124.9560 (289.94) 127.4730 (267.06)
------------------------------------------------------------------------------------
..
------------------------------- benchmark: 152 tests -------------------------------
Name (time in us) Min StdDev
------------------------------------------------------------------------------------
test_class_aligned_build 7.8110 (2.01) 1.4475 (3.03)
test_class_aligned_parse 6.7560 (1.74) 2.4557 (5.14)
test_class_aligned_parse_compiled 4.7080 (1.21) 1.0038 (2.10)
test_class_array_build 331.7150 (85.49) 45.1915 (94.68)
test_class_array_parse 286.5460 (73.85) 42.8831 (89.84)
test_class_array_parse_compiled 30.7200 (7.92) 6.9577 (14.58)
test_class_bitsinteger_build 19.4150 (5.00) 6.0416 (12.66)
test_class_bitsinteger_parse 19.2520 (4.96) 6.7657 (14.17)
test_class_bitsinteger_parse_compiled 17.4700 (4.50) 11.1148 (23.29)
test_class_bitsswapped1_build 20.0300 (5.16) 3.5605 (7.46)
test_class_bitsswapped1_parse 18.9740 (4.89) 3.1174 (6.53)
test_class_bitsswapped1_parse_compiled 17.4030 (4.49) 3.2099 (6.72)
test_class_bitsswapped2_build 866.5650 (223.34) 99.0145 (207.44)
test_class_bitsswapped2_parse 813.8270 (209.75) 104.6734 (219.29)
test_class_bitwise1_build 38.7430 (9.99) 4.1560 (8.71)
test_class_bitwise1_parse 18.8820 (4.87) 3.8922 (8.15)
test_class_bitwise1_parse_compiled 17.5770 (4.53) 2.1345 (4.47)
test_class_bitwise2_build 5,249.8520 (>1000.0) 247.1093 (517.70)
test_class_bitwise2_parse 4,650.4640 (>1000.0) 605.3646 (>1000.0)
test_class_bytes_build 5.3900 (1.39) 0.7781 (1.63)
test_class_bytes_parse 4.4180 (1.14) 0.4773 (1.0)
test_class_bytes_parse_compiled 4.0220 (1.04) 0.7253 (1.52)
test_class_bytesinteger_build 7.1450 (1.84) 1.4272 (2.99)
test_class_bytesinteger_parse 6.2820 (1.62) 1.4176 (2.97)
test_class_bytesinteger_parse_compiled 5.3420 (1.38) 1.8858 (3.95)
test_class_byteswapped1_build 7.9820 (2.06) 1.5524 (3.25)
test_class_byteswapped1_parse 6.6840 (1.72) 1.2694 (2.66)
test_class_byteswapped1_parse_compiled 4.9890 (1.29) 1.1038 (2.31)
test_class_bytewise1_build 53.7710 (13.86) 5.8007 (12.15)
test_class_bytewise1_parse 49.7540 (12.82) 7.8771 (16.50)
test_class_bytewise1_parse_compiled 48.5480 (12.51) 5.0040 (10.48)
test_class_bytewise2_build 1,270.0850 (327.34) 116.3612 (243.78)
test_class_bytewise2_parse 1,225.2780 (315.79) 99.7644 (209.01)
test_class_check_build 7.9260 (2.04) 1.7875 (3.74)
test_class_check_parse 7.7250 (1.99) 1.7400 (3.65)
test_class_check_parse_compiled 5.8770 (1.51) 1.5456 (3.24)
test_class_computed_build 6.9660 (1.80) 1.0798 (2.26)
test_class_computed_parse 6.6770 (1.72) 1.6214 (3.40)
test_class_computed_parse_compiled 5.6290 (1.45) 0.9689 (2.03)
test_class_const_build 5.9990 (1.55) 1.4849 (3.11)
test_class_const_parse 4.8720 (1.26) 1.1863 (2.49)
test_class_const_parse_compiled 4.2520 (1.10) 0.9856 (2.06)
test_class_cstring_build 7.8570 (2.03) 1.2683 (2.66)
test_class_cstring_parse 11.2290 (2.89) 1.6216 (3.40)
test_class_cstring_parse_compiled 5.6080 (1.45) 1.0321 (2.16)
test_class_default_build 6.0770 (1.57) 1.2640 (2.65)
test_class_default_parse 5.1160 (1.32) 1.1421 (2.39)
test_class_default_parse_compiled 4.4890 (1.16) 1.2474 (2.61)
test_class_enum_build 6.3000 (1.62) 0.9694 (2.03)
test_class_enum_parse 6.3900 (1.65) 0.9849 (2.06)
test_class_enum_parse_compiled 4.5520 (1.17) 0.7292 (1.53)
test_class_flag_build 4.7940 (1.24) 0.6771 (1.42)
test_class_flag_parse 4.3500 (1.12) 0.6541 (1.37)
test_class_flag_parse_compiled 4.1380 (1.07) 0.5723 (1.20)
test_class_flagsenum_build 9.7270 (2.51) 1.1748 (2.46)
test_class_flagsenum_parse 15.2000 (3.92) 2.1840 (4.58)
test_class_flagsenum_parse_compiled 11.6480 (3.00) 1.5491 (3.25)
test_class_focusedseq_build 27.1080 (6.99) 6.3815 (13.37)
test_class_focusedseq_parse 23.6720 (6.10) 3.4153 (7.16)
test_class_focusedseq_parse_compiled 10.7130 (2.76) 2.1026 (4.41)
test_class_formatfield_build 5.3590 (1.38) 1.1223 (2.35)
test_class_formatfield_parse 4.7750 (1.23) 0.8140 (1.71)
test_class_formatfield_parse_compiled 4.4370 (1.14) 0.9037 (1.89)
test_class_greedybytes_build 4.0550 (1.05) 1.1607 (2.43)
test_class_greedybytes_parse 3.8800 (1.0) 0.5046 (1.06)
test_class_greedybytes_parse_compiled 3.9690 (1.02) 1.1108 (2.33)
test_class_greedyrange_build 332.8790 (85.79) 43.8336 (91.83)
test_class_greedyrange_parse 320.9860 (82.73) 45.9480 (96.26)
test_class_greedyrange_parse_compiled 262.7010 (67.71) 36.4504 (76.36)
test_class_greedystring_build 5.3930 (1.39) 0.7442 (1.56)
test_class_greedystring_parse 5.0800 (1.31) 1.1375 (2.38)
test_class_greedystring_parse_compiled 4.6150 (1.19) 0.9228 (1.93)
test_class_hex_build 4.5730 (1.18) 0.8108 (1.70)
test_class_hex_parse 5.4210 (1.40) 0.9506 (1.99)
test_class_hex_parse_compiled 4.0000 (1.03) 0.8198 (1.72)
test_class_hexdump_build 4.5640 (1.18) 0.8572 (1.80)
test_class_hexdump_parse 5.1660 (1.33) 0.8708 (1.82)
test_class_hexdump_parse_compiled 3.9460 (1.02) 0.8104 (1.70)
test_class_ifthenelse_build 9.0200 (2.32) 3.1983 (6.70)
test_class_ifthenelse_parse 8.5450 (2.20) 4.2003 (8.80)
test_class_ifthenelse_parse_compiled 6.4490 (1.66) 3.5984 (7.54)
test_class_mapping_build 6.1160 (1.58) 0.9536 (2.00)
test_class_mapping_parse 5.5320 (1.43) 0.9137 (1.91)
test_class_mapping_parse_compiled 4.5650 (1.18) 0.8350 (1.75)
test_class_namedtuple1_build 18.3450 (4.73) 2.1664 (4.54)
test_class_namedtuple1_parse 17.1850 (4.43) 2.9482 (6.18)
test_class_namedtuple1_parse_compiled 7.1810 (1.85) 1.0228 (2.14)
test_class_namedtuple2_build 47.7850 (12.32) 6.1995 (12.99)
test_class_namedtuple2_parse 34.4330 (8.87) 3.8498 (8.07)
test_class_namedtuple2_parse_compiled 15.4160 (3.97) 2.5158 (5.27)
test_class_numpy_build 212.5540 (54.78) 27.0343 (56.64)
test_class_numpy_parse 288.5380 (74.37) 45.4344 (95.19)
test_class_numpy_parse_compiled 290.8960 (74.97) 110.2389 (230.95)
test_class_padded_build 7.7810 (2.01) 3.6378 (7.62)
test_class_padded_parse 6.6460 (1.71) 1.2688 (2.66)
test_class_padded_parse_compiled 4.7090 (1.21) 1.2451 (2.61)
test_class_padding_build 6.1880 (1.59) 1.4536 (3.05)
test_class_padding_parse 5.4070 (1.39) 1.1753 (2.46)
test_class_padding_parse_compiled 4.1200 (1.06) 1.1916 (2.50)
test_class_pascalstring_build 9.1680 (2.36) 1.4623 (3.06)
test_class_pascalstring_parse 7.8560 (2.02) 1.8567 (3.89)
test_class_pascalstring_parse_compiled 5.8910 (1.52) 0.9466 (1.98)
test_class_peek_build 14.8710 (3.83) 2.6207 (5.49)
test_class_peek_parse 19.5870 (5.05) 3.6857 (7.72)
test_class_peek_parse_compiled 10.6000 (2.73) 2.0105 (4.21)
test_class_pickled_build 5.6150 (1.45) 1.2695 (2.66)
test_class_pickled_parse 8.3370 (2.15) 1.5174 (3.18)
test_class_pickled_parse_compiled 8.9810 (2.31) 1.7670 (3.70)
test_class_pointer_build 7.2470 (1.87) 1.3817 (2.89)
test_class_pointer_parse 6.3760 (1.64) 1.2557 (2.63)
test_class_pointer_parse_compiled 5.0970 (1.31) 0.9715 (2.04)
test_class_prefixed_build 7.8970 (2.04) 1.8404 (3.86)
test_class_prefixed_parse 6.7860 (1.75) 1.3916 (2.92)
test_class_prefixed_parse_compiled 5.2350 (1.35) 1.3229 (2.77)
test_class_prefixedarray_build 873.1850 (225.05) 84.7384 (177.53)
test_class_prefixedarray_parse 763.2760 (196.72) 88.0787 (184.53)
test_class_prefixedarray_parse_compiled 79.4790 (20.48) 11.9930 (25.13)
test_class_rawcopy_build1 13.8040 (3.56) 2.1913 (4.59)
test_class_rawcopy_build2 16.9810 (4.38) 2.6092 (5.47)
test_class_rawcopy_parse 15.2890 (3.94) 3.6678 (7.68)
test_class_rawcopy_parse_compiled 14.8570 (3.83) 2.6335 (5.52)
test_class_rebuild_build 6.0380 (1.56) 1.2981 (2.72)
test_class_rebuild_parse 5.1540 (1.33) 0.8264 (1.73)
test_class_rebuild_parse_compiled 4.5160 (1.16) 0.7145 (1.50)
test_class_repeatuntil_build 11.0780 (2.86) 2.4318 (5.09)
test_class_repeatuntil_parse 10.1850 (2.63) 2.4147 (5.06)
test_class_repeatuntil_parse_compiled 6.8880 (1.78) 1.5471 (3.24)
test_class_select_build 19.1100 (4.93) 6.5128 (13.64)
test_class_select_parse 5.6280 (1.45) 3.2641 (6.84)
test_class_select_parse_compiled 5.5660 (1.43) 3.7881 (7.94)
test_class_sequence_build 24.5060 (6.32) 5.1873 (10.87)
test_class_sequence_parse 21.5420 (5.55) 2.6852 (5.63)
test_class_sequence_parse_compiled 10.1530 (2.62) 2.1645 (4.53)
test_class_string_build 8.5320 (2.20) 1.8491 (3.87)
test_class_string_parse 20.4400 (5.27) 4.4044 (9.23)
test_class_string_parse_compiled 9.1470 (2.36) 2.2427 (4.70)
test_class_struct_build 49.1730 (12.67) 5.5050 (11.53)
test_class_struct_parse 44.1300 (11.37) 6.8434 (14.34)
test_class_struct_parse_compiled 16.9070 (4.36) 3.0500 (6.39)
test_class_switch_build 9.5110 (2.45) 1.7349 (3.63)
test_class_switch_parse 8.7100 (2.24) 1.9867 (4.16)
test_class_switch_parse_compiled 6.7830 (1.75) 1.1652 (2.44)
test_class_union_build 57.0540 (14.70) 12.0599 (25.27)
test_class_union_parse 91.9150 (23.69) 10.7812 (22.59)
test_class_union_parse_compiled 22.5970 (5.82) 15.2649 (31.98)
test_class_varint_build 15.2000 (3.92) 3.2498 (6.81)
test_class_varint_parse 18.9080 (4.87) 4.2807 (8.97)
test_class_varint_parse_compiled 19.6070 (5.05) 4.0409 (8.47)
test_overall_build 1,970.9570 (507.98) 189.2782 (396.54)
test_overall_build_compiled 1,987.8950 (512.35) 166.3636 (348.54)
test_overall_parse 2,126.2570 (548.01) 255.0154 (534.27)
test_overall_parse_compiled 1,124.9560 (289.94) 127.4730 (267.06)
------------------------------------------------------------------------------------
Comparison with Kaitai Struct
================================
Kaitai Struct is a very respectable competitor, so I believe a benchmark-based comparison should be presented. Construct and Kaitai have very different capabilities: Kaitai supports about a dozen languages, Construct only supports Python, Kaitai offers only basic common features, Construct offers python-only stuff like Numpy and Pickle support, Kaitai does only parsing, Construct does also building. In a sense, those libraries are in two different categories (like sumo and karate). There are multiple scenarios where either library would not be usable.
Example used for comparison:
::
Struct(
"count" / Int32ul,
"items" / Array(this.count, Struct(
"num1" / Int8ul,
"num2" / Int24ul,
"flags" / BitStruct(
"bool1" / Flag,
"num4" / BitsInteger(3),
Padding(4),
),
"fixedarray1" / Array(3, Int8ul),
"name1" / CString("utf8"),
"name2" / PascalString(Int8ul, "utf8"),
)),
)
::
meta:
id: comparison_1_kaitai
encoding: utf-8
endian: le
seq:
- id: count
type: u4
- id: items
repeat: expr
repeat-expr: count
type: item
types:
item:
seq:
- id: num1
type: u1
- id: num2_lo
type: u2
- id: num2_hi
type: u1
- id: flags
type: flags
- id: fixedarray1
repeat: expr
repeat-expr: 3
type: u1
- id: name1
type: strz
- id: len_name2
type: u1
- id: name2
type: str
size: len_name2
instances:
num2:
value: 'num2_hi << 16 | num2_lo'
types:
flags:
seq:
- id: bool1
type: b1
- id: num4
type: b3
- id: padding
type: b4
Suprisingly, Kaitai won the benchmark! Honestly, I am shocked and dismayed that it did. The only explanation that I can point out, is that Kaitai is parsing structs into class objects (with attributes) while Construct parses into dictionaries (with keys). However that one detail seems unlikely explanation for the huge discrepancy in benchmark results. Perhaps there is a flaw in the methodology. But until that is proven, Kaitai gets its respects. Congrats.
::
$ python3.6 comparison_1_construct.py
Timeit measurements:
parsing: 0.1024609069 sec/call
parsing compiled: 0.0410809368 sec/call
$ pypy comparison_1_construct.py
Timeit measurements:
parsing: 0.0108308416 sec/call
parsing compiled: 0.0062594243 sec/call
::
$ python3.6 comparison_1_kaitai.py
Timeit measurements:
parsing: 0.0250326035 sec/call
$ pypy comparison_1_kaitai.py
Timeit measurements:
parsing: 0.0019435351 sec/call
|