1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147
|
.. -*- mode: rst-mode -*-
..
.. Version number is filled in automatically.
.. |version| replace:: 0.59.0
======
BinPAC
======
.. rst-class:: opening
BinPAC is a high level language for describing protocol parsers and
generates C++ code. It is currently maintained and distributed with the
Zeek Network Security Monitor distribution, however, the generated parsers
may be used with other programs besides Zeek.
.. contents::
Download
========
You can find the latest BinPAC release for download at
https://www.zeek.org/download.
BinPAC's git repository is located at https://github.com/zeek/binpac
This document describes BinPAC |version|. See the ``CHANGES``
file for version history.
Prerequisites
=============
BinPAC relies on the following libraries and tools, which need to be
installed before you begin:
* Flex (Fast Lexical Analyzer)
Flex is already installed on most systems, so with luck you can
skip having to install it yourself.
* Bison (GNU Parser Generator)
Bison is also already installed on many system.
* CMake 2.8.12 or greater
CMake is a cross-platform, open-source build system, typically
not installed by default. See http://www.cmake.org for more
information regarding CMake and the installation steps below for
how to use it to build this distribution. CMake generates native
Makefiles that depend on GNU Make by default
Installation
============
To build and install into ``/usr/local``::
./configure
cd build
make
make install
This will perform an out-of-source build into the build directory using
the default build options and then install the binpac binary into
``/usr/local/bin``.
You can specify a different installation directory with::
./configure --prefix=<dir>
Run ``./configure --help`` for more options.
Glossary and Convention
=======================
To make this document easier to read, the following are the glossary
and convention used.
- PAC grammar - .pac file written by user.
- PAC source - _pac.cc file generated by binpac
- PAC header - _pac.h file generated by binpac
- Analyzer - Protocol decoder generated by compiling PAC grammar
- Field - a member of a record
- Primary field - member of a record as direct result of parsing
- Derivative field - member of a record evaluated through post processing
BinPAC Language Reference
=========================
BinPAC language consists of:
- analyzer
- type - data structure like definition describing parsing unit. Types can built on each other to form more complex type similar to yacc productions.
- flow - "flow" defines how data will be fed into the analyzer and the top level parsing unit.
- Keywords
- Built-in macros
Defining an analyzer
--------------------
There are two components to an analyzer definition: the top level context
and the connection definition.
Context Definition
~~~~~~~~~~~~~~~~~~
Each analyzer requires a top level context defined by the following syntax:
.. code::
analyzer <ContextName> withcontext {
... context members ...
}
Typically top level context contains pointer to top level analyzer
and connection definition like below:
.. code::
analyzer HTTP withcontext {
connection : HTTP_analyzer;
flow : HTTP_flow;
};
Connection Definition
~~~~~~~~~~~~~~~~~~~~~
A "connection" defines the entry point into the analyzer. It consists of
two "flow" definitions, an "upflow" and a "downflow".
.. code::
connection <AnalyzerName>(optional parameter) {
upflow = <UpflowConstructor>;
downflow = <DownflowConstructor>;
}
Example:
.. code::
connection HTTP_analyzer {
upflow = HTTP_flow (true);
downflow = HTTP_flow (false);
};
type
----
A "type" is the basic building block of binpac-generated parser, and describes
the structure of a byte segment. Each non-primitive "type" generates a C++
class that can independently parse the structure which it describes.
Syntax:
.. code::
type <typeName>{(<optional type parameter(s)>)} = <compositor or primitive class>{
cases or members declaration.
} <optional attribute(s)>;
Example:
PAC grammar::
type myType = record {
data:uint8;
};
PAC header::
class myType{
public:
myType();
~myType();
int Parse(const_byteptr const t_begin_of_data, const_byteptr const t_end_of_data);
uint8 data() const { return data_; }
protected:
uint8 data_;
};
Primitives
~~~~~~~~~~
Primitive type can be treated as #define in C language. They are embedded
into other type which reference them but do not generate any parsing
code of their own. Available primitive types are:
- int8
- int16
- int32
- uint8
- uint16
- uint32
- Regular expression ( ``type HTTP_URI = RE/[[:alnum:][:punct:]]+/;`` )
- bytestring
Examples:
.. code::
type foo = record { x: number; };
is equivalent to:
.. code::
type foo = record { x: uint8[3]; };
(Note: this behavior may change in future versions of binpac.)
record
~~~~~~
A "record" composes primitive type(s) and other record(s) to create
new "type". This new "type" in turn can be used as part of parent type
or directly for parsing.
Example:
.. code::
type SMB_body = record {
word_count : uint8;
parameter_words : uint16[word_count];
byte_count : uint16;
}
case
~~~~
The "case" compositor allows switching between different parsing methods.
.. code::
type SMB_string(unicode: bool, offset: int) = case unicode of {
true -> u: SMB_unicode_string(offset);
false -> a: SMB_ascii_string;
};
A "case" supports an optional "default" label to denote none of the
above labels are matched. If no fields follow a given label, a user
can specify an arbitrary field name with the "empty" type. See
the following example.
.. code::
type HTTP_Message(expect_body: ExpectBody) = record {
headers: HTTP_Headers;
body_or_not: case expect_body of {
BODY_NOT_EXPECTED -> none: empty;
default -> body: HTTP_Body(expect_body);
};
};
Note that only one field is allowed after a given label. If multiple fields
are to be specified, they should be packed in another "record" type first.
The other usages of `case`_ are described later.
array
~~~~~
A type can be defined as a sequence of "single-type elements". By default,
array type continue parsing for the array element in an infinite loop.
Or an array size can be specified to control the number of
match. &until can be also conditionally end parsing:
.. code::
# This will match for 10 element only
type HTTP_Headers = HTTP_Header [10];
# This will match until the condition is met
type HTTP_Headers = HTTP_Header [] &until(/*Some condition*/);
Array can also be used directly inside of "record". For example:
.. code::
type DNS_message = record {
header: DNS_header;
question: DNS_question(this)[header.qdcount];
answer: DNS_rr(this, DNS_ANSWER)[header.ancount];
authority: DNS_rr(this, DNS_AUTHORITY)[header.nscount];
additional: DNS_rr(this, DNS_ADDITIONAL)[header.arcount];
}&byteorder = bigendian, &exportsourcedata
flow
----
A "flow" defines how data is fed into the analyzer. It also maintains
custom state information declared by `%member`_. flow is configured by
specifiying type of data unit.
Syntax:
.. code::
flow <Flow name>(<optional attribute>) {
<flowunit|datagram> = <top level data unit> withcontext (<context constructor parameter>);
};
When "flow" is added to top level context analyzer, it enables use of &oneline
and &length in "record" type. flow buffers data when there is not enough
to evaluate the record and dispatchs data for evaluation when the
threshold is reached.
flowunit
~~~~~~~~
When flowunit is used, the analyzer uses flow buffer to handle incremental
input and provide support for &oneline/&length. For further detail on
this, see `Buffering`_.
.. code::
flowunit = HTTP_PDU(is_orig) withcontext (analyzer, this);
datagram
~~~~~~~~
Opposite to flowunit, by declaring data unit as datagram, flow buffer is
opted out. This results in faster parsing but no incremental input
or buffering support.
.. code::
datagram = HTTP_PDU(is_orig) withcontext (analyzer, this);
Byte Ordering and Alignment
---------------------------
Byte Ordering
~~~~~~~~~~~~~
Byte Alignment
~~~~~~~~~~~~~~
.. code::
type RPC_Opaque = record {
length: uint32;
data: uint8[length];
pad: padding align 4; # pad to 4-byte boundary
};
Functions
---------
User can define functions in binpac.
Function can be declared using one of the three ways:
PAC with embedded body
~~~~~~~~~~~~~~~~~~~~~~
PAC style function prototype and embed the body using %{ %}::
function print_stuff(value :const_bytestring):bool
%{
printf("Value [%s]\n", std_str(value).c_str());
%}
PAC with PAC-case body
~~~~~~~~~~~~~~~~~~~~~~
Pac style function with a case body, this type of declaration is useful for
extending later by casefunc::
function RPC_Service(prog: uint32, vers: uint32): EnumRPCService =
case prog of {
default -> RPC_SERVICE_UNKNOWN;
};
Inlined by %code
~~~~~~~~~~~~~~~~
Function can be completely inlined by using %code::
%code{
EnumRPCService RPC_Service(const RPC_Call* call)
{
return call ? call->service() : RPC_SERVICE_UNKNOWN;
}
%}
Extending
---------
PAC code can be extended by using "refine". This is useful for code
reusing and splitting functionality for parallel development.
Extending record
~~~~~~~~~~~~~~~~
Record can be extended to add addtional attribute(s) by
using "refine typeattr". One of the typical use is to add &let for split
protocol parsing from protocol analysis.
.. code::
refine typeattr HTTP_RequestLine += &let {
process_request: bool =
process_func(method, uri, version);
};
Extending type case
~~~~~~~~~~~~~~~~~~~
.. code::
refine casetype RPC_Params += {
RPC_SERVICE_PORTMAP -> portmap: PortmapParams(call);
};
Extending function case
~~~~~~~~~~~~~~~~~~~~~~~
Function which is declared as a PAC case can be extended by adding
additional case into the switch.
.. code::
refine casefunc RPC_BuildCallVal += {
RPC_SERVICE_PORTMAP ->
PortmapBuildCallVal(call, call.params.portmap);
};
Extending connection
~~~~~~~~~~~~~~~~~~~~
Connection can be extended to add functions and members. Example::
refine connection RPC_Conn += {
function ProcessPortmapReply(results: PortmapResults): bool
%{
%}
};
State Management
----------------
State is maintained by extending parsing class by declaring derivative.
State lasts until the top level parsing unit (flowunit/datagram is destroyed).
Keywords
--------
Source code embedding
~~~~~~~~~~~~~~~~~~~~~
C++ code can be embedded within the .pac file using the following
directives. These code will be copied into the final generated code.
- %header{...%}
Code to be inserted in binpac generated header file.
- %code{...%}
Code to be inserted at the beginning of binpac generated C++ file.
.. _%member:
- %member{...%}
Add additional member(s) to connection (?) and flow class.
- %init{...%}
Code to be inserted in flow constructor.
- %cleanup{...%}
Code to be inserted in flow destructor.
Embedded pac primitive
~~~~~~~~~~~~~~~~~~~~~~
- ${
- $set{
- $type{
- $typeof{
- $const_def{
Condition checking
~~~~~~~~~~~~~~~~~~
&until
......
"&until" is used in conjunction with array declaration. It specifies exit
condition for array parsing.
.. code::
type HTTP_Headers = HTTP_Header[] &until($input.length() == 0);
&requires
.........
Process data dependencies before evaluating field.
Example: typically, derivative field is evaluated after primary field.
However "&requires" is used to force evaluate of length before msg_body.
.. code::
type RPC_Message = record {
xid: uint32;
msg_type: uint32;
msg_body: case msg_type of {
RPC_CALL -> call: RPC_Call(this);
RPC_REPLY -> reply: RPC_Reply(this);
} &requires(length);
} &let {
length = sourcedata.length(); # length of the RPC_Message
} &byteorder = bigendian, &exportsourcedata, &refcount;
&if
...
Evaluate field only if condition is met.
.. code::
type DNS_label(msg: DNS_message) = record {
length: uint8;
data: case label_type of {
0 -> label: bytestring &length = length;
3 -> ptr_lo: uint8;
};
} &let {
label_type: uint8 = length >> 6;
last: bool = (length == 0) || (label_type == 3);
ptr: DNS_name(msg)
withinput $context.flow.get_pointer(msg.sourcedata,
((length & 0x3f) << 8) | ptr_lo)
&if(label_type == 3);
clear_pointer_set: bool = $context.flow.reset_pointer_set()
&if(last);
};
.. _case:
case
....
There are two uses to the "case" keyword.
* As part of record field. In this scenario, it allow alternative
methods to parse a field. Example::
type RPC_Reply(msg: RPC_Message) = record {
stat: uint32;
reply: case stat of {
MSG_ACCEPTED -> areply: RPC_AcceptedReply(call);
MSG_DENIED -> rreply: RPC_RejectedReply(call);
};
} &let {
call: RPC_Call = context.connection.FindCall(msg.xid);
success: bool = (stat == MSG_ACCEPTED && areply.stat == SUCCESS);
};
* As function definition. Example::
function RPC_Service(prog: uint32, vers: uint32): EnumRPCService =
case prog of {
default -> RPC_SERVICE_UNKNOWN;
};
Note that one can "refine" both types of cases:
.. code::
refine casefunc RPC_Service += {
100000 -> RPC_SERVICE_PORTMAP;
};
Built-in macros
~~~~~~~~~~~~~~~
$input
......
This macro refers to the data that was passed into the ParseBuffer
function. When $input is used, binpac generate a const_bytestring
which contains the start and end pointer of the input.
PAC grammar::
&until($input.length()==0);
PAC source::
const_bytestring t_val__elem_input(t_begin_of_data, t_end_of_data);
if ( ( t_val__elem_input.length() == 0 ) )
$element
........
$element provides access to entry of the array type. Following are
the ways which $element can be used.
* Current element. Check on the value of the most recently parsed entry.
This would get executed after each time an entry is parsed. Example::
type SMB_ascii_string = uint8[] &until($element == 0);
* Current element's field. Example::
type DNS_label(msg: DNS_message) = record {
length: uint8;
data: case label_type of {
0 -> label: bytestring &length = length;
3 -> ptr_lo: uint8;
};
} &let {
label_type: uint8 = length >> 6;
last: bool = (length == 0) || (label_type == 3);
};
type DNS_name(msg: DNS_message) = record {
labels: DNS_label(msg)[] &until($element.last);
};
$context
........
This macro refers to the Analyzer context class (Context<Name> class gets
generated from analyzer <Name> withcontext {}). Using this macro, users
can gain access to the "flow" object and "analyzer" object.
Other keywords
~~~~~~~~~~~~~~
&transient
..........
Do not create copy of the bytestring
.. code::
type MIME_Line = record {
line: bytestring &restofdata &transient;
} &oneline;
&let
....
Adds derivative field to a record
.. code::
type ncp_request(length: uint32) = record {
data : uint8[length];
} &let {
function = length > 0 ? data[0] : 0;
subfunction = length > 1 ? data[1] : 0;
};
let
...
Declares global value. If the user does not specify a type,
the compiler will assume the "int" type.
PAC grammar::
let myValue:uint8=10;
PAC source::
uint8 const myValue = 10;
PAC header::
extern uint8 const myValue;
&restofdata
...........
Grab the rest of the data available in the FlowBuffer.
PAC grammar::
onebyte: uint8;
value: bytestring &restofdata &transient;
PAC source::
// Parse "onebyte"
onebyte_ = *((uint8 const *) (t_begin_of_data));
// Parse "value"
int t_value_string_length;
t_value_string_length = (t_end_of_data) - ((t_begin_of_data + 1));
int t_value__size;
t_value__size = t_value_string_length;
value_.init((t_begin_of_data + 1), t_value_string_length);
&length
.......
Length can appear in two different contexts: as property of a field
or as property of a record.
Examples:
&length as field property::
protocol : bytestring &length = 4;
translates into::
const_byteptr t_end_of_data = t_begin_of_data + 4;
int t_protocol_string_length;
t_protocol_string_length = 4;
int t_protocol__size;
t_protocol__size = t_protocol_string_length;
protocol_.init(t_begin_of_data, t_protocol_string_length);
&check
......
This was originally intended to implement the behavior of the
superceding "&enforce" attribute. It always has and always will just be
a no-op to ensure anything that uses this doesn't suddenly and
unintentionally break.
&enforce
........
Check a condition and raise exception if not met.
&chunked and $chunk
...................
When parsing a long field with variable length, "chunked" can be used to
improve performance. However, chunked field are not buffered across
packet. Data for the chunk in the current packet can be access by
using "$chunk".
&exportsourcedata
.................
Data matched for a particular type, the data matched can be retained by
using "&exportsourcedata".
.pac file
.. code::
type myType = record {
data:uint8;
} &exportsourcedata;
_pac.h
.. code::
class myType
{
public:
myType();
~myType();
int Parse(const_byteptr const t_begin_of_data, const_byteptr const _end_of_data);
uint8 myData() const { return myData_; }
const_bytestring const & sourcedata() const { return sourcedata_; }
protected:
uint8 myData_;
const_bytestring sourcedata_;
};
_pac.cc
.. code::
sourcedata_ = const_bytestring(t_begin_of_data, t_end_of_data);
sourcedata_.set_end(t_begin_of_data + 1);
Source data can be used within the type that match it or at the parent type.
.. code::
type myParentType (child:myType) = record {
somedata:uint8;
} &let{
do_something:bool = print_stuff(child.sourcedata);
};
translates into
.. code::
do_something_ = print_stuff(child()->sourcedata());
&refcount
.........
withinput
.........
Parsing Methodology
===================
.. _Buffering:
Buffering
---------
binpac supports incremental input to deal with packet fragmentation. This
is done via use of FlowBuffer class and maintaining buffering/parsing states.
FlowBuffer Class
~~~~~~~~~~~~~~~~
FlowBuffer provides two mode of buffering: line and frame. Line mode is
useful for parsing line based language like HTTP. Frame mode is best for
fixed length message. Buffering mode can be switched during parsing and
is done transparently to the grammar writer.
At compile time binpac calculates number of bytes required to evaluate
each field. During run time, data is buffered up in FlowBuffer until
there is enough to evaluate the "record". To optimize the buffering
process, if FlowBuffer has enough data to evaluate on the first NewData,
it would only mark the start and end pointer instead of copying.
- void **NewMessage**\();
- Advances the orig_data_begin\_ pointer depend on current mode\_. Moves
by 1/2 characters in LINE_MODE, by frame_length\_ in FRAME_MODE
and nothing in UNKNOWN_MODE (default mode).
- Set buffer_n\_ to 0
- Reset message_complete\_
- void **NewLine**\();
- Reset frame_length\_ and chunked\_, set mode\_ to LINE_MODE
- void **NewFrame**\(int frame_length, bool chunked\_);
- void **GrowFrame**\(int new_frame_length);
- void **AppendToBuffer**\(const_byteptr data, int len);
- Reallocate buffer\_ to add new data then copy data
- void **ExpandBuffer**\(int length);
- Reallocate buffer\_ to new size if new size is bigger than current size.
- Set minimum size to 512 (optimization?)
- void **MarkOrCopyLine**\();
- Seek current input for end of line (CR/LF/CRLF depend on line break mode).
If found append found data to buffer if one is already created or mark (set
frame_length\_) if one is not created (to minimize copying). If end of line
is not found, append partial data till end of input to buffer. Buffer
is created if one is not there.
- const_byteptr **begin**\()/**end**\()
- Returns buffer\_ and buffer_n\_ if a buffer exist, otherwise
orig_data_begin\_ and orig_data_begin\_ + frame_length\_.
Parsing States
~~~~~~~~~~~~~~
* buffering_state\_ - each parsing class contains a flag indicating whether
there are enough data buffered to evaluate the next block.
* parsing_state\_ - each parsing class which consists of multiple parsing
data unit (line/frames) has this flag indicating the parsing stage. Each
time new data comes in, it invokes parsing function and switch on
parsing_state to determine which sub parser to use next.
Regular Expression
------------------
Evaluation Order
----------------
Running Binpac-generated Analyzer Standalone
============================================
To run binpac-generated code independent of Zeek. Regex library must be
substituted. Below is one way of doing it. Use the following three header
files.
RE.h
----
.. code::
/*Dummy file to replace Zeek's file*/
#include "binpac_pcre.h"
#include "bro_dummy.h"
bro_dummy.h
-----------
.. code::
#ifndef BRO_DUMMY
#define BRO_DUMMY
#define DEBUG_MSG(x...) fprintf(stderr, x)
/*Dummy to link, this function suppose to be in Zeek*/
double network_time();
#endif
binpac_pcre.h
-------------
.. code::
#ifndef bro_pcre_h
#define bro_pcre_h
#include <stdio.h>
#include <assert.h>
#include <string>
using namespace std;
// TODO: use configure to figure out the location of pcre.h
#include "pcre.h"
class RE_Matcher {
public:
RE_Matcher(const char* pat){
pattern_ = "^";
pattern_ += "(";
pattern_ += pat;
pattern_ += ")";
pcre_ = NULL;
pextra_ = NULL;
}
~RE_Matcher() {
if (pcre_) {
pcre_free(pcre_);
}
}
int Compile() {
const char *err = NULL;
int erroffset = 0;
pcre_ = pcre_compile(pattern_.c_str(),
0, // options,
&err,
&erroffset,
NULL);
if (pcre_ == NULL) {
fprintf(stderr,
"Error in RE_Matcher::Compile(): %d:%s\n",
erroffset, err);
return 0;
}
return 1;
}
int MatchPrefix (const char* s, int n){
const char *err=NULL;
assert(pcre_);
const int MAX_NUM_OFFSETS = 30;
int offsets[MAX_NUM_OFFSETS];
int ret = pcre_exec(pcre_,
pextra_, // pcre_extra
//NULL, // pcre_extra
s, n,
0, // offset
0, // options
offsets,
MAX_NUM_OFFSETS);
if (ret < 0) {
return -1;
}
assert(offsets[0] == 0);
return offsets[1];
}
protected:
pcre *pcre_;
string pattern_;
};
#endif
main.cc
-------
In your main source, add this dummy stub.
.. code::
/*Dummy to link, this function suppose to be in Zeek*/
double network_time(){
return 0;
}
Q & A
=====
* Does &oneline only work when "flow" is used?
Yes. binpac uses the flowunit definition in "flow" to figure out which
types require buffering. For those that do, the parse function is:
.. code::
bool ParseBuffer(flow_buffer_t t_flow_buffer, ContextHTTP * t_context);
And the code of flow_buffer_t provides the functionality of buffering up to
one line. That's why &oneline is only active when "flow" is used and the
type requires buffering.
In certain cases we would want to use &oneline even if the type does
not require buffering, binpac currently does not provide such functionality.
* How would incremental input work in the case of regex?
A regex should not take incremental input. (The binpac compiler will
complain when that happens.) It should always appear below some type
that has either &length=... or &oneline.
* What is the role of Context_<Name> class (generated by analyzer <Name>
withcontext)?
* What is the difference between ''withcontext'' and w/o ''withcontext''?
withcontext should always be there. It's fine to have an empty context.
* Elaborate on $context and how it is related to "withcontext".
A "context" parameter is passed to every type. It provides a vehicle to
pass something to every type without adding a parameter to every type.
In that sense, it's optional. It exists for convenience.
* Example usage of composite type array.
Please see HTTP_Headers in http-protocol.pac in the Zeek source code.
* Clarification on "connection" keyword (binpac paper).
* Need a new way to attach hook additional code to each class beside &let.
* &transient, how is this different from declaring anonymous field? and
currently it doesn't seem to do much
.. code::
type HTTP_Header = record {
name: HTTP_HEADER_NAME &transient;
: HTTP_WS;
value: bytestring &restofdata &transient;
} &oneline;
.. code::
// Parse "name"
int t_name_string_length;
t_name_string_length =
HTTP_HEADER_NAME_re_011.MatchPrefix(
t_begin_of_data,
t_end_of_data - t_begin_of_data);
if ( t_name_string_length < 0 )
{
throw ExceptionStringMismatch( "./http-protocol.pac:96",
"|([^: \\t]+:)",
string((const char *) (t_begin_of_data), (const char *) t_end_of_data).c_str()
);
}
int t_name__size;
t_name__size = t_name_string_length;
name_.init(t_begin_of_data, t_name_string_length);
* Detail on the globals ($context, $element, $input...etc)
* How does BinPAC work with dynamic protocol detection?
Well, you can use the code in DNS-binpac.cc as a reference. First,
create a pointer to the connection. (See the example in DNS-binpac.cc)
.. code::
interp = new binpac::DNS::DNS_Conn(this);
Pass the data received from "DeliverPacket" or "DeliverStream" to
"interp->NewData()". (Again, see the example in DNS-binpac.cc)
.. code::
void DNS_UDP_Analyzer_binpac::DeliverPacket(int len, const u_char* data, bool orig, int seq, const IP_Hdr* ip, int caplen)
{
Analyzer::DeliverPacket(len, data, orig, seq, ip, caplen);
interp->NewData(orig, data, data + len);
}
* Explanation of &withinput
* Difference between using flow and not using flow (binpac generates Parse
method instead of ParseBuffer)
* &check currently working?
* Difference between flowunit and datagram, datagram and &oneline, &length?
* Go over TODO list in binpac release
* How would input get handle/buffered when length is not known (chunked)
* More feature multi byte character? utf16 utf32 etc.
TODO List
=========
New Features
------------
* Provides a method to match simple ascii text.
* Allows use fixed length array in addition to vector.
Bugs
----
Small clean-ups
~~~~~~~~~~~~~~~
* Remove anonymous field bytestring assignment.
* Redundant overflow checking/more efficient fixed length text copying.
Warning/Errors
~~~~~~~~~~~~~~
Things that compiler should flag out at code generation time
* Give warning when &transient is used on none bytestring
* Give warning when &oneline, &length is used and flowunit is not.
* Warning when more than one "connection" is defined
|