File: re2c.1

package info (click to toggle)
re2c 0.16-2
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 38,368 kB
  • ctags: 14,776
  • sloc: cpp: 8,902; sh: 1,540; haskell: 437; makefile: 256; ansic: 106
file content (1053 lines) | stat: -rw-r--r-- 42,246 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
.\" Man page generated from reStructuredText.
.
.TH RE2C 1 "" "" ""
.SH NAME
re2c \- convert regular expressions to C/C++ code
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.sp
\fBre2c [OPTIONS] FILE\fP
.SH DESCRIPTION
.sp
\fBre2c\fP is a lexer generator for C/C++. It finds regular expression
specifications inside of C/C++ comments and replaces them with a
hard\-coded DFA. The user must supply some interface code in order to
control and customize the generated DFA.
.SH OPTIONS
.INDENT 0.0
.TP
.B \fB\-? \-h \-\-help\fP
Invoke a short help.
.TP
.B \fB\-b \-\-bit\-vectors\fP
Implies \fB\-s\fP\&. Use bit vectors as well in the
attempt to coax better code out of the compiler. Most useful for
specifications with more than a few keywords (e.g. for most programming
languages).
.TP
.B \fB\-c \-\-conditions\fP
Used to support (f)lex\-like condition support.
.TP
.B \fB\-d \-\-debug\-output\fP
Creates a parser that dumps information about
the current position and in which state the parser is while parsing the
input. This is useful to debug parser issues and states. If you use this
switch you need to define a macro \fBYYDEBUG\fP that is called like a
function with two parameters: \fBvoid YYDEBUG (int state, char current)\fP\&.
The first parameter receives the state or \fB\-1\fP and the second parameter
receives the input at the current cursor.
.TP
.B \fB\-D \-\-emit\-dot\fP
Emit Graphviz dot data. It can then be processed
with e.g. \fBdot \-Tpng input.dot > output.png\fP\&. Please note that
scanners with many states may crash dot.
.TP
.B \fB\-e \-\-ecb\fP
Generate a parser that supports EBCDIC. The generated
code can deal with any character up to 0xFF. In this mode \fBre2c\fP assumes
that input character size is 1 byte. This switch is incompatible with
\fB\-w\fP, \fB\-x\fP, \fB\-u\fP and \fB\-8\fP\&.
.TP
.B \fB\-f \-\-storable\-state\fP
Generate a scanner with support for storable state.
.TP
.B \fB\-F \-\-flex\-syntax\fP
Partial support for flex syntax. When this flag
is active then named definitions must be surrounded by curly braces and
can be defined without an equal sign and the terminating semi colon.
Instead names are treated as direct double quoted strings.
.TP
.B \fB\-g \-\-computed\-gotos\fP
Generate a scanner that utilizes GCC\(aqs
computed goto feature. That is \fBre2c\fP generates jump tables whenever a
decision is of a certain complexity (e.g. a lot of if conditions are
otherwise necessary). This is only useable with GCC and produces output
that cannot be compiled with any other compiler. Note that this implies
\fB\-b\fP and that the complexity threshold can be configured using the
inplace configuration \fBcgoto:threshold\fP\&.
.TP
.B \fB\-i \-\-no\-debug\-info\fP
Do not output \fB#line\fP information. This is
useful when you want use a CMS tool with the \fBre2c\fP output which you
might want if you do not require your users to have \fBre2c\fP themselves
when building from your source.
.TP
.B \fB\-o OUTPUT \-\-output=OUTPUT\fP
Specify the \fBOUTPUT\fP file.
.TP
.B \fB\-r \-\-reusable\fP
Allows reuse of scanner definitions with \fB/*!use:re2c */\fP after \fB/*!rules:re2c */\fP\&.
In this mode no \fB/*!re2c */\fP block and exactly one \fB/*!rules:re2c */\fP must be present.
The rules are being saved and used by every \fB/*!use:re2c */\fP block that follows.
These blocks can contain inplace configurations, especially \fBre2c:flags:e\fP,
\fBre2c:flags:w\fP, \fBre2c:flags:x\fP, \fBre2c:flags:u\fP and \fBre2c:flags:8\fP\&.
That way it is possible to create the same scanner multiple times for
different character types, different input mechanisms or different output mechanisms.
The \fB/*!use:re2c */\fP blocks can also contain additional rules that will be appended
to the set of rules in \fB/*!rules:re2c */\fP\&.
.TP
.B \fB\-s \-\-nested\-ifs\fP
Generate nested ifs for some switches. Many
compilers need this assist to generate better code.
.TP
.B \fB\-t HEADER \-\-type\-header=HEADER\fP
Create a \fBHEADER\fP file that
contains types for the (f)lex\-like condition support. This can only be
activated when \fB\-c\fP is in use.
.TP
.B \fB\-u \-\-unicode\fP
Generate a parser that supports UTF\-32. The generated
code can deal with any valid Unicode character up to 0x10FFFF. In this
mode \fBre2c\fP assumes that input character size is 4 bytes. This switch is
incompatible with \fB\-e\fP, \fB\-w\fP, \fB\-x\fP and \fB\-8\fP\&. This implies \fB\-s\fP\&.
.TP
.B \fB\-v \-\-version\fP
Show version information.
.TP
.B \fB\-V \-\-vernum\fP
Show the version as a number XXYYZZ.
.TP
.B \fB\-w \-\-wide\-chars\fP
Generate a parser that supports UCS\-2. The
generated code can deal with any valid Unicode character up to 0xFFFF.
In this mode \fBre2c\fP assumes that input character size is 2 bytes. This
switch is incompatible with \fB\-e\fP, \fB\-x\fP, \fB\-u\fP and \fB\-8\fP\&. This implies
\fB\-s\fP\&.
.TP
.B \fB\-x \-\-utf\-16\fP
Generate a parser that supports UTF\-16. The generated
code can deal with any valid Unicode character up to 0x10FFFF. In this
mode \fBre2c\fP assumes that input character size is 2 bytes. This switch is
incompatible with \fB\-e\fP, \fB\-w\fP, \fB\-u\fP and \fB\-8\fP\&. This implies \fB\-s\fP\&.
.TP
.B \fB\-8 \-\-utf\-8\fP
Generate a parser that supports UTF\-8. The generated
code can deal with any valid Unicode character up to 0x10FFFF. In this
mode \fBre2c\fP assumes that input character size is 1 byte. This switch is
incompatible with \fB\-e\fP, \fB\-w\fP, \fB\-x\fP and \fB\-u\fP\&.
.TP
.B \fB\-\-case\-insensitive\fP
All strings are case insensitive, so all
"\-expressions are treated in the same way \(aq\-expressions are.
.TP
.B \fB\-\-case\-inverted\fP
Invert the meaning of single and double quoted
strings. With this switch single quotes are case sensitive and double
quotes are case insensitive.
.TP
.B \fB\-\-no\-generation\-date\fP
Suppress date output in the generated file.
.TP
.B \fB\-\-no\-generation\-date\fP
Suppress version output in the generated file.
.TP
.B \fB\-\-encoding\-policy POLICY\fP
Specify how \fBre2c\fP must treat Unicode
surrogates. \fBPOLICY\fP can be one of the following: \fBfail\fP (abort with
error when surrogate encountered), \fBsubstitute\fP (silently substitute
surrogate with error code point 0xFFFD), \fBignore\fP (treat surrogates as
normal code points). By default \fBre2c\fP ignores surrogates (for backward
compatibility). Unicode standard says that standalone surrogates are
invalid code points, but different libraries and programs treat them
differently.
.TP
.B \fB\-\-input INPUT\fP
Specify re2c input API. \fBINPUT\fP can be one of the
following: \fBdefault\fP, \fBcustom\fP\&.
.TP
.B \fB\-S \-\-skeleton\fP
Instead of embedding re2c\-generated code into C/C++
source, generate a self\-contained program for the same DFA. Most useful
for correctness and performance testing.
.TP
.B \fB\-\-empty\-class POLICY\fP
What to do if user inputs empty character
class. \fBPOLICY\fP can be one of the following: \fBmatch\-empty\fP (match empty
input: pretty illogical, but this is the default for backwards
compatibility reason), \fBmatch\-none\fP (fail to match on any input),
\fBerror\fP (compilation error). Note that there are various ways to
construct empty class, e.g: [], [^\ex00\-\exFF],
[\ex00\-\exFF][\ex00\-\exFF].
.TP
.B \fB\-\-dfa\-minimization <table | moore>\fP
Internal algorithm used by re2c to minimize DFA (defaults to \fBmoore\fP).
Both table filling and Moore\(aqs algorithms should produce identical DFA (up to states relabelling).
Table filling algorithm is much simpler and slower; it serves as a reference implementation.
.TP
.B \fB\-1 \-\-single\-pass\fP
Deprecated and does nothing (single pass is by default now).
.TP
.B \fB\-W\fP
Turn on all warnings.
.TP
.B \fB\-Werror\fP
Turn warnings into errors. Note that this option along
doesn\(aqt turn on any warnings, it only affects those warnings that have
been turned on so far or will be turned on later.
.TP
.B \fB\-W<warning>\fP
Turn on individual \fBwarning\fP\&.
.TP
.B \fB\-Wno\-<warning>\fP
Turn off individual \fBwarning\fP\&.
.TP
.B \fB\-Werror\-<warning>\fP
Turn on individual \fBwarning\fP and treat it as error (this implies \fB\-W<warning>\fP).
.TP
.B \fB\-Wno\-error\-<warning>\fP
Don\(aqt treat this particular \fBwarning\fP as error. This doesn\(aqt turn off
the warning itself.
.TP
.B \fB\-Wcondition\-order\fP
Warn if the generated program makes implicit
assumptions about condition numbering. One should use either \fB\-t, \-\-type\-header\fP option or
\fB/*!types:re2c*/\fP directive to generate mapping of condition names to numbers and use
autogenerated condition names.
.TP
.B \fB\-Wempty\-character\-class\fP
Warn if regular expression contains empty
character class. From the rational point of view trying to match empty
character class makes no sense: it should always fail. However, for
backwards compatibility reasons \fBre2c\fP allows empty character class and
treats it as empty string. Use \fB\-\-empty\-class\fP option to change default
behaviour.
.TP
.B \fB\-Wmatch\-empty\-string\fP
Warn if regular expression in a rule is
nullable (matches empty string). If DFA runs in a loop and empty match
is unintentional (input position in not advanced manually), lexer may
get stuck in eternal loop.
.TP
.B \fB\-Wswapped\-range\fP
Warn if range lower bound is greater that upper
bound. Default \fBre2c\fP behaviour is to silently swap range bounds.
.TP
.B \fB\-Wundefined\-control\-flow\fP
Warn if some input strings cause undefined
control flow in lexer (the faulty patterns are reported). This is the
most dangerous and common mistake. It can be easily fixed by adding
default rule \fB*\fP (this rule has the lowest priority, matches any code unit and consumes
exactly one code unit).
.TP
.B \fB\-Wuseless\-escape\fP
Warn if a symbol is escaped when it shouldn\(aqt be.
By default re2c silently ignores escape, but this may as well indicate a
typo or an error in escape sequence.
.UNINDENT
.SH INTERFACE CODE
.sp
The user must supply interface code either in the form of C/C++ code
(macros, functions, variables, etc.) or in the form of \fBINPLACE CONFIGURATIONS\fP\&.
Which symbols must be defined and which are optional
depends on a particular use case.
.INDENT 0.0
.TP
.B \fBYYCONDTYPE\fP
In \fB\-c\fP mode you can use \fB\-t\fP to generate a file that
contains the enumeration used as conditions. Each of the values refers
to a condition of a rule set.
.TP
.B \fBYYCTXMARKER\fP
l\-value of type \fBYYCTYPE *\fP\&.
The generated code saves trailing context backtracking information in
\fBYYCTXMARKER\fP\&. The user only needs to define this macro if a scanner
specification uses trailing context in one or more of its regular
expressions.
.TP
.B \fBYYCTYPE\fP
Type used to hold an input symbol (code unit). Usually
\fBchar\fP or \fBunsigned char\fP for ASCII, EBCDIC and UTF\-8, \fBunsigned short\fP
for UTF\-16 or UCS\-2 and \fBunsigned int\fP for UTF\-32.
.TP
.B \fBYYCURSOR\fP
l\-value of type \fBYYCTYPE *\fP that points to the current input symbol. The generated code advances
\fBYYCURSOR\fP as symbols are matched. On entry, \fBYYCURSOR\fP is assumed to
point to the first character of the current token. On exit, \fBYYCURSOR\fP
will point to the first character of the following token.
.TP
.B \fBYYDEBUG (state, current)\fP
This is only needed if the \fB\-d\fP flag was
specified. It allows one to easily debug the generated parser by calling a
user defined function for every state. The function should have the
following signature: \fBvoid YYDEBUG (int state, char current)\fP\&. The first
parameter receives the state or \-1 and the second parameter receives the
input at the current cursor.
.TP
.B \fBYYFILL (n)\fP
The generated code "calls"" \fBYYFILL (n)\fP when the
buffer needs (re)filling: at least \fBn\fP additional characters should be
provided. \fBYYFILL (n)\fP should adjust \fBYYCURSOR\fP, \fBYYLIMIT\fP, \fBYYMARKER\fP
and \fBYYCTXMARKER\fP as needed. Note that for typical programming languages
\fBn\fP will be the length of the longest keyword plus one. The user can
place a comment of the form \fB/*!max:re2c*/\fP to insert \fBYYMAXFILL\fP definition that is set to the maximum
length value.
.TP
.B \fBYYGETCONDITION ()\fP
This define is used to get the condition prior to
entering the scanner code when using \fB\-c\fP switch. The value must be
initialized with a value from the enumeration \fBYYCONDTYPE\fP type.
.TP
.B \fBYYGETSTATE ()\fP
The user only needs to define this macro if the \fB\-f\fP
flag was specified. In that case, the generated code "calls"
\fBYYGETSTATE ()\fP at the very beginning of the scanner in order to obtain
the saved state. \fBYYGETSTATE ()\fP must return a signed integer. The value
must be either \-1, indicating that the scanner is entered for the first
time, or a value previously saved by \fBYYSETSTATE (s)\fP\&. In the second
case, the scanner will resume operations right after where the last
\fBYYFILL (n)\fP was called.
.TP
.B \fBYYLIMIT\fP
Expression of type \fBYYCTYPE *\fP that marks the end of the buffer \fBYYLIMIT[\-1]\fP
is the last character in the buffer). The generated code repeatedly
compares \fBYYCURSOR\fP to \fBYYLIMIT\fP to determine when the buffer needs
(re)filling.
.TP
.B \fBYYMARKER\fP
l\-value of type \fBYYCTYPE *\fP\&.
The generated code saves backtracking information in \fBYYMARKER\fP\&. Some
easy scanners might not use this.
.TP
.B \fBYYMAXFILL\fP
This will be automatically defined by \fB/*!max:re2c*/\fP blocks as explained above.
.TP
.B \fBYYSETCONDITION (c)\fP
This define is used to set the condition in
transition rules. This is only being used when \fB\-c\fP is active and
transition rules are being used.
.TP
.B \fBYYSETSTATE (s)\fP
The user only needs to define this macro if the \fB\-f\fP
flag was specified. In that case, the generated code "calls"
\fBYYSETSTATE\fP just before calling \fBYYFILL (n)\fP\&. The parameter to
\fBYYSETSTATE\fP is a signed integer that uniquely identifies the specific
instance of \fBYYFILL (n)\fP that is about to be called. Should the user
wish to save the state of the scanner and have \fBYYFILL (n)\fP return to
the caller, all he has to do is store that unique identifier in a
variable. Later, when the scannered is called again, it will call
\fBYYGETSTATE ()\fP and resume execution right where it left off. The
generated code will contain both \fBYYSETSTATE (s)\fP and \fBYYGETSTATE\fP even
if \fBYYFILL (n)\fP is being disabled.
.UNINDENT
.SH SYNTAX
.sp
Code for \fBre2c\fP consists of a set of \fBRULES\fP, \fBNAMED DEFINITIONS\fP and
\fBINPLACE CONFIGURATIONS\fP\&.
.SS RULES
.sp
Rules consist of a regular expression (see \fBREGULAR EXPRESSIONS\fP) along with a block of C/C++ code
that is to be executed when the associated regular expression is
matched. You can either start the code with an opening curly brace or
the sequence \fB:=\fP\&. When the code with a curly brace then \fBre2c\fP counts the brace depth
and stops looking for code automatically. Otherwise curly braces are not
allowed and \fBre2c\fP stops looking for code at the first line that does
not begin with whitespace. If two or more rules overlap, the first rule
is preferred.
.INDENT 0.0
.INDENT 3.5
\fBregular\-expression { C/C++ code }\fP
.sp
\fBregular\-expression := C/C++ code\fP
.UNINDENT
.UNINDENT
.sp
There is one special rule: default rule \fB*\fP
.INDENT 0.0
.INDENT 3.5
\fB* { C/C++ code }\fP
.sp
\fB* := C/C++ code\fP
.UNINDENT
.UNINDENT
.sp
Note that default rule \fB*\fP differs from \fB[^]\fP: default rule has the lowest priority,
matches any code unit (either valid or invalid) and always consumes one character;
while \fB[^]\fP matches any valid code point (not code unit) and can consume multiple
code units. In fact, when variable\-length encoding is used, \fB*\fP
is the only possible way to match invalid input character (see \fBENCODINGS\fP for details).
.sp
If \fB\-c\fP is active then each regular expression is preceded by a list
of comma separated condition names. Besides normal naming rules there
are two special cases: \fB<*>\fP (such rules are merged to all conditions)
and \fB<>\fP (such the rule cannot have an associated regular expression,
its code is merged to all actions). Non empty rules may further more specify the new
condition. In that case \fBre2c\fP will generate the necessary code to
change the condition automatically. Rules can use \fB:=>\fP as a shortcut
to automatically generate code that not only sets the
new condition state but also continues execution with the new state. A
shortcut rule should not be used in a loop where there is code between
the start of the loop and the \fBre2c\fP block unless \fBre2c:cond:goto\fP
is changed to \fBcontinue\fP\&. If code is necessary before all rules (though not simple jumps) you
can doso by using \fB<!>\fP pseudo\-rules.
.INDENT 0.0
.INDENT 3.5
\fB<condition\-list> regular\-expression { C/C++ code }\fP
.sp
\fB<condition\-list> regular\-expression := C/C++ code\fP
.sp
\fB<condition\-list> * { C/C++ code }\fP
.sp
\fB<condition\-list> * := C/C++ code\fP
.sp
\fB<condition\-list> regular\-expression => condition { C/C++ code }\fP
.sp
\fB<condition\-list> regular\-expression => condition := C/C++ code\fP
.sp
\fB<condition\-list> * => condition { C/C++ code }\fP
.sp
\fB<condition\-list> * => condition := C/C++ code\fP
.sp
\fB<condition\-list> regular\-expression :=> condition\fP
.sp
\fB<*> regular\-expression { C/C++ code }\fP
.sp
\fB<*> regular\-expression := C/C++ code\fP
.sp
\fB<*> * { C/C++ code }\fP
.sp
\fB<*> * := C/C++ code\fP
.sp
\fB<*> regular\-expression => condition { C/C++ code }\fP
.sp
\fB<*> regular\-expression => condition := C/C++ code\fP
.sp
\fB<*> * => condition { C/C++ code }\fP
.sp
\fB<*> * => condition := C/C++ code\fP
.sp
\fB<*> regular\-expression :=> condition\fP
.sp
\fB<> { C/C++ code }\fP
.sp
\fB<> := C/C++ code\fP
.sp
\fB<> => condition { C/C++ code }\fP
.sp
\fB<> => condition := C/C++ code\fP
.sp
\fB<> :=> condition\fP
.sp
\fB<> :=> condition\fP
.sp
\fB<! condition\-list> { C/C++ code }\fP
.sp
\fB<! condition\-list> := C/C++ code\fP
.sp
\fB<!> { C/C++ code }\fP
.sp
\fB<!> := C/C++ code\fP
.UNINDENT
.UNINDENT
.SS NAMED DEFINITIONS
.sp
Named definitions are of the form:
.INDENT 0.0
.INDENT 3.5
\fBname = regular\-expression;\fP
.UNINDENT
.UNINDENT
.sp
If \fB\-F\fP is active, then named definitions are also of the form:
.INDENT 0.0
.INDENT 3.5
\fBname { regular\-expression }\fP
.UNINDENT
.UNINDENT
.SS INPLACE CONFIGURATIONS
.INDENT 0.0
.TP
.B \fBre2c:condprefix = yyc;\fP
Allows one to specify the prefix used for
condition labels. That is this text is prepended to any condition label
in the generated output file.
.TP
.B \fBre2c:condenumprefix = yyc;\fP
Allows one to specify the prefix used for
condition values. That is this text is prepended to any condition enum
value in the generated output file.
.TP
.B \fBre2c:cond:divider = "/* *********************************** */";\fP
Allows one to customize the devider for condition blocks. You can use \fB@@\fP
to put the name of the condition or customize the placeholder using
\fBre2c:cond:divider@cond\fP\&.
.TP
.B \fBre2c:cond:divider@cond = @@;\fP
Specifies the placeholder that will be
replaced with the condition name in \fBre2c:cond:divider\fP\&.
.TP
.B \fBre2c:cond:goto = "goto @@;";\fP
Allows one to customize the condition goto statements used with \fB:=>\fP style rules. You can use \fB@@\fP
to put the name of the condition or ustomize the placeholder using
\fBre2c:cond:goto@cond\fP\&. You can also change this to \fBcontinue;\fP, which
would allow you to continue with the next loop cycle including any code
between loop start and re2c block.
.TP
.B \fBre2c:cond:goto@cond = @@;\fP
Spcifies the placeholder that will be replaced with the condition label in \fBre2c:cond:goto\fP\&.
.TP
.B \fBre2c:indent:top = 0;\fP
Specifies the minimum number of indentation to
use. Requires a numeric value greater than or equal zero.
.TP
.B \fBre2c:indent:string = "\et";\fP
Specifies the string to use for indentation. Requires a string that should
contain only whitespace unless you need this for external tools. The easiest
way to specify spaces is to enclude them in single or double quotes.
If you do not want any indentation at all you can simply set this to "".
.TP
.B \fBre2c:yych:conversion = 0;\fP
When this setting is non zero, then \fBre2c\fP automatically generates
conversion code whenever yych gets read. In this case the type must be
defined using \fBre2c:define:YYCTYPE\fP\&.
.TP
.B \fBre2c:yych:emit = 1;\fP
Generation of \fByych\fP can be suppressed by setting this to 0.
.TP
.B \fBre2c:yybm:hex = 0;\fP
If set to zero then a decimal table is being used else a hexadecimal table will be generated.
.TP
.B \fBre2c:yyfill:enable = 1;\fP
Set this to zero to suppress generation of \fBYYFILL (n)\fP\&. When using this be sure to verify that the generated
scanner does not read behind input. Allowing this behavior might
introduce sever security issues to you programs.
.TP
.B \fBre2c:yyfill:check = 1;\fP
This can be set 0 to suppress output of the
pre condition using \fBYYCURSOR\fP and \fBYYLIMIT\fP which becomes useful when
\fBYYLIMIT + YYMAXFILL\fP is always accessible.
.TP
.B \fBre2c:define:YYFILL = "YYFILL";\fP
Substitution for \fBYYFILL\fP\&. Note
that by default \fBre2c\fP generates argument in braces and semicolon after
\fBYYFILL\fP\&. If you need to make \fBYYFILL\fP an arbitrary statement rather
than a call, set \fBre2c:define:YYFILL:naked\fP to non\-zero and use
\fBre2c:define:YYFILL@len\fP to denote formal parameter inside of \fBYYFILL\fP
body.
.TP
.B \fBre2c:define:YYFILL@len = "@@";\fP
Any occurrence of this text
inside of \fBYYFILL\fP will be replaced with the actual argument.
.TP
.B \fBre2c:yyfill:parameter = 1;\fP
Controls argument in braces after
\fBYYFILL\fP\&. If zero, argument is omitted. If non\-zero, argument is
generated unless \fBre2c:define:YYFILL:naked\fP is set to non\-zero.
.TP
.B \fBre2c:define:YYFILL:naked = 0;\fP
Controls argument in braces and
semicolon after \fBYYFILL\fP\&. If zero, both argument and semicolon are
omitted. If non\-zero, argument is generated unless
\fBre2c:yyfill:parameter\fP is set to zero and semicolon is generated
unconditionally.
.TP
.B \fBre2c:startlabel = 0;\fP
If set to a non zero integer then the start
label of the next scanner blocks will be generated even if not used by
the scanner itself. Otherwise the normal \fByy0\fP like start label is only
being generated if needed. If set to a text value then a label with that
text will be generated regardless of whether the normal start label is
being used or not. This setting is being reset to 0 after a start
label has been generated.
.TP
.B \fBre2c:labelprefix = "yy";\fP
Allows one to change the prefix of numbered
labels. The default is \fByy\fP and can be set any string that is a valid
label.
.TP
.B \fBre2c:state:abort = 0;\fP
When not zero and switch \fB\-f\fP is active then
the \fBYYGETSTATE\fP block will contain a default case that aborts and a \-1
case is used for initialization.
.TP
.B \fBre2c:state:nextlabel = 0;\fP
Used when \fB\-f\fP is active to control
whether the \fBYYGETSTATE\fP block is followed by a \fByyNext:\fP label line.
Instead of using \fByyNext\fP you can usually also use configuration
\fBstartlabel\fP to force a specific start label or default to \fByy0\fP as
start label. Instead of using a dedicated label it is often better to
separate the \fBYYGETSTATE\fP code from the actual scanner code by placing a
\fB/*!getstate:re2c*/\fP comment.
.TP
.B \fBre2c:cgoto:threshold = 9;\fP
When \fB\-g\fP is active this value specifies
the complexity threshold that triggers generation of jump tables rather
than using nested if\(aqs and decision bitfields. The threshold is compared
against a calculated estimation of if\-s needed where every used bitmap
divides the threshold by 2.
.TP
.B \fBre2c:yych:conversion = 0;\fP
When the input uses signed characters and
\fB\-s\fP or \fB\-b\fP switches are in effect re2c allows one to automatically convert
to the unsigned character type that is then necessary for its internal
single character. When this setting is zero or an empty string the
conversion is disabled. Using a non zero number the conversion is taken
from \fBYYCTYPE\fP\&. If that is given by an inplace configuration that value
is being used. Otherwise it will be \fB(YYCTYPE)\fP and changes to that
configuration are no longer possible. When this setting is a string the
braces must be specified. Now assuming your input is a \fBchar *\fP
buffer and you are using above mentioned switches you can set
\fBYYCTYPE\fP to \fBunsigned char\fP and this setting to either 1 or \fB(unsigned char)\fP\&.
.TP
.B \fBre2c:define:YYCONDTYPE = "YYCONDTYPE";\fP
Enumeration used for condition support with \fB\-c\fP mode.
.TP
.B \fBre2c:define:YYCTXMARKER = "YYCTXMARKER";\fP
Allows one to overwrite the
define \fBYYCTXMARKER\fP and thus avoiding it by setting the value to the
actual code needed.
.TP
.B \fBre2c:define:YYCTYPE = "YYCTYPE";\fP
Allows one to overwrite the define
\fBYYCTYPE\fP and thus avoiding it by setting the value to the actual code
needed.
.TP
.B \fBre2c:define:YYCURSOR = "YYCURSOR";\fP
Allows one to overwrite the define
\fBYYCURSOR\fP and thus avoiding it by setting the value to the actual code
needed.
.TP
.B \fBre2c:define:YYDEBUG = "YYDEBUG";\fP
Allows one to overwrite the define
\fBYYDEBUG\fP and thus avoiding it by setting the value to the actual code
needed.
.TP
.B \fBre2c:define:YYGETCONDITION = "YYGETCONDITION";\fP
Substitution for
\fBYYGETCONDITION\fP\&. Note that by default \fBre2c\fP generates braces after
\fBYYGETCONDITION\fP\&. Set \fBre2c:define:YYGETCONDITION:naked\fP to non\-zero to
omit braces.
.TP
.B \fBre2c:define:YYGETCONDITION:naked = 0;\fP
Controls braces after
\fBYYGETCONDITION\fP\&. If zero, braces are omitted. If non\-zero, braces are
generated.
.TP
.B \fBre2c:define:YYSETCONDITION = "YYSETCONDITION";\fP
Substitution for
\fBYYSETCONDITION\fP\&. Note that by default \fBre2c\fP generates argument in
braces and semicolon after \fBYYSETCONDITION\fP\&. If you need to make
\fBYYSETCONDITION\fP an arbitrary statement rather than a call, set
\fBre2c:define:YYSETCONDITION:naked\fP to non\-zero and use
\fBre2c:define:YYSETCONDITION@cond\fP to denote formal parameter inside of
\fBYYSETCONDITION\fP body.
.TP
.B \fBre2c:define:YYSETCONDITION@cond = "@@";\fP
Any occurrence of this
text inside of \fBYYSETCONDITION\fP will be replaced with the actual
argument.
.TP
.B \fBre2c:define:YYSETCONDITION:naked = 0;\fP
Controls argument in braces
and semicolon after \fBYYSETCONDITION\fP\&. If zero, both argument and
semicolon are omitted. If non\-zero, both argument and semicolon are
generated.
.TP
.B \fBre2c:define:YYGETSTATE = "YYGETSTATE";\fP
Substitution for
\fBYYGETSTATE\fP\&. Note that by default \fBre2c\fP generates braces after
\fBYYGETSTATE\fP\&. Set \fBre2c:define:YYGETSTATE:naked\fP to non\-zero to omit
braces.
.TP
.B \fBre2c:define:YYGETSTATE:naked = 0;\fP
Controls braces after
\fBYYGETSTATE\fP\&. If zero, braces are omitted. If non\-zero, braces are
generated.
.TP
.B \fBre2c:define:YYSETSTATE = "YYSETSTATE";\fP
Substitution for
\fBYYSETSTATE\fP\&. Note that by default \fBre2c\fP generates argument in braces
and semicolon after \fBYYSETSTATE\fP\&. If you need to make \fBYYSETSTATE\fP an
arbitrary statement rather than a call, set
\fBre2c:define:YYSETSTATE:naked\fP to non\-zero and use
\fBre2c:define:YYSETSTATE@cond\fP to denote formal parameter inside of
\fBYYSETSTATE\fP body.
.TP
.B \fBre2c:define:YYSETSTATE@state = "@@";\fP
Any occurrence of this text
inside of \fBYYSETSTATE\fP will be replaced with the actual argument.
.TP
.B \fBre2c:define:YYSETSTATE:naked = 0;\fP
Controls argument in braces and
semicolon after \fBYYSETSTATE\fP\&. If zero, both argument and semicolon are
omitted. If non\-zero, both argument and semicolon are generated.
.TP
.B \fBre2c:define:YYLIMIT = "YYLIMIT";\fP
Allows one to overwrite the define
\fBYYLIMIT\fP and thus avoiding it by setting the value to the actual code
needed.
.TP
.B \fBre2c:define:YYMARKER = "YYMARKER";\fP
Allows one to overwrite the define
\fBYYMARKER\fP and thus avoiding it by setting the value to the actual code
needed.
.TP
.B \fBre2c:label:yyFillLabel = "yyFillLabel";\fP
Allows one to overwrite the name of the label \fByyFillLabel\fP\&.
.TP
.B \fBre2c:label:yyNext = "yyNext";\fP
Allows one to overwrite the name of the label \fByyNext\fP\&.
.TP
.B \fBre2c:variable:yyaccept = yyaccept;\fP
Allows one to overwrite the name of the variable \fByyaccept\fP\&.
.TP
.B \fBre2c:variable:yybm = "yybm";\fP
Allows one to overwrite the name of the variable \fByybm\fP\&.
.TP
.B \fBre2c:variable:yych = "yych";\fP
Allows one to overwrite the name of the variable \fByych\fP\&.
.TP
.B \fBre2c:variable:yyctable = "yyctable";\fP
When both \fB\-c\fP and \fB\-g\fP are active then \fBre2c\fP uses this variable to generate a static jump table
for \fBYYGETCONDITION\fP\&.
.TP
.B \fBre2c:variable:yystable = "yystable";\fP
Deprecated.
.TP
.B \fBre2c:variable:yytarget = "yytarget";\fP
Allows one to overwrite the name of the variable \fByytarget\fP\&.
.UNINDENT
.SS REGULAR EXPRESSIONS
.INDENT 0.0
.TP
.B \fB"foo"\fP
literal string \fB"foo"\fP\&. ANSI\-C escape sequences can be used.
.TP
.B \fB\(aqfoo\(aq\fP
literal string \fB"foo"\fP (characters [a\-zA\-Z] treated
case\-insensitive). ANSI\-C escape sequences can be used.
.TP
.B \fB[xyz]\fP
character class; in this case, regular expression matches either \fBx\fP, \fBy\fP, or \fBz\fP\&.
.TP
.B \fB[abj\-oZ]\fP
character class with a range in it; matches \fBa\fP, \fBb\fP, any letter from \fBj\fP through \fBo\fP or \fBZ\fP\&.
.TP
.B \fB[^class]\fP
inverted character class.
.TP
.B \fBr \e s\fP
match any \fBr\fP which isn\(aqt \fBs\fP\&. \fBr\fP and \fBs\fP must be regular expressions
which can be expressed as character classes.
.TP
.B \fBr*\fP
zero or more occurrences of \fBr\fP\&.
.TP
.B \fBr+\fP
one or more occurrences of \fBr\fP\&.
.TP
.B \fBr?\fP
optional \fBr\fP\&.
.TP
.B \fB(r)\fP
\fBr\fP; parentheses are used to override precedence.
.TP
.B \fBr s\fP
\fBr\fP followed by \fBs\fP (concatenation).
.TP
.B \fBr | s\fP
either \fBr\fP or \fBs\fP (alternative).
.TP
.B \fBr\fP / \fBs\fP
\fBr\fP but only if it is followed by \fBs\fP\&. Note that \fBs\fP is not
part of the matched text. This type of regular expression is called
"trailing context". Trailing context can only be the end of a rule
and not part of a named definition.
.TP
.B \fBr{n}\fP
matches \fBr\fP exactly \fBn\fP times.
.TP
.B \fBr{n,}\fP
matches \fBr\fP at least \fBn\fP times.
.TP
.B \fBr{n,m}\fP
matches \fBr\fP at least \fBn\fP times, but not more than \fBm\fP times.
.TP
.B \fB\&.\fP
match any character except newline.
.TP
.B \fBname\fP
matches named definition as specified by \fBname\fP only if \fB\-F\fP is
off. If \fB\-F\fP is active then this behaves like it was enclosed in double
quotes and matches the string "name".
.UNINDENT
.sp
Character classes and string literals may contain octal or hexadecimal
character definitions and the following set of escape sequences:
\fB\ea\fP, \fB\eb\fP, \fB\ef\fP, \fB\en\fP, \fB\er\fP, \fB\et\fP, \fB\ev\fP, \fB\e\e\fP\&. An octal character is defined by a backslash
followed by its three octal digits (e.g. \fB\e377\fP).
Hexadecimal characters from 0 to 0xFF are defined by backslash, a lower
cased \fBx\fP and two hexadecimal digits (e.g. \fB\ex12\fP). Hexadecimal characters from 0x100 to 0xFFFF are defined by backslash, a lower cased
\fB\eu\fP or an upper cased \fB\eX\fP and four hexadecimal digits (e.g. \fB\eu1234\fP).
Hexadecimal characters from 0x10000 to 0xFFFFffff are defined by backslash, an upper cased \fB\eU\fP
and eight hexadecimal digits (e.g. \fB\eU12345678\fP).
.sp
The only portable "any" rule is the default rule \fB*\fP\&.
.SH SCANNER WITH STORABLE STATES
.sp
When the \fB\-f\fP flag is specified, \fBre2c\fP generates a scanner that can
store its current state, return to the caller, and later resume
operations exactly where it left off.
.sp
The default operation of \fBre2c\fP is a
"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
the parsing loop, and that may not always be convenient.
.sp
Typically, if there is a preprocessor ahead of the scanner in the
stream, or for that matter any other procedural source of data, the
scanner cannot "ask" for more data unless both scanner and source
live in a separate threads.
.sp
The \fB\-f\fP flag is useful for just this situation: it lets users design
scanners that work in a "push" model, i.e. where data is fed to the
scanner chunk by chunk. When the scanner runs out of data to consume, it
just stores its state, and return to the caller. When more input data is
fed to the scanner, it resumes operations exactly where it left off.
.sp
Changes needed compared to the "pull" model:
.INDENT 0.0
.IP \(bu 2
User has to supply macros \fBYYSETSTATE ()\fP and \fBYYGETSTATE (state)\fP\&.
.IP \(bu 2
The \fB\-f\fP option inhibits declaration of \fByych\fP and \fByyaccept\fP\&. So the
user has to declare these. Also the user has to save and restore these.
In the example \fBexamples/push_model/push.re\fP these are declared as
fields of the (C++) class of which the scanner is a method, so they do
not need to be saved/restored explicitly. For C they could e.g. be made
macros that select fields from a structure passed in as parameter.
Alternatively, they could be declared as local variables, saved with
\fBYYFILL (n)\fP when it decides to return and restored at entry to the
function. Also, it could be more efficient to save the state from
\fBYYFILL (n)\fP because \fBYYSETSTATE (state)\fP is called unconditionally.
\fBYYFILL (n)\fP however does not get \fBstate\fP as parameter, so we would have
to store state in a local variable by \fBYYSETSTATE (state)\fP\&.
.IP \(bu 2
Modify \fBYYFILL (n)\fP to return (from the function calling it) if more input is needed.
.IP \(bu 2
Modify caller to recognise if more input is needed and respond appropriately.
.IP \(bu 2
The generated code will contain a switch block that is used to
restores the last state by jumping behind the corrspoding \fBYYFILL (n)\fP
call. This code is automatically generated in the epilog of the first \fB/*!re2c */\fP
block. It is possible to trigger generation of the \fBYYGETSTATE ()\fP
block earlier by placing a \fB/*!getstate:re2c*/\fP comment. This is especially useful when the scanner code should be
wrapped inside a loop.
.UNINDENT
.sp
Please see \fBexamples/push_model/push.re\fP for "push" model scanner. The
generated code can be tweaked using inplace configurations \fBstate:abort\fP
and \fBstate:nextlabel\fP\&.
.SH SCANNER WITH CONDITION SUPPORT
.sp
You can precede regular expressions with a list of condition names when
using the \fB\-c\fP switch. In this case \fBre2c\fP generates scanner blocks for
each conditon. Where each of the generated blocks has its own
precondition. The precondition is given by the interface define
\fBYYGETCONDITON()\fP and must be of type \fBYYCONDTYPE\fP\&.
.sp
There are two special rule types. First, the rules of the condition \fB<*>\fP
are merged to all conditions (note that they have lower priority than
other rules of that condition). And second the empty condition list
allows one to provide a code block that does not have a scanner part.
Meaning it does not allow any regular expression. The condition value
referring to this special block is always the one with the enumeration
value 0. This way the code of this special rule can be used to
initialize a scanner. It is in no way necessary to have these rules: but
sometimes it is helpful to have a dedicated uninitialized condition
state.
.sp
Non empty rules allow one to specify the new condition, which makes them
transition rules. Besides generating calls for the define
\fBYYSETCONDTITION\fP no other special code is generated.
.sp
There is another kind of special rules that allow one to prepend code to any
code block of all rules of a certain set of conditions or to all code
blocks to all rules. This can be helpful when some operation is common
among rules. For instance this can be used to store the length of the
scanned string. These special setup rules start with an exclamation mark
followed by either a list of conditions \fB<! condition, ... >\fP or a star
\fB<!*>\fP\&. When \fBre2c\fP generates the code for a rule whose state does not have a
setup rule and a star\(aqd setup rule is present, than that code will be
used as setup code.
.SH ENCODINGS
.sp
\fBre2c\fP supports the following encodings: ASCII (default), EBCDIC (\fB\-e\fP),
UCS\-2 (\fB\-w\fP), UTF\-16 (\fB\-x\fP), UTF\-32 (\fB\-u\fP) and UTF\-8 (\fB\-8\fP).
See also inplace configuration \fBre2c:flags\fP\&.
.sp
The following concepts should be clarified when talking about encoding.
Code point is an abstract number, which represents single encoding
symbol. Code unit is the smallest unit of memory, which is used in the
encoded text (it corresponds to one character in the input stream). One
or more code units can be needed to represent a single code point,
depending on the encoding. In fixed\-length encoding, each code point
is represented with equal number of code units. In variable\-length
encoding, different code points can be represented with different number
of code units.
.INDENT 0.0
.TP
.B ASCII
is a fixed\-length encoding. Its code space includes 0x100
code points, from 0 to 0xFF. One code point is represented with exactly one
1\-byte code unit, which has the same value as the code point. Size of
\fBYYCTYPE\fP must be 1 byte.
.TP
.B EBCDIC
is a fixed\-length encoding. Its code space includes 0x100
code points, from 0 to 0xFF. One code point is represented with exactly
one 1\-byte code unit, which has the same value as the code point. Size
of \fBYYCTYPE\fP must be 1 byte.
.TP
.B UCS\-2
is a fixed\-length encoding. Its code space includes 0x10000
code points, from 0 to 0xFFFF. One code point is represented with
exactly one 2\-byte code unit, which has the same value as the code
point. Size of \fBYYCTYPE\fP must be 2 bytes.
.TP
.B UTF\-16
is a variable\-length encoding. Its code space includes all
Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
code point is represented with one or two 2\-byte code units. Size of
\fBYYCTYPE\fP must be 2 bytes.
.TP
.B UTF\-32
is a fixed\-length encoding. Its code space includes all
Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
code point is represented with exactly one 4\-byte code unit. Size of
\fBYYCTYPE\fP must be 4 bytes.
.TP
.B UTF\-8
is a variable\-length encoding. Its code space includes all
Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
code point is represented with sequence of one, two, three or four
1\-byte code units. Size of \fBYYCTYPE\fP must be 1 byte.
.UNINDENT
.sp
In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
valid Unicode code points, any encoded sequence of code units, that
would map to Unicode code points in the range 0xD800\-0xDFFF, is
ill\-formed. The user can control how \fBre2c\fP treats such ill\-formed
sequences with \fB\-\-encoding\-policy <policy>\fP flag (see \fBOPTIONS\fP
for full explanation).
.sp
For some encodings, there are code units, that never occur in valid
encoded stream (e.g. 0xFF byte in UTF\-8). If the generated scanner must
check for invalid input, the only true way to do so is to use default
rule \fB*\fP\&. Note, that full range rule \fB[^]\fP won\(aqt catch invalid code units when variable\-length encoding is used
(\fB[^]\fP means "all valid code points", while default rule \fB*\fP means "all possible code units").
.SH GENERIC INPUT API
.sp
\fBre2c\fP usually operates on input using pointer\-like primitives
\fBYYCURSOR\fP, \fBYYMARKER\fP, \fBYYCTXMARKER\fP and \fBYYLIMIT\fP\&.
.sp
Generic input API (enabled with \fB\-\-input custom\fP switch) allows one to
customize input operations. In this mode, \fBre2c\fP will express all
operations on input in terms of the following primitives:
.INDENT 0.0
.INDENT 3.5
.TS
center;
|l|l|.
_
T{
\fBYYPEEK ()\fP
T}	T{
get current input character
T}
_
T{
\fBYYSKIP ()\fP
T}	T{
advance to the next character
T}
_
T{
\fBYYBACKUP ()\fP
T}	T{
backup current input position
T}
_
T{
\fBYYBACKUPCTX ()\fP
T}	T{
backup current input position for trailing context
T}
_
T{
\fBYYRESTORE ()\fP
T}	T{
restore current input position
T}
_
T{
\fBYYRESTORECTX ()\fP
T}	T{
restore current input position for trailing context
T}
_
T{
\fBYYLESSTHAN (n)\fP
T}	T{
check if less than \fBn\fP input characters are left
T}
_
.TE
.UNINDENT
.UNINDENT
.sp
A couple of useful links that provide some examples:
.INDENT 0.0
.IP 1. 3
\fI\%http://skvadrik.github.io/aleph_null/posts/re2c/2015\-01\-13\-input_model.html\fP
.IP 2. 3
\fI\%http://skvadrik.github.io/aleph_null/posts/re2c/2015\-01\-15\-input_model_custom.html\fP
.UNINDENT
.SH SEE ALSO
.sp
You can find more information about \fBre2c\fP on the website: \fI\%http://re2c.org\fP\&.
See also: flex(1), lex(1), quex (\fI\%http://quex.sourceforge.net\fP).
.SH AUTHORS
.sp
Peter Bumbulis   \fI\%peter@csg.uwaterloo.ca\fP
.sp
Brian Young      \fI\%bayoung@acm.org\fP
.sp
Dan Nuffer       \fI\%nuffer@users.sourceforge.net\fP
.sp
Marcus Boerger   \fI\%helly@users.sourceforge.net\fP
.sp
Hartmut Kaiser   \fI\%hkaiser@users.sourceforge.net\fP
.sp
Emmanuel Mogenet \fI\%mgix@mgix.com\fP
.sp
Ulya Trofimovich \fI\%skvadrik@gmail.com\fP
.SH VERSION INFORMATION
.sp
This manpage describes \fBre2c\fP version 0.16, package date 21 Jan 2016.
.\" Generated by docutils manpage writer.
.