1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847
|
DELETEMACRO(file)
NOUSERMACRO(LALR lex yylex valid swap setDebug YYText specification input
file tag)
includefile(../../release.yo)
htmlstyle(body)(color: #27408B; background: #FFFAF0)
whenhtml(mailto(Frank B. Brokken: f.b.brokken@rug.nl))
DEFINEMACRO(lsoption)(3)(\
bf(--ARG1)=tt(ARG3) (bf(-ARG2))\
)
DEFINEMACRO(laoption)(2)(\
bf(--ARG1)=tt(ARG2)\
)
DEFINEMACRO(loption)(1)(\
bf(--ARG1)\
)
DEFINEMACRO(soption)(1)(\
bf(-ARG1)\
)
DEFINEMACRO(itx)(0)()
DEFINEMACRO(itemlist)(1)(ARG1)
DEFINEMACRO(bic)(0)(bf(bisonc++))
DEFINEMACRO(b)(0)(bf(bisonc++))
DEFINEMACRO(Bic)(0)(bf(Bisonc++))
DEFINEMACRO(Cpp)(0)(bf(C++))
DEFINEMACRO(prot)(0)(tt((prot)))
DEFINEMACRO(itt)(1)(it() tt(ARG1))
DELETEMACRO(tt)
DEFINEMACRO(tt)(1)(em(ARG1))
COMMENT( man-request, section, date, distribution file, general name)
manpage(bisonc++input)(7)(_CurYrs_)(bisonc++._CurVers_)
(bisonc++ grammar file organization)
COMMENT( man-request, larger title )
manpagename(bisonc++input)(Organization of bisonc++'s grammar file(s))
manpagesection(DESCRIPTION)
Bic() derives from bf(bison++)(1), originally derived from
bf(bison)(1). Like these programs bic() generates a parser for an LALR(1)
grammar. Bic() generates bf(C++) code: an expandable bf(C++) class.
Refer to bf(bisonc++)(1) for a general overview. This manual page covers the
structure and organization of bic()'s grammar file(s).
Bic()'s grammar file has the following generic outline:
verb(
directives (see the next section)
%%
grammar rules
)
Grammar rules have the following generic form:
verb(
nonterminal:
production-rules
;
)
Production rules consist of zero or more sequences of terminal tokens,
nonterminal tokens and/or action blocks. When multiple production rules are
used they must be separated from each other by vertical bars. Action blocks
are bf(C++) compound statements.
This manual page contains the following sections:
itemization(
it() bf(DESCRIPTION): this section;
it() bf(DIRECTIVES): bic()'s grammar-specification directives;
it() bf(POLYMORPHIC SEMANTIC VALUES): how to use polymorphic semantic
values in parsers generated by bic();
it() bf(DOLLAR NOTATIONS): available $-shorthand notations with single,
union, and polymorphic semantic value types.
it() bf(RESTRICTIONS ON TOKEN NAMES): name restrictions for user-defined
symbols;
it() bf(OBSOLETE SYMBOLS): symbols available to bf(bison)(1), but not
to bic();
it() bf(USING SYMBOLIC TOKENS IN CLASSES OTHER THAN THE PARSER CLASS);
how to refer to tokens defined in the grammar;
it() bf(EXAMPLE): an example of using bic();
it() bf(SEE ALSO): references to other programs and documentation;
it() bf(AUTHOR): at the end of this man-page.
)
manpagesection(UNDERSCORES)
includefile(../manual/underscores.yo)
manpagesection(DIRECTIVES)
Quite a few directives can be specified in the initial section of the
grammar specification file. If command-line options for directives are
available, then their specifications take precedence over the corresponding
directives in the grammar file. Once class header or implementation header
files exist directives affecting those files are ignored.
Directives accepting a `filename' do not accept path names, i.e., they
cannot contain directory separators (tt(/)); directives accepting a 'pathname'
may contain directory separators. A 'pathname' using blank characters should
be surrounded by double quotes.
Some directives may generate errors. This happens when their specifications
conflict with the contents of files bic() cannot modify (e.g., a parser class
header file exists, but doesn't define a namespace, but in a later run the a
tt(%namespace) directive was provided).
To resolve such errors the offending directive could be omitted, the existing
file could be removed, or the existing file could be hand-edited according to
the directive's specification.
itemization(
it() bf(%baseclass-header) tt(filename)
tt(Filename) defines the name of the file to contain the parser's
base class. This class defines, e.g., the parser's symbolic
tokens. Defaults to the name of the parser class plus the suffix
tt(base.h). This directive is overruled by the
bf(--baseclass-header) (bf(-b)) command-line option.
It is an error if this directive is used and an already
existing parser class header file does not contain tt(#include
"filename").
it() bf(%baseclass-preinclude) tt(pathname)
tt(Pathname) defines the path to the file preincluded by the
parser's base-class header. See the description of the
tt(--baseclass-preinclude) option for details about this
directive. By default, bic() surrounds tt(header) by double
quotes. However, when tt(header) itself is surrounded by pointed
brackets tt(#include <header>) is included.
it() bf(%class-header) tt(filename)
tt(Filename) defines the name of the file to contain the parser
class. Defaults to the name of the parser class plus the suffix
tt(.h) This directive is overruled by the bf(--class-header)
(bf(-c)) command-line option.
It is an error if this directive is used and an already
existing implementation header file does not contain tt(#include
"filename").
it() bf(%class-name) tt(parser-class-name)
Declares the name of the parser class. It defines the name of the
bf(C++) class that is generated. If no tt(%class-name) is
specified the default class name tt(Parser) is used.
It is an error if this directive is used and an already
existing parser-class header file does not define tt(class
`className') and/or if an already existing implementation header
file does not define members of the class tt(`className').
it() bf(%debug)
Add debugging code to the generated tt(parse) and its support
functions, which can show (on the standard output stream) the
steps performed by the parsing function while it parses input
streams. When this directive is specified then the parsing steps
are shown by default. The tt(setDebug) members can be used to
suppress outputting these parsing steps. tt(#ifdef DEBUG) macros
are not used. Existing debugging code can be removed by rerunning
bic() without specifying the tt(debug) option or directive.
it() bf(%default-actions)(d)(off|quiet|warn|std)
By default, bic() adds a tt($$ = $1) action block to rules not having final
action blocks, but not to empty production rules. This default behavior can
also explicitly be configured using the tt(default-actions std) option or
directive.
Bic() also supports alternate ways of handling rules not having final action
blocks. When tt(off) is specified, bic() does not add tt($$ = $1) action
blocks; when polymorphic semantic values are used, then specifying
- tt(warn) adds specialized action blocks, using the semantic types of the
first elements of the production rules, while issuing a warning;
- tt(quiet) adds these action blocks without issuing warnings.
When either tt(warn) or tt(quiet) are specified the types of $$ and $1 must
match. When bic() detects a type mismatches it issues errors.
it() bf(%error-verbose)
This directive can be specified to dump the parser's state stack to
the standard output stream when the parser encounters a syntactic
error. The stack dump shows on separate lines a stack index
followed by the state stored at the indicated stack element. The
first stack element is the stack's top element.
it() bf(%expect) tt(number)
This directive specifies the exact number of shift/reduce and
reduce/reduce conflicts for which no warnings are to be
generated. Details of the conflicts are reported in the verbose
output file (e.g., tt(grammar.output)). If the number of actually
encountered conflicts deviates from `tt(number)', then this
directive is ignored.
it() bf(%filenames) tt(filename)
tt(Filename) is a generic filename that is used for all header
files generated by bic(). Options defining specific filenames are
also available (which then, in turn, overrule the name specified
by this directive). This directive is overruled by the
bf(--filenames) (bf(-f)) command-line option.
it() bf(%flex)
When provided, the scanner member returning the matched text is
called as tt(d_scanner.YYText()), and the scanner member returning
the next lexical token is called as tt(d_scanner.yylex()). This
directive is only interpreted if the tt(%scanner) directive is
also provided.
it() bf(%implementation-header) tt(filename)
tt(Filename) defines the name of the file to contain the
implementation header. It defaults to the name of the generated
parser class plus the suffix tt(.ih).
The implementation header should contain all directives and
declarations that are em(only) used by the parser's member
functions. It is the only header file that is included by the
source file containing tt(parse)'s implementation. User defined
implementation of other class members may use the same convention,
thus concentrating all directives and declarations that are
required for the compilation of other source files belonging to
the parser class in one header file.
it() bf(%include) tt(pathname)
This directive is used to switch to tt(pathname) while processing a
grammar specification. Unless tt(pathname) defines an absolute
file-path, tt(pathname) is searched relative to the location of
bic()'s main grammar specification file (i.e., the grammar file
that was specified as bic()'s command-line option). This directive
can be used to split long grammar specification files in shorter,
meaningful units. After processing tt(pathname) processing
continues beyond the tt(%include pathname) directive.
it() bf(%left) tt(terminal ...)
Defines the names of symbolic terminal tokens that must be treated
as left-associative. I.e., in case of a shift/reduce conflict, a
reduction is preferred over a shift. Sequences of tt(%left,
%nonassoc, %right) and tt(%token) directives may be used to define
the precedence of operators. In expressions, the first used
directive defines the tokens having the lowest precedence, the
last used defines the tokens having the highest priority. See also
tt(%token) below.
it() bf(%locationstruct) tt(struct-definition)
Defines the organization of the location-struct data type
tt(LTYPE_). This struct should be specified analogously to the
way the parser's stacktype is defined using tt(%union) (see
below). The location struct is named tt(LTYPE_). By default (if
neither tt(locationstruct) nor tt(LTYPE_) is specified) the
standard location struct (see the next directive) is used:
it() bf(%lsp-needed)
This directive results in bic() generating a parser using
the standard location stack. This stack's default type is:
verb(
struct LTYPE_
{
int timestamp;
int first_line;
int first_column;
int last_line;
int last_column;
char *text;
};
)
Bic() does em(not) provide the elements of the tt(LTYPE_) struct
with values. Action blocks of production rules may refer to the
location stack element associated with a production element using
tt(@) variables, like tt(@1.timestamp, @3.text, @5). The rule's
location struct itself may be referred to as either tt(d_loc_) or
tt(@@).
it() bf(%ltype typename)
Specifies a user-defined token location type. If tt(%ltype) is
used, tt(typename) should be the name of an alternate (predefined)
type (e.g., tt(size_t)). It should not be used if a
tt(%locationstruct) specification is defined (see below). Within
the parser class, this type is available as the type
`tt(LTYPE_)'. All text on the line following tt(%ltype) is used
for the tt(typename) specification. It should therefore not
contain comment or any other characters that are not part of the
actual type definition.
it() bf(%namespace) tt(namespace)
Define all of the code generated by bic() in the namespace
tt(namespace). By default no namespace is defined. If this
directive is used the implementation header is provided with a
commented out tt(using namespace) declaration for the specified
namespace. In addition, the parser and parser base class
header files also use the specified namespace to define their
include guard directives.
It is an error if this directive is used and an already
existing parser-class header file and/or implementation header
file does not define tt(namespace identifier).
it() bf(%negative-dollar-indices)
Do not generate warnings when zero- or negative dollar-indices are
used in the grammar's action blocks. Zero or negative
dollar-indices are commonly used to implement inherited
attributes, and should normally be avoided. When used, they can be
specified like tt($-1), or like tt($<type>-1), where tt(type) is
empty; an tt(STYPE_) tag; or a field-name. However, note that in
combination with the tt(%polymorphic) directive (see below) only
the tt($-i) format can be used.
it() bf(%no-lines)
By default tt(#line) preprocessor directives are inserted just
before action statements in the file containing the parser's
tt(parse) function. These directives are suppressed by the
tt(%no-lines) directive.
it() bf(%nonassoc) tt(terminal ...)
Defines the names of symbolic terminal tokens that should be
treated as non-associative. I.e., in case of a shift/reduce
conflict, a reduction is preferred over a shift. Sequences of
tt(%left, %nonassoc, %right) and tt(%token) directives may be used
to define the precedence of operators. In expressions, the first
used directive defines the tokens having the lowest precedence,
the last used defines the tokens having the highest priority. See
also tt(%token) below.
it() bf(%parsefun-source) tt(filename)
tt(Filename) defines the name of the file to contain the parser
member function tt(parse). Defaults to tt(parse.cc). This
directive is overruled by the bf(--parse-source) (bf(-p))
command-line option.
it() bf(%polymorphic) tt(polymorphic-specification(s))
Bison's traditional way of handling multiple semantic values is to
use a tt(%union) specification (see below). Although tt(%union) is
supported by bic(), a polymorphic semantic value class is
preferred due to its improved type safety.
The tt(%polymorphic) directive defines a polymorphic semantic
value class and can be used instead of a tt(%union)
specification. Refer to section bf(POLYMORPHIC SEMANTIC VALUES)
below or to bic()'s user manual for a detailed description of the
specification, characteristics, and use of polymorphic semantic
values.
it() bf(%prec) tt(token)
Defines the precedence of a (non-empty) production rule. By
default, production rules have priorities that are equal to the priorities of
their first terminal tokens, or they receive the maximum possible priority if
they don't contain terminal tokens. To change a production rule's default
priority the tt(%prec) directive is used, which assigns the directive's
token's priority to the production rule's priority. A well known application
of tt(%prec) is:
verb(
expression:
'-' expression %prec UMINUS
{
...
}
)
Here, the default priority and precedence of the `tt(-)' token as
the subtraction operator is overruled by the precedence and
priority of the tt(UMINUS) token, which is commonly defined as
verb(
%right UMINUS
)
(see below) following, e.g., the tt('*') and tt('/') operators.
Refer to bic()'s user manual for a more elaborate coverage of the
tt(%prec) directive.
it() bf(%print-tokens)
The tt(print) directive provides an implementation of the Parser
class's tt(print_) function displaying the current token value
and the text matched by the lexical scanner as received by the
generated tt(parse) function.
it() bf(%prompt)
When adding debugging code (using the tt(debug) option or
directive) the debug information is displayed continuously while
the parser processes its input. When using the tt(prompt)
directive the generated parser displays a prompt (a question
mark) at each step of the parsing process. Caveat: when using this
option the parser's input cannot be provided at the parser's
standard input stream.
it() bf(%required-tokens) tt(number)
Following a syntactic error, require at least tt(number)
successfully processed tokens before another syntactic error can
be reported. By default tt(number) is zero.
it() bf(%right) tt(terminal ...)
Defines the names of symbolic terminal tokens that should be
treated as right-associative. I.e., in case of a shift/reduce
conflict, a shift is preferred over a reduction. Sequences of
tt(%left, %nonassoc, %right) and tt(%token) directives may be used
to define the precedence of operators. In expressions, the first
used directive defines the tokens having the lowest precedence,
the last used defines the tokens having the highest priority. See
also tt(%token) below.
it() bf(%scanner) tt(pathname)
Use tt(pathname) as the path name to the file pre-included in the
parser's class header. See the description of the tt(--scanner)
option for details about this directive. Similar to the convention
adopted for this argument, tt(pathname) by default is surrounded
by double quotes. However, when the argument is surrounded by
pointed brackets tt(#include <pathname>) is included. This
directive results in the definition of a composed tt(Scanner
d_scanner) data member into the generated parser, and in the
definition of a tt(int lex()) member, returning
tt(d_scanner.lex()).
By specifying the tt(%flex) directive the function
tt(d_scanner.yylex()) is called. Any other function to call can be
specified using the tt(--scanner-token-function) option (or
tt(%scanner-token-function) directive).
It is an error if this directive is used and an already
existing parser class header file does not include tt(`pathname').
it() bf(%scanner-class-name) tt(scannerClassName)
Defines the name of the scanner class, declared by the tt(pathname)
header file that is specified at the tt(scanner) option or
directive. By default the class name tt(Scanner) is used.
It is an error if this directive is used and either the
tt(scanner) directive was not provided, or the parser class
interface in an already existing parser class header file does not
declare a scanner class tt(d_scanner) object.
it() bf(%scanner-matched-text-function) tt(function-call)
The scanner function returning the text that was matched by the
lexical scanner after its token function (see below) has
returned. A complete function call expression should be provided
(including a scanner object, if used). Example:
verb(
%scanner-matched-text-function myScanner.matchedText()
)
By specifying the tt(%flex) directive the function
tt(d_scanner.YYText()) is called.
If the function call contains white space
tt(scanner-token-function) should be surrounded by double quotes.
it() bf(%scanner-token-function) tt(function-call)
The scanner function returning the next token, called from the
generated parser's tt(lex) function. A complete function
call expression should be provided (including a scanner object, if
used). Example:
verb(
%scanner-token-function d_scanner.lex()
)
If the function call contains white space
tt(scanner-token-function) should be surrounded by double quotes.
It is an error if this directive is used and the scanner token
function is not called from the code in an already
existing implementation header.
it() bf(%stack-expansion) tt(size)
Defines the number of elements to be added to the generated
parser's semantic value stack when it must be enlarged. By default
10 elements are added to the stack. This option/directive is
interpreted only once, and only if tt(size) at least equals the
default stack expansion size of 10.
it() bf(%start) tt(nonterminal)
The nonterminal tt(nonterminal) should be used as the grammar's
start-symbol. If omitted, the first grammatical rule is used
as the grammar's starting rule. All syntactically correct
sentences must be derivable from this starting rule.
it() bf(%stype) tt(typename)
The type of the semantic value of nonterminal tokens. By default
it is tt(int). tt(%stype, %union,) and tt(%polymorphic) are
mutually exclusive directives.
Within the parser class, the semantic value type is available as
the type `tt(STYPE_)'. All text on the line following tt(%stype)
is used for the tt(typename) specification. It should therefore
not contain comment or any other characters that are not part of
the actual type definition.
it() bf(%tag-mismatches) tt(on|off)
This directive is only interpreted when polymorphic semantic values are
used. When tt(on) is specified (which is used by default) the tt(parse) member
of the generated parser dynamically checks that the tag that is used when
calling a semantic value's tt(get) member matches the actual tag of the
semantic value.
If a mismatch is observed, then the parsing function aborts after displaying a
fatal error message. If this happens, and if the option/directive tt(debug)
was specified when bic() created the parser's parsing function, then the
program can be rerun, specifying tt(parser.setDebug(Parser::ACTIONCASES))
before calling the parsing function. As a result the case-entry numbers of the
tt(switch), defined in the parser's tt(executeAction) member, are inserted
into the standard output stream. The action case number reported just before
the program displays the fatal error message tells you in which of the
grammar's action block the error was encountered.
it() bf(%target-directory) tt(pathname)
tt(Pathname) defines the directory where generated files should be
written. By default this is the directory where bic() is
called. This directive is overruled by the tt(--target-directory)
command-line option.
it() bf(%thread-safe)
Only used with polymorphic semantic values, and then only required
when the parser is used in multiple threads: it ensures that each
thread's polymorphic code only accesses its own parser's error
counting variable.
it() bf(%token) tt(terminal ...)
Defines the names of symbolic terminal tokens. Sequences of
tt(%left, %nonassoc, %right) and tt(%token) directives may be used
to define the precedence of operators. In expressions, the first
used directive defines the tokens having the lowest precedence,
the last used defines the tokens having the highest priority. See
also tt(%token) below.
bf(NOTE:) Symbolic tokens are defined as tt(enum)-values in the
parser's base class. The names of symbolic tokens may not be equal
to the names of the members and types defined by bic() itself (see
the next sections). This requirement is em(not) enforced by bic(),
but compilation errors may result if this requirement is violated.
it() bf(%token-class) tt(classname)
tt(Classname) defines the name of the tt(Tokens) class that is
defined when the tt(%token-path) directive or option (see below)
is specified. If tt(token-path) isn't specified then
this directive is ignored. By default the class name tt(Tokens) is
used.
it() bf(%token-namespace) tt(namespace)
If tt(token-path) is specified (see below) then tt(namespace)
defines the namespace of the tt(Tokens) class. By default no
namespace is used.
it() bf(%token-path) tt(pathname)
tt(Pathname) defines the path name of the file to contain the
tt(struct Tokens) defining the enumeration tt(Tokens_) containing
the symbolic tokens of the generated grammar. If this option is
specified the tt(ParserBase) class is derived from it, thus making
the tokens available to the generated parser class. The name of
the tt(struct Tokens) can be altered using the tt(token-class)
directive or option. By default (if tt(token_path) is not
specified) the tokens are defined as the tt(enum Tokens_) in the
tt(ParserBase) class. If tt(pathname) doesn't exist it is created
by bic(). If the file tt(pathname) already exists it is rewritten
at each new run of bic().
it() bf(%type) tt(<type> nonterminal ...)
In combination with tt(%polymorphic) or tt(%union): associate the
semantic value of a nonterminal symbol with a polymorphic
semantic value tag or union field defined by these directives.
it() bf(%union) tt(union-definition)
Acts identically to the identically named bf(bison) and bf(bison++)
declaration. Bic() generates a union, named tt(STYPE_), as its
semantic type.
it() bf(%weak-tags)
This directive is ignored unless the tt(%polymorphic)
directive was specified. It results in the declaration of tt(enum
Tag_) rather than tt(enum class Tag_). When in doubt, don't use
this directive.
)
manpagesection(POLYMORPHIC SEMANTIC VALUES)
label(POLYMORPHIC)
Like bf(bison)(1), bic() by default uses tt(int) semantic values, and also
supports the tt(%stype) and tt(%union) directives for using single-type or
traditional bf(C)-type unions as semantic values. These types of semantic
values are covered in bic()'s manual.
In addition, the tt(%polymorphic) directive can be specified to generate a
parser using `polymorphic' semantic values. In this case semantic values are
specified as pairs, consisting of em(tags) (which are bf(C++) identifiers),
and bf(C++) (pointer or value) type names. Tags and type names are separated
by colons. Multiple tag and type name combinations are separated by
semicolons, and an optional semicolon ends the final tag/type pair: type name
specifications end at a semicolon or at a percent character (indicating the
beginning of the next directive). End of line comment and standard bf(C)
comment may also be used, but both are ignored.
Here is an example, defining three semantic values: an tt(int), a
tt(std::string) and a tt(std::vector<double>):
verb(
%polymorphic INT: int; STRING: std::string;
VECT: std::vector<double>
)
The identifier to the left of the colon is called the em(tag-identifier)
(or simply em(tag)), and the type name to the right of the colon is called the
em(type-name). Starting with bic() version 4.12.00 the types no longer have to
provide default constructors.
When polymorphic type-names refer to types that have not yet been declared
by the parser's base class header, then these types must be (directly or
indirectly) declared in a header file whose location is specified using the
tt(%baseclass-preinclude) directive.
tt(%type) directives are used to associate (non-)terminals with semantic
value types. E.g., after:
verb(
%polymorphic INT: int; TEXT: std::string
%type <INT> expr
)
the tt(expr) nonterminal returns tt(int) semantic values. In a
rule like:
verb(
expr:
expr '+' expr
{
// Action block: C++ statements here.
}
)
symbols tt($$, $1,) and tt($3) represent tt(int) values, and
can be used that way in the bf(C++) action block.
bf(Definitions and declarations)
The tt(%polymorphic) directive adds the following definitions and
declarations to the generated base class header and parser source
file (if the tt(%namespace) directive was used then all declared/defined
elements are placed inside the namespace that is specified by the
tt(%namespace) directive):
itemization(
it() All semantic value type identifiers are collected in a strongly typed
`tt(Tag_)' enumeration. E.g.,
verb(
enum class Tag_
{
INT,
STRING,
VECT
};
)
it() An anonymous tt(enum) defining the symbolic constant tt(sizeofTag_)
equal to the number of tags in the tt(Tag_) enumeration.
it() The namespace tt(Meta_) contains almost all of the code
implementing polymorphic values.
)
The namespace tt(Meta_) contains, among other classes the class tt(SType).
The parser's semantic value type tt(STYPE_) is equal to tt(Meta_::SType).
bf(STYPE_ equals Meta_::SType)
tt(Meta_::SType) provides the standard user interface for using polymorphic
semantic data types. It declares the following public interface:
includefile(../manual/grammar/stypeinterface)
manpagesection(DOLLAR NOTATIONS)
Inside action blocks dollar-notations can be used to retrieve and assign
values from/to the elements of production rules. Type directives are used to
associates dollar-notations with semantic types.
When tt(%stype) is specified (and with the default tt(int) semantic value
type) the following dollar-notations are available:
includefile(../manual/grammar/stypedollar.yo)
When tt(%union) is specified these dollar-notations are available:
includefile(../manual/grammar/uniondollar.yo)
When tt(%polymorphic) is specified these dollar-notations can be used:
includefile(../manual/grammar/polydollar.yo)
manpagesection(RESTRICTIONS ON TOKEN NAMES)
To avoid collisions with names defined by the parser's (base) class, the
following identifiers should not be used as token names:
itemization(
it() Identifiers ending in an underscore;
it() Any of the following identifiers: tt(ABORT, ACCEPT, ERROR, clearin,
debug), or tt(setDebug).
)
manpagesection(OBSOLETE SYMBOLS)
All bf(DECLARATIONS) and bf(DEFINE) symbols not listed above but defined
in bf(bison++) are obsolete with bic(). In particular, there is no tt(%header{
... %}) section anymore. Also, all bf(DEFINE) symbols related to member
functions are now obsolete. There is no need for these symbols anymore as they
can simply be declared in the class header file and defined elsewhere.
manpagesection(USING SYMBOLIC TOKENS IN CLASSES OTHER THAN THE PARSER CLASS)
The tokens defined in the grammar files processed by bic() must usually
also be available to the lexical scanner, returning those tokens when certain
regular expressions are matched. E.g., a tt(NUMBER) token may be used in the
grammar and the lexical scanner may be expected to return that token when the
input matches the tt([0-9]+) regular expression. To avoid circular
dependencies among classes the tokens can be written to a separate file using
the tt(token-path) directive or option. The location and name of this file is
specified by the tt(token-path) specification, and is generated from scratch
at every run of bic(). By default the grammar's symbolic tokens are made
available in the class tt(Tokens), and classes may refer to its tokens using
the tt(Tokens) class scope (e.g., tt(Tokens::NUMBER)).
Before bic() version 6.04.00 tokens were made available by including the
file tt(parserbase.h), using a simple tt(#define) suggesting that the tokens
were in fact defined by the parser class itself. Using this scheme lexical
scanner specifications returned, e.g., tt(Parser::NUMBER) when tt([0-9]+) was
matched. Unless the tt(token-path) directive or option is used this approach
is still available, but its use is deprecated.
manpagesection(EXAMPLE)
Using a fairly traditional example, we construct a simple calculator
below. The basic operators as well as parentheses can be used to specify
expressions, and each expression should be terminated by a newline. The
program terminates when a tt(q) is entered. Empty lines result in a mere
prompt.
First an associated grammar is constructed. When a syntactic error is
encountered all tokens are skipped until then next newline and a simple
message is printed using the default tt(error) function. It is assumed that no
semantic errors occur (in particular, no divisions by zero). The grammar is
decorated with actions performed when the corresponding grammatical production
rule is recognized. The grammar itself is rather standard and straightforward,
but note the first part of the specification file, containing various other
directives, among which the tt(%scanner) directive, resulting in a composed
tt(d_scanner) object as well as an implementation of the member function
tt(int lex), and the tt(%token-path) directive, defining the tt(class Tokens)
in he file tt(../scanner/tokens.h). In this example, the tt(Scanner) class is
generated by bf(flexc++)(1). The details of constructing a class using
tt(flexc++) is beyond the scope of this man-page, but tt(flexc++'s)
specification file is shown below.
Here is bf(bisonc++)'s input file:
verbinclude(calculator/parser/grammar)
Bic() processes this file, generating the following files:
itemization(
it() The parser's base class, which should not be modified by the
programmer:
verbinclude(calculator/parser/parserbase.h)
it() The parser class tt(parser.h) itself. In the grammar
specification various member functions are used (e.g., tt(done)) and
tt(prompt). These functions are so small that they can very well be
implemented inline. Note that tt(done) calls tt(ACCEPT) to terminate
further parsing. tt(ACCEPT) and related members (e.g., tt(ABORT)) can be
called from any member called by tt(parse). As a consequence, action blocks
could contain mere function calls, rather than several statements, thus
minimizing the need to rerun bic() when an action is modified.
Once bic() has created tt(parser.h) additionally required members can be
added to it (bic() itself won't modify tt(parser.h) anymore once it is
created), resulting in the following final version:
verbinclude(calculator/parser/parser.h)
it() The file tt(../tokens/tokens.h) is generated because of the
tt(%token-path) directive. To avoid circular dependencies the tokens
are made available in a separate file, allowing classes used by the
parser to use the grammar's tokens as well. Here is the file
specifying the grammar's tokens:
verbinclude(calculator/tokens/tokens.h)
)
For the program no additional members had to be defined in the class
tt(Parser). The member function tt(parse) is defined by bic() in the source
file tt(parse.cc), and it includes tt(parser.ih).
As tt(cerr) is used in the grammar's actions, a tt(using namespace std) or
comparable directive is required. It is specified in tt(parser.ih). Here is
the implementation header declaring the standard namespace:
verbinclude(calculator/parser/parser.ih)
In the current context the member function tt(parse's) implementation is not
very relevant (it should not be modified by the programmer anyway). It is not
shown here, but is available as tt(calculator/parser/parse.cc) in the
distribution's tt(demos/) directory after building the calculator using the
there provided tt(build) script.
The lexical scanner is generated by bf(flexc++)(1) from the following
specification file, using the command tt(flexc++ lexer):
verbinclude(calculator/scanner/lexer)
Finally, here is the program's tt(main) function:
verbinclude(calculator/main.cc)
manpagesection(SEE ALSO)
DEFINESYMBOL(manalso)(bf(bisonc++)(1), bf(bisonc++api)(3))
includefile(seealso.yo)
manpageauthor()
Frank B. Brokken (f.b.brokken@rug.nl).
|