1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235
|
=======================================================================
List of Implemented Fixes and Changes for Maintenance Releases of PCCTS
=======================================================================
DISCLAIMER
The software and these notes are provided "as is". They may include
typographical or technical errors and their authors disclaims all
liability of any kind or nature for damages due to error, fault,
defect, or deficiency regardless of cause. All warranties of any
kind, either express or implied, including, but not limited to, the
implied warranties of merchantability and fitness for a particular
purpose are disclaimed.
-------------------------------------------------------
Note: Items #153 to #1 are now in a separate file named
CHANGES_FROM_133_BEFORE_MR13.txt
-------------------------------------------------------
#234. (Changed in MR21) Implicit int for function return value
ATokenBuffer:bufferSize() did not specify a type for the
return value.
Reported by Hai Vo-Ba (hai@fc.hp.com).
#233. (Changed in MR20) Converted to MSVC 6.0
Due to external circumstances I have had to convert to MSVC 6.0
The MSVC 5.0 project files (.dsw and .dsp) have been retained as
xxx50.dsp and xxx50.dsw. The MSVC 6.0 files are named xxx60.dsp
and xxx60.dsw (where xxx is the related to the directory/project).
#232. (Changed in MR20) Make setwd bit vectors protected in parser.h
The access for the setwd array in the parser header was not
specified. As a result, it would depend on the code which
preceded it. In MR20 it will always have access "protected".
Reported by Piotr Eljasiak (eljasiak@zt.gdansk.tpsa.pl).
#231. (Changed in MR20) Error in token buffer debug code.
When token buffer debugging is selected via the pre-processor
symbol DEBUG_TOKENBUFFER there is an erroneous check in
AParser.cpp:
#ifdef DEBUG_TOKENBUFFER
if (i >= inputTokens->bufferSize() ||
inputTokens->minTokens() < LLk ) /* MR20 Was "<=" */
...
#endif
Reported by David Wigg (wiggjd@sbu.ac.uk).
#230. (Changed in MR20) Fixed problem with #define for -gd option
There was an error in setting zzTRACE_RULES for the -gd (trace) option.
Reported by Gary Funck (gary@intrepid.com).
#229. (Changed in MR20) Additional "const" for literals
"const" was added to the token name literal table.
"const" was added to some panic() and similar routine
#228. (Changed in MR20) dlg crashes on "()"
The following token defintion will cause DLG to crash.
#token "()"
When there is a syntax error in a regular expression
many of the dlg routines return a structure which has
null pointers. When this is accessed by callers it
generates the crash.
I have attempted to fix the more common cases.
Reported by Mengue Olivier (dolmen@bigfoot.com).
#227. (Changed in MR20) Array overwrite
Steveh Hand (sassth@unx.sas.com) reported a problem which
was traced to a temporary array which was not properly
resized for deeply nested blocks. This has been fixed.
#226. (Changed in MR20) -pedantic conformance
G. Hobbelt (i_a@mbh.org) and THM made many, many minor
changes to create prototypes for all the functions and
bring antlr, dlg, and sorcerer into conformance with
the gcc -pedantic option.
This may require uses to add pccts/h/pcctscfg.h to some
files or makefiles in order to have __USE_PROTOS defined.
#225 (Changed in MR20) AST stack adjustment in C mode
The fix in #214 for AST stack adjustment in C mode missed
some cases.
Reported with fix by Ger Hobbelt (i_a@mbh.org).
#224 (Changed in MR20) LL(1) and LL(2) with #pragma approx
This may take a record for the oldest, most trival, lexical
error in pccts. The regular expressions for LL(1) and LL(2)
lacked an escape for the left and right parenthesis.
Reported by Ger Hobbelt (i_a@mbh.org).
#223 (Changed in MR20) Addition of IBM_VISUAL_AGE directory
Build files for antlr, dlg, and sorcerer under IBM Visual Age
have been contributed by Anton Sergeev (ags@mlc.ru). They have
been placed in the pccts/IBM_VISUAL_AGE directory.
#222 (Changed in MR20) Replace __STDC__ with __USE_PROTOS
Most occurrences of __STDC__ replaced with __USE_PROTOS due to
complaints from several users.
#221 (Changed in MR20) Added #include for DLexerBase.h to PBlackBox.
Added #include for DLexerBase.h to PBlackBox.
#220 (Changed in MR19) strcat arguments reversed in #pred parse
The arguments to strcat are reversed when creating a print
name for a hash table entry for use with #pred feature.
Problem diagnosed and fix reported by Scott Harrington
(seh4@ix.netcom.com).
#219. (Changed in MR19) C Mode routine zzfree_ast
Changes to reduce use of recursion for AST trees with only right
links or only left links in the C mode routine zzfree_ast.
Implemented by SAKAI Kiyotaka (ksakai@isr.co.jp).
#218. (Changed in MR19) Changes to support unsigned char in C mode
Changes to antlr.h and err.h to fix omissions in use of zzchar_t
Implemented by SAKAI Kiyotaka (ksakai@isr.co.jp).
#217. (Changed in MR19) Error message when dlg -i and -CC options selected
The parsers generated by pccts in C++ mode are not able to support the
interactive lexer option (except, perhaps, when using the deferred fetch
parser option.(Item #216).
DLG now warns when both -i and -CC are selected.
This warning was suggested by David Venditti (07751870267-0001@t-online.de).
#216. (Changed in MR19) Defer token fetch for C++ mode
Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)
Normally, pccts keeps the lookahead token buffer completely filled.
This requires max(k,ck) tokens of lookahead. For some applications
this can cause deadlock problems. For example, there may be cases
when the parser can't tell when the input has been completely consumed
until the parse is complete, but the parse can't be completed because
the input routines are waiting for additional tokens to fill the
lookahead buffer.
When the ANTLRParser class is built with the pre-processor option
ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
until LA(i) or LT(i) is called.
To test whether this option has been built into the ANTLRParser class
use "isDeferFetchEnabled()".
This is experimental. The interaction with guess mode (syntactic
predicates)is not known.
#215. (Changed in MR19) Addition of reset() to DLGLexerBase
There was no obvious way to reset the lexer for reuse. The
reset() method now does this.
Suggested by David Venditti (07751870267-0001@t-online.de).
#214. (Changed in MR19) C mode: Adjust AST stack pointer at exit
In C mode the AST stack pointer needs to be reset if there will
be multiple calls to the ANTLRx macros.
Reported with fix by Paul D. Smith (psmith@baynetworks.com).
#213. (Changed in MR18) Fatal error with -mrhoistk (k>1 hoisting)
When rearranging code I forgot to un-comment a critical line of
code that handles hoisting of predicates with k>1 lookahead. This
is now fixed.
Reported by Reinier van den Born (reinier@vnet.ibm.com).
#212. (Changed in MR17) Mac related changes by Kenji Tanaka
Kenji Tanaka (kentar@osa.att.ne.jp) has made a number of changes for
Macintosh users.
a. The following Macintosh MPW files aid in installing pccts on Mac:
pccts/MPW_Read_Me
pccts/install68K.mpw
pccts/installPPC.mpw
pccts/antlr/antlr.r
pccts/antlr/antlr68K.make
pccts/antlr/antlrPPC.make
pccts/dlg/dlg.r
pccts/dlg/dlg68K.make
pccts/dlg/dlgPPC.make
pccts/sorcerer/sor.r
pccts/sorcerer/sor68K.make
pccts/sorcerer/sorPPC.make
They completely replace the previous Mac installation files.
b. The most significant is a change in the MAC_FILE_CREATOR symbol
in pcctscfg.h:
old: #define MAC_FILE_CREATOR 'MMCC' /* Metrowerks C/C++ Text files */
new: #define MAC_FILE_CREATOR 'CWIE' /* Metrowerks C/C++ Text files */
c. Added calls to special_fopen_actions() where necessary.
#211. (Changed in MR16a) C++ style comment in dlg
This has been fixed.
#210. (Changed in MR16a) Sor accepts \r\n, \r, or \n for end-of-line
A user requested that Sorcerer be changed to accept other forms
of end-of-line.
#209. (Changed in MR16) Name of files changed.
Old: CHANGES_FROM_1.33
New: CHANGES_FROM_133.txt
Old: KNOWN_PROBLEMS
New: KNOWN_PROBLEMS.txt
#208. (Changed in MR16) Change in use of pccts #include files
There were problems with MS DevStudio when mixing Sorcerer and
PCCTS in the same source file. The problem is caused by the
redefinition of setjmp in the MS header file setjmp.h. In
setjmp.h the pre-processor symbol setjmp was redefined to be
_setjmp. A later effort to execute #include <setjmp.h> resulted
in an effort to #include <_setjmp.h>. I'm not sure whether this
is a bug or a feature. In any case, I decided to fix it by
avoiding the use of pre-processor symbols in #include statements
altogether. This has the added benefit of making pre-compiled
headers work again.
I've replaced statements:
old: #include PCCTS_SETJMP_H
new: #include "pccts_setjmp.h"
Where pccts_setjmp.h contains:
#ifndef __PCCTS_SETJMP_H__
#define __PCCTS_SETJMP_H__
#ifdef PCCTS_USE_NAMESPACE_STD
#include <Csetjmp>
#else
#include <setjmp.h>
#endif
#endif
A similar change has been made for other standard header files
required by pccts and sorcerer: stdlib.h, stdarg.h, stdio.h, etc.
Reported by Jeff Vincent (JVincent@novell.com) and Dale Davis
(DalDavis@spectrace.com).
#207. (Changed in MR16) dlg reports an invalid range for: [\0x00-\0xff]
dlg will report that this is an invalid range.
Diagnosed by Piotr Eljasiak (eljasiak@no-spam.zt.gdansk.tpsa.pl):
I think this problem is not specific to unsigned chars
because dlg reports no error for the range [\0x00-\0xfe].
I've found that information on range is kept in field
letter (unsigned char) of Attrib struct. Unfortunately
the letter value internally is for some reasons increased
by 1, so \0xff is represented here as 0.
That's why dlg complains about the range [\0x00-\0xff] in
dlg_p.g:
if ($$.letter > $2.letter) {
error("invalid range ", zzline);
}
The fix is:
if ($$.letter > $2.letter && 255 != $$2.letter) {
error("invalid range ", zzline);
}
#206. (Changed in MR16) Free zzFAILtext in ANTLRParser destructor
The ANTLRParser destructor now frees zzFAILtext.
Problem and fix reported by Manfred Kogler (km@cast.uni-linz.ac.at).
#205. (Changed in MR16) DLGStringReset argument now const
Changed: void DLGStringReset(DLGChar *s) {...}
To: void DLGStringReset(const DLGChar *s) {...}
Suggested by Dale Davis (daldavis@spectrace.com)
#204. (Changed in MR15a) Change __WATCOM__ to __WATCOMC__ in pcctscfg.h
Reported by Oleg Dashevskii (olegdash@my-dejanews.com).
#203. (Changed in MR15) Addition of sorcerer to distribution kit
I have finally caved in to popular demand. The pccts 1.33mr15
kit will include sorcerer. The separate sorcerer kit will be
discontinued.
#202. (Changed) in MR15) Organization of MS Dev Studio Projects in Kit
Previously there was one workspace that contained projects for
all three parts of pccts: antlr, dlg, and sorcerer. Now each
part (and directory) has its own workspace/project and there
is an additional workspace/project to build a library from the
.cpp files in the pccts/h directory.
The library build will create pccts_debug.lib or pccts_release.lib
according to the configuration selected.
If you don't want to build pccts 1.33MR15 you can download a
ready-to-run kit for win32 from http://www.polhode.com/win32.zip.
The ready-to-run for win32 includes executables, a pre-built static
library for the .cpp files in the pccts/h directory, and a sample
application
You will need to define the environment variable PCCTS to point to
the root of the pccts directory hierarchy.
#201. (Changed in MR15) Several fixes by K.J. Cummings (cummings@peritus.com)
Generation of SETJMP rather than SETJMP_H in gen.c.
(Sor B19) Declaration of ref_vars_inits for ref_var_inits in
pccts/sorcerer/sorcerer.h.
#200. (Changed in MR15) Remove operator=() in AToken.h
User reported that WatCom couldn't handle use of
explicit operator =(). Replace with equivalent
using cast operator.
#199. (Changed in MR15) Don't allow use of empty #tokclass
Change antlr.g to disallow empty #tokclass sets.
Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
#198. Revised ANSI C grammar due to efforts by Manuel Kessler
Manuel Kessler (mlkessler@cip.physik.uni-wuerzburg.de)
Allow trailing ... in function parameter lists.
Add bit fields.
Allow old-style function declarations.
Support cv-qualified pointers.
Better checking of combinations of type specifiers.
Release of memory for local symbols on scope exit.
Allow input file name on command line as well as by redirection.
and other miscellaneous tweaks.
This is not part of the pccts distribution kit. It must be
downloaded separately from:
http://www.polhode.com/ansi_mr15.zip
#197. (Changed in MR14) Resetting the lookahead buffer of the parser
Explanation and fix by Sinan Karasu (sinan.karasu@boeing.com)
Consider the code used to prime the lookahead buffer LA(i)
of the parser when init() is called:
void
ANTLRParser::
prime_lookahead()
{
int i;
for(i=1;i<=LLk; i++) consume();
dirty=0;
//lap = 0; // MR14 - Sinan Karasu (sinan.karusu@boeing.com)
//labase = 0; // MR14
labase=lap; // MR14
}
When the parser is instantiated, lap=0,labase=0 is set.
The "for" loop runs LLk times. In consume(), lap = lap +1 (mod LLk) is
computed. Therefore, lap(before the loop) == lap (after the loop).
Now the only problem comes in when one does an init() of the parser
after an Eof has been seen. At that time, lap could be non zero.
Assume it was lap==1. Now we do a prime_lookahead(). If LLk is 2,
then
consume()
{
NLA = inputTokens->getToken()->getType();
dirty--;
lap = (lap+1)&(LLk-1);
}
or expanding NLA,
token_type[lap&(LLk-1)]) = inputTokens->getToken()->getType();
dirty--;
lap = (lap+1)&(LLk-1);
so now we prime locations 1 and 2. In prime_lookahead it used to set
lap=0 and labase=0. Now, the next token will be read from location 0,
NOT 1 as it should have been.
This was never caught before, because if a parser is just instantiated,
then lap and labase are 0, the offending assignment lines are
basically no-ops, since the for loop wraps around back to 0.
#196. (Changed in MR14) Problems with "(alpha)? beta" guess
Consider the following syntactic predicate in a grammar
with 2 tokens of lookahead (k=2 or ck=2):
rule : ( alpha )? beta ;
alpha : S t ;
t : T U
| T
;
beta : S t Z ;
When antlr computes the prediction expression with one token
of lookahead for alts 1 and 2 of rule t it finds an ambiguity.
Because the grammar has a lookahead of 2 it tries to compute
two tokens of lookahead for alts 1 and 2 of t. Alt 1 clearly
has a lookahead of (T U). Alt 2 is one token long so antlr
tries to compute the follow set of alt 2, which means finding
the things which can follow rule t in the context of (alpha)?.
This cannot be computed, because alpha is only part of a rule,
and antlr can't tell what part of beta is matched by alpha and
what part remains to be matched. Thus it impossible for antlr
to properly determine the follow set of rule t.
Prior to 1.33MR14, the follow of (alpha)? was computed as
FIRST(beta) as a result of the internal representation of
guess blocks.
With MR14 the follow set will be the empty set for that context.
Normally, one expects a rule appearing in a guess block to also
appear elsewhere. When the follow context for this other use
is "ored" with the empty set, the context from the other use
results, and a reasonable follow context results. However if
there is *no* other use of the rule, or it is used in a different
manner then the follow context will be inaccurate - it was
inaccurate even before MR14, but it will be inaccurate in a
different way.
For the example given earlier, a reasonable way to rewrite the
grammar:
rule : ( alpha )? beta
alpha : S t ;
t : T U
| T
;
beta : alpha Z ;
If there are no other uses of the rule appearing in the guess
block it will generate a test for EOF - a workaround for
representing a null set in the lookahead tests.
If you encounter such a problem you can use the -alpha option
to get additional information:
line 2: error: not possible to compute follow set for alpha
in an "(alpha)? beta" block.
With the antlr -alpha command line option the following information
is inserted into the generated file:
#if 0
Trace of references leading to attempt to compute the follow set of
alpha in an "(alpha)? beta" block. It is not possible for antlr to
compute this follow set because it is not known what part of beta has
already been matched by alpha and what part remains to be matched.
Rules which make use of the incorrect follow set will also be incorrect
1 #token T alpha/2 line 7 brief.g
2 end alpha alpha/3 line 8 brief.g
2 end (...)? block at start/1 line 2 brief.g
#endif
At the moment, with the -alpha option selected the program marks
any rules which appear in the trace back chain (above) as rules with
possible problems computing follow set.
Reported by Greg Knapen (gregory.knapen@bell.ca).
#195. (Changed in MR14) #line directive not at column 1
Under certain circunstances a predicate test could generate
a #line directive which was not at column 1.
Reported with fix by David Kgedal (davidk@lysator.liu.se)
(http://www.lysator.liu.se/~davidk/).
#194. (Changed in MR14) (C Mode only) Demand lookahead with #tokclass
In C mode with the demand lookahead option there is a bug in the
code which handles matches for #tokclass (zzsetmatch and
zzsetmatch_wsig).
The bug causes the lookahead pointer to get out of synchronization
with the current token pointer.
The problem was reported with a fix by Ger Hobbelt (hobbelt@axa.nl).
#193. (Changed in MR14) Use of PCCTS_USE_NAMESPACE_STD
The pcctscfg.h now contains the following definitions:
#ifdef PCCTS_USE_NAMESPACE_STD
#define PCCTS_STDIO_H <Cstdio>
#define PCCTS_STDLIB_H <Cstdlib>
#define PCCTS_STDARG_H <Cstdarg>
#define PCCTS_SETJMP_H <Csetjmp>
#define PCCTS_STRING_H <Cstring>
#define PCCTS_ASSERT_H <Cassert>
#define PCCTS_ISTREAM_H <istream>
#define PCCTS_IOSTREAM_H <iostream>
#define PCCTS_NAMESPACE_STD namespace std {}; using namespace std;
#else
#define PCCTS_STDIO_H <stdio.h>
#define PCCTS_STDLIB_H <stdlib.h>
#define PCCTS_STDARG_H <stdarg.h>
#define PCCTS_SETJMP_H <setjmp.h>
#define PCCTS_STRING_H <string.h>
#define PCCTS_ASSERT_H <assert.h>
#define PCCTS_ISTREAM_H <istream.h>
#define PCCTS_IOSTREAM_H <iostream.h>
#define PCCTS_NAMESPACE_STD
#endif
The runtime support in pccts/h uses these pre-processor symbols
consistently.
Also, antlr and dlg have been changed to generate code which uses
these pre-processor symbols rather than having the names of the
#include files hard-coded in the generated code.
This required the addition of "#include pcctscfg.h" to a number of
files in pccts/h.
It appears that this sometimes causes problems for MSVC 5 in
combination with the "automatic" option for pre-compiled headers.
In such cases disable the "automatic" pre-compiled headers option.
Suggested by Hubert Holin (Hubert.Holin@Bigfoot.com).
#192. (Changed in MR14) Change setText() to accept "const ANTLRChar *"
Changed ANTLRToken::setText(ANTLRChar *) to setText(const ANTLRChar *).
This allows literal strings to be used to initialize tokens. Since
the usual token implementation (ANTLRCommonToken) makes a copy of the
input string, this was an unnecessary limitation.
Suggested by Bob McWhirter (bob@netwrench.com).
#191. (Changed in MR14) HP/UX aCC compiler compatibility problem
Needed to explicitly declare zzINF_DEF_TOKEN_BUFFER_SIZE and
zzINF_BUFFER_TOKEN_CHUNK_SIZE as ints in pccts/h/AParser.cpp.
Reported by David Cook (dcook@bmc.com).
#190. (Changed in MR14) IBM OS/2 CSet compiler compatibility problem
Name conflict with "_cs" in pccts/h/ATokenBuffer.cpp
Reported by David Cook (dcook@bmc.com).
#189. (Changed in MR14) -gxt switch in C mode
The -gxt switch in C mode didn't work because of incorrect
initialization.
Reported by Sinan Karasu (sinan@boeing.com).
#188. (Changed in MR14) Added pccts/h/DLG_stream_input.h
This is a DLG stream class based on C++ istreams.
Contributed by Hubert Holin (Hubert.Holin@Bigfoot.com).
#187. (Changed in MR14) Rename config.h to pcctscfg.h
The PCCTS configuration file has been renamed from config.h to
pcctscfg.h. The problem with the original name is that it led
to name collisions when pccts parsers were combined with other
software.
All of the runtime support routines in pccts/h/* have been
changed to use the new name. Existing software can continue
to use pccts/h/config.h. The contents of pccts/h/config.h is
now just "#include "pcctscfg.h".
I don't have a record of the user who suggested this.
#186. (Changed in MR14) Pre-processor symbol DllExportPCCTS class modifier
Classes in the C++ runtime support routines are now declared:
class DllExportPCCTS className ....
By default, the pre-processor symbol is defined as the empty
string. This if for use by MSVC++ users to create DLL classes.
Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).
#185. (Changed in MR14) Option to not use PCCTS_AST base class for ASTBase
Normally, the ASTBase class is derived from PCCTS_AST which contains
functions useful to Sorcerer. If these are not necessary then the
user can define the pre-processor symbol "PCCTS_NOT_USING_SOR" which
will cause the ASTBase class to replace references to PCCTS_AST with
references to ASTBase where necessary.
The class ASTDoublyLinkedBase will contain a pure virtual function
shallowCopy() that was formerly defined in class PCCTS_AST.
Suggested by Bob McWhirter (bob@netwrench.com).
#184. (Changed in MR14) Grammars with no tokens generate invalid tokens.h
Reported by Hubert Holin (Hubert.Holin@bigfoot.com).
#183. (Changed in MR14) -f to specify file with names of grammar files
In DEC/VMS it is difficult to specify very long command lines.
The -f option allows one to place the names of the grammar files
in a data file in order to bypass limitations of the DEC/VMS
command language interpreter.
Addition supplied by Bernard Giroud (b_giroud@decus.ch).
#182. (Changed in MR14) Output directory option for DEC/VMS
Fix some problems with the -o option under DEC/VMS.
Fix supplied by Bernard Giroud (b_giroud@decus.ch).
#181. (Changed in MR14) Allow chars > 127 in DLGStringInput::nextChar()
Changed DLGStringInput to cast the character using (unsigned char)
so that languages with character codes greater than 127 work
without changes.
Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).
#180. (Added in MR14) ANTLRParser::getEofToken()
Added "ANTLRToken ANTLRParser::getEofToken() const" to match the
setEofToken routine.
Requested by Manfred Kogler (km@cast.uni-linz.ac.at).
#179. (Fixed in MR14) Memory leak for BufFileInput subclass of DLGInputStream
The BufFileInput class described in Item #142 neglected to release
the allocated buffer when an instance was destroyed.
Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
#178. (Fixed in MR14) Bug in "(alpha)? beta" guess blocks first sets
In 1.33 vanilla, and all maintenance releases prior to MR14
there is a bug in the handling of guess blocks which use the
"long" form:
(alpha)? beta
inside a (...)*, (...)+, or {...} block.
This problem does *not* apply to the case where beta is omitted
or when the syntactic predicate is on the leading edge of an
alternative.
The problem is that both alpha and beta are stored in the
syntax diagram, and that some analysis routines would fail
to skip the alpha portion when it was not on the leading edge.
Consider the following grammar with -ck 2:
r : ( (A)? B )* C D
| A B /* forces -ck 2 computation for old antlr */
/* reports ambig for alts 1 & 2 */
| B C /* forces -ck 2 computation for new antlr */
/* reports ambig for alts 1 & 3 */
;
The prediction expression for the first alternative should be
LA(1)={B C} LA(2)={B C D}, but previous versions of antlr
would compute the prediction expression as LA(1)={A C} LA(2)={B D}
Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu) who provided
a very clear example of the problem and identified the probable cause.
#177. (Changed in MR14) #tokdefs and #token with regular expression
In MR13 the change described by Item #162 caused an existing
feature of antlr to fail. Prior to the change it was possible
to give regular expression definitions and actions to tokens
which were defined via the #tokdefs directive.
This now works again.
Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
#176. (Changed in MR14) Support for #line in antlr source code
Note: this was implemented by Arpad Beszedes (beszedes@inf.u-szeged.hu).
In 1.33MR14 it is possible for a pre-processor to generate #line
directives in the antlr source and have those line numbers and file
names used in antlr error messages and in the #line directives
generated by antlr.
The #line directive may appear in the following forms:
#line ll "sss" xx xx ...
where ll represents a line number, "sss" represents the name of a file
enclosed in quotation marks, and xxx are arbitrary integers.
The following form (without "line") is not supported at the moment:
# ll "sss" xx xx ...
The result:
zzline
is replaced with ll from the # or #line directive
FileStr[CurFile]
is updated with the contents of the string (if any)
following the line number
Note
----
The file-name string following the line number can be a complete
name with a directory-path. Antlr generates the output files from
the input file name (by replacing the extension from the file-name
with .c or .cpp).
If the input file (or the file-name from the line-info) contains
a path:
"../grammar.g"
the generated source code will be placed in "../grammar.cpp" (i.e.
in the parent directory). This is inconvenient in some cases
(even the -o switch can not be used) so the path information is
removed from the #line directive. Thus, if the line-info was
#line 2 "../grammar.g"
then the current file-name will become "grammar.g"
In this way, the generated source code according to the grammar file
will always be in the current directory, except when the -o switch
is used.
#175. (Changed in MR14) Bug when guess block appears at start of (...)*
In 1.33 vanilla and all maintenance releases prior to 1.33MR14
there is a bug when a guess block appears at the start of a (...)+.
Consider the following k=1 (ck=1) grammar:
rule :
( (STAR)? ZIP )* ID ;
Prior to 1.33MR14, the generated code resembled:
...
zzGUESS_BLOCK
while ( 1 ) {
if ( ! LA(1)==STAR) break;
zzGUESS
if ( !zzrv ) {
zzmatch(STAR);
zzCONSUME;
zzGUESS_DONE
zzmatch(ZIP);
zzCONSUME;
...
Note that the routine uses STAR for the prediction expression
rather than ZIP. With 1.33MR14 the generated code resembles:
...
while ( 1 ) {
if ( ! LA(1)==ZIP) break;
...
This problem existed only with (...)* blocks and was caused
by the slightly more complicate graph which represents (...)*
blocks. This caused the analysis routine to compute the first
set for the alpha part of the "(alpha)? beta" rather than the
beta part.
Both (...)+ and {...} blocks handled the guess block correctly.
Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu) who provided
a very clear example of the problem and identified the probable cause.
#174. (Changed in MR14) Bug when action precedes syntactic predicate
In 1.33 vanilla, and all maintenance releases prior to MR14,
there was a bug when a syntactic predicate was immediately
preceded by an action. Consider the following -ck 2 grammar:
rule :
<<int i;>>
(alpha)? beta C
| A B
;
alpha : A ;
beta : A B;
Prior to MR14, the code generated for the first alternative
resembled:
...
zzGUESS
if ( !zzrv && LA(1)==A && LA(2)==A) {
alpha();
zzGUESS_DONE
beta();
zzmatch(C);
zzCONSUME;
} else {
...
The prediction expression (i.e. LA(1)==A && LA(2)==A) is clearly
wrong because LA(2) should be matched to B (first[2] of beta is {B}).
With 1.33MR14 the prediction expression is:
...
if ( !zzrv && LA(1)==A && LA(2)==B) {
alpha();
zzGUESS_DONE
beta();
zzmatch(C);
zzCONSUME;
} else {
...
This will only affect users in which alpha is shorter than
than max(k,ck) and there is an action immediately preceding
the syntactic predicate.
This problem was reported by reported by Arpad Beszedes
(beszedes@inf.u-szeged.hu) who provided a very clear example
of the problem and identified the presence of the init-action
as the likely culprit.
#173. (Changed in MR13a) -glms for Microsoft style filenames with -gl
With the -gl option antlr generates #line directives using the
exact name of the input files specified on the command line.
An oddity of the Microsoft C and C++ compilers is that they
don't accept file names in #line directives containing "\"
even though these are names from the native file system.
With -glms option, the "\" in file names appearing in #line
directives is replaced with a "/" in order to conform to
Microsoft compiler requirements.
Reported by Erwin Achermann (erwin.achermann@switzerland.org).
#172. (Changed in MR13) \r\n in antlr source counted as one line
Some MS software uses \r\n to indicate a new line. Antlr
now recognizes this in counting lines.
Reported by Edward L. Hepler (elh@ece.vill.edu).
#171. (Changed in MR13) #tokclass L..U now allowed
The following is now allowed:
#tokclass ABC { A..B C }
Reported by Dave Watola (dwatola@amtsun.jpl.nasa.gov)
#170. (Changed in MR13) Suppression for predicates with lookahead depth >1
In MR12 the capability for suppression of predicates with lookahead
depth=1 was introduced. With MR13 this had been extended to
predicates with lookahead depth > 1 and released for use by users
on an experimental basis.
Consider the following grammar with -ck 2 and the predicate in rule
"a" with depth 2:
r1 : (ab)* "@"
;
ab : a
| b
;
a : (A B)? => <<p(LATEXT(2))>>? A B C
;
b : A B C
;
Normally, the predicate would be hoisted into rule r1 in order to
determine whether to call rule "ab". However it should *not* be
hoisted because, even if p is false, there is a valid alternative
in rule b. With "-mrhoistk on" the predicate will be suppressed.
If "-info p" command line option is present the following information
will appear in the generated code:
while ( (LA(1)==A)
#if 0
Part (or all) of predicate with depth > 1 suppressed by alternative
without predicate
pred << p(LATEXT(2))>>?
depth=k=2 ("=>" guard) rule a line 8 t1.g
tree context:
(root = A
B
)
The token sequence which is suppressed: ( A B )
The sequence of references which generate that sequence of tokens:
1 to ab r1/1 line 1 t1.g
2 ab ab/1 line 4 t1.g
3 to b ab/2 line 5 t1.g
4 b b/1 line 11 t1.g
5 #token A b/1 line 11 t1.g
6 #token B b/1 line 11 t1.g
#endif
A slightly more complicated example:
r1 : (ab)* "@"
;
ab : a
| b
;
a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)
;
b : <<q(LATEXT(2))>>? D E
;
In this case, the sequence (D E) in rule "a" which lies behind
the guard is used to suppress the predicate with context (D E)
in rule b.
while ( (LA(1)==A || LA(1)==D)
#if 0
Part (or all) of predicate with depth > 1 suppressed by alternative
without predicate
pred << q(LATEXT(2))>>?
depth=k=2 rule b line 11 t2.g
tree context:
(root = D
E
)
The token sequence which is suppressed: ( D E )
The sequence of references which generate that sequence of tokens:
1 to ab r1/1 line 1 t2.g
2 ab ab/1 line 4 t2.g
3 to a ab/1 line 4 t2.g
4 a a/1 line 8 t2.g
5 #token D a/1 line 8 t2.g
6 #token E a/1 line 8 t2.g
#endif
&&
#if 0
pred << p(LATEXT(2))>>?
depth=k=2 ("=>" guard) rule a line 8 t2.g
tree context:
(root = A
B
)
#endif
(! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {
ab();
...
#169. (Changed in MR13) Predicate test optimization for depth=1 predicates
When the MR12 generated a test of a predicate which had depth 1
it would use the depth >1 routines, resulting in correct but
inefficient behavior. In MR13, a bit test is used.
#168. (Changed in MR13) Token expressions in context guards
The token expressions appearing in context guards such as:
(A B)? => <<test(LT(1))>>? someRule
are computed during an early phase of antlr processing. As
a result, prior to MR13, complex expressions such as:
~B
L..U
~L..U
TokClassName
~TokClassName
were not computed properly. This resulted in incorrect
context being computed for such expressions.
In MR13 these context guards are verified for proper semantics
in the initial phase and then re-evaluated after complex token
expressions have been computed in order to produce the correct
behavior.
Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu).
#167. (Changed in MR13) ~L..U
Prior to MR13, the complement of a token range was
not properly computed.
#166. (Changed in MR13) token expression L..U
The token U was represented as an unsigned char, restricting
the use of L..U to cases where U was assigned a token number
less than 256. This is corrected in MR13.
#165. (Changed in MR13) option -newAST
To create ASTs from an ANTLRTokenPtr antlr usually calls
"new AST(ANTLRTokenPtr)". This option generates a call
to "newAST(ANTLRTokenPtr)" instead. This allows a user
to define a parser member function to create an AST object.
Similar changes for ASTBase::tmake and ASTBase::link were not
thought necessary since they do not create AST object, only
use existing ones.
#164. (Changed in MR13) Unused variable _astp
For many compilations, we have lived with warnings about
the unused variable _astp. It turns out that this varible
can *never* be used because the code which references it was
commented out.
This investigation was sparked by a note from Erwin Achermann
(erwin.achermann@switzerland.org).
#163. (Changed in MR13) Incorrect makefiles for testcpp examples
All the examples in pccts/testcpp/* had incorrect definitions
in the makefiles for the symbol "CCC". Instead of CCC=CC they
had CC=$(CCC).
There was an additional problem in testcpp/1/test.g due to the
change in ANTLRToken::getText() to a const member function
(Item #137).
Reported by Maurice Mass (maas@cuci.nl).
#162. (Changed in MR13) Combining #token with #tokdefs
When it became possible to change the print-name of a
#token (Item #148) it became useful to give a #token
statement whose only purpose was to giving a print name
to the #token. Prior to this change this could not be
combined with the #tokdefs feature.
#161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
#160. (Changed in MR13) Omissions in list of names for remap.h
When a user selects the -gp option antlr creates a list
of macros in remap.h to rename some of the standard
antlr routines from zzXXX to userprefixXXX.
There were number of omissions from the remap.h name
list related to the new trace facility. This was reported,
along with a fix, by Bernie Solomon (bernard@ug.eds.com).
#159. (Changed in MR13) Violations of classic C rules
There were a number of violations of classic C style in
the distribution kit. This was reported, along with fixes,
by Bernie Solomon (bernard@ug.eds.com).
#158. (Changed in MR13) #header causes problem for pre-processors
A user who runs the C pre-processor on antlr source suggested
that another syntax be allowed. With MR13 such directives
such as #header, #pragma, etc. may be written as "\#header",
"\#pragma", etc. For escaping pre-processor directives inside
a #header use something like the following:
\#header
<<
\#include <stdio.h>
>>
#157. (Fixed in MR13) empty error sets for rules with infinite recursion
When the first set for a rule cannot be computed due to infinite
left recursion and it is the only alternative for a block then
the error set for the block would be empty. This would result
in a fatal error.
Reported by Darin Creason (creason@genedax.com)
#156. (Changed in MR13) DLGLexerBase::getToken() now public
#155. (Changed in MR13) Context behind predicates can suppress
With -mrhoist enabled the context behind a guarded predicate can
be used to suppress other predicates. Consider the following grammar:
r0 : (r1)+;
r1 : rp
| rq
;
rp : <<p LATEXT(1)>>? B ;
rq : (A)? => <<q LATEXT(1)>>? (A|B);
In earlier versions both predicates "p" and "q" would be hoisted into
rule r0. With MR12c predicate p is suppressed because the context which
follows predicate q includes "B" which can "cover" predicate "p". In
other words, in trying to decide in r0 whether to call r1, it doesn't
really matter whether p is false or true because, either way, there is
a valid choice within r1.
#154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
A common error, even among experienced pccts users, is to code
an init-action to inhibit hoisting rather than a leading action.
An init-action does not inhibit hoisting.
This was coded:
rule1 : <<;>> rule2
This is what was meant:
rule1 : <<;>> <<;>> rule2
With MR13, the user can code:
rule1 : <<;>> <<nohoist>> rule2
The following will give an error message:
rule1 : <<nohoist>> rule2
If the <<nohoist>> appears as an init-action rather than a leading
action an error message is issued. The meaning of an init-action
containing "nohoist" is unclear: does it apply to just one
alternative or to all alternatives ?
-------------------------------------------------------
Note: Items #153 to #1 are now in a separate file named
CHANGES_FROM_133_BEFORE_MR13.txt
-------------------------------------------------------
|