File: lang-support-guide.texi

package info (click to toggle)
xemacs21-packages 2009.02.17.dfsg.2-4
  • links: PTS
  • area: main
  • in suites: bullseye, buster, sid, stretch
  • size: 116,276 kB
  • ctags: 89,333
  • sloc: lisp: 1,232,060; ansic: 16,570; java: 13,514; xml: 6,477; sh: 4,617; makefile: 4,022; asm: 3,007; perl: 840; cpp: 500; ruby: 257; csh: 96; haskell: 93; awk: 49; python: 47
file content (1643 lines) | stat: -rw-r--r-- 56,405 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
\input texinfo  @c -*-texinfo-*-
@c %**start of header
@setfilename semantic-langdev.info
@set TITLE  Language Support Developer's Guide
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}

@c *************************************************************************
@c @ Header
@c *************************************************************************

@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp

@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header

@copying
This manual documents Application Development with Semantic.

Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007 Eric M. Ludlam
Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce
Copyright @copyright{} 2002, 2003 Richard Y. Kim

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being list their titles, with the Front-Cover Texts
being list, and with the Back-Cover Texts being list.  A copy of the
license is included in the section entitled ``GNU Free Documentation
License''.
@end quotation
@end copying

@ifinfo
@dircategory Emacs
@direntry
* Semantic Language Writer's guide: (semantic-langdev).
@end direntry
@end ifinfo

@iftex
@finalout
@end iftex

@c @setchapternewpage odd
@c @setchapternewpage off

@ifinfo
This file documents Language Support Development with Semantic.
@emph{Infrastructure for parser based text analysis in Emacs}

Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
@end ifinfo

@titlepage
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@vskip 0pt plus 1 fill
Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
@page
@vskip 0pt plus 1 fill
@insertcopying
@end titlepage
@page

@c MACRO inclusion
@include semanticheader.texi


@c *************************************************************************
@c @ Document
@c *************************************************************************
@contents

@node top
@top @value{TITLE}

Semantic is bundled with support for several languages such as
C, C++, Java, Python, etc.
However one of the primary gols of semantic is to provide a framework
in which anyone can add support for other languages easily.
In order to support a new lanaugage, one typically has to provide a
lexer and a parser along with appropriate semantic actions that
produce the end result of the parser - the semantic tags.

This chapter first discusses the semantic tag data structure to
familiarize the reader to the goal.  Then all the components necessary
for supporting a lanaugage is discussed starting with the writing
lexer, writing the parser, writing semantic rules, etc.
Finally several parsers bundled with semantic are discussed as case
studies.

@menu
* Tag Structure::               
* Language Support Overview::   
* Writing Lexers::              
* Writing Parsers::             
* Parsing a language file::     
* Debugging::                   
* Parser Error Handling::       
* GNU Free Documentation License::  
* Index::                       
@end menu

@node Tag Structure
@chapter Tag Structure
@cindex Tag Structure

The end result of the parser for a buffer is a list of @i{tags}.
Currently each tag is a list with up to five elements:
@example
("NAME" CLASS ATTRIBUTES PROPERTIES OVERLAY)
@end example

@var{CLASS} represents what kind of tag this is.  Common @var{CLASS}
values include @code{variable}, @code{function}, or @code{type}.
@inforef{Tag Basics, , semantic-appdev.info}.

@var{ATTRIBUTES} is a slot filled with langauge specific options for
the tag.  Function arguments, return type, and other flags all are
stored in attributes.  A language author fills in the ATTRIBUTES with
the tag constructor, which is parser style dependant.

@var{PROPERTIES} is a slot generated by the semantic parser harness,
and need not be provided by a language author.  Programmatically
access tag properties with @code{semantic--tag-put-property},
@code{semantic--tag-put-property-no-side-effect} and
@code{semantic--tag-get-property}.

@var{OVERLAY} represents positional information for this tag.  It is
automatically generated by the semantic parser harness, and need not
be provided by the language author, unless they provide a tag
expansion function via @code{semantic-tag-expand-function}.

The @var{OVERLAY} property is accessed via several functions returning
the beginning, end, and buffer of a token.  Use these functions unless
the overlay is really needed (see @inforef{Tag Query, , app-dev-guide}).
Depending on the
overlay in a program can be dangerous because sometimes the overlay is
replaced with an integer pair
@example
[ START END ]
@end example
when the buffer the tag belongs to is not in memory.  This happens
when a user has activated the Semantic Database 
@inforef{semanticdb, , semantic-appdev}.

To create tags for a functional or object oriented language, you can
use s series of tag creation functions.  @inforef{Creating Tags, , semantic-appdev}

@node Language Support Overview
@chapter Language Support Overview
@cindex Language Support Overview

Starting with version 2.0, @semantic{} provides many ways to add support
for a language into the @semantic{} framework.

The primary means to customize how @semantic{} works is to implement
language specific versions of @i{overloadable} functions.  Semantic
has a specialized mode bound way to do this.
@ref{Semantic Overload Mechanism}.

The parser has several parts which are all also overloadable.  The
primary entry point into the parser is
@code{semantic-fetch-tags} which calls
@code{semantic-parse-region} which returns a list of semantic tags
which get set to @code{semantic--buffer-cache}.

@code{semantic-parse-region} is the first ``overloadable'' function.
The default behavior of this is to simply call @code{semantic-lex},
then pass the lexical token list to
@code{semantic-repeat-parse-whole-stream}.  At each stage, another
more focused layer provides a means of overloading.

The parser is not the only layer that provides overloadable methods.
Application APIs @inforef{top, , semantic-appdev} provide many
overload functions as well.

@menu
* Semantic Overload Mechanism::  
* Semantic Parser Structure::   
* Application API Structure::   
@end menu

@node Semantic Overload Mechanism
@section Semantic Overload Mechanism

one of @semantic{}'s goals is to provide a framework for supporting a
wide range of languages.
writing parsers for some languages are very simple, e.g.,
any dialect of lisp family such as emacs-lisp and scheme.
parsers for many languages can be written in context free grammars
such as c, java, python, etc.
on the other hand, it is impossible to specify context free grammars
for other languages such as texinfo.
Yet @semantic{} already provides parsers for all these languages.

In order to support such wide range of languages,
a mechanism for customizing the parser engine at many levels
was needed to maximize the code reuse yet give each programmer
the flexibility of customizing the parser engine at many levels
of granularity.
@cindex function overloading
@cindex overloading, function
The solution that @semantic{} provides is the
@i{function overloading} mechanism which
allows one to intercept and customize the behavior
of many of the functions in the parser engine.
First the parser engine breaks down the task of parsing a language into
several steps.
Each step is represented by an Emacs-Lisp function.
Some of these are
@code{semantic-parse-region},
@code{semantic-lex},
@code{semantic-parse-stream},
@code{semantic-parse-changes},
etc.

Many built-in @semantic{} functions are declared
as being @i{over-loadable} functions, i.e., functions that do
reasonable things for most languages, but can be
customized to suit the particular needs of a given language.
All @i{over-loadable} functions then can easily be @i{over-ridden}
if necessary.
The rest of this section provides details on this @i{overloading mechanism}.

Over-loadable functions are created by defining functions
with the @code{define-overload} macro rather than the usual @code{defun}.
@code{define-overload} is a thin wrapper around @code{defun}
that sets up the function so that it can be overloaded.
An @i{over-loadable} function then can be @i{over-ridden}
in one of two ways:
@code{define-mode-overload-implementation}
and
@code{semantic-install-function-overrides}.

Let's look at a couple of examples.
@code{semantic-parse-region} is one of the top level functions
in the parser engine defined via @code{define-overload}:

@example
(define-overload semantic-parse-region
  (start end &optional nonterminal depth returnonerror)
  "Parse the area between START and END, and return any tokens found.

...
  
tokens.")
@end example

The document string was truncated in the middle above since it is not
relevant here.
The macro invocation above defines the @code{semantic-parse-region}
Emacs-Lisp function that checks first if there is an overloaded
implementation.
If one is found, then that is called.
If a mode specific implementation is not found, then the default
implementation is called which in this case is to call
@code{semantic-parse-region-default}, i.e.,
a function with the same name but with the tailing @i{-default}.
That function needs to be written separately and take the same
arguments as the entry created with @code{define-overload}.

One way to overload @code{semantic-parse-region} is via
@code{semantic-install-function-overrides}.
An example from @file{semantic-texi.el} file is shown below:

@example
(defun semantic-default-texi-setup ()
  "Set up a buffer for parsing of Texinfo files."
  ;; This will use our parser.
  (semantic-install-function-overrides
   '((parse-region . semantic-texi-parse-region)
     (parse-changes . semantic-texi-parse-changes)))
  ...
  )

(add-hook 'texinfo-mode-hook 'semantic-default-texi-setup)
@end example

Above function is called whenever a buffer is setup as texinfo mode.
@code{semantic-install-function-overrides} above indicates that
@code{semantic-texi-parse-region} is to over-ride the default
implementation of @code{semantic-parse-region}.
Note the use of @code{parse-region} symbol which is
@code{semantic-parse-region} without the leading @i{semantic-} prefix.

Another way to over-ride a built-in @semantic{} function is via
@code{define-mode-overload-implementation}.
An example from @file{wisent-python.el} file is shown below.

@example
(define-mode-overload-implementation
  semantic-parse-region python-mode
  (start end &optional nonterminal depth returnonerror)
  "Over-ride in order to initialize some variables."
  (let ((wisent-python-lexer-indent-stack '(0))
        (wisent-python-explicit-line-continuation nil))
    (semantic-parse-region-default
     start end nonterminal depth returnonerror)))
@end example

Above over-rides @code{semantic-parse-region} so that for 
buffers whose major mode is @code{python-mode},
the code specified above is executed rather than the
default implementation.

@subsection Why not to use advice

One may wonder why @semantic defines an overload mechanism when
Emacs already has advice.  @xref{(elisp)Advising Functions}.

Advising is generally considered a mechanism of last resort when
modifying or hooking into an existing package without modifying
that sourse file.  Overload files advertise that they @i{should} be
overloaded, and define syntactic sugar to do so.

@node Semantic Parser Structure
@section Semantic Parser Structure

NOTE: describe the functions that do parsing, and how to overload each.

@ignore
semantic-fetch-tags is the top level function that parses the current buffer.
  semantic-parse-changes
    semantic-parse-changes-default
      semantic-edits-incremental-parser
  semantic-parse-region (overloadable)
    semantic-parse-region-default
      semantic-lex (overloadable)
        *semantic-lex-analyzer
          semantic-flex
      semantic-repeat-parse-whole-stream
        semantic-parse-stream (overloadable)
          semantic-parse-stream-default
            semantic-bovinate-stream (bovine)
          wisent-parse-stream      (wisent)
    semantic-texi-parse-region
@end ignore

@example

@ignore
semantic-post-change-major-mode-function
semantic-parser-name

semantic-toplevel-bovine-table (see semantic-active-p)
  semantic-bovinate-stream
    semantic-toplevel-bovine-table
      semantic-parse-region
        semantic-parse-region-default
          semantic-lex
          semantic-repeat-parse-whole-stream

semantic-init-db-hooks semanticdb-semantic-init-hook-fcn
semantic-init-hooks semantic-auto-parse-mode

semantic-flex-keywords-obarray (see semantic-bnf-keyword-table)
  Used by semantic-lex-keyword-symbol, semantic-lex-keyword-set,
  semantic-lex-map-keywords, semantic-flex
semantic-lex-types-obarray

* To support a new language, one must write a set of Emacs-Lisp
  functions that converts any valid text written in that language
  into a list of semantic tokens.  Typically this task is divided into two
  areas: a lexer and a parser.
* There are many ways of doing this.  However in almost all cases, two
* Parser converts
wisent parsers
bovine parsers
custom parsers
@end ignore


@end example

@node Application API Structure
@section Application API Structure

NOTE: improve this:

How to program with the Application programming API into the data
structures created by @semantic are in the Application development
guide. Read that guide to get a feel for the specifics of what you can
customize. @inforef{top, , semantic-appdev}

Here are a list of applications, and the specific APIs that you will
need to overload to make them work properly with your language.

@table @code
@item imenu
@itemx speedbar
@itemx ecb
These tools requires that the @code{semantic-format} methods create
correct strings.
@inforef{Format Tag, ,semantic-addpev}
@item semantic-analyze
The analysis tool requires that the @code{semanticdb} tool is active,
and that the searching methods are overloaded.  In addition,
@code{semanticdb} system database could be written to provide symbols
from the global environment of your langauge.
@inforef{System Databases, , semantic-appdev}

In addition, the analyzer requires that the @code{semantic-ctxt}
methods are overloaded.  These methods allow the analyzer to look at
the context of the cursor in your language, and predict the type of
location of the cursor. @inforef{Derived Context, , semantic-appdev}.
@item semantic-idle-summary-mode
@itemx semantic-idle-completions-mode
These tools use the semantic analysis tool.
@inforef{Context Analysis. . semantic-appdev}
@end table

@menu
* Semantic Analyzer Support::   
@end menu

@node Semantic Analyzer Support
@subsection Semantic Analyzer Support

@ignore
>> From and Email I sent to get David started on supporting the analyzer

  First, context parsing needs to work.  This includes
`semantic-ctxt-current-symbol', `-function', `-assignment'.  You also
need `semantic-get-local-arguments' and -local-variables'.

  The next most critical piece is to provide implementations of the
semanticdb-find search path calculation API.
`semanticdb-find-table-for-include' is a good start.  That really
should use `semantic-dependency-tag-file', but that doesn't use
semanticdb-project-root when looking for files.  Java could be
trouble here since you can import a *.

  A couple more good ones is `semanticdb-find-translate-path-brutish'
and `semanticdb-find-translate-path-includes'.  Brutish searches look
at everything in the current project.  The include path will scan
only those items explicitly included into your file.

  Last but not least, for Java, we need a semanticdb back end that
will provide tags out of a jar file.  Since most objects inherit from
a system library (like Object), you will need this to get the tag
list including `clone' and the like.
@end ignore

@node Writing Lexers
@chapter Writing Lexers
@cindex Writing Lexers

@ignore
Are we going to support semantic-flex as well as the new lexer?

Not in the doc - Eric
@end ignore

In order to reduce a source file into a tag table, it must first be
converted into a token stream.  Tokens are syntactic elements such as
whitespace, symbols, strings, lists, and punctuation.

The lexer uses the major-mode's syntax table for conversion.
@xref{Syntax Tables,,,elisp}.
As long as that is set up correctly (along with the important
@code{comment-start} and @code{comment-start-skip} variable) the lexer
should already work for your language.

The primary entry point of the lexer is the @dfn{semantic-lex} function
shown below.
Normally, you do not need to call this function.
It is usually called by @emph{semantic-fetch-tags} for you.

@anchor{semantic-lex}
@defun semantic-lex start end &optional depth length
Lexically analyze text in the current buffer between @var{START} and @var{END}.
Optional argument @var{DEPTH} indicates at what level to scan over entire
lists.  The last argument, @var{LENGTH} specifies that @dfn{semantic-lex}
should only return @var{LENGTH} tokens.  The return value is a token stream.
Each element is a list, such of the form
  (symbol start-expression .  end-expression)
where @var{SYMBOL} denotes the token type.
See @code{semantic-lex-tokens} variable for details on token types.  @var{END}
does not mark the end of the text scanned, only the end of the
beginning of text scanned.  Thus, if a string extends past @var{END}, the
end of the return token will be larger than @var{END}.  To truly restrict
scanning, use @dfn{narrow-to-region}.
@end defun

@menu
* Lexer Overview::              What is a Lexer?
* Lexer Output::                Output of a Lexical Analyzer
* Lexer Construction::          Constructing your own lexer
* Lexer Built In Analyzers::    Built in analyzers you can use
* Lexer Analyzer Construction::  Constructing your own anlyzers
* Keywords::                    Specialized lexical tokens.
* Keyword Properties::          
@end menu

@node Lexer Overview
@section Lexer Overview

@semantic lexer breaks up the content of an Emacs buffer into a stream of
tokens.  This process is based mostly on regular expressions which in
turn depend on the syntax table of the buffer's major mode being setup
properly.
@xref{Major Modes,,,emacs}.
@xref{Syntax Tables,,,elisp}.
@xref{Regexps,,,emacs}.

The top level lexical function @dfn{semantic-lex}, calls the function
stored in @dfn{semantic-lex-analyzer}.  The default value is the
function @dfn{semantic-flex} from version 1.4 of Semantic.  This will
eventually be depricated.

In the default lexer, the following regular expressions which rely on syntax
tables are used:

@table @code
@item @code{\\s-}
whitespace characters
@item @code{\\sw}
word constituent
@item @code{\\s_}
symbol constituent
@item @code{\\s.}
punctuation character
@item @code{\\s<}
comment starter
@item @code{\\s>}
comment ender
@item @code{\\s\\}
escape character
@item @code{\\s)}
close parenthesis character
@item @code{\\s$}
paired delimiter
@item @code{\\s\"}
string quote
@item @code{\\s\'}
expression prefix
@end table

In addition, Emacs' built-in features such as
@code{comment-start-skip},
@code{forward-comment},
@code{forward-list},
and
@code{forward-sexp}
are employed.

@node Lexer Output
@section Lexer Output

The lexer, @ref{semantic-lex}, scans the content of a buffer and
returns a token list.
Let's illustrate this using this simple example.

@example
00: /*
01:  * Simple program to demonstrate semantic.
02:  */
03:
04: #include <stdio.h>
05:
06: int i_1;
07:
08: int
09: main(int argc, char** argv)
10: @{
11:     printf("Hello world.\n");
12: @}
@end example

Evaluating @code{(semantic-lex (point-min) (point-max))}
within the buffer with the code above returns the following token list.
The input line and string that produced each token is shown after
each semi-colon.

@example
((punctuation     52 .  53)     ; 04: #
 (INCLUDE         53 .  60)     ; 04: include
 (punctuation     61 .  62)     ; 04: <
 (symbol          62 .  67)     ; 04: stdio
 (punctuation     67 .  68)     ; 04: .
 (symbol          68 .  69)     ; 04: h
 (punctuation     69 .  70)     ; 04: >
 (INT             72 .  75)     ; 06: int
 (symbol          76 .  79)     ; 06: i_1
 (punctuation     79 .  80)     ; 06: ;
 (INT             82 .  85)     ; 08: int
 (symbol          86 .  90)     ; 08: main
 (semantic-list   90 . 113)     ; 08: (int argc, char** argv)
 (semantic-list  114 . 147)     ; 09-12: body of main function
 )
@end example

As shown above, the token list is a list of ``tokens''.
Each token in turn is a list of the form

@example
(TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION)
@end example

@noindent
where TOKEN-TYPE is a symbol, and the other two are integers indicating
the buffer position that delimit the token such that

@lisp
(buffer-substring BEGINNING-POSITION ENDING-POSITION)
@end lisp

@noindent
would return the string form of the token.

Note that one line (line 4 above) can produce seven tokens while
the whole body of the function produces a single token.
This is because the @var{depth} parameter of @code{semantic-lex} was
not specified.
Let's see the output when @var{depth} is set to 1.
Evaluate @code{(semantic-lex (point-min) (point-max) 1)} in the same buffer.
Note the third argument of @code{1}.

@example
((punctuation    52 .  53)     ; 04: #
 (INCLUDE        53 .  60)     ; 04: include
 (punctuation    61 .  62)     ; 04: <
 (symbol         62 .  67)     ; 04: stdio
 (punctuation    67 .  68)     ; 04: .
 (symbol         68 .  69)     ; 04: h
 (punctuation    69 .  70)     ; 04: >
 (INT            72 .  75)     ; 06: int
 (symbol         76 .  79)     ; 06: i_1
 (punctuation    79 .  80)     ; 06: ;
 (INT            82 .  85)     ; 08: int
 (symbol         86 .  90)     ; 08: main

 (open-paren     90 .  91)     ; 08: (
 (INT            91 .  94)     ; 08: int
 (symbol         95 .  99)     ; 08: argc
 (punctuation    99 . 100)     ; 08: ,
 (CHAR          101 . 105)     ; 08: char
 (punctuation   105 . 106)     ; 08: *
 (punctuation   106 . 107)     ; 08: *
 (symbol        108 . 112)     ; 08: argv
 (close-paren   112 . 113)     ; 08: )

 (open-paren    114 . 115)     ; 10: @{
 (symbol        120 . 126)     ; 11: printf
 (semantic-list 126 . 144)     ; 11: ("Hello world.\n")
 (punctuation   144 . 145)     ; 11: ;
 (close-paren   146 . 147)     ; 12: @}
 )
@end example

The @var{depth} parameter ``peeled away'' one more level of ``list''
delimited by matching parenthesis or braces.
The depth parameter can be specified to be any number.
However, the parser needs to be able to handle the extra tokens.

This is an interesting benefit of the lexer having the full
resources of Emacs at its disposal.
Skipping over matched parenthesis is achieved by simply calling
the built-in functions @code{forward-list} and @code{forward-sexp}.

@node Lexer Construction
@section Lexer Construction

While using the default lexer is certainly an option, particularly
for grammars written in semantic 1.4 style, it is usually more
efficient to create a custom lexer for your language.

You can create a new lexer with @dfn{define-lex}.

@defun define-lex name doc &rest analyzers
@anchor{define-lex}
Create a new lexical analyzer with @var{NAME}.
@var{DOC} is a documentation string describing this analyzer.
@var{ANALYZERS} are small code snippets of analyzers to use when
building the new @var{NAMED} analyzer.  Only use analyzers which
are written to be used in @dfn{define-lex}.
Each analyzer should be an analyzer created with @dfn{define-lex-analyzer}.
Note: The order in which analyzers are listed is important.
If two analyzers can match the same text, it is important to order the
analyzers so that the one you want to match first occurs first.  For
example, it is good to put a numbe analyzer in front of a symbol
analyzer which might mistake a number for as a symbol.
@end defun

The list of @var{analyzers}, needed here can consist of one of
several built in analyzers, or one of your own construction.  The
built in analyzers are:

@node Lexer Built In Analyzers
@section Lexer Built In Analyzers

@defspec semantic-lex-default-action
The default action when no other lexical actions match text.
This action will just throw an error.
@end defspec

@defspec semantic-lex-beginning-of-line
Detect and create a beginning of line token (BOL).
@end defspec

@defspec semantic-lex-newline
Detect and create newline tokens.
@end defspec

@defspec semantic-lex-newline-as-whitespace
Detect and create newline tokens.
Use this ONLY if newlines are not whitespace characters (such as when
they are comment end characters) AND when you want whitespace tokens.
@end defspec

@defspec semantic-lex-ignore-newline
Detect and create newline tokens.
Use this ONLY if newlines are not whitespace characters (such as when
they are comment end characters).
@end defspec

@defspec semantic-lex-whitespace
Detect and create whitespace tokens.
@end defspec

@defspec semantic-lex-ignore-whitespace
Detect and skip over whitespace tokens.
@end defspec

@defspec semantic-lex-number
Detect and create number tokens.
Number tokens are matched via this variable:

@defvar semantic-lex-number-expression
Regular expression for matching a number.
If this value is @code{nil}, no number extraction is done during lex.
This expression tries to match C and Java like numbers.

@example
DECIMAL_LITERAL:
    [1-9][0-9]*
  ;
HEX_LITERAL:
    0[xX][0-9a-fA-F]+
  ;
OCTAL_LITERAL:
    0[0-7]*
  ;
INTEGER_LITERAL:
    <DECIMAL_LITERAL>[lL]?
  | <HEX_LITERAL>[lL]?
  | <OCTAL_LITERAL>[lL]?
  ;
EXPONENT:
    [eE][+-]?[09]+
  ;
FLOATING_POINT_LITERAL:
    [0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
  | [.][0-9]+<EXPONENT>?[fFdD]?
  | [0-9]+<EXPONENT>[fFdD]?
  | [0-9]+<EXPONENT>?[fFdD]
  ;
@end example
@end defvar

@end defspec

@defspec semantic-lex-symbol-or-keyword
Detect and create symbol and keyword tokens.
@end defspec

@defspec semantic-lex-charquote
Detect and create charquote tokens.
@end defspec

@defspec semantic-lex-punctuation
Detect and create punctuation tokens.
@end defspec

@defspec semantic-lex-punctuation-type
Detect and create a punctuation type token.
Recognized punctuations are defined in the current table of lexical
types, as the value of the `punctuation' token type.
@end defspec

@defspec semantic-lex-paren-or-list
Detect open parenthesis.
Return either a paren token or a semantic list token depending on
`semantic-lex-current-depth'.
@end defspec

@defspec semantic-lex-open-paren
Detect and create an open parenthisis token.
@end defspec

@defspec semantic-lex-close-paren
Detect and create a close paren token.
@end defspec

@defspec semantic-lex-string
Detect and create a string token.
@end defspec

@defspec semantic-lex-comments
Detect and create a comment token.
@end defspec

@defspec semantic-lex-comments-as-whitespace
Detect comments and create a whitespace token.
@end defspec

@defspec semantic-lex-ignore-comments
Detect and create a comment token.
@end defspec

@node Lexer Analyzer Construction
@section Lexer Analyzer Construction

Each of the previous built in analyzers are constructed using a set
of analyzer construction macros.  The root construction macro is:

@defun define-lex-analyzer name doc condition &rest forms
Create a single lexical analyzer @var{NAME} with @var{DOC}.
When an analyzer is called, the current buffer and point are
positioned in a buffer at the location to be analyzed.
@var{CONDITION} is an expression which returns @code{t} if @var{FORMS} should be run.
Within the bounds of @var{CONDITION} and @var{FORMS}, the use of backquote
can be used to evaluate expressions at compile time.
While forms are running, the following variables will be locally bound:
  @code{semantic-lex-analysis-bounds} - The bounds of the current analysis.
                  of the form (@var{START} . @var{END})
  @code{semantic-lex-maximum-depth} - The maximum depth of semantic-list
                  for the current analysis.
  @code{semantic-lex-current-depth} - The current depth of @code{semantic-list} that has
                  been decended.
  @code{semantic-lex-end-point} - End Point after match.
                   Analyzers should set this to a buffer location if their
                   match string does not represent the end of the matched text.
  @code{semantic-lex-token-stream} - The token list being collected.
                   Add new lexical tokens to this list.
Proper action in @var{FORMS} is to move the value of @code{semantic-lex-end-point} to
after the location of the analyzed entry, and to add any discovered tokens
at the beginning of @code{semantic-lex-token-stream}.
This can be done by using @dfn{semantic-lex-push-token}.
@end defun

Additionally, a simple regular expression based analyzer can be built
with:

@defun define-lex-regex-analyzer name doc regexp &rest forms
Create a lexical analyzer with @var{NAME} and @var{DOC} that will match @var{REGEXP}.
@var{FORMS} are evaluated upon a successful match.
See @dfn{define-lex-analyzer} for more about analyzers.
@end defun

@defun define-lex-simple-regex-analyzer name doc regexp toksym &optional index &rest forms
Create a lexical analyzer with @var{NAME} and @var{DOC} that match @var{REGEXP}.
@var{TOKSYM} is the symbol to use when creating a semantic lexical token.
@var{INDEX} is the index into the match that defines the bounds of the token.
Index should be a plain integer, and not specified in the macro as an
expression.
@var{FORMS} are evaluated upon a successful match @var{BEFORE} the new token is
created.  It is valid to ignore @var{FORMS}.
See @dfn{define-lex-analyzer} for more about analyzers.
@end defun

Regular expression analyzers are the simplest to create and manage.
Often, a majority of your lexer can be built this way.  The analyzer
for matching punctuation looks like this:

@example
(define-lex-simple-regex-analyzer semantic-lex-punctuation
  "Detect and create punctuation tokens."
  "\\(\\s.\\|\\s$\\|\\s'\\)" 'punctuation)
@end example

More complex analyzers for matching larger units of text to optimize
the speed of parsing and analysis is done by matching blocks.

@defun define-lex-block-analyzer name doc spec1 &rest specs
Create a lexical analyzer @var{NAME} for paired delimiters blocks.
It detects a paired delimiters block or the corresponding open or
close delimiter depending on the value of the variable
@code{semantic-lex-current-depth}.  @var{DOC} is the documentation string of the lexical
analyzer.  @var{SPEC1} and @var{SPECS} specify the token symbols and open, close
delimiters used.  Each @var{SPEC} has the form:

(@var{BLOCK-SYM} (@var{OPEN-DELIM} @var{OPEN-SYM}) (@var{CLOSE-DELIM} @var{CLOSE-SYM}))

where @var{BLOCK-SYM} is the symbol returned in a block token.  @var{OPEN-DELIM}
and @var{CLOSE-DELIM} are respectively the open and close delimiters
identifying a block.  @var{OPEN-SYM} and @var{CLOSE-SYM} are respectively the
symbols returned in open and close tokens.
@end defun

These blocks is what makes the speed of semantic's Emacs Lisp based
parsers fast.  For exmaple, by defining all text inside @{ braces @} as
a block the parser does not need to know the contents of those braces
while parsing, and can skip them all together.

@node Keywords
@section Keywords

Another important piece of the lexer is the keyword table (see
@ref{Writing Parsers}).  You language will want to set up a keyword table for
fast conversion of symbol strings to language terminals.

The keywords table can also be used to store additional information
about those keywords.  The following programming functions can be useful
when examining text in a language buffer.

@defun semantic-lex-keyword-p name
Return non-@code{nil} if a keyword with @var{NAME} exists in the keyword table.
Return @code{nil} otherwise.
@end defun

@defun semantic-lex-keyword-put name property value
For keyword with @var{NAME}, set its @var{PROPERTY} to @var{VALUE}.
@end defun

@defun semantic-lex-keyword-get name property
For keyword with @var{NAME}, return its @var{PROPERTY} value.
@end defun

@defun semantic-lex-map-keywords fun &optional property
Call function @var{FUN} on every semantic keyword.
If optional @var{PROPERTY} is non-@code{nil}, call @var{FUN} only on every keyword which
as a @var{PROPERTY} value.  @var{FUN} receives a semantic keyword as argument.
@end defun

@defun semantic-lex-keywords &optional property
Return a list of semantic keywords.
If optional @var{PROPERTY} is non-@code{nil}, return only keywords which have a
@var{PROPERTY} set.
@end defun

Keyword properties can be set up in a grammar file for ease of maintenance.
While examining the text in a language buffer, this can provide an easy
and quick way of storing details about text in the buffer.

@node Keyword Properties
@section Standard Keyword Properties

Keywords in a language can have multiple properties.  These
properties can be used to associate the string that is the keyword
with additional information.

Currently available properties are:

@table @b
@item summary
The summary property is used by semantic-summary-mode as a help
string for the keyword specified.
@end table

Notes:

Possible future properties.  This is just me musing:

@table @b
@item face
Face used for highlighting this keyword, differentiating it from the
keyword face.
@item template
@itemx skeleton
Some sort of tempo/skel template for inserting the programatic
structure associated with this keyword.
@item abbrev
As with template.
@item action
@itemx menu
Perhaps the keyword is clickable and some action would be useful.
@end table


@node Writing Parsers
@chapter Writing Parsers
@cindex Writing Parsers

@ignore
For the parser developer, I can think of two extra sections.  One for
semanticdb extensions,  (If a system database is needed.)  A second
for the `semantic-ctxt' extensions.  Many of the most interesting
tools will completely fail to work without local context parsing
support.

Perhaps even a section on foreign tokens.  For example, putting a
Java token into a C++ file could auto-gen a native method, just as
putting a token into a Texinfo file converts it into documentation.

In addition, in the "writing grammars" section should have
subsections as listed in the examples of the overview section.  It
might be useful to have a fourth section describing the similarities
between the two file types (by and wy) and how to use the grammar
mode.  (I'm not sure if that should be covered elsewhere.)
@end ignore

When converting a source file into a tag table it is important to
specify rules to accomplish this.  The rules are stored in the buffer
local variable @code{semantic--buffer-cache}.

While it is certainly possible to write this table yourself, it is
most likely you will want to use the @ref{Grammar Programming
Environment}.

There are three choices for parsing your language.

@table @b
@item Bovine Parser
The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser.  For more information,
@inforef{top, the Bovine Parser Manual, bovine}.

@item Wisent Parser
The @dfn{wisent} parser is a port of the GNU Compiler Compiler Bison
to Emacs Lisp.  Wisent includes the iterative error handler of the
bovine parser, and has the same error correction as traditional
@acronym{LALR} parsers.  For more information,
@inforef{top, the Wisent Parser Manual, wisent}.

@item External Parser
External parsers, such as the texinfo parser can be implemented using
any means.  This allows the use of a regular expression parser for
non-regular languages, or external programs for speed.
@end table

@menu
* External Parsers::            Writing an external parser
* Grammar Programming Environment::  Using the grammar writing environemt
* Parser Backend Support::             Lisp needed to support a grammar.
@end menu

@node External Parsers
@section External Parsers

The texinfo parser in @file{semantic-texi.el} is an example of an
external parser.  To make your parser work, you need to have a setup
function.

Note: Finish this.

@node Grammar Programming Environment
@section Grammar Programming Environment

Semantic grammar files in @file{.by} or @file{.wy} format have their
own programming mode.  This mode provides indentation and coloring
services in those languages.  In addition, the grammar languages are
also supported by @semantic so tagging information is available to
tools such as imenu or speedbar.

For more information,
@inforef{top, the Grammar Framework Manual, grammar-fw}.

@node Parsing a language file
@chapter Parsing a language file

The best way to call the parser from programs is via
@code{semantic-fetch-tags}.  This, in turn, uses other internal
@acronym{API} functions which plug-in parsers can take advantage of.

@defun semantic-fetch-tags
@anchor{semantic-fetch-tags}
Fetch semantic tags from the current buffer.
If the buffer cache is up to date, return that.
If the buffer cache is out of date, attempt an incremental reparse.
If the buffer has not been parsed before, or if the incremental reparse
fails, then parse the entire buffer.
If a lexcial error had been previously discovered and the buffer
was marked unparseable, then do nothing, and return the cache.
@end defun

Another approach is to let Emacs call the parser on idle time, when
needed, then use @code{semantic-fetch-available-tags} to retrieve and
process only the available tags, provided that the
@code{semantic-after-*-hook} hooks have been setup to synchronize with
new tags when they become available.

@defun semantic-fetch-available-tags
@anchor{semantic-fetch-available-tags}
Fetch available semantic tags from the current buffer.
That is, return tags currently in the cache without parsing the
current buffer.

Parse operations happen asynchronously when needed on Emacs idle time.
Use the @code{semantic-after-toplevel-cache-change-hook} and
@code{semantic-after-partial-cache-change-hook} hooks to synchronize with
new tags when they become available.
@end defun

@deffn Command semantic-clear-toplevel-cache
@anchor{semantic-clear-toplevel-cache}
Clear the toplevel tag cache for the current buffer.
Clearing the cache will force a complete reparse next time a token
stream is requested.
@end deffn

@menu
* Parser Backend Support:: Parser backend support.
@end menu

@node Parser Backend Support
@section Parser Backend Support

Once you have written a grammar file that has been compiled into
Emacs Lisp code, additional glue needs to be written to finish
connecting the generated parser into the Emacs framework.

Large portions of this glue is automatically generated, but will
probably need additional modification to get things to work properly.

Typically, a grammar file @file{foo.wy} will create the file
@file{foo-wy.el}.  It is then useful to also create a file
@file{wisent-foo.el} (or @file{sematnic-foo.el}) to contain the parser
back end, or the glue that completes the semantic support for the
language.

@menu
* Example Backend File::
* Tag Expansion::
@end menu

@node Example Backend File
@subsection Example Backend File

Typical structure for this file is:

@example
;;; semantic-foo.el -- parser support for FOO.

;;; Your copyright Notice

(require 'foo-wy.el)  ;; The parser
(require 'foo) ;; major mode definition for FOO

;;; Code:

;;; Lexical Analyzer
;;
;; OPTIONAL
;; It is possible to define your lexical analyzer completely in your
;; grammar file.

(define-lex foo-lexical-analyzer
  "Create a lexical analyzer."
  ...)

;;; Expand Function
;;
;; OPTIONAL
;; Not all langauges are so complex as to need this function.
;; See `semantic-tag-expand-function' for more details.
(defun foo-tag-expand-function (tag)
  "Expand TAG into multiple tags if needed."
  ...)

;;; Parser Support
;;
;; OPTIONAL
;; If you need some specialty routines inside your grammar file
;; you can add some here.   The process may be to take diverse info
;; and reorganize it.
;;
;; It is also appropriate to write these functions in the prologue
;; of the grammar function.
(defun foo-do-something-hard (...)
  "...")

;;; Overload methods
;;
;; OPTIONAL
;; To allow your langauge to be fully supported by all the
;; applications that use semantic, it is important, but not necessary
;; to create implementations of overload methods.
(define-mode-overload-implementation some-semantic-function foo-mode (tag)
  "Implement some-semantic-function for FOO."
  )

;;;###autoload
(defun semantic-default-foo-setup ()
  "Set up a buffer for semantic parsing of the FOO language."
  (semantic-foo-by--install-parser)
  (setq semantic-tag-expand-function foo-tag-expand-function
        ;; Many other language specific settings can be done here
        ;; as well.
        )
  ;; This may be optional
  (setq semantic-lex-analyzer #'foo-lexical-analyzer)
  )

;;;###autoload
(add-hook 'foo-mode-hook 'semantic-default-foo-setup)

(provide 'semantic-c)

;;; semantic-foo.el ends here
@end example

@node Tag Expansion
@subsection Tag Expansion

In any language with compound tag types, you will need to implement
an @emph{expand function}.  Once written, assign it to this variable.

@defvar semantic-tag-expand-function
@anchor{semantic-tag-expand-function}
Function used to expand a tag.
It is passed each tag production, and must return a list of tags
derived from it, or @code{nil} if it does not need to be expanded.

Languages with compound definitions should use this function to expand
from one compound symbol into several.  For example, in @var{C} or Java the
following definition is easily parsed into one tag:

  int a, b;

This function should take this compound tag and turn it into two tags,
one for @var{A}, and the other for @var{B}.
@end defvar

Additionally, you can use the expand function in conjuntion with your
language for other types of compound statements.  For example, in
Common Lisp Object System, you can have a definition:

@example
(defclass classname nil
  (slots ...) ...)
@end example

This will create both the datatype @code{classname} and the functional
constructor @code{classname}.  Each slot may have a @code{:accessor}
method as well.

You can create a special compounded tag in your rule, for example:

@example
classdef: LPAREN DEFCLASS name semantic-list semantic-list RPAREN
          (TAG "custom" 'compound-class
               :value (list
                        (TYPE-TAG $3 "class" ...)
                        (FUNCTION-TAG $3 ...)
                        ))
        ;
@end example

and in your expand function, you would write:

@example
(defun my-tag-expand (tag)
  "Expand tags for my langauge."
  (when (semantic-tag-of-class-p tag 'compound-class)
     (remq nil
        (semantic-tag-get-attribute tag :value))))
@end example

This will cause the custom tag to be replaced by the tags created in
the :value attribute of the specially constructed tag.

@node Debugging
@chapter Debugging

Grammars can be tricky things to debug.  There are several types of
tools for debugging in Semantic, and the type of problem you have
requires different types of tools.

@menu
* Lexical Debugging::           
* Parser Output tools::         
* Bovine Parser Debugging::     
* Wisent Parser Debugging::     
* Overlay Debugging::           
* Incremental Parser Debugging::  
* Debugging Analysis::          
* Semantic 1.4 Doc::            
@end menu

@node Lexical Debugging
@section Lexical Debugging

The first major problem you may encounter is with lexical analysis.
If the text is not transformed into the expected token stream, no
parser will understand it.

You can step through the lexical analyzer with the following command:

@deffn Command semantic-lex-debug arg
@anchor{semantic-lex-debug}
Debug the semantic lexer in the current buffer.
Argument @var{ARG} specifies of the analyze the whole buffer, or start at point.
While engaged, each token identified by the lexer will be highlighted
in the target buffer   @var{A} description of the current token will be
displayed in the minibuffer.  Press @kbd{SPC} to move to the next lexical token.
@end deffn

For an example of what the output of the @code{semantic-lex} function
should return, see @ref{Lexer Output}.

@node Parser Output tools
@section Parser Output tools

There are several tools which can be used to see what the parser
output is.  These will work for any type of parser, including the
Bovine parser, Wisent parser.

The first and easiest is a minor mode which highlights text the
parser did not understand.

@deffn Command semantic-show-unmatched-syntax-mode &optional arg
@anchor{semantic-show-unmatched-syntax-mode}
Minor mode to highlight unmatched lexical syntax tokens.
When a parser executes, some elements in the buffer may not match any
parser rules.  These text characters are considered unmatched syntax.
Often time, the display of unmatched syntax can expose coding
problems before the compiler is run.

With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.

@table @kbd
@item key
binding
@item C-c ,
Prefix Command
@item C-c , `
semantic-show-unmatched-syntax-next
@end table
@end deffn

Another interesting mode will display a line between all the tags in
the current buffer to make it more obvious where boundaries lie.  You
can enable this as a minor mode.

@deffn Command semantic-show-tag-boundaries-mode &optional arg
@anchor{semantic-show-tag-boundaries-mode}
Minor mode to display a boundary in front of tags.
The boundary is displayed using an overline in Emacs @var{21}.
With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

Another interesting mode helps if you are worred about specific
attributes, you can se this minor mode to highlight different tokens
in different ways based on the attributes you are most concerned with.

@deffn Command semantic-highlight-by-attribute-mode &optional arg
@anchor{semantic-highlight-by-attribute-mode}
Minor mode to highlight tags based on some attribute.
By default, the protection of a tag will give it a different
background color.

With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

Another tool that can be used is a dump of the current list of tags.
This shows the actual Lisp representation of the tags generated in a
rather bland dump.  This can be useful if text was successfully
parsed, and you want to be sure that the correct information was
captured.

@deffn Command bovinate &optional clear
@anchor{bovinate}
Bovinate the current buffer.  Show output in a temp buffer.
Optional argument @var{CLEAR} will clear the cache before bovinating.
If @var{CLEAR} is negative, it will do a full reparse, and also not display
the output buffer.
@end deffn

@node Bovine Parser Debugging
@section Bovine Parser Debugging

The bovine parser is described in @inforef{top, ,bovine}.

Asside using a traditional Emacs Lisp debugger on functions you
provide for token expansion, there is one other means of debugging
which interactively steps over the rules in your grammar file.

@deffn Command semantic-debug
@anchor{semantic-debug}
Parse the current buffer and run in debug mode.
@end deffn

Once the parser is activated in this mode, the current tag cache is
flushed, and the parser started.  At each stage in the LALR parser,
the current rule, and match step is highlighted in your parser source
buffer.  In a second window, the text being parsed is shown, and the
lexical token found is highlighted.  A clue of the current stack of
saved data is displayed in the minibuffer.

There is a wide range of keybindings that can be used to execute code
in your buffer.  (Not all are implemented.)

@table @kbd
@item n
@itemx SPC
Next.
@item s
Step.
@item u
Up.  (Not implemented yet.)
@item d
Down.  (Not implemented yet.)
@item f
Fail Match.  Pretend the current match element and the token in the
buffer is a failed match, even if it is not.
@item h
Print information about the current parser state.
@item s
Jump to to the source buffer.
@item p
Jump to the parser buffer.
@item q
Quit.  Exits this debug session and the parser.
@item a
Abort.  Aborts one level of the parser, possibly exiting the debugger.
@item g
Go.  Stop debugging, and just start parsing.
@item b
Set Breakpoint.  (Not implemented yet.)
@item e
@code{eval-expression}.  Lets you execute some random Emacs Lisp command.
@end table

@b{Note:} While the core of @code{semantic-debug} is a generic
debugger interface for rule based grammars, only the bovine parser has
a specific backend implementation.  If someone wants to implement a
debugger backend for wisent, that would be spiff.

@node Wisent Parser Debugging
@section Wisent Parser Debugging

Wisent does not implement a backend for @code{semantic-debug}, it
does have some debugging commands the rule actions.  You can read
about them in the wisent manual.

@inforef{Grammar Debugging, , wisent}

@node Overlay Debugging
@section Overlay Debugging

Once a buffer has been parsed into a tag table, the next most
important step is getting those tags activated for a buffer, and
storable in a @code{semanticdb} backend.
@inforef{semanticdb, , semantic-appdev}.

These two activities depend on the ability of every tag in the table
to be linked and unlinked to the current buffer with an overlay.
@inforef{semantic-appdev, , Tag Overlay}
@inforef{semantic-appdev, , Tag Hooks}

In this case, the most important function that must be written is:

@defun semantic-tag-components-with-overlays tag
@anchor{semantic-tag-components-with-overlays}
Return the list of top level components belonging to @var{TAG}.
Children are any sub-tags which contain overlays.

Default behavior is to get @dfn{semantic-tag-components} in addition
to the components of an anonymous types (if applicable.)

Note for language authors:
  If a mode defines a language tag that has tags in it with overlays
you should still return them with this function.
Ignoring this step will prevent several features from working correctly.
This function can be overriden in semantic using the
symbol @code{tag-components-with-overlays}.
@end defun

If your are successfully building a tag table, and errors occur
saving or restoring tags from semanticdb, this is the most likely
cause of the problem.

@node Incremental Parser Debugging
@section Incremental Parser Debugging

The incremental parser is a highly complex engine for quickly
refreshing the tag table of a buffer after some set of changes have
been made to that buffer by a user.

There is no debugger or interface to the incremental parser, however
there are a few minor modes which can help you identify issues if you
think there are problems while incrementally parsing a buffer.

The first stage of the incremental parser is in tracking the changes
the user makes to a buffer.  You can visibly track these changes too.

@deffn Command semantic-highlight-edits-mode &optional arg
@anchor{semantic-highlight-edits-mode}
Minor mode for highlighting changes made in a buffer.
Changes are tracked by semantic so that the incremental parser can work
properly.
This mode will highlight those changes as they are made, and clear them
when the incremental parser accounts for those edits.
With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

Another important aspect of the incremental parser involves tracking
the current parser state of the buffer.  You can track this state
also.

@deffn Command semantic-show-parser-state-mode &optional arg
@anchor{semantic-show-parser-state-mode}
Minor mode for displaying parser cache state in the modeline.
The cache can be in one of three states.  They are
Up to date, Partial reprase needed, and Full reparse needed.
The state is indicated in the modeline with the following characters:
@table @kbd
@item -
The cache is up to date.
@item !
The cache requires a full update.
@item ^
The cache needs to be incrementally parsed.
@item %
The cache is not currently parseable.
@item @@
Auto-parse in progress (not set here.)
@end table

With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

When the incremental parser starts updating the tags buffer, you can
also enable a set of messages to help identify how the incremental
parser is merging changes with the main buffer.

@defvar semantic-edits-verbose-flag
@anchor{semantic-edits-verbose-flag}
Non-@code{nil} means the incremental perser is verbose.
If @code{nil}, errors are still displayed, but informative messages are not.
@end defvar

@node Debugging Analysis
@section Debugging Analysis

The semantic analyzer is a at the top of the food chain when it comes
to @semantic{} service functionality.  The semantic support for a
language must be absolute before analysis will work property.

A good way to test analysis is by placing the cursor in different
places, and requesting a dump of the context.

@deffn Command semantic-analyze-current-context position
@anchor{semantic-analyze-current-context}
Analyze the current context at @var{POSITION}.
If called interactively, display interesting information about @var{POSITION}
in a separate buffer.
Returns an object based on symbol @dfn{semantic-analyze-context}.
@end deffn

@ref{Semantic Analyzer Support}
@inforef{Analyzer, , semantic-user}

@node Semantic 1.4 Doc
@section Semantic 1.4 Doc

@i{
In semantic 1.4 the following documentation was written for debugging.
I'm leaving in here until better doc for 2.0 is done.
}

Writing language files using BY is significantly easier than writing
then using regular expressions in a functional manner.  Debugging
them, however, can still prove challenging.

There are two ways to debug a language definition if it is not
behaving as expected.  One way is to debug against the source @file{.by}
file.

If your language definition was written in BNF notation, debugging is
quite easy.  The command @code{semantic-debug} will start you off.

@deffn Command semantic-debug
Reprase the current buffer and run in parser debug mode.
@end deffn

While debugging, two windows are visible.  One window shows the file
being parsed, and the syntactic token being tested is highlighted.
The second window shows the table being used (in the BY source) with
the current rule highlighted.  The cursor will sit on the specific
match rule being tested against.

In the minibuffer, a brief summary of the current situation is
listed.  The first element is the syntactic token which is a list of
the form:

@example
(TYPE START . END)
@end example

The rest of the display is a list of all strings collected for the
currently tested rule.  Each time a new rule is entered, the list is
restarted.  Upon returning from a rule into a previous match list, the
previous match list is restored, with the production of the dependent
rule in the list.

Use @kbd{C-g} to stop debugging.  There are no commands for any
fancier types of debugging.

NOTE: Semantic 2.0 has more debugging commands.  Use:
@kbd{C-h m semantic-debug-mode} to view.

@node Parser Error Handling
@chapter Parser Error Handling
@cindex Parser Error Handling

NOTE: Write Me

@node GNU Free Documentation License
@appendix GNU Free Documentation License

@include fdl.texi

@node Index
@unnumbered Index
@printindex cp

@iftex
@contents
@summarycontents
@end iftex

@bye

@c Following comments are for the benefit of ispell.