File: optlib.rst

package info (click to toggle)
universal-ctags 0%2Bgit20181215-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 17,444 kB
  • sloc: ansic: 84,242; vhdl: 5,924; sh: 5,830; perl: 1,743; cpp: 1,599; cs: 1,193; python: 812; sql: 572; f90: 534; php: 479; yacc: 459; fortran: 341; makefile: 325; asm: 311; objc: 284; ruby: 261; xml: 245; java: 157; tcl: 133; cobol: 122; lisp: 113; erlang: 61; ada: 55; ml: 49; awk: 43
file content (1525 lines) | stat: -rw-r--r-- 54,764 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
.. _optlib:

Extending ctags with Regex parser (*optlib*)
---------------------------------------------------------------------

:Maintainer: Masatake YAMATO <yamato@redhat.com>

.. TODO:
	review extras, fields, and roles sections
	possibly restructure this file's section ordering
	add documentation for --_mtable-extend-<LANG>
	add documentation for tjump, treset, tquit flags
	add a section on debugging
	add a section on langdef base parser flag, including
		shared/dedicated/bidirectional directions

----

.. Q: shouldn't the section about option files (preload especially) go in
	their own section somewhere else in the docs? They're not specifically
	for "Extending ctags" - they can be used for any command options that
	you want to use permanently. It's really the new language parsers using
	--regex-<LANG> and such that are about "Extending ctags", no?


Option files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An "option" file is a file in which command line options are written line
by line. ``ctags`` loads it and runs as if the options in the file were
passed in command line.

Following file is an example of option file.

.. code-block:: python

	# Exclude directories that don't contain real code
	--exclude=Units
		# indentation is ignored
		--exclude=tinst-root
	--exclude=Tmain

`#` can be used as a start marker of a line comment.
Whitespaces at the start of lines are ignored during loading.

There are two categories of option files, though they both contain command
line options: **preload** and **optlib** option files.

.. Q: do we really want to call the non-preload option files "optlib"?
	That name seems like an internal detail. Users of ctags never see that
	name anywhere except in these docs, and it's weird. How about
	"specified" option files, or "requested" or some such? (i.e., the file
	is explicitly specified or requested when ctags is run)

Preload option file
......................................................................

Preload option files are option files loaded by ``ctags`` automatically
at start-up time. Which files are loaded at start-up time are very different
from Exuberant-ctags.

At start-up time, Universal-ctags loads files having :file:`.ctags` as a
file extension under the following statically defined directories:

#. :file:`$HOME/.ctags.d`
#. :file:`$HOMEDRIVE$HOMEPATH/.ctags.d` (in ``Windows``)
#. :file:`.ctags.d`
#. :file:`ctags.d`

``ctags`` visits the directories in the order listed above for preloading files.
``ctags`` loads files having :file:`.ctags` as file extension in alphabetical
order (strcmp(3) is used for comparing, so for example
:file:`.ctags.d/ZZZ.ctags` will be loaded *before* :file:`.ctags.d/aaa.ctags`).

Quoted from man page of Exuberant-ctags::

	FILES
		   /ctags.cnf (on MSDOS, MSWindows only)
		   /etc/ctags.conf
		   /usr/local/etc/ctags.conf
		   $HOME/.ctags
		   $HOME/ctags.cnf (on MSDOS, MSWindows only)
		   .ctags
		   ctags.cnf (on MSDOS, MSWindows only)
				  If any of these configuration files exist, each will
				  be expected to contain a set of default options
				  which are read in the order listed when ctags
				  starts, but before the CTAGS environment variable is
				  read or any command line options are read.  This
				  makes it possible to set up site-wide, personal or
				  project-level defaults. It is possible to compile
				  ctags to read an additional configuration file
				  before any of those shown above, which will be
				  indicated if the output produced by the --version
				  option lists the "custom-conf" feature. Options
				  appearing in the CTAGS environment variable or on
				  the command line will override options specified in
				  these files. Only options will be read from these
				  files.  Note that the option files are read in
				  line-oriented mode in which spaces are significant
				  (since shell quoting is not possible). Each line of
				  the file is read as one command line parameter (as
				  if it were quoted with single quotes). Therefore,
				  use new lines to indicate separate command-line
				  arguments.

What follows explains the differences and their intentions...


Directory oriented configuration management
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Exuberant-ctags provides a way to customize ctags with options like
``--langdef=<LANG>`` and ``--regex-<LANG>``. These options are
powerful and make ctags popular for programmers.

Universal-ctags extends this idea; we have added new options for
defining a parser, and have extended existing options. Defining
a new parser with the options is more than "customizing" in
Universal-ctags.

To make it easier to maintain a parser defined using the options, you can put
each parser language in a different options file. Universal-ctags doesn't
preload a single file. Instead, Universal-ctags loads all files having the
:file:`.ctags` extension under the previously specified directories. If you have
multiple parser definitions, put them in different files.

Avoiding option incompatibility issues
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The Universal-ctags options are different from those of Exuberant-ctags,
therefore Universal-ctags doesn't load any of the files Exuberant-ctags loads at
start-up. Otherwise there would be incompatibility issues if Exuberant-ctags
loaded an option file that used a newly introduced option in Universal-ctags,
and vice versa.

No system wide configuration
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

To make the preload path list short and because it was rarely ever used,
Universal-ctags does not load any option files for system wide configuration.
(i.e., no :file:`/etc/ctags.d`)

Use :file:`.ctags` for the file extension
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Extensions :file:`.cnf` and :file:`.conf` are obsolete.
Use the unified extension :file:`.ctags` only.


Optlib option file
......................................................................

From a syntax perspective, there is no difference between optlib option files
and preload option files; ``ctags`` options are written line by line in a file.

Optlib option files are option files not loaded at start-up time
automatically. To load an optlib option file, specify a pathname
for an optlib option file with ``--options=PATHNAME`` option
explicitly. The pathname can be just the filename if it's in the
current directory.

Exuberant-ctags has the ``--options`` option, but you can only specify a
single file to load. Universal-ctags extends the option two aspects: you
can specify a directory to load all files in that directory, and you can
specify a path search list to look in. See next section for details.


Specifying a directory
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

If you specify a directory instead of a file as the argument for the
``--options=PATHNAME``, Universal-ctags will load all files having a
:file:`.ctags` extension under the directory in alphabetical order.

Specifying an optlib path search list
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

For loading a file (or directory) specified in ``--options=PATHNAME``,
``ctags`` searches "optlib path list" first if the option argument
(PATHNAME) doesn't start with '``/``' or '``.``'. If ``ctags`` finds a
file, ``ctags`` loads it.

If ``ctags`` doesn't find a file in the path list, ``ctags`` loads
a file (or directory) at the specified pathname.

By default, optlib path list is empty. To set or add a directory
path to the list, use ``--optlib-dir=PATH``.

For setting (adding one after clearing)::

	--optlib-dir=PATH

For adding::

	--optlib-dir=+PATH

Tips for writing an option file
......................................................................

* Use ``--quiet --options=NONE`` to disable preloading.

.. IN MAN PAGE

* Two options are introduced for debugging the process of loading
  option files.

	``--_echo=MSG``

		Prints MSG to standard error immediately.

	``--_force-quit=[NUM]``

		Exit immediately with the status of the specified NUM.

* Universal-ctags has an ``optlib2c`` script that translates an option file
  into C source code. Your optlib parser can thus easily become a built-in parser,
  by contributing to Universal-ctags' github. You could be famous!
  Examples are in the ``optlib`` directory in Universal-ctags source tree.

Regular expression (regex) engine
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Universal-ctags currently uses the same regex engine as Exuberant-ctags does:
the POSIX.2 regex engine in GNU glibc-2.10.1. By default it uses the Extended
Regular Expressions (ERE) syntax, as used by most engines today; however it does
*not* support many of the "modern" extensions such as lazy captures,
non-capturing grouping, atomic grouping, possessive quantifiers, look-ahead/behind,
etc. It is also notoriously slow when backtracking, and has some known "quirks"
with respect to escaping special characters in bracket expressions.

For example, a pattern of ``[^\]]+`` is invalid in POSIX.2, because the ``]`` is
*not* special inside a bracket expression, and thus should **not** be escaped.
Most regex engines ignore this subtle detail in POSIX.2, and instead allow
escaping it with ``\]`` inside the bracket expression and treat it as the
literal character ``]``. GNU glibc, however, does not generate an error but
instead considers it undefined behavior, and in fact it will match very odd
things. Instead you **must** use the more unintuitive ``[^]]+`` syntax. The same
is technically true of other special characters inside a bracket expression,
such as ``[^\)]+``, which should instead be ``[^)]+``. The ``[^\)]+`` will
appear to work usually, but only because what it is really doing is matching any
character but ``\`` *or* ``)``. The only exceptions for using ``\`` inside a
bracket expression are for ``\t`` and ``\n``, which ctags converts to their
single literal character control codes before passing the pattern to glibc.

Another detail to keep in mind is how the regex engine treats newlines.
Universal-ctags compiles the regular expressions in the ``--regex-<LANG>`` and
``--mline-regex-<LANG>`` options with REG_NEWLINE set. What that means is documented
in the
`POSIX spec <http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html>`_.
One obvious effect is that the regex special dot any-character ``.`` does not match
newline characters, the ``^`` anchor *does* match right after a newline, and
the ``$`` anchor matches right before a newline. A more subtle issue is this text from the
`Regular Expressions chapter <http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html>`_:
"the use of literal <newline>s or any escape sequence equivalent produces undefined
results". What that means is using a regex pattern with ``[^\n]+`` is invalid,
and indeed in glibc produces very odd results. **Never** use ``\n`` in patterns
for ``--regex-<LANG>``, and never use them in non-matching bracket expressions
for ``--mline-regex-<LANG>`` patterns. For the experimental ``--_mtable-regex-<LANG>``
you can safely use ``\n`` because that regex is not compiled with REG_NEWLINE.

You should always test your regex patterns against test files with strings that
do and do not match. Pay particular emphasis to when it should *not* match, and
how *much* it matches when it should. A common error is forgetting that a
POSIX.2 ERE engine is always greedy; the `*` and `+` quantifiers match
as much as possible, before backtracking from the end of their match.

For example this pattern::

	foo.*bar

Will match this **entire** string, not just the first part::

	foobar, bar, and even more bar


Regex option argument flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Many regex-based options described in this document support additonal arguments
in the form of long flags. Long flags are specified with surrounding ``{`` and
``}``.

The general format and placement is as follows::

	--regex-<LANG>=<PATTERN>/<NAME>/[<KIND>/]LONGFLAGS

Some examples:

.. code-block:: perl

	--regex-Pod=/^=head1[ \t]+(.+)/\1/c/
	--regex-Foo=/set=[^;]+/\1/v/{icase}
	--regex-Man=/^\.TH[[:space:]]{1,}"([^"]{1,})".*/\1/t/{exclusive}{icase}{scope=push}
	--regex-Gdbinit=/^#//{exclusive}

Note that the last example only has two ``/`` forward slashes following
the regex pattern, as a shortened form when no kind-spec exists.

The ``--mline-regex-<LANG>`` option also follows the above format. The
experimental ``--_mtable-regex-<LANG>`` option follows a slightly
modified version as well.

The ``--langdef=<LANG>`` option also supports long flags, but not using
forward-slash separators.

Regex control flags
......................................................................

.. Q: why even discuss the single-character version of the flags? Just
	make everyone use the long form.

The regex matching can be controlled by adding flags to the ``--regex-<LANG>``,
``--mline-regex-<LANG>``, and experimental ``--_mtable-regex-<LANG>`` options.
This is done by either using the single character short flags ``b``, ``e`` and
``i`` flags as explained in the *ctags.1* man page, or by using long flags
described earlier. The long flags require more typing but are much more
readable.

The mapping between the older short flag names and long flag names is:

=========== =========== ===========
short flag  long flag   description
=========== =========== ===========
b           basic       Posix basic regular expression syntax.
e           extend      Posix extended regular expression syntax (default).
i           icase       Case-insensitive matching.
=========== =========== ===========


So the following ``--regex-<LANG>`` expression:

.. code-block:: perl

   --regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d,definition/x

is the same as:

.. code-block:: perl

   --regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d,definition/{extend}

The characters ``{`` and ``}`` may not be suitable for command line
use, but long flags are mostly intended for option files.

Exclusive flag in regex
......................................................................

By default, lines read from the input files will be matched with **all** regular
expressions defined with ``--regex-<LANG>``. Each matched regular expression
will successfully emit a tag.

In some cases another policy, exclusive-matching, is preferable to the
all-matching policy. Exclusive-matching means the rest of regular
expressions are not tried if one of regular expressions is matched
successfully, for that input line.

For specifying exclusive-matching the flags ``exclusive`` (long) and ``x``
(short) were introduced. For example, this is used in
:file:`optlib/gdbinit.ctags` for ignoring comment lines in ``gdb`` files,
as follows:

.. code-block:: perl

	--regex-Gdbinit=/^#//{exclusive}

Comments in gbd files start with ``#`` so the above line is the first regex
match line in :file:`gdbinit.ctags`, so that subsequent regex matches are
not tried for the input line.

If an empty name pattern(``//``) is used for the ``--regex-<LANG>`` option,
ctags warns it as a wrong usage of the option. However, if the flags
``exclusive`` or ``x`` is specified, the warning is suppressed.

NOTE: This flag does not make sense in the multi-line ``--mline-regex-<LANG>``
option nor the multi-table ``--_mtable-regex-<LANG>`` option.


Experimental flags
......................................................................

.. note:: These flags are experimental. They apply to all regex option
	types: basic ``--regex-<LANG>``, multi-line ``--mline-regex-<LANG>``,
	and the experimental multi-table ``--_mtable-regex-<LANG>`` option.

``_extra``

	This flag indicates the tag should only be generated if the given
	'extra' type is enabled, as explained in :ref:`extras`.

``_field``

	This flag allows a regex match to add additional custom fields to the
	generated tag entry, as explained in :ref:`fields`.

``_role``

	This flag allows a regex match to generate a reference tag entry and
	specify the role of the reference, as explained in :ref:`roles`.


Ghost kind in regex parser
......................................................................

.. Q: what is the point of documenting this?

If a whitespace is used as a kind letter, it is never printed when
ctags is called with ``--list-kinds`` option.  This kind is
automatically assigned to an empty name pattern.

Normally you don't need to know this.


Scope tracking in a regex parser
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. IN MAN PAGE

With the ``scope`` long flag, you can record/track scope context.
A stack is used for tracking the scope context.

``{scope=push}``

	Push the tag captured with a regex pattern to the top of the stack.
	If you don't want to record this tag but just push, use
	`placeholder` long option together.

``{scope=ref}``

	Refer to the thing at the top of the stack as a scope where the tag captured
	with a regex pattern is. The stack is not modified with this specification.
	If the stack is empty, this flag is just ignored.

``{scope=pop}``

	Pop the thing at the top of the stack.
	If the stack is empty, this flag is just ignored.

``{scope=clear}``

	Make the stack empty.

``{scope=set}``

	Clear then push.

``{placeholder}``

	Don't print a tag captured with a regex pattern to a tag file. This is
	useful when you need to push non-named context information to the stack.
	Well known non-named scope in C language is established with `{`. A non-
	named scope never appears in tags file as a name or scope name.  However,
	pushing it is important to balance ``push`` and ``pop``.

Example 1:

.. code-block:: python

	# in /tmp/input.foo
	class foo:
	def bar(baz):
		print(baz)
	class goo:
	def gar(gaz):
		print(gaz)

.. code-block:: perl

	# in /tmp/foo.ctags:
	--langdef=Foo
	--map-Foo=+.foo

	--regex-Foo=/^class[[:blank:]]+([[:alpha:]]+):/\1/c,class/{scope=set}
	--regex-Foo=/^[[:blank:]]+def[[:blank:]]+([[:alpha:]]+).*:/\1/d,definition/{scope=ref}

.. code-block:: console

	$ ctags --options=/tmp/foo.ctags -o - /tmp/input.foo
	bar	/tmp/input.foo	/^    def bar(baz):$/;"	d	class:foo
	foo	/tmp/input.foo	/^class foo:$/;"	c
	gar	/tmp/input.foo	/^    def gar(gaz):$/;"	d	class:goo
	goo	/tmp/input.foo	/^class goo:$/;"	c


Example 2:

.. code-block:: c

	// in /tmp/input.pp
	class foo {
		int bar;
	}

.. code-block:: perl

	# in /tmp/pp.ctags:
	--langdef=pp
	--map-pp=+.pp

	--regex-pp=/^[[:blank:]]*\}//{scope=pop}{exclusive}
	--regex-pp=/^class[[:blank:]]*([[:alnum:]]+)[[[:blank:]]]*\{/\1/c,class,classes/{scope=push}
	--regex-pp=/^[[:blank:]]*int[[:blank:]]*([[:alnum:]]+)/\1/v,variable,variables/{scope=ref}

.. code-block:: console

	$ ctags --options=/tmp/pp.ctags -o - /tmp/input.pp
	bar	/tmp/input.pp	/^    include bar$/;"	v	class:foo
	foo	/tmp/input.pp	/^class foo {$/;"	c


NOTE: This flag doesn't work well with ``--mline-regex-<LANG>=``.

Overriding the letter for file kind
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. Q: this was fixed in https://github.com/universal-ctags/ctags/pull/331
	so can we remove this section?

One of the built-in tag kinds in Universal-ctags is the ``F`` file kind.
Overriding the letter for file kind is not allowed in Universal-ctags.

.. warning::

	Don't use ``F`` as a kind letter in your parser. (See issue #317 on github)


Generating fully qualified tags automatically from scope information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If scope fields are filled properly with `{scope=...}` regex flags,
you can use the field values for generating fully qualified tags.
About the `{scope=..}` flag itself, see "FLAGS FOR --regex-<LANG>
OPTION" section of `ctags-optlib(7)` man page or
`Universal-ctags parser definition language <https://github.com/universal-ctags/ctags/blob/master/man/ctags-optlib.7.rst.in>`_.

Specify `{_autoFQTag}` to the end of ``--langdef=<LANG>`` option like
``-langdef=Foo{_autoFQTag}`` to make ctags generate fully qualified
tags automatically.

`.` is the default separator combining names into a fully qualified
tag. It is not customizable yet.

input.foo::

  class X
     var y
  end

foo.ctags::

  --langdef=foo{_autoFQTag}
  --map-foo=+.foo
  --kinddef-foo=c,class,classes
  --kinddef-foo=v,var,variables
  --regex-foo=/class ([A-Z]*)/\1/c/{scope=push}
  --regex-foo=/end///{placeholder}{scope=pop}
  --regex-foo=/[ \t]*var ([a-z]*)/\1/v/{scope=ref}

Output::

	$ u-ctags --quiet --options=NONE --options=./foo.ctags -o - input.foo
	X	input.foo	/^class X$/;"	c
	y	input.foo	/^	var y$/;"	v	class:X

	$ u-ctags --quiet --options=NONE --options=./foo.ctags --extras=+q -o - input.foo
	X	input.foo	/^class X$/;"	c
	X.y	input.foo	/^	var y$/;"	v	class:X
	y	input.foo	/^	var y$/;"	v	class:X


"X.y" is printed as a fully qualified tag when ``--extras=+q`` is given.


Multi-line pattern match
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We often need to scan multiple lines to generate a tag, whether due to
needing contextual information to decide whether to tag or not, or to
constrain generating tags to only certain cases, or to grab multiple
substrings to generate the tag name.

Universal-ctags has two ways to accomplish this: multi-line regex options,
and an experimental multi-table regex options described later.

The newly introduced ``--mline-regex-<LANG>`` is similar to ``--regex-<LANG>``
except the pattern is applied to the whole file's contents, not line by line.

This example is based on an issue #219 posted by @andreicristianpetcu:

.. code-block:: java

	// in input.java:

	@Subscribe
	public void catchEvent(SomeEvent e)
	{
	return;
	}

	@Subscribe
	public void
	recover(Exception e)
	{
	return;
	}

The above java code is similar to the Java `Spring <https://spring.io>`_
framework. The ``@Subscribe`` annotation is a keyword for the framework, and the
developer would like to have a tag generated for each method annotated with
``@Subscribe``, using the name of the method followed by a dash followed by the
type of the argument. For example the developer wants the tag name
``Event-SomeEvent`` generated for the first method shown above.

To accomplish this, the developer creates a :file:`spring.ctags` file with
the following:

.. code-block:: perl

	# in spring.ctags:
	--langdef=javaspring
	--map-javaspring:+.java
	--mline-regex-javaspring=/@Subscribe([[:space:]])*([a-z ]+)[[:space:]]*([a-zA-Z]*)\(([a-zA-Z]*)/\3-\4/s,subscription/{mgroup=3}
	--fields=+ln

And now using :file:`spring.ctags` the tag file has this:

.. code-block:: console

	$ ./ctags -o - --options=./spring.ctags input.java
	Event-SomeEvent	input.java	/^public void catchEvent(SomeEvent e)$/;"	s	line:2	language:javaspring
	recover-Exception	input.java	/^    recover(Exception e)$/;"	s	line:10	language:javaspring

Multiline pattern flags
......................................................................

.. note:: These flags also apply to the experimental ``--_mtable-regex-<LANG>``
	option described later.

``{mgroup=N}``

	This flag indicates the pattern should be applied to the whole file
	contents, not line by line. ``N`` is the number of a capture group in the
	pattern, which is used to record the line number location of the tag. In the
	above example ``3`` is specified. The start position of the regex capture
	group 3, relative to the whole file is used.

.. warning:: You **must** add an ``{mgroup=N}`` flag to the multi-line
	``--mline-regex-<LANG>`` option, even if the ``N`` is ``0`` (meaning the
	start position of the whole regex pattern). You do not need to add it for
	the multi-table ``--_mtable-regex-<LANG>``.

.. Q: isn't the above restriction really a bug? I think it is. I should fix it.


``{_advanceTo=N[start|end]}``

	A regex pattern is applied to whole file's contents iteratively. This long
	flag specifies from where the pattern should be applied in the next
	iteration for regex matching. When a pattern matches, the next pattern
	matching starts from the start or end of capture group ``N``. By default it
	advances to the end of of the whole match (i.e., ``{_advanceTo=0end}`` is
	the default).


	Let's think about following input
	::

	   def def abc

	Consider two sets of options, foo and bar.

	.. code-block:: perl

		# foo.ctags:
	   	--langdef=foo
	   	--langmap=foo:.foo
	   	--kinddef-foo=a,something,something
	   	--mline-regex-foo=/def *([a-z]+)/\1/a/{mgroup=1}


	.. code-block:: perl

		# bar.ctags:
		--langdef=bar
		--langmap=bar:.bar
		--kinddef-bar=a,something,something
		--mline-regex-bar=/def *([a-z]+)/\1/a/{mgroup=1}{_advanceTo=1start}

	*foo.ctags* emits following tags output::

	   def	input.foo	/^def def abc$/;"	a

	*bar.ctgs* emits following tags output::

	   def	input-0.bar	/^def def abc$/;"	a
	   abc	input-0.bar	/^def def abc$/;"	a

	``_advanceTo=1start`` is specified in *bar.ctags*.
	This allows ctags to capture "abc".

	At the first iteration, the patterns of both
	*foo.ctags* and *bar.ctags* match as follows
	::

		0   1       (start)
		v   v
		def def abc
			   ^
			   0,1  (end)

	"def" at the group 1 is captured as a tag in
	both languages. At the next iteration, the positions
	where the pattern matching is applied to are not the
	same in the languages.

	*foo.ctags*
	::

			   0end (default)
			   v
		def def abc


	*bar.ctags*
	::

			1start (as specified in _advanceTo long flag)
			v
		def def abc

	This difference of positions makes the difference of tags output.

	A more relevant use-case is when ``{_advanceTo=N[start|end]}`` is used in
	the experimental ``--_mtable-regex-<LANG>``, to "advance" back to the
	beginning of a match, so that one can generate multiple tags for the same
	input line(s).

.. note:: This flag doesn't work well with scope related flags and ``exclusive`` flags.


.. Q: this was previously titled "Byte oriented pattern matching...", presumably
	because it "matched against the input at the current byte position, not line".
	But that's also true for --mline-regex-<LANG>, as far as I can tell.

Advanced pattern matching with multiple regex tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note:: This is a highly experimental feature. This will not go into
	the man page of 6.0. But let's be honest, it's the most exciting feature!

In some cases, the ``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options are not
sufficient to generate the tags for a particular language. Some of the common
reasons for this are:

* To ignore commented lines or sections for the language file, so that
  tags aren't generated for symbols that are within the comments.
* To enter and exit scope, and use it for tagging based on contextual
  state or with end-scope markers that are difficult to match to their
  associated scope entry point.
* To support nested scopes.
* To change the pattern searched for, or the resultant tag for the same
  pattern, based on scoping or contextual location.
* To break up an overly complicated ``--mline-regex-<LANG>`` pattern into
  separate regex patterns, for performance or readability reasons.

To help handle such things, Universal-ctags has been enhanced with multi-table
regex matching. The feature is inspired by `lex`, the fast lexical analyzer
generator, which is a popular tool on Unix environments for writing parsers, and
`RegexLexer <http://pygments.org/docs/lexerdevelopment/>`_ of Pygments.
Knowledge about them will help you understand the new options.

The new options are:

``--_tabledef-<LANG>``

	Declares a new regex matching table of a given name for the language,
	as described in :ref:`tabledef`.

``--_mtable-regex-<LANG>``

	Adds a regex pattern and associated tag generation information and flags, to
	the given table, as described in :ref:`mtable_regex`.

``--_mtable-extend-<LANG>``

	Includes a previously-defined regex table to the named one.

The above will be discussed in more detail shortly.

First, let's explain the feature with an example. Consider a
imaginary language "`X`" has a similar syntax as JavaScript: "var" is
used as defining variable(s), , and "/\* ... \*/" is used for block
comments.

Here is our input, :file:`input.x`:

.. code-block:: java

   /* BLOCK COMMENT
   var dont_capture_me;
   */
   var a /* ANOTHER BLOCK COMMENT */, b;

We want ctags to capture ``a`` and ``b`` - but it is difficult to write a parser
that will ignore ``dont_capture_me`` in the comment with a classical regex
parser defined with ``--regex-<LANG>`` or ``--mline-regex-<LANG>``, because of
the block comments.

The ``--regex-<LANG>`` option only works on one line at a time, so cannnot know
``dont_capture_me`` is within comments. The ``--mline-regex-<LANG>`` could
do it in theory, but due to the greedy nature of the regex engine it is
impractical and potentially inefficient to do so, given that there could be
multiple block comments in the file, with `*` inside them, etc.

A parser written with multi-table regex, on the other hand, can capture only
``a`` and ``b`` safely. But it is more complicated to understand.

Here is a 1st version of :file:`X.ctags`:
::

   --langdef=X
   --map-X=.x
   --kinddef-X=v,var,variables

Not so interesting. It doesn't really *do* anything yet. It just creates a new
language named ``X``, for files ending with a :file:`.x` suffix, and defines a
new tag for variable kinds.

When writing a multi-table parser, you have to think about the necessary states
of parsing. For the parser of language ``X``, we need the following states:

* `toplevel` (initial state)
* `comment` (inside comment)
* `vars` (var statements)

.. _tabledef:

Declaring a new regex table
......................................................................

Before adding regular expressions, you have to declare tables for each state
with the ``--_tabledef-<LANG>=<TABLE>`` option.

Here is the 2nd version of :file:`X.ctags` doing so:
::

   --langdef=X
   --map-X=.x
   --kinddef-X=v,var,variables

   --_tabledef-X=toplevel
   --_tabledef-X=comment
   --_tabledef-X=vars

For table names, only characters in the range ``[0-9a-zA-Z_]`` are acceptable.

For a given language, for each file's input the ctags multi-table parser begins
with the *first* declared table. For :file:`X.ctags`, ``toplevel`` is the one.
The other tables are only ever entered/checked if another table specified to do
so, starting with the first table. In other words, if the first declared table
does not find a match for the current input, and does not specify to go to
another table, the other tables for that language won't be used. The flags to go
to another table are ``{tenter}``, ``{tleave}``, and ``{tjump}``, as described
later.

.. _mtable_regex:

Adding a regex to a regex table
......................................................................

The new option to add a regex to a declared table is ``--_mtable-regex-<LANG>``,
and it follows this form:

.. code-block:: perl

	--_mtable-regex-<LANG>=<TABLE>/<PATTERN>/<NAME>/[<KIND>]/LONGFLAGS

The parameters for ``--_mtable-regex-<LANG>`` look complicated. However,
``<PATTERN>``, ``<NAME>``, and ``<KIND>`` are the same as the parameters of the
``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options. ``<TABLE>`` is simply
the name of a table previously declared with the ``--_tabledef-<LANG>`` option.

A regex pattern added to a parser with ``--_mtable-regex-<LANG>`` is matched
against the input at the current byte position, not line. Even if you do not
specify the ``^`` anchor at the start of the pattern, ``ctags`` adds ``^`` to
the pattern automatically. Unlike the ``--regex-<LANG>`` and
``--mline-regex-<LANG>`` options, a ``^`` anchor does not mean "begging of
line" in ``--_mtable-regex-<LANG>``; instead it means the beginning of the
input string (i.e., the current byte position).

The ``LONGFLAGS`` include the already discussed flags for ``--regex-<LANG>`` and
``--mline-regex-<LANG>``: ``{scope=...}``, ``{mgroup=N}``, ``{_advanceTo=N}``,
``{basic}``, ``{extend}``, and ``{icase}``. The ``{exclusive}`` flag does not
make sense for multi-table regex.

In addition, several new flags are introduced exclusively for multi-table
regex use:

``{tenter}``

	Push the current table on the stack, and enter another table.

``{tleave}``

	Leave the current table, pop the stack, and go to the table that was
	just popped from the stack.

``{tjump}``

	Jump to another table, without affecting the stack.

``{treset}``

	Clear the stack, and go to another table.

``{tquit}``

	Clear the stack, and stop processing the current input file for this
	language.

To explain the above new flags, we'll continue using our example in the
next section.

Skipping block comments
......................................................................

Let's continue with our example. Here is the 3rd version of :file:`X.ctags`:

.. code-block:: perl

   --langdef=X
   --map-X=.x
   --kinddef-X=v,var,variables

   --_tabledef-X=toplevel
   --_tabledef-X=comment
   --_tabledef-X=vars

   --_mtable-regex-X=toplevel/\/\*//{tenter=comment}
   --_mtable-regex-X=toplevel/.//

   --_mtable-regex-X=comment/\*\///{tleave}
   --_mtable-regex-X=comment/.//

Four ``--_mtable-regex-X`` lines are added for skipping the block comments. Let's
discuss them one by one.

For each new file it scans, ``ctags`` always chooses the first pattern of the
first table of the parser. Even if it's an empty table, ``ctags`` will only try
the first declared table. (in such a case it would immedietaly fail to match
anything, and thus stop proessing the input file and effectively do nothing)

The first declared table (``toplevel``) has the following regex added to
it first:

.. code-block:: perl

	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}

A pattern of ``\/\*`` is added to the ``toplevel`` table, to match the
beginning of a block comment. A backslash character is used in front of the
leading ``/`` to escape the separation character ``/`` that separates the fields
of ``--_mtable-regex-<LANG>``. Another backslash inside the pattern is used
before the asterisk ``*``, to make it a literal asterisk character in regex.

The last ``//`` means ``ctags`` should not tag something matching this pattern.
In ``--regex-<LANG>`` you never use ``//`` because it would be pointless to
match something and not tag it using and single-line ``--regex-<LANG>``; in
multi-line ``--mline-regex-<LANG>`` you rarely see it, because it would rarely
be useful. But in multi-table regex it's quite common, since you frequently
want to transition from one state to another (i.e., ``tenter`` or ``tjump``
from one table to another).

The long flag added to our first regex of our first table is ``tenter``, which
is a long flag for switching the table and pushing on the stack. ``{tenter=comment}``
means "switch the table from toplevel to comment".

So given the input file :file:`input.x` shown earlier, ``ctags`` will begin at
the ``toplevel`` table and try to match the first regex. It will succeed, and
thus push on the stack and go to the ``comment`` table.

It will begin at the top of the ``comment`` table (it always begins at the top
of a given table), and try each regex line in sequence until it finds a match.
If it fails to find a match, it will pop the stack and go to the table that was
just popped from the stack, and begin trying to match at the top of *that* table.
If it continues failing to find a match, and ultimately reaches the end of the
stack, it will stop processing for this file. For the next input file, it will
begin again from the top of the first declared table.

Getting back to our example, the top of the ``comment`` table has this regex:

.. code-block:: perl

	--_mtable-regex-X=comment/\*\///{tleave}

Similar to the previous ``toplevel`` table pattern, this one for ``\*\/`` uses
a backslash to escape the separator ``/``, as well as one before the ``*`` to
make it a literal asterisk in regex. So what it's looking for, from a simple
string perspective, is the sequence ``*/``. Note that this means even though
you see three backslashes ``///`` at the end, the first one is escaped and used
for the pattern itself, and the ``--_mtable-regex-X`` only has ``//`` to
separate the regex pattern from the long flags, instead of the usual ``///``.
Thus it's using the shorthand form of the ``--_mtable-regex-X`` option.
It could instead have been:

.. code-block:: perl

	--_mtable-regex-X=comment/\*\////{tleave}

The above would have worked exactly the same.

Getting back to our example, remember we're looking at the :file:`input.x`
file, currently using the ``comment`` table, and trying to match the first
regex of that table, shown above, at the following location::

	   ,ctags is trying to match starting here
	  v
	/* BLOCK COMMENT
	var dont_capture_me;
	*/
	var a /* ANOTHER BLOCK COMMENT */, b;

The pattern doesn't match for the position just after ``/*``, because that
position is a space character. So ``ctags`` tries the next pattern in the same
table:

.. code-block:: perl

	--_mtable-regex-X=comment/.//

This pattern matches any any one character including newline; the current
position moves one character forward. Now the character at the current position is
``B``. The first pattern of the table ``*/`` still does not match with the input. So
``ctags`` uses next pattern again. When the current position moves to the ``*/``
of the 3rd line of :file:`input.x`, it will finally match this:

.. code-block:: perl

	--_mtable-regex-X=comment/\*\///{tleave}

In this pattern, the long flag ``{tleave}`` is specified. This triggers table
switching again. ``{tleave}`` makes ``ctags`` switch the table back to the last
table used before doing ``{tenter}``. In this case, ``toplevel`` is the table.
``ctags`` manages a stack where references to tables are put. ``{tenter}`` pushes
the current table to the stack. ``{tleave}`` pops the table at the top of the
stack and chooses it.

So now ``ctags`` is back to the ``toplevel`` table, and tries the first regex
of that table, which was this:

.. code-block:: perl

	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}

It tries to match that against its current position, which is now the
newline on line 3, between the ``*/`` and the word ``var``::

	/* BLOCK COMMENT
	var dont_capture_me;
	*/ <--- ctags is now at this newline (/n) character
	var a /* ANOTHER BLOCK COMMENT */, b;

The first regex of the ``toplevel`` table does not match a newline, so it tries
the second regex:

.. code-block:: perl

	--_mtable-regex-X=toplevel/.//

This matches a newline successfully, but has no actions to perform. So ``ctags``
moves one character forward (the newline it just matched), and goes back to the
top of the ``toplevel`` table, and tries the first regex again. Eventually we'll
reach the beginning of the second block comment, and do the same things as before.

When ``ctags`` finally reaches the end of the file (the position after ``b;``),
it will not be able to match either the first or second regex of the
``toplevel`` table, and quit processing the input file.

So far, we've successfully skipped over block comments for our new ``X``
language, but haven't generated any tags. The point of ``ctags`` is to generate
tags, not just keep your computer warm. So now let's move onto actually tagging
variables...


Capturing variables in a sequence
......................................................................

Here is the 4th version of :file:`X.ctags`:

.. code-block:: perl

	--langdef=X
	--map-X=.x
	--kinddef-X=v,var,variables

	--_tabledef-X=toplevel
	--_tabledef-X=comment
	--_tabledef-X=vars

	--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
	# NEW
	--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
	--_mtable-regex-X=toplevel/.//

	--_mtable-regex-X=comment/\*\///{tleave}
	--_mtable-regex-X=comment/.//

	# NEW
	--_mtable-regex-X=vars/;//{tleave}
	--_mtable-regex-X=vars/\/\*//{tenter=comment}
	--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
	--_mtable-regex-X=vars/.//

One pattern in ``toplevel`` was added, and a new table ``vars`` with four
patterns was also added.

The new regex in ``toplevel`` is this:

.. code-block:: perl

	--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}

The purpose of this being in `toplevel` is to switch to the `vars` table when
the keyword ``var`` is found in the input stream. We need to switch states
(i.e., tables) because we can't simply capture the variables ``a`` and ``b``
with a single regex pattern in the ``toplevel`` table, because there might be
block comments inside the ``var`` statement (as there are in our
:file:`input.x`), and we also need to create *two* tags: one for ``a`` and one
for ``b``, even though the word ``var`` only appears once. In other words, we
need to "remember" that we saw the keyword ``var``, when we later encounter the
names ``a`` and ``b``, so that we know to tag each of them; and saving that
"in-variable-statement" state is accomplished by switching tables to the
``vars`` table.

The first regex in our new ``vars`` table is:

.. code-block:: perl

	--_mtable-regex-X=vars/;//{tleave}

This pattern is used to match a single semi-colon ``;``, and if it matches
pop back to the ``toplevel`` table using the ``{tleave}`` long flag. We
didn't have to make this the first regex pattern, because it doesn't overlap
with any of the other ones other than the ``/.//`` last one (which must be
last for this example to work).

The second regex in our ``vars`` table is:

.. code-block:: perl

	--_mtable-regex-X=vars/\/\*//{tenter=comment}

We need this because block comments can be in variable definitions::

   var a /* ANOTHER BLOCK COMMENT */, b;

So to skip block comments in such a position, the pattern ``\/\*`` is used just
like it was used in the ``toplevel`` table: to find the literal ``/*`` beginning
of the block comment and enter the ``comment`` table. Because we're using
``{tenter}`` and ``{tleave}`` to push/pop from a stack of tables, we can
use the same ``comment`` table for both ``toplevel`` and ``vars`` to go to,
because ``ctags`` will "remember" the previous table and ``{tleave}`` will
pop back to the right one.

The third regex in our ``vars`` table is:

.. code-block:: perl

	--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/

This is nothing special, but is the one that actually tags something: it
captures the variable name and uses it for generating a ``variable`` (shorthand
``v``) tag kind.

The last regex in the ``vars`` table we've seen before:

.. code-block:: perl

	--_mtable-regex-X=vars/.//

This makes ``ctags`` ignore any other characters, such as whitespace or the
comma ``,``.


Running our example
......................................................................

.. code-block:: console

	$ cat input.x
	/* BLOCK COMMENT
	var dont_capture_me;
	*/
	var a /* ANOTHER BLOCK COMMENT */, b;

	$ u-ctags -o - --fields=+n --options=X.ctags input.x
	u-ctags -o - --fields=+n --options=X.ctags input.x
	a	input.x	/^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;"	v	line:4
	b	input.x	/^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;"	v	line:4

It works!

You can find additional examples of multi-table regex in our github repo, under
the ``optlib`` directory. For example ``puppetManifest.ctags`` is a serious
example. It is the primary parser for testing multi-table regex parsers, and
used in the actual ``ctags`` program for parsing puppet manifest files.


.. this "extras" section should probably be moved up this document, as a
	subsection in the "Regex option argument flags" section

.. _extras:

Conditional tagging with extras
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. NOT REVIEWED YET

If a matched pattern should only be tagged when an ``extra`` is enabled, mark
the pattern with ``{_extra=XNAME}``. ``XNAME`` is the name of extra. You must
define an ``XNAME`` with the ``--_extradef-<LANG>=XNAME,DESCRIPTION`` option
before defining a regex option marked ``{_extra=XNAME}``.

.. code-block:: python

	if __name__ == '__main__':
		do_something()

To capture above lines in a python program(*input.py*), an extra can be used.

.. code-block:: perl

	--_extradef-Python=main,__main__ entry points
	--regex-Python=/^if __name__ == '__main__':/__main__/f/{_extra=main}

The above optlib(*python-main.ctags*) introduces ``main`` extra to Python parser.
The pattern matching is done only when the ``main`` is enabled.

.. code-block:: console

	$ ./ctags --options=python-main.ctags -o - --extras-Python='+{main}' input.py
	__main__	input.py	/^if __name__ == '__main__':$/;"	f


.. this "fields" section should probably be moved up this document, as a
	subsection in the "Regex option argument flags" section

.. _fields:

Adding custom fields to the tag output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. NOT REVIEWED YET

Exuberant-ctags allows one of the specified group in a regex pattern can be
used as a part of the name of a tagEntry. Universal-ctags offers using
the other groups in the regex pattern.

An optlib parser can have its own fields. The groups can be used as a
value of the fields of a tagEntry.

Let's think about *Unknown*, an imaginary language.
Here is a source file(``input.unknown``) written in *Unknown*:

	public func foo(n, m);
	protected func bar(n);
	private func baz(n,...);

With `--regex-Unknown=...` Exuberant-ctags can capture `foo`, `bar`, and `baz`
as names. Universal-ctags can attach extra context information to the
names as values for fields. Let's focus on `bar`. `protected` is a
keyword to control how widely the identifier `bar` can be accessed.
`(n)` is the parameter list of `bar`. `protected` and `(n)` are
extra context information of `bar`.

With following optlib file(``unknown.ctags``)), ``ctags`` can attach
`protected` to protection field and `(n)` to signature field.

.. code-block:: perl

	--langdef=unknown
	--kinddef-unknown=f,func,functions
	--map-unknown=+.unknown

	--_fielddef-unknown=protection,access scope
	--_fielddef-unknown=signature,signatures

	--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}

	--fields-unknown=+'{protection}{signature}'

For the line `protected func bar(n);` you will get following tags output::

	bar	input.unknown	/^protected func bar(n);$/;"	f	protection:protected	signature:(n)

Let's see the detail of ``unknown.ctags``.

.. code-block:: perl

	--_fielddef-unknown=protection,access scope

``--_fielddef-<LANG>=name,description`` defines a new field for a parser
specified by `<LANG>`.  Before defining a new field for the parser,
the parser must be defined with ``--langdef=<LANG>``. `protection` is
the field name used in tags output. `access scope` is the description
used in the output of ``--list-fields`` and ``--list-fields=Unknown``.

.. code-block:: perl

	--_fielddef-unknown=signature,signatures

This defines a field named `signature`.

.. code-block:: perl

	--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}

This option requests making a tag for the name that is specified with the group 3 of the
pattern, attaching the group 1 as a value for `protection` field to the tag, and attaching
the group 4 as a value for `signature` field to the tag. You can use the long regex flag
`_field` for attaching fields to a tag with following notation rule::

  {_field=FIELDNAME:GROUP}


``--fields-<LANG>=[+|-]{FIELDNAME}`` can be used to enable or disable specified field.

When defining a new parser own field, it is disabled by default. Enable the
field explicitly to use the field. See :ref:`Parser own fields <parser-own-fields>`
about `--fields-<LANG>` option.

`passwd` parser is a simple example that uses ``--fields-<LANG>`` option.


.. this "roles" section should probably be moved up this document, as a
	subsection in the "Regex option argument flags" section

.. _roles:

Capturing reference tags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. NOT REVIEWED YET

To capture a reference tag with an optlib parser, specify a role with
`_role` long regex flag. Let's see an example:

.. code-block:: perl

	--langdef=FOO
	--kinddef-FOO=m,module,modules
	--_roledef-FOO=m.imported,imported module
	--regex-FOO=/import[ \t]+([a-z]+)/\1/m/{_role=imported}
	--extras=+r
	--fields=+r

See the line, `--regex-FOO=...`.  In this parser `FOO`, a name of
imported module is captured as a reference tag with role `imported`.
A role must be defined before specifying it as value for `_role` flag.
`--_roledef-<LANG>` option is for defining a role.

The parameter of the option comes from three components: a kind
letter, the name of role, and the description of role. The kind letter
comes first.  Following a period, give the role name. The period
represents that the role is defined under the kind specified with the
kind letter.  In the example, `imported` role is defined under
`module` kind specified with `m`.

Of course, the kind specified with the kind letter must be defined
before using `--_roledef-<FOO>` option. `--kinddef-<LANG>` option
is for defining a kind.

The roles are listed with `--list-roles=<LANG>`. The name and
description passed to `--_roledef-<LANG>` option are used in
the output like::

	$ ./ctags --langdef=FOO --kinddef-FOO=m,module,modules \
				--_roledef-FOO='m.imported,imported module' --list-roles=FOO
	#KIND(L/N) NAME     ENABLED DESCRIPTION
	m/module   imported on      imported module


With specifying `_role` regex flag multiple times with different
roles, you can assign multiple roles to a reference tag.
See following input of C language

.. code-block:: C

   i += 1;

An ultra fine grained C parser may capture a variable `i` with
`lvalue` and `incremented`. You can do it with:

.. code-block:: perl

	--_roledef-C=v.lvalue,locator values
	--_roledef-C=v.incremented,incremeted with ++ operator
	--regex-C=/([a-zA-Z_][a-zA-Z_0-9])+ *+=/\1/v/{_role=lvalue}{_role=incremeted}


Submitting an optlib file to the Universal-ctags project
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You are encouraged to submit your :file:`.ctags` file to our github through
a pull request.

Universal-ctags provides a facility for "Option library".
Read "Option library" about the concept and usage first.

Here I will explain how to merge your .ctags into universal-ctags as
part of option library. Here I assume you consider contributing
an option library in which a regex based language parser is defined.
See `How to Add Support for a New Language to Exuberant Ctags (EXTENDING)`_
about the way to how to write a regex based language parser. In this
section I explains the next step.

.. _`How to Add Support for a New Language to Exuberant Ctags (EXTENDING)`: http://ctags.sourceforge.net/EXTENDING.html

I use Swine as the name of programming language which your parser
deals with. Assume source files written in Swine language have a suffix
*.swn*. The file name of option library is *swine.ctags*.


Copyright notice, contact mail address and license term
......................................................................

Put these information at the header of *swine.ctags*.

An example taken from *data/optlib/ctags.ctags* ::

	#
	#
	#  Copyright (c) 2014, Red Hat, Inc.
	#  Copyright (c) 2014, Masatake YAMATO
	#
	#  Author: Masatake YAMATO <yamato@redhat.com>
	#
	# This program is free software; you can redistribute it and/or
	# modify it under the terms of the GNU General Public License
	# as published by the Free Software Foundation; either version 2
	# of the License, or (at your option) any later version.
	#
	# This program is distributed in the hope that it will be useful,
	# but WITHOUT ANY WARRANTY; without even the implied warranty of
	# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	# GNU General Public License for more details.
	#
	# You should have received a copy of the GNU General Public License
	# along with this program; if not, write to the Free Software
	# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
	# USA.
	#
	#
	...

"GPL version 2 or later version" is needed here.  Option file is not
linked to ``ctags`` command. However, I have a plan to write a translator
which generates *.c* file from a given option file. As the result the
*.c* file is built into ``ctags`` command. In such a case "GPL version 2
or later version" may be required.

*Units* test cases
......................................................................

We, universal-ctags developers don't have enough time to learn all
languages supported by ``ctags``. In other word, we cannot review the
code. Only test cases help us to know whether a contributed option
library works well or not. We may reject any contribution without
a test case.

Read "Using *Units*" about how to write *Units* test
cases.  Don't write one big test case. Some smaller cases are helpful
to know about the intent of the contributor.

* *Units/sh-alias.d*
* *Units/sh-comments.d*
* *Units/sh-quotes.d*
* *Units/sh-statements.d*

are good example of small test cases.
Big test cases are good if smaller test cases exist.

See also *parser-m4.r/m4-simple.d* especially *parser-m4.r/m4-simple.d/args.ctags*.
Your test cases need ``ctags`` having already loaded your option
library, swine.ctags. You must specify loading it in the
test case own *args.ctags*.

Assume your test name is *swine-simile.d*. Put ``--option=swine`` in
*Units/swine-simile.d/args.ctags*.

Makefile.in
......................................................................
Add your optlib file, *swine.ctags* to ``PRELOAD_OPTLIB`` variable of
*Makefile.in*.


If you don't want your optlib loaded automatically when ``ctags`` starts up,
put your optlib file into ``OPTLIB`` of *Makefile.in* instead of
``PRELOAD_OPTLIB``.

Verification
......................................................................

Let's verify all your work here.

1. Run the tests and check whether your test case is passed or failed::

	$ make units

2. Verify your files are installed as expected::

	$ mkdir /tmp/tmp
	$ ./configure --prefix=/tmp/tmp
	$ make
	$ make install
	$ /tmp/tmp/ctags -o - --option=swine something_input.swn


Pull-request
......................................................................

Please, consider submitting your well written optlib parser to
Universal-ctags. Your *.ctags* is a treasure and can be shared as a
first class software component in Universal-ctags.

Pull-requests are welcome.