1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525
|
.. _optlib:
Extending ctags with Regex parser (*optlib*)
---------------------------------------------------------------------
:Maintainer: Masatake YAMATO <yamato@redhat.com>
.. TODO:
review extras, fields, and roles sections
possibly restructure this file's section ordering
add documentation for --_mtable-extend-<LANG>
add documentation for tjump, treset, tquit flags
add a section on debugging
add a section on langdef base parser flag, including
shared/dedicated/bidirectional directions
----
.. Q: shouldn't the section about option files (preload especially) go in
their own section somewhere else in the docs? They're not specifically
for "Extending ctags" - they can be used for any command options that
you want to use permanently. It's really the new language parsers using
--regex-<LANG> and such that are about "Extending ctags", no?
Option files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An "option" file is a file in which command line options are written line
by line. ``ctags`` loads it and runs as if the options in the file were
passed in command line.
Following file is an example of option file.
.. code-block:: python
# Exclude directories that don't contain real code
--exclude=Units
# indentation is ignored
--exclude=tinst-root
--exclude=Tmain
`#` can be used as a start marker of a line comment.
Whitespaces at the start of lines are ignored during loading.
There are two categories of option files, though they both contain command
line options: **preload** and **optlib** option files.
.. Q: do we really want to call the non-preload option files "optlib"?
That name seems like an internal detail. Users of ctags never see that
name anywhere except in these docs, and it's weird. How about
"specified" option files, or "requested" or some such? (i.e., the file
is explicitly specified or requested when ctags is run)
Preload option file
......................................................................
Preload option files are option files loaded by ``ctags`` automatically
at start-up time. Which files are loaded at start-up time are very different
from Exuberant-ctags.
At start-up time, Universal-ctags loads files having :file:`.ctags` as a
file extension under the following statically defined directories:
#. :file:`$HOME/.ctags.d`
#. :file:`$HOMEDRIVE$HOMEPATH/.ctags.d` (in ``Windows``)
#. :file:`.ctags.d`
#. :file:`ctags.d`
``ctags`` visits the directories in the order listed above for preloading files.
``ctags`` loads files having :file:`.ctags` as file extension in alphabetical
order (strcmp(3) is used for comparing, so for example
:file:`.ctags.d/ZZZ.ctags` will be loaded *before* :file:`.ctags.d/aaa.ctags`).
Quoted from man page of Exuberant-ctags::
FILES
/ctags.cnf (on MSDOS, MSWindows only)
/etc/ctags.conf
/usr/local/etc/ctags.conf
$HOME/.ctags
$HOME/ctags.cnf (on MSDOS, MSWindows only)
.ctags
ctags.cnf (on MSDOS, MSWindows only)
If any of these configuration files exist, each will
be expected to contain a set of default options
which are read in the order listed when ctags
starts, but before the CTAGS environment variable is
read or any command line options are read. This
makes it possible to set up site-wide, personal or
project-level defaults. It is possible to compile
ctags to read an additional configuration file
before any of those shown above, which will be
indicated if the output produced by the --version
option lists the "custom-conf" feature. Options
appearing in the CTAGS environment variable or on
the command line will override options specified in
these files. Only options will be read from these
files. Note that the option files are read in
line-oriented mode in which spaces are significant
(since shell quoting is not possible). Each line of
the file is read as one command line parameter (as
if it were quoted with single quotes). Therefore,
use new lines to indicate separate command-line
arguments.
What follows explains the differences and their intentions...
Directory oriented configuration management
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Exuberant-ctags provides a way to customize ctags with options like
``--langdef=<LANG>`` and ``--regex-<LANG>``. These options are
powerful and make ctags popular for programmers.
Universal-ctags extends this idea; we have added new options for
defining a parser, and have extended existing options. Defining
a new parser with the options is more than "customizing" in
Universal-ctags.
To make it easier to maintain a parser defined using the options, you can put
each parser language in a different options file. Universal-ctags doesn't
preload a single file. Instead, Universal-ctags loads all files having the
:file:`.ctags` extension under the previously specified directories. If you have
multiple parser definitions, put them in different files.
Avoiding option incompatibility issues
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
The Universal-ctags options are different from those of Exuberant-ctags,
therefore Universal-ctags doesn't load any of the files Exuberant-ctags loads at
start-up. Otherwise there would be incompatibility issues if Exuberant-ctags
loaded an option file that used a newly introduced option in Universal-ctags,
and vice versa.
No system wide configuration
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
To make the preload path list short and because it was rarely ever used,
Universal-ctags does not load any option files for system wide configuration.
(i.e., no :file:`/etc/ctags.d`)
Use :file:`.ctags` for the file extension
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Extensions :file:`.cnf` and :file:`.conf` are obsolete.
Use the unified extension :file:`.ctags` only.
Optlib option file
......................................................................
From a syntax perspective, there is no difference between optlib option files
and preload option files; ``ctags`` options are written line by line in a file.
Optlib option files are option files not loaded at start-up time
automatically. To load an optlib option file, specify a pathname
for an optlib option file with ``--options=PATHNAME`` option
explicitly. The pathname can be just the filename if it's in the
current directory.
Exuberant-ctags has the ``--options`` option, but you can only specify a
single file to load. Universal-ctags extends the option two aspects: you
can specify a directory to load all files in that directory, and you can
specify a path search list to look in. See next section for details.
Specifying a directory
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
If you specify a directory instead of a file as the argument for the
``--options=PATHNAME``, Universal-ctags will load all files having a
:file:`.ctags` extension under the directory in alphabetical order.
Specifying an optlib path search list
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
For loading a file (or directory) specified in ``--options=PATHNAME``,
``ctags`` searches "optlib path list" first if the option argument
(PATHNAME) doesn't start with '``/``' or '``.``'. If ``ctags`` finds a
file, ``ctags`` loads it.
If ``ctags`` doesn't find a file in the path list, ``ctags`` loads
a file (or directory) at the specified pathname.
By default, optlib path list is empty. To set or add a directory
path to the list, use ``--optlib-dir=PATH``.
For setting (adding one after clearing)::
--optlib-dir=PATH
For adding::
--optlib-dir=+PATH
Tips for writing an option file
......................................................................
* Use ``--quiet --options=NONE`` to disable preloading.
.. IN MAN PAGE
* Two options are introduced for debugging the process of loading
option files.
``--_echo=MSG``
Prints MSG to standard error immediately.
``--_force-quit=[NUM]``
Exit immediately with the status of the specified NUM.
* Universal-ctags has an ``optlib2c`` script that translates an option file
into C source code. Your optlib parser can thus easily become a built-in parser,
by contributing to Universal-ctags' github. You could be famous!
Examples are in the ``optlib`` directory in Universal-ctags source tree.
Regular expression (regex) engine
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Universal-ctags currently uses the same regex engine as Exuberant-ctags does:
the POSIX.2 regex engine in GNU glibc-2.10.1. By default it uses the Extended
Regular Expressions (ERE) syntax, as used by most engines today; however it does
*not* support many of the "modern" extensions such as lazy captures,
non-capturing grouping, atomic grouping, possessive quantifiers, look-ahead/behind,
etc. It is also notoriously slow when backtracking, and has some known "quirks"
with respect to escaping special characters in bracket expressions.
For example, a pattern of ``[^\]]+`` is invalid in POSIX.2, because the ``]`` is
*not* special inside a bracket expression, and thus should **not** be escaped.
Most regex engines ignore this subtle detail in POSIX.2, and instead allow
escaping it with ``\]`` inside the bracket expression and treat it as the
literal character ``]``. GNU glibc, however, does not generate an error but
instead considers it undefined behavior, and in fact it will match very odd
things. Instead you **must** use the more unintuitive ``[^]]+`` syntax. The same
is technically true of other special characters inside a bracket expression,
such as ``[^\)]+``, which should instead be ``[^)]+``. The ``[^\)]+`` will
appear to work usually, but only because what it is really doing is matching any
character but ``\`` *or* ``)``. The only exceptions for using ``\`` inside a
bracket expression are for ``\t`` and ``\n``, which ctags converts to their
single literal character control codes before passing the pattern to glibc.
Another detail to keep in mind is how the regex engine treats newlines.
Universal-ctags compiles the regular expressions in the ``--regex-<LANG>`` and
``--mline-regex-<LANG>`` options with REG_NEWLINE set. What that means is documented
in the
`POSIX spec <http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html>`_.
One obvious effect is that the regex special dot any-character ``.`` does not match
newline characters, the ``^`` anchor *does* match right after a newline, and
the ``$`` anchor matches right before a newline. A more subtle issue is this text from the
`Regular Expressions chapter <http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html>`_:
"the use of literal <newline>s or any escape sequence equivalent produces undefined
results". What that means is using a regex pattern with ``[^\n]+`` is invalid,
and indeed in glibc produces very odd results. **Never** use ``\n`` in patterns
for ``--regex-<LANG>``, and never use them in non-matching bracket expressions
for ``--mline-regex-<LANG>`` patterns. For the experimental ``--_mtable-regex-<LANG>``
you can safely use ``\n`` because that regex is not compiled with REG_NEWLINE.
You should always test your regex patterns against test files with strings that
do and do not match. Pay particular emphasis to when it should *not* match, and
how *much* it matches when it should. A common error is forgetting that a
POSIX.2 ERE engine is always greedy; the `*` and `+` quantifiers match
as much as possible, before backtracking from the end of their match.
For example this pattern::
foo.*bar
Will match this **entire** string, not just the first part::
foobar, bar, and even more bar
Regex option argument flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many regex-based options described in this document support additonal arguments
in the form of long flags. Long flags are specified with surrounding ``{`` and
``}``.
The general format and placement is as follows::
--regex-<LANG>=<PATTERN>/<NAME>/[<KIND>/]LONGFLAGS
Some examples:
.. code-block:: perl
--regex-Pod=/^=head1[ \t]+(.+)/\1/c/
--regex-Foo=/set=[^;]+/\1/v/{icase}
--regex-Man=/^\.TH[[:space:]]{1,}"([^"]{1,})".*/\1/t/{exclusive}{icase}{scope=push}
--regex-Gdbinit=/^#//{exclusive}
Note that the last example only has two ``/`` forward slashes following
the regex pattern, as a shortened form when no kind-spec exists.
The ``--mline-regex-<LANG>`` option also follows the above format. The
experimental ``--_mtable-regex-<LANG>`` option follows a slightly
modified version as well.
The ``--langdef=<LANG>`` option also supports long flags, but not using
forward-slash separators.
Regex control flags
......................................................................
.. Q: why even discuss the single-character version of the flags? Just
make everyone use the long form.
The regex matching can be controlled by adding flags to the ``--regex-<LANG>``,
``--mline-regex-<LANG>``, and experimental ``--_mtable-regex-<LANG>`` options.
This is done by either using the single character short flags ``b``, ``e`` and
``i`` flags as explained in the *ctags.1* man page, or by using long flags
described earlier. The long flags require more typing but are much more
readable.
The mapping between the older short flag names and long flag names is:
=========== =========== ===========
short flag long flag description
=========== =========== ===========
b basic Posix basic regular expression syntax.
e extend Posix extended regular expression syntax (default).
i icase Case-insensitive matching.
=========== =========== ===========
So the following ``--regex-<LANG>`` expression:
.. code-block:: perl
--regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d,definition/x
is the same as:
.. code-block:: perl
--regex-m4=/^m4_define\(\[([^]$\(]+).+$/\1/d,definition/{extend}
The characters ``{`` and ``}`` may not be suitable for command line
use, but long flags are mostly intended for option files.
Exclusive flag in regex
......................................................................
By default, lines read from the input files will be matched with **all** regular
expressions defined with ``--regex-<LANG>``. Each matched regular expression
will successfully emit a tag.
In some cases another policy, exclusive-matching, is preferable to the
all-matching policy. Exclusive-matching means the rest of regular
expressions are not tried if one of regular expressions is matched
successfully, for that input line.
For specifying exclusive-matching the flags ``exclusive`` (long) and ``x``
(short) were introduced. For example, this is used in
:file:`optlib/gdbinit.ctags` for ignoring comment lines in ``gdb`` files,
as follows:
.. code-block:: perl
--regex-Gdbinit=/^#//{exclusive}
Comments in gbd files start with ``#`` so the above line is the first regex
match line in :file:`gdbinit.ctags`, so that subsequent regex matches are
not tried for the input line.
If an empty name pattern(``//``) is used for the ``--regex-<LANG>`` option,
ctags warns it as a wrong usage of the option. However, if the flags
``exclusive`` or ``x`` is specified, the warning is suppressed.
NOTE: This flag does not make sense in the multi-line ``--mline-regex-<LANG>``
option nor the multi-table ``--_mtable-regex-<LANG>`` option.
Experimental flags
......................................................................
.. note:: These flags are experimental. They apply to all regex option
types: basic ``--regex-<LANG>``, multi-line ``--mline-regex-<LANG>``,
and the experimental multi-table ``--_mtable-regex-<LANG>`` option.
``_extra``
This flag indicates the tag should only be generated if the given
'extra' type is enabled, as explained in :ref:`extras`.
``_field``
This flag allows a regex match to add additional custom fields to the
generated tag entry, as explained in :ref:`fields`.
``_role``
This flag allows a regex match to generate a reference tag entry and
specify the role of the reference, as explained in :ref:`roles`.
Ghost kind in regex parser
......................................................................
.. Q: what is the point of documenting this?
If a whitespace is used as a kind letter, it is never printed when
ctags is called with ``--list-kinds`` option. This kind is
automatically assigned to an empty name pattern.
Normally you don't need to know this.
Scope tracking in a regex parser
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. IN MAN PAGE
With the ``scope`` long flag, you can record/track scope context.
A stack is used for tracking the scope context.
``{scope=push}``
Push the tag captured with a regex pattern to the top of the stack.
If you don't want to record this tag but just push, use
`placeholder` long option together.
``{scope=ref}``
Refer to the thing at the top of the stack as a scope where the tag captured
with a regex pattern is. The stack is not modified with this specification.
If the stack is empty, this flag is just ignored.
``{scope=pop}``
Pop the thing at the top of the stack.
If the stack is empty, this flag is just ignored.
``{scope=clear}``
Make the stack empty.
``{scope=set}``
Clear then push.
``{placeholder}``
Don't print a tag captured with a regex pattern to a tag file. This is
useful when you need to push non-named context information to the stack.
Well known non-named scope in C language is established with `{`. A non-
named scope never appears in tags file as a name or scope name. However,
pushing it is important to balance ``push`` and ``pop``.
Example 1:
.. code-block:: python
# in /tmp/input.foo
class foo:
def bar(baz):
print(baz)
class goo:
def gar(gaz):
print(gaz)
.. code-block:: perl
# in /tmp/foo.ctags:
--langdef=Foo
--map-Foo=+.foo
--regex-Foo=/^class[[:blank:]]+([[:alpha:]]+):/\1/c,class/{scope=set}
--regex-Foo=/^[[:blank:]]+def[[:blank:]]+([[:alpha:]]+).*:/\1/d,definition/{scope=ref}
.. code-block:: console
$ ctags --options=/tmp/foo.ctags -o - /tmp/input.foo
bar /tmp/input.foo /^ def bar(baz):$/;" d class:foo
foo /tmp/input.foo /^class foo:$/;" c
gar /tmp/input.foo /^ def gar(gaz):$/;" d class:goo
goo /tmp/input.foo /^class goo:$/;" c
Example 2:
.. code-block:: c
// in /tmp/input.pp
class foo {
int bar;
}
.. code-block:: perl
# in /tmp/pp.ctags:
--langdef=pp
--map-pp=+.pp
--regex-pp=/^[[:blank:]]*\}//{scope=pop}{exclusive}
--regex-pp=/^class[[:blank:]]*([[:alnum:]]+)[[[:blank:]]]*\{/\1/c,class,classes/{scope=push}
--regex-pp=/^[[:blank:]]*int[[:blank:]]*([[:alnum:]]+)/\1/v,variable,variables/{scope=ref}
.. code-block:: console
$ ctags --options=/tmp/pp.ctags -o - /tmp/input.pp
bar /tmp/input.pp /^ include bar$/;" v class:foo
foo /tmp/input.pp /^class foo {$/;" c
NOTE: This flag doesn't work well with ``--mline-regex-<LANG>=``.
Overriding the letter for file kind
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. Q: this was fixed in https://github.com/universal-ctags/ctags/pull/331
so can we remove this section?
One of the built-in tag kinds in Universal-ctags is the ``F`` file kind.
Overriding the letter for file kind is not allowed in Universal-ctags.
.. warning::
Don't use ``F`` as a kind letter in your parser. (See issue #317 on github)
Generating fully qualified tags automatically from scope information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If scope fields are filled properly with `{scope=...}` regex flags,
you can use the field values for generating fully qualified tags.
About the `{scope=..}` flag itself, see "FLAGS FOR --regex-<LANG>
OPTION" section of `ctags-optlib(7)` man page or
`Universal-ctags parser definition language <https://github.com/universal-ctags/ctags/blob/master/man/ctags-optlib.7.rst.in>`_.
Specify `{_autoFQTag}` to the end of ``--langdef=<LANG>`` option like
``-langdef=Foo{_autoFQTag}`` to make ctags generate fully qualified
tags automatically.
`.` is the default separator combining names into a fully qualified
tag. It is not customizable yet.
input.foo::
class X
var y
end
foo.ctags::
--langdef=foo{_autoFQTag}
--map-foo=+.foo
--kinddef-foo=c,class,classes
--kinddef-foo=v,var,variables
--regex-foo=/class ([A-Z]*)/\1/c/{scope=push}
--regex-foo=/end///{placeholder}{scope=pop}
--regex-foo=/[ \t]*var ([a-z]*)/\1/v/{scope=ref}
Output::
$ u-ctags --quiet --options=NONE --options=./foo.ctags -o - input.foo
X input.foo /^class X$/;" c
y input.foo /^ var y$/;" v class:X
$ u-ctags --quiet --options=NONE --options=./foo.ctags --extras=+q -o - input.foo
X input.foo /^class X$/;" c
X.y input.foo /^ var y$/;" v class:X
y input.foo /^ var y$/;" v class:X
"X.y" is printed as a fully qualified tag when ``--extras=+q`` is given.
Multi-line pattern match
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We often need to scan multiple lines to generate a tag, whether due to
needing contextual information to decide whether to tag or not, or to
constrain generating tags to only certain cases, or to grab multiple
substrings to generate the tag name.
Universal-ctags has two ways to accomplish this: multi-line regex options,
and an experimental multi-table regex options described later.
The newly introduced ``--mline-regex-<LANG>`` is similar to ``--regex-<LANG>``
except the pattern is applied to the whole file's contents, not line by line.
This example is based on an issue #219 posted by @andreicristianpetcu:
.. code-block:: java
// in input.java:
@Subscribe
public void catchEvent(SomeEvent e)
{
return;
}
@Subscribe
public void
recover(Exception e)
{
return;
}
The above java code is similar to the Java `Spring <https://spring.io>`_
framework. The ``@Subscribe`` annotation is a keyword for the framework, and the
developer would like to have a tag generated for each method annotated with
``@Subscribe``, using the name of the method followed by a dash followed by the
type of the argument. For example the developer wants the tag name
``Event-SomeEvent`` generated for the first method shown above.
To accomplish this, the developer creates a :file:`spring.ctags` file with
the following:
.. code-block:: perl
# in spring.ctags:
--langdef=javaspring
--map-javaspring:+.java
--mline-regex-javaspring=/@Subscribe([[:space:]])*([a-z ]+)[[:space:]]*([a-zA-Z]*)\(([a-zA-Z]*)/\3-\4/s,subscription/{mgroup=3}
--fields=+ln
And now using :file:`spring.ctags` the tag file has this:
.. code-block:: console
$ ./ctags -o - --options=./spring.ctags input.java
Event-SomeEvent input.java /^public void catchEvent(SomeEvent e)$/;" s line:2 language:javaspring
recover-Exception input.java /^ recover(Exception e)$/;" s line:10 language:javaspring
Multiline pattern flags
......................................................................
.. note:: These flags also apply to the experimental ``--_mtable-regex-<LANG>``
option described later.
``{mgroup=N}``
This flag indicates the pattern should be applied to the whole file
contents, not line by line. ``N`` is the number of a capture group in the
pattern, which is used to record the line number location of the tag. In the
above example ``3`` is specified. The start position of the regex capture
group 3, relative to the whole file is used.
.. warning:: You **must** add an ``{mgroup=N}`` flag to the multi-line
``--mline-regex-<LANG>`` option, even if the ``N`` is ``0`` (meaning the
start position of the whole regex pattern). You do not need to add it for
the multi-table ``--_mtable-regex-<LANG>``.
.. Q: isn't the above restriction really a bug? I think it is. I should fix it.
``{_advanceTo=N[start|end]}``
A regex pattern is applied to whole file's contents iteratively. This long
flag specifies from where the pattern should be applied in the next
iteration for regex matching. When a pattern matches, the next pattern
matching starts from the start or end of capture group ``N``. By default it
advances to the end of of the whole match (i.e., ``{_advanceTo=0end}`` is
the default).
Let's think about following input
::
def def abc
Consider two sets of options, foo and bar.
.. code-block:: perl
# foo.ctags:
--langdef=foo
--langmap=foo:.foo
--kinddef-foo=a,something,something
--mline-regex-foo=/def *([a-z]+)/\1/a/{mgroup=1}
.. code-block:: perl
# bar.ctags:
--langdef=bar
--langmap=bar:.bar
--kinddef-bar=a,something,something
--mline-regex-bar=/def *([a-z]+)/\1/a/{mgroup=1}{_advanceTo=1start}
*foo.ctags* emits following tags output::
def input.foo /^def def abc$/;" a
*bar.ctgs* emits following tags output::
def input-0.bar /^def def abc$/;" a
abc input-0.bar /^def def abc$/;" a
``_advanceTo=1start`` is specified in *bar.ctags*.
This allows ctags to capture "abc".
At the first iteration, the patterns of both
*foo.ctags* and *bar.ctags* match as follows
::
0 1 (start)
v v
def def abc
^
0,1 (end)
"def" at the group 1 is captured as a tag in
both languages. At the next iteration, the positions
where the pattern matching is applied to are not the
same in the languages.
*foo.ctags*
::
0end (default)
v
def def abc
*bar.ctags*
::
1start (as specified in _advanceTo long flag)
v
def def abc
This difference of positions makes the difference of tags output.
A more relevant use-case is when ``{_advanceTo=N[start|end]}`` is used in
the experimental ``--_mtable-regex-<LANG>``, to "advance" back to the
beginning of a match, so that one can generate multiple tags for the same
input line(s).
.. note:: This flag doesn't work well with scope related flags and ``exclusive`` flags.
.. Q: this was previously titled "Byte oriented pattern matching...", presumably
because it "matched against the input at the current byte position, not line".
But that's also true for --mline-regex-<LANG>, as far as I can tell.
Advanced pattern matching with multiple regex tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. note:: This is a highly experimental feature. This will not go into
the man page of 6.0. But let's be honest, it's the most exciting feature!
In some cases, the ``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options are not
sufficient to generate the tags for a particular language. Some of the common
reasons for this are:
* To ignore commented lines or sections for the language file, so that
tags aren't generated for symbols that are within the comments.
* To enter and exit scope, and use it for tagging based on contextual
state or with end-scope markers that are difficult to match to their
associated scope entry point.
* To support nested scopes.
* To change the pattern searched for, or the resultant tag for the same
pattern, based on scoping or contextual location.
* To break up an overly complicated ``--mline-regex-<LANG>`` pattern into
separate regex patterns, for performance or readability reasons.
To help handle such things, Universal-ctags has been enhanced with multi-table
regex matching. The feature is inspired by `lex`, the fast lexical analyzer
generator, which is a popular tool on Unix environments for writing parsers, and
`RegexLexer <http://pygments.org/docs/lexerdevelopment/>`_ of Pygments.
Knowledge about them will help you understand the new options.
The new options are:
``--_tabledef-<LANG>``
Declares a new regex matching table of a given name for the language,
as described in :ref:`tabledef`.
``--_mtable-regex-<LANG>``
Adds a regex pattern and associated tag generation information and flags, to
the given table, as described in :ref:`mtable_regex`.
``--_mtable-extend-<LANG>``
Includes a previously-defined regex table to the named one.
The above will be discussed in more detail shortly.
First, let's explain the feature with an example. Consider a
imaginary language "`X`" has a similar syntax as JavaScript: "var" is
used as defining variable(s), , and "/\* ... \*/" is used for block
comments.
Here is our input, :file:`input.x`:
.. code-block:: java
/* BLOCK COMMENT
var dont_capture_me;
*/
var a /* ANOTHER BLOCK COMMENT */, b;
We want ctags to capture ``a`` and ``b`` - but it is difficult to write a parser
that will ignore ``dont_capture_me`` in the comment with a classical regex
parser defined with ``--regex-<LANG>`` or ``--mline-regex-<LANG>``, because of
the block comments.
The ``--regex-<LANG>`` option only works on one line at a time, so cannnot know
``dont_capture_me`` is within comments. The ``--mline-regex-<LANG>`` could
do it in theory, but due to the greedy nature of the regex engine it is
impractical and potentially inefficient to do so, given that there could be
multiple block comments in the file, with `*` inside them, etc.
A parser written with multi-table regex, on the other hand, can capture only
``a`` and ``b`` safely. But it is more complicated to understand.
Here is a 1st version of :file:`X.ctags`:
::
--langdef=X
--map-X=.x
--kinddef-X=v,var,variables
Not so interesting. It doesn't really *do* anything yet. It just creates a new
language named ``X``, for files ending with a :file:`.x` suffix, and defines a
new tag for variable kinds.
When writing a multi-table parser, you have to think about the necessary states
of parsing. For the parser of language ``X``, we need the following states:
* `toplevel` (initial state)
* `comment` (inside comment)
* `vars` (var statements)
.. _tabledef:
Declaring a new regex table
......................................................................
Before adding regular expressions, you have to declare tables for each state
with the ``--_tabledef-<LANG>=<TABLE>`` option.
Here is the 2nd version of :file:`X.ctags` doing so:
::
--langdef=X
--map-X=.x
--kinddef-X=v,var,variables
--_tabledef-X=toplevel
--_tabledef-X=comment
--_tabledef-X=vars
For table names, only characters in the range ``[0-9a-zA-Z_]`` are acceptable.
For a given language, for each file's input the ctags multi-table parser begins
with the *first* declared table. For :file:`X.ctags`, ``toplevel`` is the one.
The other tables are only ever entered/checked if another table specified to do
so, starting with the first table. In other words, if the first declared table
does not find a match for the current input, and does not specify to go to
another table, the other tables for that language won't be used. The flags to go
to another table are ``{tenter}``, ``{tleave}``, and ``{tjump}``, as described
later.
.. _mtable_regex:
Adding a regex to a regex table
......................................................................
The new option to add a regex to a declared table is ``--_mtable-regex-<LANG>``,
and it follows this form:
.. code-block:: perl
--_mtable-regex-<LANG>=<TABLE>/<PATTERN>/<NAME>/[<KIND>]/LONGFLAGS
The parameters for ``--_mtable-regex-<LANG>`` look complicated. However,
``<PATTERN>``, ``<NAME>``, and ``<KIND>`` are the same as the parameters of the
``--regex-<LANG>`` and ``--mline-regex-<LANG>`` options. ``<TABLE>`` is simply
the name of a table previously declared with the ``--_tabledef-<LANG>`` option.
A regex pattern added to a parser with ``--_mtable-regex-<LANG>`` is matched
against the input at the current byte position, not line. Even if you do not
specify the ``^`` anchor at the start of the pattern, ``ctags`` adds ``^`` to
the pattern automatically. Unlike the ``--regex-<LANG>`` and
``--mline-regex-<LANG>`` options, a ``^`` anchor does not mean "begging of
line" in ``--_mtable-regex-<LANG>``; instead it means the beginning of the
input string (i.e., the current byte position).
The ``LONGFLAGS`` include the already discussed flags for ``--regex-<LANG>`` and
``--mline-regex-<LANG>``: ``{scope=...}``, ``{mgroup=N}``, ``{_advanceTo=N}``,
``{basic}``, ``{extend}``, and ``{icase}``. The ``{exclusive}`` flag does not
make sense for multi-table regex.
In addition, several new flags are introduced exclusively for multi-table
regex use:
``{tenter}``
Push the current table on the stack, and enter another table.
``{tleave}``
Leave the current table, pop the stack, and go to the table that was
just popped from the stack.
``{tjump}``
Jump to another table, without affecting the stack.
``{treset}``
Clear the stack, and go to another table.
``{tquit}``
Clear the stack, and stop processing the current input file for this
language.
To explain the above new flags, we'll continue using our example in the
next section.
Skipping block comments
......................................................................
Let's continue with our example. Here is the 3rd version of :file:`X.ctags`:
.. code-block:: perl
--langdef=X
--map-X=.x
--kinddef-X=v,var,variables
--_tabledef-X=toplevel
--_tabledef-X=comment
--_tabledef-X=vars
--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
--_mtable-regex-X=toplevel/.//
--_mtable-regex-X=comment/\*\///{tleave}
--_mtable-regex-X=comment/.//
Four ``--_mtable-regex-X`` lines are added for skipping the block comments. Let's
discuss them one by one.
For each new file it scans, ``ctags`` always chooses the first pattern of the
first table of the parser. Even if it's an empty table, ``ctags`` will only try
the first declared table. (in such a case it would immedietaly fail to match
anything, and thus stop proessing the input file and effectively do nothing)
The first declared table (``toplevel``) has the following regex added to
it first:
.. code-block:: perl
--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
A pattern of ``\/\*`` is added to the ``toplevel`` table, to match the
beginning of a block comment. A backslash character is used in front of the
leading ``/`` to escape the separation character ``/`` that separates the fields
of ``--_mtable-regex-<LANG>``. Another backslash inside the pattern is used
before the asterisk ``*``, to make it a literal asterisk character in regex.
The last ``//`` means ``ctags`` should not tag something matching this pattern.
In ``--regex-<LANG>`` you never use ``//`` because it would be pointless to
match something and not tag it using and single-line ``--regex-<LANG>``; in
multi-line ``--mline-regex-<LANG>`` you rarely see it, because it would rarely
be useful. But in multi-table regex it's quite common, since you frequently
want to transition from one state to another (i.e., ``tenter`` or ``tjump``
from one table to another).
The long flag added to our first regex of our first table is ``tenter``, which
is a long flag for switching the table and pushing on the stack. ``{tenter=comment}``
means "switch the table from toplevel to comment".
So given the input file :file:`input.x` shown earlier, ``ctags`` will begin at
the ``toplevel`` table and try to match the first regex. It will succeed, and
thus push on the stack and go to the ``comment`` table.
It will begin at the top of the ``comment`` table (it always begins at the top
of a given table), and try each regex line in sequence until it finds a match.
If it fails to find a match, it will pop the stack and go to the table that was
just popped from the stack, and begin trying to match at the top of *that* table.
If it continues failing to find a match, and ultimately reaches the end of the
stack, it will stop processing for this file. For the next input file, it will
begin again from the top of the first declared table.
Getting back to our example, the top of the ``comment`` table has this regex:
.. code-block:: perl
--_mtable-regex-X=comment/\*\///{tleave}
Similar to the previous ``toplevel`` table pattern, this one for ``\*\/`` uses
a backslash to escape the separator ``/``, as well as one before the ``*`` to
make it a literal asterisk in regex. So what it's looking for, from a simple
string perspective, is the sequence ``*/``. Note that this means even though
you see three backslashes ``///`` at the end, the first one is escaped and used
for the pattern itself, and the ``--_mtable-regex-X`` only has ``//`` to
separate the regex pattern from the long flags, instead of the usual ``///``.
Thus it's using the shorthand form of the ``--_mtable-regex-X`` option.
It could instead have been:
.. code-block:: perl
--_mtable-regex-X=comment/\*\////{tleave}
The above would have worked exactly the same.
Getting back to our example, remember we're looking at the :file:`input.x`
file, currently using the ``comment`` table, and trying to match the first
regex of that table, shown above, at the following location::
,ctags is trying to match starting here
v
/* BLOCK COMMENT
var dont_capture_me;
*/
var a /* ANOTHER BLOCK COMMENT */, b;
The pattern doesn't match for the position just after ``/*``, because that
position is a space character. So ``ctags`` tries the next pattern in the same
table:
.. code-block:: perl
--_mtable-regex-X=comment/.//
This pattern matches any any one character including newline; the current
position moves one character forward. Now the character at the current position is
``B``. The first pattern of the table ``*/`` still does not match with the input. So
``ctags`` uses next pattern again. When the current position moves to the ``*/``
of the 3rd line of :file:`input.x`, it will finally match this:
.. code-block:: perl
--_mtable-regex-X=comment/\*\///{tleave}
In this pattern, the long flag ``{tleave}`` is specified. This triggers table
switching again. ``{tleave}`` makes ``ctags`` switch the table back to the last
table used before doing ``{tenter}``. In this case, ``toplevel`` is the table.
``ctags`` manages a stack where references to tables are put. ``{tenter}`` pushes
the current table to the stack. ``{tleave}`` pops the table at the top of the
stack and chooses it.
So now ``ctags`` is back to the ``toplevel`` table, and tries the first regex
of that table, which was this:
.. code-block:: perl
--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
It tries to match that against its current position, which is now the
newline on line 3, between the ``*/`` and the word ``var``::
/* BLOCK COMMENT
var dont_capture_me;
*/ <--- ctags is now at this newline (/n) character
var a /* ANOTHER BLOCK COMMENT */, b;
The first regex of the ``toplevel`` table does not match a newline, so it tries
the second regex:
.. code-block:: perl
--_mtable-regex-X=toplevel/.//
This matches a newline successfully, but has no actions to perform. So ``ctags``
moves one character forward (the newline it just matched), and goes back to the
top of the ``toplevel`` table, and tries the first regex again. Eventually we'll
reach the beginning of the second block comment, and do the same things as before.
When ``ctags`` finally reaches the end of the file (the position after ``b;``),
it will not be able to match either the first or second regex of the
``toplevel`` table, and quit processing the input file.
So far, we've successfully skipped over block comments for our new ``X``
language, but haven't generated any tags. The point of ``ctags`` is to generate
tags, not just keep your computer warm. So now let's move onto actually tagging
variables...
Capturing variables in a sequence
......................................................................
Here is the 4th version of :file:`X.ctags`:
.. code-block:: perl
--langdef=X
--map-X=.x
--kinddef-X=v,var,variables
--_tabledef-X=toplevel
--_tabledef-X=comment
--_tabledef-X=vars
--_mtable-regex-X=toplevel/\/\*//{tenter=comment}
# NEW
--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
--_mtable-regex-X=toplevel/.//
--_mtable-regex-X=comment/\*\///{tleave}
--_mtable-regex-X=comment/.//
# NEW
--_mtable-regex-X=vars/;//{tleave}
--_mtable-regex-X=vars/\/\*//{tenter=comment}
--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
--_mtable-regex-X=vars/.//
One pattern in ``toplevel`` was added, and a new table ``vars`` with four
patterns was also added.
The new regex in ``toplevel`` is this:
.. code-block:: perl
--_mtable-regex-X=toplevel/var[ \n\t]//{tenter=vars}
The purpose of this being in `toplevel` is to switch to the `vars` table when
the keyword ``var`` is found in the input stream. We need to switch states
(i.e., tables) because we can't simply capture the variables ``a`` and ``b``
with a single regex pattern in the ``toplevel`` table, because there might be
block comments inside the ``var`` statement (as there are in our
:file:`input.x`), and we also need to create *two* tags: one for ``a`` and one
for ``b``, even though the word ``var`` only appears once. In other words, we
need to "remember" that we saw the keyword ``var``, when we later encounter the
names ``a`` and ``b``, so that we know to tag each of them; and saving that
"in-variable-statement" state is accomplished by switching tables to the
``vars`` table.
The first regex in our new ``vars`` table is:
.. code-block:: perl
--_mtable-regex-X=vars/;//{tleave}
This pattern is used to match a single semi-colon ``;``, and if it matches
pop back to the ``toplevel`` table using the ``{tleave}`` long flag. We
didn't have to make this the first regex pattern, because it doesn't overlap
with any of the other ones other than the ``/.//`` last one (which must be
last for this example to work).
The second regex in our ``vars`` table is:
.. code-block:: perl
--_mtable-regex-X=vars/\/\*//{tenter=comment}
We need this because block comments can be in variable definitions::
var a /* ANOTHER BLOCK COMMENT */, b;
So to skip block comments in such a position, the pattern ``\/\*`` is used just
like it was used in the ``toplevel`` table: to find the literal ``/*`` beginning
of the block comment and enter the ``comment`` table. Because we're using
``{tenter}`` and ``{tleave}`` to push/pop from a stack of tables, we can
use the same ``comment`` table for both ``toplevel`` and ``vars`` to go to,
because ``ctags`` will "remember" the previous table and ``{tleave}`` will
pop back to the right one.
The third regex in our ``vars`` table is:
.. code-block:: perl
--_mtable-regex-X=vars/([a-zA-Z][a-zA-Z0-9]*)/\1/v/
This is nothing special, but is the one that actually tags something: it
captures the variable name and uses it for generating a ``variable`` (shorthand
``v``) tag kind.
The last regex in the ``vars`` table we've seen before:
.. code-block:: perl
--_mtable-regex-X=vars/.//
This makes ``ctags`` ignore any other characters, such as whitespace or the
comma ``,``.
Running our example
......................................................................
.. code-block:: console
$ cat input.x
/* BLOCK COMMENT
var dont_capture_me;
*/
var a /* ANOTHER BLOCK COMMENT */, b;
$ u-ctags -o - --fields=+n --options=X.ctags input.x
u-ctags -o - --fields=+n --options=X.ctags input.x
a input.x /^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;" v line:4
b input.x /^var a \/* ANOTHER BLOCK COMMENT *\/, b;$/;" v line:4
It works!
You can find additional examples of multi-table regex in our github repo, under
the ``optlib`` directory. For example ``puppetManifest.ctags`` is a serious
example. It is the primary parser for testing multi-table regex parsers, and
used in the actual ``ctags`` program for parsing puppet manifest files.
.. this "extras" section should probably be moved up this document, as a
subsection in the "Regex option argument flags" section
.. _extras:
Conditional tagging with extras
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. NOT REVIEWED YET
If a matched pattern should only be tagged when an ``extra`` is enabled, mark
the pattern with ``{_extra=XNAME}``. ``XNAME`` is the name of extra. You must
define an ``XNAME`` with the ``--_extradef-<LANG>=XNAME,DESCRIPTION`` option
before defining a regex option marked ``{_extra=XNAME}``.
.. code-block:: python
if __name__ == '__main__':
do_something()
To capture above lines in a python program(*input.py*), an extra can be used.
.. code-block:: perl
--_extradef-Python=main,__main__ entry points
--regex-Python=/^if __name__ == '__main__':/__main__/f/{_extra=main}
The above optlib(*python-main.ctags*) introduces ``main`` extra to Python parser.
The pattern matching is done only when the ``main`` is enabled.
.. code-block:: console
$ ./ctags --options=python-main.ctags -o - --extras-Python='+{main}' input.py
__main__ input.py /^if __name__ == '__main__':$/;" f
.. this "fields" section should probably be moved up this document, as a
subsection in the "Regex option argument flags" section
.. _fields:
Adding custom fields to the tag output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. NOT REVIEWED YET
Exuberant-ctags allows one of the specified group in a regex pattern can be
used as a part of the name of a tagEntry. Universal-ctags offers using
the other groups in the regex pattern.
An optlib parser can have its own fields. The groups can be used as a
value of the fields of a tagEntry.
Let's think about *Unknown*, an imaginary language.
Here is a source file(``input.unknown``) written in *Unknown*:
public func foo(n, m);
protected func bar(n);
private func baz(n,...);
With `--regex-Unknown=...` Exuberant-ctags can capture `foo`, `bar`, and `baz`
as names. Universal-ctags can attach extra context information to the
names as values for fields. Let's focus on `bar`. `protected` is a
keyword to control how widely the identifier `bar` can be accessed.
`(n)` is the parameter list of `bar`. `protected` and `(n)` are
extra context information of `bar`.
With following optlib file(``unknown.ctags``)), ``ctags`` can attach
`protected` to protection field and `(n)` to signature field.
.. code-block:: perl
--langdef=unknown
--kinddef-unknown=f,func,functions
--map-unknown=+.unknown
--_fielddef-unknown=protection,access scope
--_fielddef-unknown=signature,signatures
--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}
--fields-unknown=+'{protection}{signature}'
For the line `protected func bar(n);` you will get following tags output::
bar input.unknown /^protected func bar(n);$/;" f protection:protected signature:(n)
Let's see the detail of ``unknown.ctags``.
.. code-block:: perl
--_fielddef-unknown=protection,access scope
``--_fielddef-<LANG>=name,description`` defines a new field for a parser
specified by `<LANG>`. Before defining a new field for the parser,
the parser must be defined with ``--langdef=<LANG>``. `protection` is
the field name used in tags output. `access scope` is the description
used in the output of ``--list-fields`` and ``--list-fields=Unknown``.
.. code-block:: perl
--_fielddef-unknown=signature,signatures
This defines a field named `signature`.
.. code-block:: perl
--regex-unknown=/^((public|protected|private) +)?func ([^\(]+)\((.*)\)/\3/f/{_field=protection:\1}{_field=signature:(\4)}
This option requests making a tag for the name that is specified with the group 3 of the
pattern, attaching the group 1 as a value for `protection` field to the tag, and attaching
the group 4 as a value for `signature` field to the tag. You can use the long regex flag
`_field` for attaching fields to a tag with following notation rule::
{_field=FIELDNAME:GROUP}
``--fields-<LANG>=[+|-]{FIELDNAME}`` can be used to enable or disable specified field.
When defining a new parser own field, it is disabled by default. Enable the
field explicitly to use the field. See :ref:`Parser own fields <parser-own-fields>`
about `--fields-<LANG>` option.
`passwd` parser is a simple example that uses ``--fields-<LANG>`` option.
.. this "roles" section should probably be moved up this document, as a
subsection in the "Regex option argument flags" section
.. _roles:
Capturing reference tags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. NOT REVIEWED YET
To capture a reference tag with an optlib parser, specify a role with
`_role` long regex flag. Let's see an example:
.. code-block:: perl
--langdef=FOO
--kinddef-FOO=m,module,modules
--_roledef-FOO=m.imported,imported module
--regex-FOO=/import[ \t]+([a-z]+)/\1/m/{_role=imported}
--extras=+r
--fields=+r
See the line, `--regex-FOO=...`. In this parser `FOO`, a name of
imported module is captured as a reference tag with role `imported`.
A role must be defined before specifying it as value for `_role` flag.
`--_roledef-<LANG>` option is for defining a role.
The parameter of the option comes from three components: a kind
letter, the name of role, and the description of role. The kind letter
comes first. Following a period, give the role name. The period
represents that the role is defined under the kind specified with the
kind letter. In the example, `imported` role is defined under
`module` kind specified with `m`.
Of course, the kind specified with the kind letter must be defined
before using `--_roledef-<FOO>` option. `--kinddef-<LANG>` option
is for defining a kind.
The roles are listed with `--list-roles=<LANG>`. The name and
description passed to `--_roledef-<LANG>` option are used in
the output like::
$ ./ctags --langdef=FOO --kinddef-FOO=m,module,modules \
--_roledef-FOO='m.imported,imported module' --list-roles=FOO
#KIND(L/N) NAME ENABLED DESCRIPTION
m/module imported on imported module
With specifying `_role` regex flag multiple times with different
roles, you can assign multiple roles to a reference tag.
See following input of C language
.. code-block:: C
i += 1;
An ultra fine grained C parser may capture a variable `i` with
`lvalue` and `incremented`. You can do it with:
.. code-block:: perl
--_roledef-C=v.lvalue,locator values
--_roledef-C=v.incremented,incremeted with ++ operator
--regex-C=/([a-zA-Z_][a-zA-Z_0-9])+ *+=/\1/v/{_role=lvalue}{_role=incremeted}
Submitting an optlib file to the Universal-ctags project
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You are encouraged to submit your :file:`.ctags` file to our github through
a pull request.
Universal-ctags provides a facility for "Option library".
Read "Option library" about the concept and usage first.
Here I will explain how to merge your .ctags into universal-ctags as
part of option library. Here I assume you consider contributing
an option library in which a regex based language parser is defined.
See `How to Add Support for a New Language to Exuberant Ctags (EXTENDING)`_
about the way to how to write a regex based language parser. In this
section I explains the next step.
.. _`How to Add Support for a New Language to Exuberant Ctags (EXTENDING)`: http://ctags.sourceforge.net/EXTENDING.html
I use Swine as the name of programming language which your parser
deals with. Assume source files written in Swine language have a suffix
*.swn*. The file name of option library is *swine.ctags*.
Copyright notice, contact mail address and license term
......................................................................
Put these information at the header of *swine.ctags*.
An example taken from *data/optlib/ctags.ctags* ::
#
#
# Copyright (c) 2014, Red Hat, Inc.
# Copyright (c) 2014, Masatake YAMATO
#
# Author: Masatake YAMATO <yamato@redhat.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
#
...
"GPL version 2 or later version" is needed here. Option file is not
linked to ``ctags`` command. However, I have a plan to write a translator
which generates *.c* file from a given option file. As the result the
*.c* file is built into ``ctags`` command. In such a case "GPL version 2
or later version" may be required.
*Units* test cases
......................................................................
We, universal-ctags developers don't have enough time to learn all
languages supported by ``ctags``. In other word, we cannot review the
code. Only test cases help us to know whether a contributed option
library works well or not. We may reject any contribution without
a test case.
Read "Using *Units*" about how to write *Units* test
cases. Don't write one big test case. Some smaller cases are helpful
to know about the intent of the contributor.
* *Units/sh-alias.d*
* *Units/sh-comments.d*
* *Units/sh-quotes.d*
* *Units/sh-statements.d*
are good example of small test cases.
Big test cases are good if smaller test cases exist.
See also *parser-m4.r/m4-simple.d* especially *parser-m4.r/m4-simple.d/args.ctags*.
Your test cases need ``ctags`` having already loaded your option
library, swine.ctags. You must specify loading it in the
test case own *args.ctags*.
Assume your test name is *swine-simile.d*. Put ``--option=swine`` in
*Units/swine-simile.d/args.ctags*.
Makefile.in
......................................................................
Add your optlib file, *swine.ctags* to ``PRELOAD_OPTLIB`` variable of
*Makefile.in*.
If you don't want your optlib loaded automatically when ``ctags`` starts up,
put your optlib file into ``OPTLIB`` of *Makefile.in* instead of
``PRELOAD_OPTLIB``.
Verification
......................................................................
Let's verify all your work here.
1. Run the tests and check whether your test case is passed or failed::
$ make units
2. Verify your files are installed as expected::
$ mkdir /tmp/tmp
$ ./configure --prefix=/tmp/tmp
$ make
$ make install
$ /tmp/tmp/ctags -o - --option=swine something_input.swn
Pull-request
......................................................................
Please, consider submitting your well written optlib parser to
Universal-ctags. Your *.ctags* is a treasure and can be shared as a
first class software component in Universal-ctags.
Pull-requests are welcome.
|