File: format.rst

package info (click to toggle)
universal-ctags 0%2Bgit20181215-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 17,444 kB
  • sloc: ansic: 84,242; vhdl: 5,924; sh: 5,830; perl: 1,743; cpp: 1,599; cs: 1,193; python: 812; sql: 572; f90: 534; php: 479; yacc: 459; fortran: 341; makefile: 325; asm: 311; objc: 284; ruby: 261; xml: 245; java: 157; tcl: 133; cobol: 122; lisp: 113; erlang: 61; ada: 55; ml: 49; awk: 43
file content (498 lines) | stat: -rw-r--r-- 16,083 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
===================================================================
Proposal for extended Vi tags file format
===================================================================

.. note::

    The contents of next section is a copy of FORMAT file in exuberant
    ctags source code in its subversion repository at sourceforge.net.

	We have made some modifications:

	* Exceptions introduced in Universal-ctags are explained with
	  "EXCEPTION" marker.

    * `Exceptions in Universal-ctags`_ subsction summarizes the exceptions.

.. contents:: `Table of contents`
	:depth: 3
	:local:

----

:Version: 0.06 DRAFT
:Date: 1998 Feb 8
:Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net>


Introduction
---------------------------------------------------------------------

The file format for the "tags" file, as used by Vi and many of its
descendants, has limited capabilities.

This additional functionality is desired:

1. Static or local tags.
   The scope of these tags is the file where they are defined.  The same tag
   can appear in several files, without really being a duplicate.
2. Duplicate tags.
   Allow the same tag to occur more then once.  They can be located in
   a different file and/or have a different command.
3. Support for C++.
   A tag is not only specified by its name, but also by the context (the
   class name).
4. Future extension.
   When even more additional functionality is desired, it must be possible to
   add this later, without breaking programs that don't support it.


From proposal to standard
-------------------------------------------------------------------------

To make this proposal into a standard for tags files, it needs to be supported
by most people working on versions of Vi, ctags, etc..  Currently this
standard is supported by:

Darren Hiebert <dhiebert at users.sourceforge.net>
	Exuberant ctags

Bram Moolenaar <Bram at vim.org>
	Vim (Vi IMproved)

These have been or will be asked to support this standard:

Nvi
		Keith Bostic <bostic at bsdi.com>

Vile
		Tom E. Dickey <dickey at clark.net>

NEdit
		Mark Edel <edel at ltx.com>

CRiSP
		Paul Fox <fox at crisp.demon.co.uk>

Lemmy
		James Iuliano <jai at accessone.com>

Zeus
		Jussi Jumppanen <jussij at ca.com.au>

Elvis
		Steve Kirkendall <kirkenda at cs.pdx.edu>

FTE
		Marko Macek <Marko.Macek at snet.fri.uni-lj.si>


Backwards compatibility
---------------------------------------------------------------------------

A tags file that is generated in the new format should still be usable by Vi.
This makes it possible to distribute tags files that are usable by all
versions and descendants of Vi.

This restricts the format to what Vi can handle.  The format is:

1. The tags file is a list of lines, each line in the format::

	{tagname}<Tab>{tagfile}<Tab>{tagaddress}


   {tagname}
	Any identifier, not containing white space..

	EXCEPTION: Universal-ctags violates this item of the proposal;
	tagname may contain spaces. However, tabs are not allowed.

   <Tab>
	Exactly one TAB character (although many versions of Vi can
	handle any amount of white space).

   {tagfile}
	The name of the file where {tagname} is defined, relative to
	the current directory (or location of the tags file?).

   {tagaddress}
	Any Ex command.  When executed, it behaves like 'magic' was
	not set.

2. The tags file is sorted on {tagname}.  This allows for a binary search in
   the file.

3. Duplicate tags are allowed, but which one is actually used is
   unpredictable (because of the binary search).

The best way to add extra text to the line for the new functionality, without
breaking it for Vi, is to put a comment in the {tagaddress}.  This gives the
freedom to use any text, and should work in any traditional Vi implementation.

For example, when the old tags file contains::

	main	main.c	/^main(argc, argv)$/
	DEBUG	defines.c	89

The new lines can be::

	main	main.c	/^main(argc, argv)$/;"any additional text
	DEBUG	defines.c	89;"any additional text

Note that the ';' is required to put the cursor in the right line, and then
the '"' is recognized as the start of a comment.

For Posix compliant Vi versions this will NOT work, since only a line number
or a search command is recognized.  I hope Posix can be adjusted.  Nvi suffers
from this.


Security
------------------------------------------------------------------

Vi allows the use of any Ex command in a tags file.  This has the potential of
a trojan horse security leak.

The proposal is to allow only Ex commands that position the cursor in a single
file.  Other commands, like editing another file, quitting the editor,
changing a file or writing a file, are not allowed.  It is therefore logical
to call the command a tagaddress.

Specifically, these two Ex commands are allowed:

* A decimal line number::

	89

* A search command.  It is a regular expression pattern, as used by Vi,
  enclosed in // or ??::

	/^int c;$/
	?main()?

There are two combinations possible:

* Concatenation of the above, with ';' in between.  The meaning is that the
  first line number or search command is used, the cursor is positioned in
  that line, and then the second search command is used (a line number would
  not be useful).  This can be done multiple times.  This is useful when the
  information in a single line is not unique, and the search needs to start
  in a specified line.
  ::

	/struct xyz {/;/int count;/
	389;/struct foo/;/char *s;/

* A trailing comment can be added, starting with ';"' (two characters:
  semi-colon and double-quote).  This is used below.
  ::

	89;" foo bar

This might be extended in the future.  What is currently missing is a way to
position the cursor in a certain column.


Goals
--------

Now the usage of the comment text has to be defined.  The following is aimed
at:

1. Keep the text short, because:

   * The line length that Vi can handle is limited to 512 characters.
   * Tags files can contain thousands of tags.  I have seen tags files of
     several Mbytes.
   * More text makes searching slower.

2. Keep the text readable, because:

   * It is often necessary to check the output of a new ctags program.
   * Be able to edit the file by hand.
   * Make it easier to write a program to produce or parse the file.

3. Don't use special characters, because:

   * It should be possible to treat a tags file like any normal text file.

Proposal
-----------

Use a comment after the {tagaddress} field.  The format would be::

	{tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]


{tagname}
	Any identifier, not containing white space..

	EXCEPTION: Universal-ctags violates this item of the proposal;
	name may contain spaces. However, tabs are not allowed.
	Conversion, for some characters including <Tab> in the "value",
	explained in the last of this section is applied.

<Tab>
	Exactly one TAB character (although many versions of Vi can
	handle any amount of white space).

{tagfile}
	The name of the file where {tagname} is defined, relative to
	the current directory (or location of the tags file?).

{tagaddress}
	Any Ex command.  When executed, it behaves like 'magic' was
	not set.  It may be restricted to a line number or a search
	pattern (Posix).

Optionally:

;"
		semicolon + doublequote: Ends the tagaddress in way that looks
		like the start of a comment to Vi.

{tagfield}
		See below.

A tagfield has a name, a colon, and a value: "name:value".

* The name consist only out of alphabetical characters.  Upper and lower case
  are allowed.  Lower case is recommended.  Case matters ("kind:" and "Kind:
  are different tagfields).

* The value may be empty.
  It cannot contain a <Tab>.

  - When a value contains a "\\t", this stands for a <Tab>.
  - When a value contains a "\\r", this stands for a <CR>.
  - When a value contains a "\\n", this stands for a <NL>.
  - When a value contains a "\\\\", this stands for a single '\\' character.

  Other use of the backslash character is reserved for future expansion.
  Warning: When a tagfield value holds an MS-DOS file name, the backslashes
  must be doubled!

  EXCEPTION: Universal-ctags introduces more conversion rules.

  - When a value contains a "\\a", this stands for a <BEL> (0x07).
  - When a value contains a "\\b", this stands for a <BS> (0x08).
  - When a value contains a "\\v", this stands for a <VT> (0x0b).
  - When a value contains a "\\f", this stands for a <FF> (0x0c).
  - The characters in range 0x01 to 0x1F included, 0x7F, and leading space
    (0x20) and '!' (0x21) are converted to \x prefixed hexadecimal number if
    the characters are not handled in the above "value" rules.

Proposed tagfield names:

=============== =============================================================================
FIELD-NAME	DESCRIPTION
=============== =============================================================================
arity		Number of arguments for a function tag.

class		Name of the class for which this tag is a member or method.

enum		Name of the enumeration in which this tag is an enumerator.

file		Static (local) tag, with a scope of the specified file.  When
		the value is empty, {tagfile} is used.

function	Function in which this tag is defined.  Useful for local
		variables (and functions).  When functions nest (e.g., in
		Pascal), the function names are concatenated, separated with
		'/', so it looks like a path.

kind		Kind of tag.  The value depends on the language.  For C and
		C++ these kinds are recommended:

		c
			class name

		d
			define (from #define XXX)

		e
			enumerator

		f
			function or method name

		F
			file name

		g
			enumeration name

		m
			member (of structure or class data)

		p
			function prototype

		s
			structure name

		t
			typedef

		u
			union name

		v
			variable

		When this field is omitted, the kind of tag is undefined.

struct		Name of the struct in which this tag is a member.

union		Name of the union in which this tag is a member.
=============== =============================================================================


Note that these are mostly for C and C++.  When tags programs are written for
other languages, this list should be extended to include the used field names.
This will help users to be independent of the tags program used.

Examples::

	asdf	sub.cc	/^asdf()$/;"	new_field:some\svalue	file:
	foo_t	sub.h	/^typedef foo_t$/;"	kind:t
	func3	sub.p	/^func3()$/;"	function:/func1/func2	file:
	getflag	sub.c	/^getflag(arg)$/;"	kind:f	file:
	inc	sub.cc	/^inc()$/;"	file: class:PipeBuf


The name of the "kind:" field can be omitted.  This is to reduce the size of
the tags file by about 15%.  A program reading the tags file can recognize the
"kind:" field by the missing ':'.  Examples::

	foo_t	sub.h	/^typedef foo_t$/;"	t
	getflag	sub.c	/^getflag(arg)$/;"	f	file:


Additional remarks:

* When a tagfield appears twice in a tag line, only the last one is used.


Note about line separators:

Vi traditionally runs on Unix systems, where the line separator is a single
linefeed character <NL>.  On MS-DOS and compatible systems <CR><NL> is the
standard line separator.  To increase portability, this line separator is also
supported.

On the Macintosh a single <CR> is used for line separator.  Supporting this on
Unix systems causes problems, because most fgets() implementation don't see
the <CR> as a line separator.  Therefore the support for a <CR> as line
separator is limited to the Macintosh.

Summary:

==============  ======================  =========================
line separator	generated on		accepted on
==============  ======================  =========================
<LF>		Unix			Unix, MS-DOS, Macintosh
<CR>		Macintosh		Macintosh
<CR><LF>	MS-DOS			Unix, MS-DOS, Macintosh
==============  ======================  =========================

The characters <CR> and <LF> cannot be used inside a tag line.  This is not
mentioned elsewhere (because it's obvious).


Note about white space:

Vi allowed any white space to separate the tagname from the tagfile, and the
filename from the tagaddress.  This would need to be allowed for backwards
compatibility.  However, all known programs that generate tags use a single
<Tab> to separate fields.

There is a problem for using file names with embedded white space in the
tagfile field.  To work around this, the same special characters could be used
as in the new fields, for example "\\s".  But, unfortunately, in MS-DOS the
backslash character is used to separate file names.  The file name
"c:\\vim\\sap" contains "\\s", but this is not a <Space>.  The number of
backslashes could be doubled, but that will add a lot of characters, and make
parsing the tags file slower and clumsy.

To avoid these problems, we will only allow a <Tab> to separate fields, and
not support a file name or tagname that contains a <Tab> character.  This
means that we are not 100% Vi compatible.  However, there is no known tags
program that uses something else than a <Tab> to separate the fields.  Only
when a user typed the tags file himself, or made his own program to generate a
tags file, we could run into problems.  To solve this, the tags file should be
filtered, to replace the arbitrary white space with a single <Tab>.  This Vi
command can be used::

	:%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/

(replace ^I with a real <Tab>).


TAG FILE INFORMATION:

Psuedo-tag lines can be used to encode information into the tag file regarding
details about its content (e.g. have the tags been sorted?, are the optional
tagfields present?), and regarding the program used to generate the tag file.
This information can be used both to optimize use of the tag file (e.g.
enable/disable binary searching) and provide general information (what version
of the generator was used).

The names of the tags used in these lines may be suitably chosen to ensure
that when sorted, they will always be located near the first lines of the tag
file.  The use of "!_TAG_" is recommended.  Note that a rare tag like "!"
can sort to before these lines.  The program reading the tags file should be
smart enough to skip over these tags.

The lines described below have been chosen to convey a select set of
information.

Tag lines providing information about the content of the tag file::

    !_TAG_FILE_FORMAT	{version-number}	/optional comment/
    !_TAG_FILE_SORTED	{0|1}			/0=unsorted, 1=sorted/

The {version-number} used in the tag file format line reserves the value of
"1" for tag files complying with the original UNIX vi/ctags format, and
reserves the value "2" for tag files complying with this proposal. This value
may be used to determine if the extended features described in this proposal
are present.

Tag lines providing information about the program used to generate the tag
file, and provided solely for documentation purposes::

    !_TAG_PROGRAM_AUTHOR	{author-name}	/{email-address}/
    !_TAG_PROGRAM_NAME	{program-name}	/optional comment/
    !_TAG_PROGRAM_URL	{URL}	/optional comment/
    !_TAG_PROGRAM_VERSION	{version-id}	/optional comment/

Exceptions in Universal-ctags
--------------------------------

Universal-ctags supports this proposal with some
exceptions.


Exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#. {tagname} in tags file generated by Universal-ctags may contain
   spaces and several escape sequences. Parsers for documents like Tex and
   reStructuredText, or liberal languages such as JavaScript need these
   exceptions. See {tagname} of Proposal section for more detail about the
   conversion.

.. _compat-output:

Compatible output and weakness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. NOT REVIEWED YET

Default behavior (``--output-format=u-ctags`` option) has the
exceptions.  In other hand, with ``--output-format=e-ctags`` option
ctags has no exception; Universal-ctags command may use the same file
format as Exuberant-ctags. However, ``--output-format=e-ctags`` throws
away a tag entry which name includes a space or a tab
character. ``TAG_OUTPUT_MODE`` pseudo tag tells which format is
used when ctags generating tags file.