1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355
|
/*man-start*********************************************************************
========================================================================
APPENDIX 4 - SYNTAX HIGHLIGHTING IN THE
========================================================================
This appendix contains details on syntax highlighting in THE. Syntax
highlighting is the mechanism by which different tokens within a file;
usually containing source code, are displayed in different colours.
The model THE uses for its syntax highlighting is based on the model
used by KEDIT for Windows from Mansfield Software. This model is extremely
configurable and flexible. While most of the KEDIT features are implemented,
THE also adds a couple of other features that make the syntax highlighting
even better.
This appendix concentrates on the format of THE language definition
files. For a description of the commands that manipulate other aspects
of syntax highlighting in THE, see the descriptions of the following
commands:
<SET AUTOCOLOR>, <SET COLORING>, <SET ECOLOR>, <SET PARSER>.
==================
Performance Impact
==================
Syntax highlighting in an editor comes at a cost; reduced performance.
Because of the extra processing required to determine which characters are
displayed in which colours, displaying the screen is slower. As THE
recalculates the display colours after every displayable key is pressed,
then you may notice a reduction in responsiveness.
For those languages that allow paired comments (ie they can span multiple
lines) performance is impacted even more. This is because THE has to
determine if the lines being displayed are within one of these multi-line
comment pairs which may start before the first displayed line.
THE will incorrectly display syntax highlighting in certain circumstances.
This is because THE does not fully parse the complete file to determine
the correct colours; that would be too slow. Instead, THE checks the
currently displayed lines and determines the syntax highlighting based
on these lines.
Where THE will get syntax highlighting wrong:
If all displayed lines are within a multi-line comment block and neither
the starting comment token nor the ending comment token are displayed. THE
will treat the displayed lines as code.
When the starting or ending comment tokens for multi-line comments are
part of a language string.
Also bear in mind that excluding large portions of the file with ALL, will
dramatically slow down checking of multi-line comments.
================================
File Extensions Vs Magic Numbers
================================
A THE extension to the KEDIT syntax highlighting model is support for
<magic numbers>. (See <SET AUTOCOLOR> for more details). For the default
<parser>s, where there might be a conflict between setting syntax highlighting
based on a file extension or a <magic number>, the file extension mapping takes
precedence.
=============================
THE Language Definition Files
=============================
THE Language Definition Files usually have a file extension of .tld.
THE comes with a small number of sample TLD files. Look at these files in
conjunction with the following descriptions to fully understand how to
write your own TLD files.
TLD files consist of several sections identified by header lines. Header
lines start with a colon in column one. Items within the particular header
are listed on separate lines after the header to which they apply.
Blank lines are ignored, and so are comments (* as first non-blank). Each
item that can be repeated occurs on a separate line. The above definition
of what a TLD file looks like is expressed in the TLD file; tld.tld.
The purpose of each header and the valid contents are explained below.
===========
:identifier
===========
This section specifies, using a regular expression how a keyword in the
language is defined. The only item line contains three regular expressions
separated by space characters.
Syntax:
first_char_re other_char_re [last_char_re]
Meaning of options:
first_char_re
This regular expression specifies the valid characters that an
identifier can begin with.
other_char_re
This regular expression specifies the valid characters that the
remainder of characters in an identifier can consist of.
last_char_re
This regular expression is optional. If specified, it states the
valid characters that an identifier can end with.
=====
:case
=====
This section defines whether the case of letters that make up identifiers in
the language are case-sensitive or not. Only one of the items below can be
included.
Syntax:
RESPECT | IGNORE
Meaning of options:
respect
case is relevant. The keywords 'if', 'IF' and 'If' are different.
ignore
case is irrelevant. The keywords 'if' and 'IF' are treated as the
same identifier.
=======
:option
=======
This section specifies different options that can affect other sections.
The options below can all be included in the one TLD.
Syntax:
REXX | PREPROCESSOR char | FUNCTION char [BLANK | NOBLANK]
Meaning of options:
rexx
specifies special processing for Rexx (not implemented)
preprocessor char
languages like C that have preprocessor identifiers usually begin
with a special character (specified by 'char') to differentiate
these types of keywords from others.
function char [blank | noblank]
this option is used to identify how keywords specified in the
:function section are identified. 'char' specifies the character that
is used to start a function, usually '('. The 'blank' or 'noblank'
argument determines if blank characters can appear between the function
identifier and the function start character. eg a Rexx function
call must be written without blanks between the function name and
the function start character: 'word('. In C 'word (' or 'word('
are both valid syntax for a function call.
=======
:number
=======
This section specifies the format of numbers in the language. Most languages
use a small number of generic types of languages.
(This header is currently ignored)
Syntax:
REXX | C | COBOL
Meaning of options:
ECOLOR Value:
=======
:string
=======
This section specifies how strings within the language are defined.
Multiple values may be specified, as many languages use both single and
double quotes.
Syntax:
SINGLE | DOUBLE [BACKSLASH]
Meaning of options:
single
Specifies that the language uses single quotes to identify a string.
double
Specifies that the language uses double quotes to identify a string.
backslash
Some languages require a backslash character immediately preceding
either a single or double quote to allow the quote to be included
in the string.
ECOLOR Character:
For complete strings, the ECOLOUR character used is 'B'. For incomplete
strings, the ECOLOUR character used is 'S'.
========
:comment
========
This section specifies the format of comments. Both paired and line
comments can be specified, as can multiple occurrences of each.
Syntax:
PAIRED open_string close_string [NEST | NONEST]
LINE comment_string ANY | FIRSTNONBLANK | COLUMN n
Meaning of options:
paired
These types of comments can span multiple lines. They have an opening
string and a closing string.
open_string
This defines the string that opens a paired comment.
close_string
This defines the string that closes a paired comment.
nest
Some languages allow paired comments to be nested. (not implemented)
nonest
Defining this indicates that the language does not allow nesting
of paired comments. The effect of this option will result in the
first close_string to end the paired comment no matter how many
open_string occurrences there are. (not implemented)
line
These type of comments cannot span multiple lines. Everything on the
line after the comment_string is considered part of the comment.
comment_string
The string that defines a line comment.
any
For line comments, this indicates that the comment_string can occur
anywhere on the line, and all characters following it are part of
the comment.
firstnonblank
For line comments, this indicates that the comment_string can only
occur as the first non-blank of the line.
column n
For line comments, this indicates that the comment_string must
start in the specified column.
ECOLOR Character:
Comments are displayed in the colour specified with ECOLOUR 'A'.
========
:keyword
========
This section specifies all of the identifiers that are to be considered
language keywords. You must specific the :identifier section in the TLD
file before the :keyword section.
Syntax:
keyword [ALTernate x] [TYPE x]
Meaning of options:
keyword
This specifies the string that is considered to be a language
keyword.
alternate x
All keywords are displayed in the same colour, unless you use
this option to specify a different colour. There are 9 alternate
colours that can be used; ECOLOUR 1 through 9. 'alternate' can
be abbreviated to 'alt'.
type x
(not implemented)
ECOLOR Character:
Unless overridden by the 'alternate' option, the keyword is displayed
in the colour specified with ECOLOUR 'D'.
=========
:function
=========
This section specifies all of the identifiers that are to be considered
functions. Normally this is used for those functions that are builtin
into the language, but can be any identifier. You specify the function
identifier without the function char specified in the :option section.
You must specify the :option and the :identifier sections in the TLD
file before the :function section.
Syntax:
function [ALTernate x]
Meaning of options:
function
This specifies the string that is considered to be a language
function.
alternate x
All keywords are displayed in the same colour, unless you use
this option to specify a different colour. There are 9 alternate
colours that can be used; ECOLOUR 1 through 9. 'alternate' can
be abbreviated to 'alt'.
ECOLOR Character:
Unless overridden by the 'alternate' option, the keyword is displayed
in the colour specified with ECOLOUR 'V'.
=======
:header
=======
This section specifies the format of headers. Headers are lines within a file
that begin with a particular string and usually identify different parts of
the file. They are similar to labels.
Syntax:
LINE header_string ANY | FIRSTNONBLANK | COLUMN n
Meaning of options:
header_string
The string that defines a header.
any
This indicates that the header_string can occur anywhere on the line,
and all characters following it are part of the header.
firstnonblank
This indicates that the header_string can only occur as the first
non-blank of the line.
column n
This indicates that the header_string must start in the specified column.
ECOLOR Character:
Headers are displayed in the colour specified with ECOLOUR 'G'.
======
:label
======
This section specifies the format of labels. Labels are lines within a file
that end with a particular string. They are similar to headers.
Syntax:
DELIMITER label_string ANY | FIRSTNONBLANK | COLUMN n
COLUMN n (not implemented yet)
Meaning of options:
label_string
The string that defines a label.
any
This indicates that the label_string can occur anywhere on the line,
and all characters up to it are part of the label.
firstnonblank
This indicates that the label_string can only occur as the first
non-blank of the line.
column n
As part of a DELIMITER label, this indicates that the label_string
must start in the specified column. If specified by itself, then the
label does not require any special delimiter; the string that starts
(or ends??) in the specified column is regarded as a label.
ECOLOR Character:
Labels are displayed in the colour specified with ECOLOUR 'E'.
======
:match
======
(Not implemented yet)
============
:postcompare
============
(Not implemented yet)
This section specifies items that are checked for after all other syntax
checking has been completed. This can be useful if you want to allow
user-defined datatypes or other code to be displayed in different colours.
===============
Builtin Parsers
===============
THE includes a number of builtin syntax highlighting <parser>s.
The following table lists the default <parser>s and the files they apply to:
+--------+-----------+----------------+
| Parser | Filemasks | "Magic Number" |
+--------+-----------+----------------+
| REXX | *.rex | rexx |
| - | *.rexx | regina |
| - | *.cmd | rxx |
| - | *.the | - |
| - | .therc | - |
| C | *.c | - |
| - | *.h | - |
| - | *.cc | - |
| - | *.hpp | - |
| - | *.cpp | - |
| SH | - | sh |
| - | - | ksh |
| - | - | bash |
| - | - | zsh |
| TLD | *.tld | - |
| HTML | *.html | - |
| - | *.htm | - |
+--------+-----------+----------------+
A Rexx macro is provided; tld2c.rex, to convert a .tld file into the C code
that can be embedded in default.c. This enables you to configure THE with the
default <parser>s that are more applicable for you.
**man-end**********************************************************************/
|