1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
|
/*man-start*********************************************************************
========================================================================
APPENDIX 7 - REGULAR EXPRESSIONS IN THE
========================================================================
This appendix contains details on regular expression usage in THE. There are
two places where THE uses regular expressions; in targets in commands like
<LOCATE> and <ALL>, and in the specification of patterns in THE Language
Definition files used for syntax highlighting.
THE uses the GNU Regular Expression Library to implement regular expressions.
This library has several different regular expression syntaxes that can be used
when specifying targets.
Note that all pattern specifications used for syntax highlighting always uses
the EMACS regular expression syntax.
The following table lists the features of each of the regular expression
syntaxes that can be set via the <SET REGEXP> command. Each feature in the
table is explained later.
This appendix is not intended to explain everything about regular expressions.
If you want to find out more about GNU Regular Expressions, then view the on-line
documentation at <http://www.hessling-editor.sf.net/doc/regex/regex/>.
+------------------------+----------------------------+
| Syntax | Features |
+------------------------+----------------------------+
| EMACS | None set |
+------------------------+----------------------------+
| AWK | BACKSLASH_ESCAPE_IN_LISTS |
| | DOT_NOT_NULL |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_REFS |
| | NO_BACKSLASH_VBAR |
| | NO_EMPTY_RANGES |
| | UNMATCHED_RIGHT_PAREND_ORD |
+------------------------+----------------------------+
| POSIX_AWK | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_BRACES |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
| | UNMATCHED_RIGHT_PAREN_ORD |
| | BACKSLASH_ESCAPE_IN_LISTS |
+------------------------+----------------------------+
| GREP | BACKSLASH_PLUS_QM |
| | CHAR_CLASSES |
| | HAT_LISTS_NOT_NEWLINE |
| | INTERVALS |
| | NEWLINE_ALT |
+------------------------+----------------------------+
| EGREP | CHAR_CLASSES |
| | HAT_LISTS_NOT_NEWLINE |
| | NEWLINE_ALT |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
+------------------------+----------------------------+
| POSIX_EGREP | CHAR_CLASSES |
| | HAT_LISTS_NOT_NEWLINE |
| | NEWLINE_ALT |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
| | NO_BACKSLASH_BRACES |
| | INTERVALS |
+------------------------+----------------------------+
| SED | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | BACKSLASH_PLUS_QM |
+------------------------+----------------------------+
| POSIX_BASIC | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | BACKSLASH_PLUS_QM |
+------------------------+----------------------------+
| POSIX_MINIMAL_BASIC | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | LIMITED_OPS |
+------------------------+----------------------------+
| POSIX_EXTENDED | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INDEP_OPS |
| | NO_BACKSLASH_BRACES |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_VBAR |
| | UNMATCHED_RIGHT_PAREN_ORD |
+------------------------+----------------------------+
| POSIX_MINIMAL_EXTENDED | CHAR_CLASSES |
| | DOT_NEWLINE |
| | DOT_NOT_NULL |
| | INTERVALS |
| | NO_EMPTY_RANGES |
| | CONTEXT_INDEP_ANCHORS |
| | CONTEXT_INVALID_OPS |
| | NO_BACKSLASH_BRACES |
| | NO_BACKSLASH_PARENS |
| | NO_BACKSLASH_REFS |
| | NO_BACKSLASH_VBAR |
| | UNMATCHED_RIGHT_PAREN_ORD |
+------------------------+----------------------------+
============
BACKSLASH_ESCAPE_IN_LISTS
============
If this feature is not set, then \ inside a bracket expression is
literal.
If set, then such a \ quotes the following character.
============
BACKSLASH_PLUS_QM
============
If this feature is not set, then + and ? are operators, and \+ and \? are
literals.
If set, then \+ and \? are operators and + and ? are literals.
============
CHAR_CLASSES
============
If this feature is set, then character classes are supported.
They are:
[:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
[:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
If not set, then character classes are not supported.
============
CONTEXT_INDEP_ANCHORS
============
If this feature is set, then ^ and $ are always anchors (outside bracket
expressions, of course).
If this feature is not set, then it depends:
^ is an anchor if it is at the beginning of a regular
expression or after an open-group or an alternation operator;
$ is an anchor if it is at the end of a regular expression, or
before a close-group or an alternation operator.
This feature could be (re)combined with CONTEXT_INDEP_OPS, because
POSIX draft 11.2 says that * etc. in leading positions is undefined.
============
CONTEXT_INDEP_OPS
============
If this feature is set, then special characters are always special regardless
of where they are in the pattern.
If this feature is not set, then special characters are special only in some
contexts; otherwise they are ordinary. Specifically, * + ? and intervals
are only special when not after the beginning, open-group, or alternation operator.
============
CONTEXT_INVALID_OPS
============
If this feature is set, then *, +, ?, and { cannot be first in an RE or immediately
after an alternation or begin-group operator.
============
DOT_NEWLINE
============
If this feature is set, then . matches newline. If not set, then it does not.
============
DOT_NOT_NULL
============
If this feature is set, then . does not match NUL. If not set, then it does.
============
HAT_LISTS_NOT_NEWLINE
============
If this feature is set, nonmatching lists [^...] do not match newline.
If not set, they do.
============
INTERVALS
============
If this feature is set, either \{...\} or {...} defines an interval, depending
on NO_BACKSLASH_BRACES.
If not set, \{, \}, {, and } are literals.
============
LIMITED_OPS
============
If this feature is set, +, ? and | are not recognized as operators.
If not set, they are.
============
NEWLINE_ALT
============
If this feature is set, newline is an alternation operator.
If not set, newline is literal.
============
NO_BACKSLASH_BRACES
============
If this feature is set, then `{...}' defines an interval, and \{ and \} are literals.
If not set, then `\{...\}' defines an interval.
============
NO_BACKSLASH_PARENS
============
If this feature is set, (...) defines a group, and \( and \) are literals.
If not set, \(...\) defines a group, and ( and ) are literals.
============
NO_BACKSLASH_REFS
============
If this feature is set, then \<digit> matches <digit>. If not set, then \<digit>
is a back-reference.
============
NO_BACKSLASH_VBAR
============
If this feature is set, then | is an alternation operator, and \| is literal.
If not set, then \| is an alternation operator, and | is literal.
============
NO_EMPTY_RANGES
============
If this feature is set, then an ending range point collating higher than the starting
range point, as in [z-a], is invalid.
If not set, then when ending range point collates higher than the starting range
point, the range is ignored.
============
UNMATCHED_RIGHT_PAREN_ORD
============
If this feature is set, then an unmatched ) is ordinary.
If not set, then an unmatched ) is invalid.
**man-end**********************************************************************/
|