File: usercharclass.html

package info (click to toggle)
redet 8.26-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 9,696 kB
  • sloc: tcl: 38,887; makefile: 58
file content (176 lines) | stat: -rw-r--r-- 7,202 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<!--Time-stamp: <2006-08-17 23:24:17 poser> -->
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
   <META NAME="Author" CONTENT="Bill Poser">

   <TITLE>Redet Reference Manual: User-Defined Character Classes</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFE2C0" VLINK="#0000EE" LINK="#AA0066" ALINK="#FF0000">

<H2><a name="charclass">User-Defined Character Classes</a></H2>
<P>
All programs that support true regular expressions provide a way to enter sets
of characters, usually using the notation [abc] or (a|b|c), sometimes both.
Such notations have, however, two limitations. First, even if they are predefined
in the initialization file so that it is not necessary to type the same set
repeatedly, once inserted into the regular expression, it is easy to lose track
of what each set consists of. Second, some sets are large, with the result that
the regular expression becomes long and unwieldy. <i>Redet</i> provides a means
for overcoming these limitations in the form of user-defined character classes.
</p>
<P>
Redet allows the user to define any number of named character sets.
This facility is disabled by default to prevent it from confusing users
who expect the normal behavior of the chosen program. You can enable it
interactively via the Tools:Classes menu or from your initialization file
by including the line:
</p>
<p>
UserClassesEnabledP T
</p
<P>
It is automatically enabled if you define a class interactively.
</p>

<P>
Character classes may be defined interactively,
via the command <i>Enter Character Class Definition</i>,
read from a file, via the command <i>Load Character Class Definitions</i>,
or defined in the initialization file using the command <i>DefineCharacterClass</i>.
Definitions entered interactively may be saved for future use via the command
<i>Save Character Class Definitions to File</i>.
</p>
<p>
A definition consists of three parts:
</p>
<ul>
<li>A set name</li>
<li>The list of characters</li>
<li>A gloss that explains what the set consists of. This is desirable since it is usually
most convenient to use a short name for the set.</li>
</ul>
<p>
If defined interactively, each of the three components is entered separately, as shown
below:
</p>
<br>
<div align="center">
<img src="Images/ClassEntry.jpg" width="80%" alt="A character class definition entered interactively" border="2">
</div>
<br clear="all">
<p>
A character class definition file contains one class per line, with the class name, the
characters themselves, and the gloss in that order in three fields separated by tabs, e.g.:
</p>
<pre>
vowels	aeiou	The English vowel letters
</pre>
<P>
Similarly, the initialization file command <i>DefineCharacterClass</i> takes three
arguments: the class name, the set of characters, and the gloss.
</p>
<P>
If a class defined in a file is already defined, the new definition overrides
the old one silently. However, if a definition is entered interactively
for an existing class, Redet asks whether it should redefine the class.
</p>
<div align="center">
<img src="Images/RedefinitionConfirmation.jpg" width="80%" alt="A popup asking whether Redet should redefine an existing class definition" border="2">
</div>
<br clear="all">
<P>
If the user prefers not to redefine the existing class, the definition window reappears
with the character and gloss fields filled in as before but with the name field
empty, ready to receive a new name.
</P>
<div align="center">
<img src="Images/StageTwoDefinition.jpg" width="80%" alt="A stage two class definition popup" border="2">
</div>
<br clear="all">
<p>
Once defined, a character class may be included in a regular expression just like
any other component. User-defined character classes are listed in a palette,
from which they may be copied into the regular expression window with a mouse click
like program palette entries. To see which characters belong to a user-defined character
class, double left click on the palette entry for the class. 
</p>
<br>
<div align="center">
<img src="Images/ClassPalette.jpg" width="80%" alt="A palette of user-defined character classes" border="2">
</div>
<br clear="all">
<p>
One of the advantages of character classes defined in this way is that they
do not clutter up the regular expression window. However, sometimes it is
desirable to see exactly what is being executed. The command <i>Display Regular Expression Actually Executed</i> on the <i>Class</i> menu pops up a window showing the regular
expression as executed. When first popped
up, it shows the last regular expression executed. If left up, it is updated
each time a regular expression is executed.
</p>
<p>
User-defined character classes are entered using a fixed notation; <i>Redet</i>
automatically translates this notation into notation appropriate for the selected
program.
</p>
<br>
<div align="center">
<img src="Images/Actual01.jpg" width="80%" alt="A regular expression as actually executed" border="2">
</div>
<br clear="all">

<br>
<div align="center">
<img src="Images/Actual02.jpg" width="80%" alt="A regular expression as actually executed" border="2">
</div>
<br clear="all">
<P>
An additional extension provided by <i>Redet</i> allows user-defined named character
classes to be intersected. A sequence of two or more user-defined named character classes
enclosed within angle-brackets is translated into the intersection of the character
classes.
</p>
<br>
<div align="center">
<img src="Images/ClassIntersection01.jpg" width="80%" alt="Intersection of character classes" border="2">
</div>
<br clear="all">

<P>
Here we see a regular expression matching a sequence of three characters. The classes
use the feature notation used in linguistics. The first character is specified as
a voiced labial. The labials in English are the consonants <i>p,b,m,f,v</i> and <i>w</i>.
Of these, <i>p</i> and <i>f</i> are voiceless (that is, the vocal folds do not vibrate
during these sounds); the others are voiced. The second character is specified
as a front vowel, the third as a nasal.
</p>

<br>
<div align="center">
<img src="Images/IntersectionActual01.jpg" width="50%" alt="The regular expression actually executed as a result of character class intersection" border="2">
</div>
<br clear="all">
<P>
Some pattern matching engines use angle-brackets as metacharacters, and
it is sometimes necessary to match angle-brackets, so the use of angle-brackets
as delimiters for character-class intersection can create conflicts.
One way to deal with this problem is to disable the user defined character class
facility, which you may do either from the <i>User Class</i> submenu of the <i>Tools</i> menu
or by means of the initialization file command <i>UserClassesEnabledP</i>.
Another way is to redefine the intersection delimiters, which you may do by means
of the initialization file commands
<i>SetLeftUserClassIntersectionDelimiter</i>
and
<i>SetRightUserClassIntersectionDelimiter</i>.
</p>

<br>
<center><a href="substitution.html">Next</a></center>
<br>
<center><a href="Manual.html">Back to Table of Contents</a></center>
</body>
</html>