File: xiphos-32-search-syntax.page

package info (click to toggle)
xiphos 4.2.1%2Bdfsg1-7
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 19,192 kB
  • sloc: ansic: 28,127; cpp: 13,020; python: 1,042; xml: 163; sh: 86; makefile: 29
file content (312 lines) | stat: -rw-r--r-- 11,504 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
<page id="xiphos-32-search-syntax" type="topic"
      xmlns="http://projectmallard.org/1.0/"
	 xmlns:its="http://www.w3.org/2005/11/its">

  <info>
    <desc>Search Syntax using Regular Expression.</desc>

    <link type="guide" xref="index#search-function"/>

    <revision pkgversion="4.1.0" date="2018-04-24" status="draft"/>
    <revision pkgversion="4.1.0" date="2018-05-30" status="candidate"/>

    <title type='link' role="trail"></title>
    <title type='text'>Xiphos</title>

    <credit type="author" its:translate="no">
      <name>Andy Piper</name>
    </credit>
    <credit type="author" its:translate="no">
      <name>Pierre Benz</name>
    </credit>
    <credit type="author" its:translate="no">
      <name>Dr Peter von Kaehne</name>
    </credit>
    <credit type="author" its:translate="no">
      <name>Karl Kleinpaste</name>
    </credit>
    <credit type="author" its:translate="no">
      <name>Matthew Talbert</name>
    </credit>

    <include href="legal.xml" xmlns="http://www.w3.org/2001/XInclude"/>

  </info>

  <!-- id="hdbk-op-search-dialog-text-regexp" -->

  <title>Search Syntax</title>

  <p>Regular expression searches provide a way to do simple or complex searches
  for strings that match a pattern or set of patterns (branches) separated by
  vertical bars &quot;|&quot;. While a pattern can be built to look for a word
  or phrase, a simple pattern that consists of a word does not look for only
  that word but for any place the string of letters that make that word are
  found. A search for &quot;right&quot; will return verses that contain the
  word &quot;right&quot;, but also &quot;<em>right</em>eous&quot;,
  &quot;<em>right</em> eousness&quot;, &quot;un<em>right</em>eous&quot;,
  &quot;up<em>right</em>&quot; and even &quot;b<em>right</em>&quot;. A search
  for &quot;hall not&quot; is not a search for &quot;hall&quot; AND
  &quot;not&quot; but for the string &quot;hall not&quot; with a space between
  the second &quot;l&quot; and the &quot;n&quot;. The search for &quot;hall
  not&quot; will find occurrences of &quot;s<em>hall not</em>&quot;.</p>

  <p>The power of Regular Expressions is in the patterns (or templates) used to
  define a search. A pattern consists of ordinary characters and some special
  characters that are used and interpreted by a set of rules. Special characters
  include .\[^*$?+. Ordinary (or simple) characters are any characters that are
  not special. The backslash, &quot;\&quot;, is used to convert special
  characters to ordinary and ordinary characters to special.</p>

  <p>Example: the pattern &quot;<em>i. love\.</em>&quot; will find sentences
  that end with &quot;h<em>i</em>s <em>love</em>&quot; or &quot;<em>i</em>n
  <em>love</em>&quot; or &quot; <em>i</em>s <em>love</em>&quot; followed by a
  period. The first period in &quot;i. love \.&quot; is a special character that
  means allow any character in this position. The backslash in &quot;i.
  love\.&quot; means that the period following it is not to be considered a
  special character, but is an ordinary period.</p>

  <section id="hdbk-op-search-dialog-text-regexp-rules">

    <title>Rules for Regular Expression Search Requests</title>
    <list>
      <item>
	<p>. The period matches any character.</p>
      </item>
      <item>
	<p>* The asterisk matches 0 or more characters of the
	preceding: set, character or indicated character.</p>
      </item>
      <item>
	<p>+ The plus sign matches 1 or more characters of the
	preceding: set, character or indicated character.</p>
      </item>
      <item>
	<p>? The question mark matches 0 or 1 character of the
	preceding: set, character or indicated character.</p>
      </item>
      <item>
	<p>[ ] Square brackets match any one of the characters
	specified inside [ ].</p>
      </item>
      <item>
	<p>^ A caret as the first character inside [ ] means NOT.</p>
      </item>
      <item>
	<p>^ A caret beginning a pattern anchors the beginning of a
	line.</p>
      </item>
      <item>
	<p>$ A dollar at the end of a pattern anchors the end of a
	line.</p>
      </item>
      <item>
	<p>| A vertical bar means logical OR.</p>
      </item>
      <item>
	<p>( ) Parentheses enclose expressions for grouping.
	<em>Not supported!</em></p>
      </item>
      <item>
	<p>\ A backslash can be used prior to any special character
	to match that character.</p>
      </item>
      <item>
	<p>\ A backslash can be used prior to an ordinary character
	to make it a special character.</p>
      </item>
    </list>

    <section id="period">

      <title>The Period</title>

      <p>The Period &quot;.&quot; will match any single character even
      a space or other non-alphabet character.
      <em>s.t</em> matches <em>s</em>i<em>t</em>,
      <em>s</em>e<em>t</em>,<em> s</em>o<em>t</em>,
      etc., which could be located in <em>s</em>i<em>t</em>ting,
      compas<em>s</em>e<em>t</em>h and <em>s</em>o<em>t</em>tish
      <em>b..t</em> matches <em>b</em>oo<em>t</em>,
      <em>b</em>oa<em>t</em> and <em>b</em>ea<em>t
      foot.tool </em>matches <em>foot</em>s<em>tool </em>and
      <em>foot tool</em></p>
    </section>


    <section id="asterisk">

      <title>The Asterisk</title>

      <p>The asterisk &quot;*&quot; matches zero or more characters of the
      preceding: set, character or indicated character. Using a period asterisk
      combination &quot;.*&quot; after a commonly found pattern can cause the
      search to take a very long time, making the program seem to freeze.
      <em>be*n</em> matches<em> beeen, been, ben</em>, and <em>bn</em> which
      could locate Reu<em>ben</em> and She<em>bn</em>a.</p>
    </section>


    <section id="plus">

      <title>The Plus Sign</title>

      <p>The Plus Sign &quot;+&quot; matches one or more characters of the
      preceding: set, character or indicated character. Using a period and plus
      sign combination &quot;.+&quot; after a commonly found pattern can cause
      the search to take a very long time, making the program seem to freeze.
      <em>be+n</em> matches <em>beeen, been</em> and <em>ben</em>, but not
      <em>bn</em>.</p>
    </section>


    <section id="question">

      <title>The Question Mark</title>

      <p>The Question Mark &quot;?&quot;matches zero or one character of the
      preceding: set, character or indicated character. <em>be?n</em> matches
      <em>ben</em> and <em>bn</em> but not <em>been</em>.
      <em>trees?</em> matches <em>trees</em> or <em>tree</em>.</p>
    </section>


    <section id="bracket">

      <title>The Square Brackets </title>

      <p>The Square Brackets &quot;[]&quot; enclose a set of characters that can
      match. The period, asterisk, plus sign and question mark are not special
      inside the brackets. A minus sign can be used to indicate a range. If you
      want a caret &quot;^&quot; to be part of the range do not place it first
      after the left bracket or it will be a special character. To include a
      &quot;]&quot; in the set make it the first (or second after a special
      &quot;^&quot;) character in the set. To include a minus sign in the set
      make it the first (or second after a special &quot;^&quot;) or last
      character in the set.
      <em>s[eia]t</em> matches <em>set</em>, <em>sit</em>,
      and <em>sat</em>, but not <em>s</em>o<em>t</em>.
      <em>s[eia]+t </em>matches as above but also, <em>seat,
      seet, siet</em>, etc.
      <em>[a-d]</em> matches <em>a, b, c,</em> or <em>d</em>.
      <em>[A-Z]</em> matches any uppercase letter.
      [.;:?!] matches ., ;, :, ?, or ! but not a comma.
      [ ]^-] matches ] or ^ or -</p>
    </section>


    <section id="caret">

      <title>The Caret first in Square Brackets </title>

      <p>If the Caret is the first character after the left bracket
      (&quot;[^&quot;) it means NOT. <em>s[^io]t</em> matches <em>set,
      sat</em>, etc., but not <em>s</em>i<em>t</em> and
      <em>s</em>o<em>t</em>.</p>
    </section>

    <section id="caret-s">

      <title>The Caret as Start of Line Anchor </title>

      <p>If the Caret is the first character in a pattern (&quot;^xxx&quot;) it
      anchors the pattern to the start of a line. Any match must be at the
      beginning of a line. Because of unfiltered formatting characters in some
      texts, this feature does not always work, but may if a few periods are
      placed after the caret to account for the formatting characters. <em>^In
      the beginning</em> matches lines that start with &quot;<em>In the
      beginning</em>&quot;. (May need to use: <em>^.....In the
      beginning</em>)</p>
    </section>


    <section id="dollar">

      <title>The Dollar Sign as End of Line Anchor </title>

      <p>If the Dollar Sign is the last character (&quot;xxx$&quot;) in a
      pattern it anchors the pattern to the end of a line. Any match must be at
      the end of a line. Because of unfiltered formatting characters in some
      texts, this feature does not always work, but may if a few periods are
      placed before the dollar sign to account for the formatting characters.
      <em>Amen\.$</em> matches lines that end with &quot;<em>Amen.</em>&quot;
      (May need to use Amen\....$, Amen\..........$,
      or even Amen\....................$)</p>
    </section>


    <section id="bar">

      <title>The Vertical Bar </title>

      <p>The Vertical Bar &quot;|&quot; between patterns means OR.
      <em>John|Peter</em> matches <em>John</em> or <em>Peter.
      John .*Peter|Peter .*John</em> matches <em>John</em>
      ... <em>Peter</em> or <em>Peter</em> ... <em>John</em>.
      (.* slows a search)
      <em>pain|suffering|sorrow</em> matches <em>pain</em>,
      or <em>suffering</em>, or <em>sorrow</em>.</p>
    </section>


    <section id="parenth">

      <title>The Parentheses</title>

      <p><em>The use of Parentheses &quot;( )&quot; is not supported!</em></p>
    </section>


    <section id="backslash">

      <title>The Backslash Prior to a Special Character</title>
      <p>The Backslash prior to a special character (&quot;\*&quot;) indicates
      that the character is not being used in its special meaning, but is just
      to match itself. <em>amen\.</em> matches <em>amen.</em> but not
      <em>amen</em>t and will not locate firm<em>amen</em>t.</p>
    </section>

    <section id="backslash-o">

      <title>The Backslash Prior to an Ordinary Character </title>


      <p>The Backslash prior to an ordinary character (&quot;\o&quot;) indicates
      that the character is not being used to match itself, but has special
      meaning.</p>

      <list>
	<item>
	  <p>\b if use outside [ ] means word boundary. If used inside [ ] means
	  backspace. <em>\brighteous\b</em> matches <em>righteous</em> but not
	  un<em>righteous</em> or <em>righteous</em>ness</p>
	</item>
	<item>
	  <p>\B means non-word boundary.  <em>\Brighteous\B</em> matches
	  un<em>righteous</em>ness and un<em>righteous</em>ly but not
	  <em>righteous</em>, un<em>righteous</em> or
	  <em>righteous</em>ness.</p>
	</item>
	<item>
	  <p>\d means digit; same as [0-9].</p>
	</item>
	<item>
	  <p>\D means non-digit, same as [^0-9].</p>
	</item>
	<item>
	  <p>\s means space. </p>
	</item>
	<item>
	  <p>\S means not a space. </p>
	</item>
	<item>
	  <p>\w means alphanumeric; same as [a-zA-Z0-9_].</p>
	</item>
	<item>
	  <p>\W means not alphanumeric; same as [^a-zA-Z0-9_].</p>
	</item>
      </list>
    </section>
  </section>

</page>