1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312
|
<page id="xiphos-32-search-syntax" type="topic"
xmlns="http://projectmallard.org/1.0/"
xmlns:its="http://www.w3.org/2005/11/its">
<info>
<desc>Search Syntax using Regular Expression.</desc>
<link type="guide" xref="index#search-function"/>
<revision pkgversion="4.1.0" date="2018-04-24" status="draft"/>
<revision pkgversion="4.1.0" date="2018-05-30" status="candidate"/>
<title type='link' role="trail"></title>
<title type='text'>Xiphos</title>
<credit type="author" its:translate="no">
<name>Andy Piper</name>
</credit>
<credit type="author" its:translate="no">
<name>Pierre Benz</name>
</credit>
<credit type="author" its:translate="no">
<name>Dr Peter von Kaehne</name>
</credit>
<credit type="author" its:translate="no">
<name>Karl Kleinpaste</name>
</credit>
<credit type="author" its:translate="no">
<name>Matthew Talbert</name>
</credit>
<include href="legal.xml" xmlns="http://www.w3.org/2001/XInclude"/>
</info>
<!-- id="hdbk-op-search-dialog-text-regexp" -->
<title>Search Syntax</title>
<p>Regular expression searches provide a way to do simple or complex searches
for strings that match a pattern or set of patterns (branches) separated by
vertical bars "|". While a pattern can be built to look for a word
or phrase, a simple pattern that consists of a word does not look for only
that word but for any place the string of letters that make that word are
found. A search for "right" will return verses that contain the
word "right", but also "<em>right</em>eous",
"<em>right</em> eousness", "un<em>right</em>eous",
"up<em>right</em>" and even "b<em>right</em>". A search
for "hall not" is not a search for "hall" AND
"not" but for the string "hall not" with a space between
the second "l" and the "n". The search for "hall
not" will find occurrences of "s<em>hall not</em>".</p>
<p>The power of Regular Expressions is in the patterns (or templates) used to
define a search. A pattern consists of ordinary characters and some special
characters that are used and interpreted by a set of rules. Special characters
include .\[^*$?+. Ordinary (or simple) characters are any characters that are
not special. The backslash, "\", is used to convert special
characters to ordinary and ordinary characters to special.</p>
<p>Example: the pattern "<em>i. love\.</em>" will find sentences
that end with "h<em>i</em>s <em>love</em>" or "<em>i</em>n
<em>love</em>" or " <em>i</em>s <em>love</em>" followed by a
period. The first period in "i. love \." is a special character that
means allow any character in this position. The backslash in "i.
love\." means that the period following it is not to be considered a
special character, but is an ordinary period.</p>
<section id="hdbk-op-search-dialog-text-regexp-rules">
<title>Rules for Regular Expression Search Requests</title>
<list>
<item>
<p>. The period matches any character.</p>
</item>
<item>
<p>* The asterisk matches 0 or more characters of the
preceding: set, character or indicated character.</p>
</item>
<item>
<p>+ The plus sign matches 1 or more characters of the
preceding: set, character or indicated character.</p>
</item>
<item>
<p>? The question mark matches 0 or 1 character of the
preceding: set, character or indicated character.</p>
</item>
<item>
<p>[ ] Square brackets match any one of the characters
specified inside [ ].</p>
</item>
<item>
<p>^ A caret as the first character inside [ ] means NOT.</p>
</item>
<item>
<p>^ A caret beginning a pattern anchors the beginning of a
line.</p>
</item>
<item>
<p>$ A dollar at the end of a pattern anchors the end of a
line.</p>
</item>
<item>
<p>| A vertical bar means logical OR.</p>
</item>
<item>
<p>( ) Parentheses enclose expressions for grouping.
<em>Not supported!</em></p>
</item>
<item>
<p>\ A backslash can be used prior to any special character
to match that character.</p>
</item>
<item>
<p>\ A backslash can be used prior to an ordinary character
to make it a special character.</p>
</item>
</list>
<section id="period">
<title>The Period</title>
<p>The Period "." will match any single character even
a space or other non-alphabet character.
<em>s.t</em> matches <em>s</em>i<em>t</em>,
<em>s</em>e<em>t</em>,<em> s</em>o<em>t</em>,
etc., which could be located in <em>s</em>i<em>t</em>ting,
compas<em>s</em>e<em>t</em>h and <em>s</em>o<em>t</em>tish
<em>b..t</em> matches <em>b</em>oo<em>t</em>,
<em>b</em>oa<em>t</em> and <em>b</em>ea<em>t
foot.tool </em>matches <em>foot</em>s<em>tool </em>and
<em>foot tool</em></p>
</section>
<section id="asterisk">
<title>The Asterisk</title>
<p>The asterisk "*" matches zero or more characters of the
preceding: set, character or indicated character. Using a period asterisk
combination ".*" after a commonly found pattern can cause the
search to take a very long time, making the program seem to freeze.
<em>be*n</em> matches<em> beeen, been, ben</em>, and <em>bn</em> which
could locate Reu<em>ben</em> and She<em>bn</em>a.</p>
</section>
<section id="plus">
<title>The Plus Sign</title>
<p>The Plus Sign "+" matches one or more characters of the
preceding: set, character or indicated character. Using a period and plus
sign combination ".+" after a commonly found pattern can cause
the search to take a very long time, making the program seem to freeze.
<em>be+n</em> matches <em>beeen, been</em> and <em>ben</em>, but not
<em>bn</em>.</p>
</section>
<section id="question">
<title>The Question Mark</title>
<p>The Question Mark "?"matches zero or one character of the
preceding: set, character or indicated character. <em>be?n</em> matches
<em>ben</em> and <em>bn</em> but not <em>been</em>.
<em>trees?</em> matches <em>trees</em> or <em>tree</em>.</p>
</section>
<section id="bracket">
<title>The Square Brackets </title>
<p>The Square Brackets "[]" enclose a set of characters that can
match. The period, asterisk, plus sign and question mark are not special
inside the brackets. A minus sign can be used to indicate a range. If you
want a caret "^" to be part of the range do not place it first
after the left bracket or it will be a special character. To include a
"]" in the set make it the first (or second after a special
"^") character in the set. To include a minus sign in the set
make it the first (or second after a special "^") or last
character in the set.
<em>s[eia]t</em> matches <em>set</em>, <em>sit</em>,
and <em>sat</em>, but not <em>s</em>o<em>t</em>.
<em>s[eia]+t </em>matches as above but also, <em>seat,
seet, siet</em>, etc.
<em>[a-d]</em> matches <em>a, b, c,</em> or <em>d</em>.
<em>[A-Z]</em> matches any uppercase letter.
[.;:?!] matches ., ;, :, ?, or ! but not a comma.
[ ]^-] matches ] or ^ or -</p>
</section>
<section id="caret">
<title>The Caret first in Square Brackets </title>
<p>If the Caret is the first character after the left bracket
("[^") it means NOT. <em>s[^io]t</em> matches <em>set,
sat</em>, etc., but not <em>s</em>i<em>t</em> and
<em>s</em>o<em>t</em>.</p>
</section>
<section id="caret-s">
<title>The Caret as Start of Line Anchor </title>
<p>If the Caret is the first character in a pattern ("^xxx") it
anchors the pattern to the start of a line. Any match must be at the
beginning of a line. Because of unfiltered formatting characters in some
texts, this feature does not always work, but may if a few periods are
placed after the caret to account for the formatting characters. <em>^In
the beginning</em> matches lines that start with "<em>In the
beginning</em>". (May need to use: <em>^.....In the
beginning</em>)</p>
</section>
<section id="dollar">
<title>The Dollar Sign as End of Line Anchor </title>
<p>If the Dollar Sign is the last character ("xxx$") in a
pattern it anchors the pattern to the end of a line. Any match must be at
the end of a line. Because of unfiltered formatting characters in some
texts, this feature does not always work, but may if a few periods are
placed before the dollar sign to account for the formatting characters.
<em>Amen\.$</em> matches lines that end with "<em>Amen.</em>"
(May need to use Amen\....$, Amen\..........$,
or even Amen\....................$)</p>
</section>
<section id="bar">
<title>The Vertical Bar </title>
<p>The Vertical Bar "|" between patterns means OR.
<em>John|Peter</em> matches <em>John</em> or <em>Peter.
John .*Peter|Peter .*John</em> matches <em>John</em>
... <em>Peter</em> or <em>Peter</em> ... <em>John</em>.
(.* slows a search)
<em>pain|suffering|sorrow</em> matches <em>pain</em>,
or <em>suffering</em>, or <em>sorrow</em>.</p>
</section>
<section id="parenth">
<title>The Parentheses</title>
<p><em>The use of Parentheses "( )" is not supported!</em></p>
</section>
<section id="backslash">
<title>The Backslash Prior to a Special Character</title>
<p>The Backslash prior to a special character ("\*") indicates
that the character is not being used in its special meaning, but is just
to match itself. <em>amen\.</em> matches <em>amen.</em> but not
<em>amen</em>t and will not locate firm<em>amen</em>t.</p>
</section>
<section id="backslash-o">
<title>The Backslash Prior to an Ordinary Character </title>
<p>The Backslash prior to an ordinary character ("\o") indicates
that the character is not being used to match itself, but has special
meaning.</p>
<list>
<item>
<p>\b if use outside [ ] means word boundary. If used inside [ ] means
backspace. <em>\brighteous\b</em> matches <em>righteous</em> but not
un<em>righteous</em> or <em>righteous</em>ness</p>
</item>
<item>
<p>\B means non-word boundary. <em>\Brighteous\B</em> matches
un<em>righteous</em>ness and un<em>righteous</em>ly but not
<em>righteous</em>, un<em>righteous</em> or
<em>righteous</em>ness.</p>
</item>
<item>
<p>\d means digit; same as [0-9].</p>
</item>
<item>
<p>\D means non-digit, same as [^0-9].</p>
</item>
<item>
<p>\s means space. </p>
</item>
<item>
<p>\S means not a space. </p>
</item>
<item>
<p>\w means alphanumeric; same as [a-zA-Z0-9_].</p>
</item>
<item>
<p>\W means not alphanumeric; same as [^a-zA-Z0-9_].</p>
</item>
</list>
</section>
</section>
</page>
|