File: syntax_perl.html

package info (click to toggle)
boost 1.34.1-14
  • links: PTS
  • area: main
  • in suites: lenny
  • size: 116,412 kB
  • ctags: 259,566
  • sloc: cpp: 642,395; xml: 56,450; python: 17,612; ansic: 14,520; sh: 2,265; yacc: 858; perl: 481; makefile: 478; lex: 94; sql: 74; csh: 6
file content (626 lines) | stat: -rw-r--r-- 30,228 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
   <head>
      <title>Boost.Regex: Perl Regular Expression Syntax</title>
      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
      <LINK href="../../../boost.css" type="text/css" rel="stylesheet"></head>
   <body>
      <P>
         <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
            <TR>
               <td vAlign="top" width="300">
                  <h3><A href="../../../index.htm"><IMG height="86" alt="C++ Boost" src="../../../boost.png" width="277" border="0"></A></h3>
               </td>
               <TD width="353">
                  <H1 align="center">Boost.Regex</H1>
                  <H2 align="center">
                     Perl&nbsp;Regular Expression Syntax</H2>
               </TD>
               <td width="50">
                  <h3><A href="index.html"><IMG height="45" alt="Boost.Regex Index" src="uarrow.gif" width="43" border="0"></A></h3>
               </td>
            </TR>
         </TABLE>
      </P>
      <HR>
      <H3>Contents</H3>
      <dl class="index">
         <dt><A href="#synopsis">Synopsis</A> <dt><A href="#Perl">Perl&nbsp;Syntax</A> <dt><A href="#what">
                     What Gets Matched</A> <dt><A href="#variations">Variations</A>
                     <dd>
                        <dt><A href="#options">Options</A> <dt><A href="#mods">Modifiers</A> <dt><A href="#refs">References</A></dt>
      </dl>
      <H3><A name="synopsis"></A>Synopsis</H3>
      <P>The Perl regular expression syntax is based on that used by the programming 
         language <EM>Perl</EM> .&nbsp; Perl regular expressions are the default 
         behavior in Boost.Regex or you can&nbsp;pass the flag <EM>perl</EM> to the 
         regex constructor, for example:</P>
      <PRE>// e1 is a case sensitive Perl regular expression: 
// since Perl is the default option there's no need to explicitly specify the syntax used here:
boost::regex e1(my_expression);
// e2 a case insensitive Perl regular expression:
boost::regex e2(my_expression, boost::regex::perl|boost::regex::icase);</PRE>
      <H3>Perl&nbsp;Regular Expression Syntax<A name="Perl"></A></H3>
      <P>In&nbsp;Perl regular expressions, all characters match themselves except for 
         the following special characters:</P>
      <PRE>.[{()\*+?|^$</PRE>
      <H4>Wildcard:</H4>
      <P>The single character '.' when used outside of a character set will match any 
         single character except:</P>
      <P>The NULL character when the flag <EM>match_no_dot_null</EM> is passed to the 
         matching algorithms.</P>
      <P>The newline character when the flag <EM>match_not_dot_newline</EM> is passed to 
         the matching algorithms.</P>
      <H4>Anchors:</H4>
      <P>A '^' character shall match the start of a line.</P>
      <P>A '$' character shall match the end of a line.</P>
      <H4>Marked sub-expressions:</H4>
      <P>A section beginning ( and ending ) acts as a marked sub-expression.&nbsp; 
         Whatever matched the sub-expression is split out in a separate field by the 
         matching algorithms.&nbsp; Marked sub-expressions can also repeated, or 
         referred to by a back-reference.</P>
      <H4>Non-marking grouping:</H4>
      <P>A marked sub-expression is useful to lexically group part of a regular 
         expression, but has the side-effect of spitting out an extra field in the 
         result.&nbsp; As an alternative&nbsp;you can lexically group part of a regular 
         expression, without generating a marked sub-expression by using (?: and ) , for 
         example (?:ab)+ will repeat "ab" without splitting out any separate 
         sub-expressions.</P>
      <H4>Repeats:</H4>
      <P>Any atom (a single character, a marked sub-expression, or a character class) 
         can be repeated with the *, +, ?, and {}&nbsp;operators.</P>
      <P>The * operator will match the preceding atom zero or more times, for example 
         the expression a*b will match any of the following:</P>
      <PRE>b
ab
aaaaaaaab</PRE>
      <P>The + operator will match the preceding atom one or more times, for example the 
         expression a+b will match any of the following:</P>
      <PRE>ab
aaaaaaaab</PRE>
      <P>But will not match:</P>
      <PRE>b</PRE>
      <P>The ? operator will match the preceding atom zero or&nbsp;one times, for 
         example the expression ca?b will match any of the following:</P>
      <PRE>cb
cab</PRE>
      <P>But will not match:</P>
      <PRE>caab</PRE>
      <P>An atom can also be repeated with a bounded repeat:</P>
      <P>a{n}&nbsp; Matches 'a' repeated exactly <EM>n</EM> times.</P>
      <P>a{n,}&nbsp; Matches 'a' repeated <EM>n</EM> or more times.</P>
      <P>a{n, m}&nbsp; Matches 'a' repeated between <EM>n</EM> and <EM>m</EM> times 
         inclusive.</P>
      <P>For example:</P>
      <PRE>^a{2,3}$</PRE>
      <P>Will match either of:</P>
      <PRE>aa
aaa</PRE>
      <P>But neither of:</P>
      <PRE>a
aaaa</PRE>
      <P>It is an error to use a repeat operator, if the preceding construct can not be 
         repeated, for example:</P>
      <PRE>a(*)</PRE>
      <P>Will raise an error, as there is nothing for the * operator to be applied to.</P>
      <H4>Non greedy repeats</H4>
      <P>The normal repeat operators are "greedy", that is to say they will consume as 
         much input as possible.&nbsp; There are non-greedy versions available that will 
         consume as little input as possible while still producing a match.</P>
      <P>*? Matches the previous atom zero or more times, while consuming as little 
         input as possible.</P>
      <P>+? Matches the previous atom one or more times, while consuming as little input 
         as possible.</P>
      <P>?? Matches the previous atom zero or one times, while consuming as little input 
         as possible.</P>
      <P>{n,}? Matches the previous atom <EM>n</EM> or more times, while&nbsp;consuming 
         as little input as possible.</P>
      <P>{n,m}? Matches the previous atom between <EM>n</EM> and <EM>m</EM> times, 
         while&nbsp;consuming as little input as possible.</P>
      <H4>Back references:</H4>
      <P>An escape character followed by a digit <EM>n</EM>, where <EM>n </EM>is in the 
         range 1-9, matches the same string that was matched by sub-expression <EM>n</EM>.&nbsp; 
         For example the expression:</P>
      <PRE>^(a*).*\1$</PRE>
      <P>Will match the string:</P>
      <PRE>aaabbaaa</PRE>
      <P>But not the string:</P>
      <PRE>aaabba</PRE>
      <H4>Alternation</H4>
      <P>The | operator will match either of its arguments, so for example: abc|def will 
         match either "abc" or "def".&nbsp;
      </P>
      <P>Parenthesis can be used to group alternations, for example: ab(d|ef) will match 
         either of "abd" or "abef".</P>
      <P>Empty&nbsp;alternatives are not allowed (these are almost always a mistake), 
         but if you really want an empty alternative use (?:) as a placeholder, for 
         example:</P>
      <BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
         <P>"|abc" is not a valid expression, but<BR>
            "(?:)|abc" is and is equivalent, also the expression:<BR>
            "(?:abc)??" has exactly the same effect.</P>
      </BLOCKQUOTE>
      <H4>Character sets:</H4>
      <P>A character set is a bracket-expression starting with [ and ending with ], it 
         defines a set of characters, and matches any single character that is a member 
         of that set.</P>
      <P>A bracket expression may contain any combination of the following:</P>
      <BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
         <H5>Single characters:</H5>
         <P>For example [abc], will match any of the characters 'a', 'b', or 'c'.</P>
         <H5>Character ranges:</H5>
         <P>For example [a-c] will match any single character in the range 'a' to 
            'c'.&nbsp; By default, for POSIX-Perl regular expressions, a character <EM>x</EM>
            is within the range <EM>y</EM> to <EM>z</EM>, if it collates within that 
            range;&nbsp;this results in locale specific behavior.&nbsp; This behavior can 
            be turned off by unsetting the <EM><A href="syntax_option_type.html#Perl">collate</A></EM>
            option flag - in which case whether a character appears within a range is 
            determined by comparing the code points of the characters only</P>
         <H5>Negation:</H5>
         <P>If the bracket-expression begins with the ^ character, then it matches the 
            complement of the characters it contains, for example [^a-c] matches any 
            character that is not in the range a-c.</P>
         <H5>Character classes:</H5>
         <P>An expression of the form [[:name:]] matches the named character class "name", 
            for example [[:lower:]] matches any lower case character.&nbsp; See <A href="character_class_names.html">
               character class names</A>.</P>
         <H5>Collating Elements:</H5>
         <P>An expression of the form [[.col.] matches the collating element <EM>col</EM>.&nbsp; 
            A collating element is any single character, or any sequence of characters that 
            collates as a single unit.&nbsp; Collating elements may also be used as the end 
            point of a range, for example: [[.ae.]-c] matches the character sequence "ae", 
            plus any single character in the range "ae"-c, assuming that "ae" is treated as 
            a single collating element in the current locale.</P>
         <P>As an extension, a collating element may also be specified via it's <A href="collating_names.html">
               symbolic name</A>, for example:</P>
         <P>[[.NUL.]]</P>
         <P>matches a NUL character.</P>
         <H5>Equivalence classes:</H5>
         <P>
            An expression oftheform[[=col=]], matches any character or collating element 
            whose primary sort key is the same as that for collating element <EM>col</EM>, 
            as with colating elements the name <EM>col</EM> may be a <A href="collating_names.html">
               symbolic name</A>.&nbsp; A primary sort key is one that ignores case, 
            accentation, or locale-specific tailorings; so for example [[=a=]] matches any 
            of the characters: a, , , , , , , A, , , , ,  and .&nbsp; 
            Unfortunately implementation of this is reliant on the platform's collation and 
            localisation support; this feature can not be relied upon to work portably 
            across all platforms, or even all locales on one platform.</P>
         <H5>Escapes:</H5>
         <P>All the escape sequences that match a single character, or a single character 
            class are permitted within a character class definition, <EM>except</EM> the 
            negated character classes (\D \W etc).</P>
      </BLOCKQUOTE>
      <H5>Combinations:</H5>
      <P>All of the above can be combined in one character set declaration, for example: 
         [[:digit:]a-c[.NUL.]].</P>
      <H4>Escapes</H4>
      <P>Any special character preceded by an escape shall match itself.
      </P>
      <P>The following escape sequences are also supported:</P>
      <BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
         <H5>Escapes matching a specific character</H5>
         <P>The following escape sequences are all synonyms for single characters:</P>
         <P>
            <TABLE id="Table7" cellSpacing="1" cellPadding="1" width="100%" border="1">
               <TR>
                  <TD><STRONG>Escape</STRONG></TD>
                  <TD><STRONG>Character</STRONG></TD>
               </TR>
               <TR>
                  <TD>\a</TD>
                  <TD>'\a'</TD>
               </TR>
               <TR>
                  <TD>\e</TD>
                  <TD>0x1B</TD>
               </TR>
               <TR>
                  <TD>\f</TD>
                  <TD>\f</TD>
               </TR>
               <TR>
                  <TD>\n</TD>
                  <TD>\n</TD>
               </TR>
               <TR>
                  <TD>\r</TD>
                  <TD>\r</TD>
               </TR>
               <TR>
                  <TD>\t</TD>
                  <TD>\t</TD>
               </TR>
               <TR>
                  <TD>\v</TD>
                  <TD>\v</TD>
               </TR>
               <TR>
                  <TD>\b</TD>
                  <TD>\b (but only inside a character class declaration).</TD>
               </TR>
               <TR>
                  <TD>\cX</TD>
                  <TD>An ASCII escape sequence - the character whose code point is X % 32</TD>
               </TR>
               <TR>
                  <TD>\xdd</TD>
                  <TD>A hexadecimal escape sequence - matches the single character whose code point 
                     is 0xdd.</TD>
               </TR>
               <TR>
                  <TD>\x{dddd}</TD>
                  <TD>A hexadecimal escape sequence - matches the single character whose code point 
                     is 0xdddd.</TD>
               </TR>
               <TR>
                  <TD>\0ddd</TD>
                  <TD>An octal escape sequence - matches the single character whose code point is 
                     0ddd.</TD>
               </TR>
               <TR>
                  <TD>\N{name}</TD>
                  <TD>Matches the single character which has the <A href="collating_names.html">symbolic 
                        name</A> <EM>name.&nbsp; </EM>For example \N{newline} matches the single 
                     character \n.</TD>
               </TR>
            </TABLE>
         </P>
         <H5>"Single character" character&nbsp;classes:</H5>
         <P>Any escaped character <EM>x</EM>, if <EM>x</EM> is the name of a character 
            class shall match any character that is a member of that class, and any escaped 
            character <EM>X</EM>, if <EM>x</EM> is the name of a character class, shall 
            match any character not in that class.</P>
         <P>The following are supported by default:</P>
         <P>
            <TABLE id="Table3" cellSpacing="1" cellPadding="1" width="300" border="1">
               <TR>
                  <TD><STRONG>Escape sequence</STRONG></TD>
                  <TD><STRONG>Equivalent to</STRONG></TD>
               </TR>
               <TR>
                  <TD>\d</TD>
                  <TD>[[:digit:]]</TD>
               </TR>
               <TR>
                  <TD>\l</TD>
                  <TD>[[:lower:]]</TD>
               </TR>
               <TR>
                  <TD>\s</TD>
                  <TD>[[:space:]]</TD>
               </TR>
               <TR>
                  <TD>\u</TD>
                  <TD>[[:upper:]]</TD>
               </TR>
               <TR>
                  <TD>\w</TD>
                  <TD>[[:word:]]</TD>
               </TR>
               <TR>
                  <TD>\D</TD>
                  <TD>[^[:digit:]]</TD>
               </TR>
               <TR>
                  <TD>\L</TD>
                  <TD>[^[:lower:]]</TD>
               </TR>
               <TR>
                  <TD>\S</TD>
                  <TD>[^[:space:]]</TD>
               </TR>
               <TR>
                  <TD>\U</TD>
                  <TD>[^[:upper:]]</TD>
               </TR>
               <TR>
                  <TD>\W</TD>
                  <TD>[^[:word:]]</TD>
               </TR>
            </TABLE>
         </P>
         <H5>Character Properties</H5>
         <P>The character property names in the following table are all equivalent to the <A href="character_class_names.html">
               names used in character classes</A>.</P>
         <P>
            <TABLE id="Table9" cellSpacing="1" cellPadding="1" width="100%" border="0">
               <TR>
                  <TD><STRONG>Form</STRONG></TD>
                  <TD><STRONG>Description</STRONG></TD>
                  <TD><STRONG>Equivalent character set form</STRONG></TD>
               </TR>
               <TR>
                  <TD>\pX</TD>
                  <TD>Matches any character that has the property X.</TD>
                  <TD>[[:X:]]</TD>
               </TR>
               <TR>
                  <TD>\p{Name}</TD>
                  <TD>Matches any character that has the property <EM>Name</EM>.</TD>
                  <TD>[[:Name:]]</TD>
               </TR>
               <TR>
                  <TD>\PX</TD>
                  <TD>Matches any character that does not have the property X.</TD>
                  <TD>[^[:X:]]</TD>
               </TR>
               <TR>
                  <TD>\P{Name}</TD>
                  <TD>Matches any character that does not have the property <EM>Name</EM>.</TD>
                  <TD>[^[:Name:]]</TD>
               </TR>
            </TABLE>
         </P>
         <H5>Word Boundaries</H5>
         <P>The following escape sequences match the boundaries of words:</P>
         <P>
            <TABLE id="Table4" cellSpacing="1" cellPadding="1" width="100%" border="1">
               <TR>
                  <TD>\&lt;</TD>
                  <TD>Matches the start of a word.</TD>
               </TR>
               <TR>
                  <TD>\&gt;</TD>
                  <TD>Matches the end of a word.</TD>
               </TR>
               <TR>
                  <TD>\b</TD>
                  <TD>Matches a word boundary (the start or end of a word).</TD>
               </TR>
               <TR>
                  <TD>\B</TD>
                  <TD>Matches only when not at a word boundary.</TD>
               </TR>
            </TABLE>
         </P>
         <H5>Buffer boundaries</H5>
         <P>The following match only at buffer boundaries: a "buffer" in this context is 
            the whole of the input text&nbsp;that is being matched against (note that ^ and 
            $ may match embedded newlines within the text).</P>
         <P>
            <TABLE id="Table5" cellSpacing="1" cellPadding="1" width="100%" border="1">
               <TR>
                  <TD>\`</TD>
                  <TD>Matches at the start of a buffer only.</TD>
               </TR>
               <TR>
                  <TD>\'</TD>
                  <TD>Matches at the end of a buffer only.</TD>
               </TR>
               <TR>
                  <TD>\A</TD>
                  <TD>Matches at the start of a buffer only (the same as \`).</TD>
               </TR>
               <TR>
                  <TD>\z</TD>
                  <TD>Matches at the end of a buffer only (the same as \').</TD>
               </TR>
               <TR>
                  <TD>\Z</TD>
                  <TD>Matches an optional sequence of newlines at the end of a buffer: equivalent to 
                     the regular expression \n*\z</TD>
               </TR>
            </TABLE>
         </P>
         <H5>Continuation Escape</H5>
         <P>The sequence \G matches only at the end of the last match found, or at the 
            start of the text being matched if no previous match was found.&nbsp; This 
            escape useful if you're iterating over the matches contained within a text, and 
            you want each subsequence match to start where the last one ended.</P>
         <H5>Quoting escape</H5>
         <P>The escape sequence \Q begins a "quoted sequence": all the subsequent 
            characters are treated as literals, until either the end of the regular 
            expression or \E is found.&nbsp; For example the expression: \Q\*+\Ea+ would 
            match either of:</P>
         <PRE>\*+a<BR>\*+aaa</PRE>
         <H5>Unicode escapes</H5>
         <P>
            <TABLE id="Table6" cellSpacing="1" cellPadding="1" width="100%" border="1">
               <TR>
                  <TD>\C</TD>
                  <TD>Matches a single code point: in Boost regex this has exactly the same effect 
                     as a "." operator.</TD>
               </TR>
               <TR>
                  <TD>\X</TD>
                  <TD>Matches a combining character sequence: that is any non-combining character 
                     followed by a sequence of zero or more combining characters.</TD>
               </TR>
            </TABLE>
         </P>
         <H5>Any other escape</H5>
         <P>Any other escape sequence matches the character that is escaped, for example \@ 
            matches a literal <A href="mailto:'@'">'@'</A>.</P>
      </BLOCKQUOTE>
      <H4 dir="ltr">Perl Extended Patterns</H4>
      <P dir="ltr">Perl-specific extensions to the regular expression syntax all start 
         with (?.</P>
      <BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">
         <H5 dir="ltr">Comments</H5>
         <P dir="ltr">(?# ... ) is treated as a comment, it's contents are ignored.</P>
         <H5 dir="ltr">Modifiers</H5>
         <P dir="ltr">(?imsx-imsx ... ) alters which of the perl modifiers are in effect 
            within the pattern, changes take effect from the point that the block is first 
            seen and extend to any enclosing ).&nbsp; Letters before a '-' turn that perl 
            modifier on, letters afterward, turn it off.</P>
         <P dir="ltr">(?imsx-imsx:pattern) applies the specified modifiers to <EM>pattern</EM>
            only.</P>
         <H5 dir="ltr">Non-marking grouping</H5>
         <P dir="ltr">(?:pattern) lexically groups <EM>pattern</EM>, without generating an 
            additional sub-expression.</P>
         <H5 dir="ltr">Lookahead</H5>
         <P dir="ltr">(?=pattern) consumes zero characters, only if <EM>pattern</EM> matches.</P>
         <P dir="ltr">(?!pattern) consumes zero characters, only if <EM>pattern</EM> does 
            not match.</P>
         <P dir="ltr">Lookahead is typically used to create the logical AND of two regular 
            expressions, for example if a password must contain a lower case letter, an 
            upper case letter, a punctuation symbol, and be at least 6 characters long, 
            then the expression:</P>
         <PRE dir="ltr">(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}</PRE>
         <P dir="ltr">could be used to validate the password.</P>
         <H5 dir="ltr">Lookbehind</H5>
         <P dir="ltr">(?&lt;=pattern) consumes zero characters, only if <EM>pattern</EM> could 
            be matched against the characters preceding the current position (<EM>pattern</EM>
            must be of fixed length).</P>
         <P dir="ltr">(?&lt;!pattern) consumes zero characters, only if <EM>pattern</EM> could 
            not be matched against the characters preceding the current position (<EM>pattern</EM>
            must be of fixed length).</P>
         <H5 dir="ltr">Independent sub-expressions</H5>
         <P dir="ltr">(?&gt;pattern) <EM>pattern</EM> is matched independently of the 
            surrounding patterns, the expression will never backtrack into <EM>pattern</EM>.&nbsp; 
            Independent sub-expressions are typically used to improve performance; only the 
            best possible match for <EM>pattern</EM> will be considered, if this doesn't 
            allow the expression as a whole to match then no match is found at all.</P>
         <H5 dir="ltr">Conditional Expressions</H5>
         <P dir="ltr">(?(condition)yes-pattern|no-pattern) attempts to match <EM>yes-pattern</EM>
            if the <EM>condition </EM>is true, otherwise attempts to match <EM>no-pattern</EM>.</P>
         <P dir="ltr">(?(condition)yes-pattern) attempts to match <EM>yes-pattern</EM> if 
            the <EM>condition </EM>is true, otherwise fails.</P>
         <P dir="ltr"><EM>Condition</EM> may be either a forward lookahead assert, or the 
            index of a marked sub-expression (the condition becomes true if the 
            sub-expression has been matched).</P>
      </BLOCKQUOTE><A name="what">
         <H4>Operator precedence</H4>
         <P>&nbsp;The order of precedence for of operators is as shown in the following 
            table:</P>
         <P>
            <TABLE id="Table2" cellSpacing="1" cellPadding="1" width="100%" border="1">
               <TR>
                  <TD>Collation-related bracket symbols</TD>
                  <TD>[==] [::] [..]</TD>
               </TR>
               <TR>
                  <TD>Escaped characters
                  </TD>
                  <TD>\</TD>
               </TR>
               <TR>
                  <TD>Character set&nbsp;(bracket expression)
                  </TD>
                  <TD>[]</TD>
               </TR>
               <TR>
                  <TD>Grouping</TD>
                  <TD>()</TD>
               </TR>
               <TR>
                  <TD>Single-character-ERE duplication
                  </TD>
                  <TD>* + ? {m,n}</TD>
               </TR>
               <TR>
                  <TD>Concatenation</TD>
                  <TD></TD>
               </TR>
               <TR>
                  <TD>Anchoring</TD>
                  <TD>^$</TD>
               </TR>
               <TR>
                  <TD>Alternation</TD>
                  <TD>|</TD>
               </TR>
            </TABLE>
         </P>
      </A>
      <H3>What gets matched</H3>
      <P>If you view the regular expression as a directed (possibly cyclic) graph, then 
         the best match found is the first match found by a depth-first-search performed 
         on that graph, while matching the input text.</P>
      <P>Alternatively:</P>
      <P>the best match found is the leftmost match, with individual elements matched as 
         follows;</P>
      <P>
         <TABLE id="Table8" cellSpacing="1" cellPadding="1" width="100%" border="0">
            <TR>
               <TD><STRONG>Construct</STRONG></TD>
               <TD><STRONG>What gets matches</STRONG></TD>
            </TR>
            <TR>
               <TD>AtomA AtomB</TD>
               <TD>Locates the best match for AtomA that has a following match for&nbsp;AtomB.</TD>
            </TR>
            <TR>
               <TD>Expression1 | Expression2</TD>
               <TD>If Expresion1 can be matched then returns that match, otherwise attempts to 
                  match Expression2.</TD>
            </TR>
            <TR>
               <TD>S{N}</TD>
               <TD>Matches S repeated exactly N times.</TD>
            </TR>
            <TR>
               <TD>S{N,M}</TD>
               <TD>Matches S repeated between N and M times, and as many times as possible.</TD>
            </TR>
            <TR>
               <TD>S{N,M}?</TD>
               <TD>Matches S repeated between N and M times, and as few times as possible.</TD>
            </TR>
            <TR>
               <TD><!--StartFragment --> S?, S*, S+</TD>
               <TD><!--StartFragment --> The same as <CODE>S{0,1}</CODE>, <CODE>S{0,UINT_MAX}</CODE>,
                  <CODE>S{1,UINT_MAX}</CODE> respectively.
               </TD>
            </TR>
            <TR>
               <TD>S??, S*?, S+?</TD>
               <TD>The same as <CODE>S{0,1}?</CODE>, <CODE>S{0,UINT_MAX}?</CODE>, <CODE>S{1,UINT_MAX}?</CODE>
                  respectively.
               </TD>
            </TR>
            <TR>
               <TD><!--StartFragment --> (?&gt;S)
               </TD>
               <TD>Matches the best match for S, and only that.</TD>
            </TR>
            <TR>
               <TD>
                  (?=S), (?&lt;=S)
               </TD>
               <TD>Matches only the best match for S (this is only visible if there are capturing 
                  parenthesis within S).</TD>
            </TR>
            <TR>
               <TD><!--StartFragment --> (?!S), (?&lt;!S)</TD>
               <TD>Considers only whether a match for S exists or not.</TD>
            </TR>
            <TR>
               <TD><!--StartFragment --> (?(condition)yes-pattern | no-pattern)</TD>
               <TD>If condition is <EM>true</EM>, then only <EM>yes-pattern</EM> is considered, 
                  otherwise only <EM>no-pattern</EM> is considered.</TD>
            </TR>
         </TABLE>
      </P>
      <H3><A name="variations"></A>Variations</H3>
      <P>The options <A href="syntax_option_type.html#perl"><EM>normal, ECMAScript, JavaScript</EM>
            and <EM>JScript</EM></A> are all synonyms for <EM>Perl</EM>.</P>
      <H3><A name="options"></A>Options</H3>
      <P>There are a <A href="syntax_option_type.html#Perl">variety of flags</A> that 
         may be combined with the <EM>Perl</EM> option when constructing the regular 
         expression, in particular note that the <A href="syntax_option_type.html#Perl">newline_alt</A>
         option alters the syntax, while the <A href="syntax_option_type.html#Perl">collate, 
            nosubs&nbsp;and icase</A> options modify how the case and locale sensitivity 
         are to be applied.</P>
      <H3><A name="mods"></A>Modifiers</H3>
      <P>The perl <EM>smix</EM> modifiers can either be applied using a (?smix-smix) 
         prefix to the regular expression, or with one of the regex-compile time flags <EM><A href="syntax_option_type.html#Perl">
               no_mod_m, mod_x, mod_s, and no_mod_s</A></EM>.
      </P>
      <H3><A name="refs">References</H3>
      <P><A href="http://perldoc.perl.org/perlre.html"> Perl 5.8.</A></P>
      <HR>
      <P></P>
      <p>Revised&nbsp; 
         <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> 
         21 Aug 2004&nbsp; 
         <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
      <P><I> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;2004</I></P>
      <I>
         <P><I>Use, modification and distribution are subject to the Boost Software License, 
               Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
               or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>).</I></P>
      </I>
   </body>
</html>