1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736
|
.TH PCRE2SYNTAX 3 "27 November 2024" "PCRE2 10.45"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
.rs
.sp
The full syntax and semantics of the regular expression patterns that are
supported by PCRE2 are described in the
.\" HREF
\fBpcre2pattern\fP
.\"
documentation. This document contains a quick-reference summary of the pattern
syntax followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the
.\" HREF
\fBpcre2api\fP
.\"
documentation.
.
.SH "QUOTING"
.rs
.sp
\ex where x is non-alphanumeric is a literal x
\eQ...\eE treat enclosed characters as literal
.sp
Note that white space inside \eQ...\eE is always treated as literal, even if
PCRE2_EXTENDED is set, causing most other white space to be ignored. Note also
that PCRE2's handling of \eQ...\eE has some differences from Perl's. See the
.\" HREF
\fBpcre2pattern\fP
.\"
documentation for details.
.
.
.SH "BRACED ITEMS"
.rs
.sp
With one exception, wherever brace characters { and } are required to enclose
data for constructions such as \eg{2} or \ek{name}, space and/or horizontal tab
characters that follow { or precede } are allowed and are ignored. In the case
of quantifiers, they may also appear before or after the comma. The exception
is \eu{...} which is not Perl-compatible and is recognized only when
PCRE2_EXTRA_ALT_BSUX is set. This is an ECMAScript compatibility feature, and
follows ECMAScript's behaviour.
.
.
.SH "ESCAPED CHARACTERS"
.rs
.sp
This table applies to ASCII and Unicode environments. An unrecognized escape
sequence causes an error.
.sp
\ea alarm, that is, the BEL character (hex 07)
\ecx "control-x", where x is a non-control ASCII character
\ee escape (hex 1B)
\ef form feed (hex 0C)
\en newline (hex 0A)
\er carriage return (hex 0D)
\et tab (hex 09)
\e0dd character with octal code 0dd
\eddd character with octal code ddd, or backreference
\eo{ddd..} character with octal code ddd..
\eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
\exhh character with hex code hh
\ex{hh..} character with hex code hh..
.sp
\eN{U+hh..} is synonymous with \ex{hh..} but is not supported in environments
that use EBCDIC code (mainly IBM mainframes). Note that \eN not followed by an
opening curly bracket has a different meaning (see below).
.P
If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
following are also recognized:
.sp
\eU the character "U"
\euhhhh character with hex code hhhh
\eu{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
.sp
When \ex is not followed by {, one or two hexadecimal digits are read,
but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be
recognized as a hexadecimal escape; otherwise it matches a literal "x".
Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits
or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
matches a literal "u".
.P
Note that \e0dd is always an octal code. The treatment of backslash followed by
a non-zero digit is complicated; for details see the section
.\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
.\" </a>
"Non-printing characters"
.\"
in the
.\" HREF
\fBpcre2pattern\fP
.\"
documentation, where details of escape processing in EBCDIC environments are
also given.
.
.
.SH "CHARACTER TYPES"
.rs
.sp
. any character except newline;
in dotall mode, any character whatsoever
\eC one code unit, even in UTF mode (best avoided)
\ed a decimal digit
\eD a character that is not a decimal digit
\eh a horizontal white space character
\eH a character that is not a horizontal white space character
\eN a character that is not a newline
\ep{\fIxx\fP} a character with the \fIxx\fP property
\eP{\fIxx\fP} a character without the \fIxx\fP property
\eR a newline sequence
\es a white space character
\eS a character that is not a white space character
\ev a vertical white space character
\eV a character that is not a vertical white space character
\ew a "word" character
\eW a "non-word" character
\eX a Unicode extended grapheme cluster
.sp
\eC is dangerous because it may leave the current matching point in the middle
of a UTF-8 or UTF-16 character. The application can lock out the use of \eC by
setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
with the use of \eC permanently disabled.
.P
By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
happening, \es and \ew may also match characters with code points in the range
128-255. If the PCRE2_UCP option is set, the behaviour of these escape
sequences is changed to use Unicode properties and they match many more
characters, but there are some option settings that can restrict individual
sequences to matching only ASCII characters.
.P
Property descriptions in \ep and \eP are matched caselessly; hyphens,
underscores, and ASCII white space characters are ignored, in accordance with
Unicode's "loose matching" rules. For example, \ep{Bidi_Class=al} is the same
as \ep{ bidi class = AL }.
.
.
.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
.rs
.sp
C Other
Cc Control
Cf Format
Cn Unassigned
Co Private use
Cs Surrogate
.sp
L Letter
Lc Cased letter, the union of Ll, Lu, and Lt
L& Synonym of Lc
Ll Lower case letter
Lm Modifier letter
Lo Other letter
Lt Title case letter
Lu Upper case letter
.sp
M Mark
Mc Spacing mark
Me Enclosing mark
Mn Non-spacing mark
.sp
N Number
Nd Decimal number
Nl Letter number
No Other number
.sp
P Punctuation
Pc Connector punctuation
Pd Dash punctuation
Pe Close punctuation
Pf Final punctuation
Pi Initial punctuation
Po Other punctuation
Ps Open punctuation
.sp
S Symbol
Sc Currency symbol
Sk Modifier symbol
Sm Mathematical symbol
So Other symbol
.sp
Z Separator
Zl Line separator
Zp Paragraph separator
Zs Space separator
.sp
From release 10.45, when caseless matching is set, Ll, Lu, and Lt are all
equivalent to Lc.
.
.
.SH "PCRE2 SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
.rs
.sp
Xan Alphanumeric: union of properties L and N
Xps POSIX space: property Z or tab, NL, VT, FF, CR
Xsp Perl space: property Z or tab, NL, VT, FF, CR
Xuc Universally-named character: one that can be
represented by a Universal Character Name
Xwd Perl word: property Xan or underscore
.sp
Perl and POSIX space are now the same. Perl added VT to its space character set
at release 5.18.
.
.
.SH "BINARY PROPERTIES FOR \ep AND \eP"
.rs
.sp
Unicode defines a number of binary properties, that is, properties whose only
values are true or false. You can obtain a list of those that are recognized by
\ep and \eP, along with their abbreviations, by running this command:
.sp
pcre2test -LP
.
.
.
.SH "SCRIPT MATCHING WITH \ep AND \eP"
.rs
.sp
Many script names and their 4-letter abbreviations are recognized in
\ep{sc:...} or \ep{scx:...} items, or on their own with \ep (and also \eP of
course). You can obtain a list of these scripts by running this command:
.sp
pcre2test -LS
.
.
.
.SH "THE BIDI_CLASS PROPERTY FOR \ep AND \eP"
.rs
.sp
\ep{Bidi_Class:<class>} matches a character with the given class
\ep{BC:<class>} matches a character with the given class
.sp
The recognized classes are:
.sp
AL Arabic letter
AN Arabic number
B paragraph separator
BN boundary neutral
CS common separator
EN European number
ES European separator
ET European terminator
FSI first strong isolate
L left-to-right
LRE left-to-right embedding
LRI left-to-right isolate
LRO left-to-right override
NSM non-spacing mark
ON other neutral
PDF pop directional format
PDI pop directional isolate
R right-to-left
RLE right-to-left embedding
RLI right-to-left isolate
RLO right-to-left override
S segment separator
WS white space
.
.
.SH "CHARACTER CLASSES"
.rs
.sp
[...] positive character class
[^...] negative character class
[x-y] range (can be used for hex characters)
[[:xxx:]] positive POSIX named set
[[:^xxx:]] negative POSIX named set
.sp
alnum alphanumeric
alpha alphabetic
ascii 0-127
blank space or tab
cntrl control character
digit decimal digit
graph printing, excluding space
lower lower case letter
print printing, including space
punct printing, excluding alphanumeric
space white space
upper upper case letter
word same as \ew
xdigit hexadecimal digit
.sp
In PCRE2, POSIX character set names recognize only ASCII characters by default,
but some of them use Unicode properties if PCRE2_UCP is set. You can use
\eQ...\eE inside a character class.
.P
When PCRE2_ALT_EXTENDED_CLASS is set, UTS#18 extended character classes may be
used, allowing nested character classes, combined using set operators.
.sp
[x&&[^y]] UTS#18 extended character class
.sp
x||y set union (OR)
x&&y set intersection (AND)
x--y set difference (AND NOT)
x~~y set symmetric difference (XOR)
.sp
.
.
.SH "PERL EXTENDED CHARACTER CLASSES"
.rs
.sp
(?[...]) Perl extended character class
(?[\ep{Thai} & \ep{Nd}]) operators; whitespace ignored
(?[(x - y) & z]) parentheses for grouping
.sp
(?[ [^3] & \ep{Nd} ]) [...] is a nested ordinary class
(?[ [:alpha:] - [z] ]) POSIX set is allowed outside [...]
(?[ \ed - [3] ]) backslash-escaped set is allowed outside [...]
(?[ !\en & [:ascii:] ]) backslash-escaped character is allowed outside [...]
all other characters or ranges must be enclosed in [...]
.sp
x|y, x+y set union (OR)
x&y set intersection (AND)
x-y set difference (AND NOT)
x^y set symmetric difference (XOR)
!x set complement (NOT)
.sp
Inside a Perl extended character class, [...] switches mode to be interpreted
as an ordinary character class. Outside of a nested [...], the only items
permitted are backslash-escapes, POSIX sets, operators, and parentheses. Inside
a nested ordinary class, ^ has its usual meaning (inverts the class when used
as the first character); outside of a nested class, ^ is the XOR operator.
.
.
.SH "QUANTIFIERS"
.rs
.sp
? 0 or 1, greedy
?+ 0 or 1, possessive
?? 0 or 1, lazy
* 0 or more, greedy
*+ 0 or more, possessive
*? 0 or more, lazy
+ 1 or more, greedy
++ 1 or more, possessive
+? 1 or more, lazy
{n} exactly n
{n,m} at least n, no more than m, greedy
{n,m}+ at least n, no more than m, possessive
{n,m}? at least n, no more than m, lazy
{n,} n or more, greedy
{n,}+ n or more, possessive
{n,}? n or more, lazy
{,m} zero up to m, greedy
{,m}+ zero up to m, possessive
{,m}? zero up to m, lazy
.
.
.SH "ANCHORS AND SIMPLE ASSERTIONS"
.rs
.sp
\eb word boundary
\eB not a word boundary
^ start of subject
also after an internal newline in multiline mode
(after any newline if PCRE2_ALT_CIRCUMFLEX is set)
\eA start of subject
$ end of subject
also before newline at end of subject
also before internal newline in multiline mode
\eZ end of subject
also before newline at end of subject
\ez end of subject
\eG first matching position in subject
.
.
.SH "REPORTED MATCH POINT SETTING"
.rs
.sp
\eK set reported start of match
.sp
From release 10.38 \eK is not permitted by default in lookaround assertions,
for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
option is set, the previous behaviour is re-enabled. When this option is set,
\eK is honoured in positive assertions, but ignored in negative ones.
.
.
.SH "ALTERNATION"
.rs
.sp
expr|expr|expr...
.
.
.SH "CAPTURING"
.rs
.sp
(...) capture group
(?<name>...) named capture group (Perl)
(?'name'...) named capture group (Perl)
(?P<name>...) named capture group (Python)
(?:...) non-capture group
(?|...) non-capture group; reset group numbers for
capture groups in each alternative
.sp
In non-UTF modes, names may contain underscores and ASCII letters and digits;
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
both cases, a name must not start with a digit.
.
.
.SH "ATOMIC GROUPS"
.rs
.sp
(?>...) atomic non-capture group
(*atomic:...) atomic non-capture group
.
.
.SH "COMMENT"
.rs
.sp
(?#....) comment (not nestable)
.
.
.SH "OPTION SETTING"
.rs
Changes of these options within a group are automatically cancelled at the end
of the group.
.sp
(?a) all ASCII options
(?aD) restrict \ed to ASCII in UCP mode
(?aS) restrict \es to ASCII in UCP mode
(?aW) restrict \ew to ASCII in UCP mode
(?aP) restrict all POSIX classes to ASCII in UCP mode
(?aT) restrict POSIX digit classes to ASCII in UCP mode
(?i) caseless
(?J) allow duplicate named groups
(?m) multiline
(?n) no auto capture
(?r) restrict caseless to either ASCII or non-ASCII
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) ignore white space except in classes or \eQ...\eE
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset the given option(s)
(?^) unset imnrsx options
.sp
(?aP) implies (?aT) as well, though this has no additional effect. However, it
means that (?-aP) also implies (?-aT) and disables all ASCII restrictions for
POSIX classes.
.P
Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capture group, for
example (?i:...).
.P
The following are recognized only at the very start of a pattern or after one
of the newline or \eR sequences or options with similar syntax. More than one
of them may appear. For the first three, d is a decimal number.
.sp
(*LIMIT_DEPTH=d) set the backtracking limit to d
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d) set the match limit to d
(*CASELESS_RESTRICT) set PCRE2_EXTRA_CASELESS_RESTRICT when matching
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
(*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
(*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
(*NO_JIT) disable JIT optimization
(*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
(*TURKISH_CASING) set PCRE2_EXTRA_TURKISH_CASING when matching
(*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
.sp
Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of
the limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP,
not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
application can lock out the use of (*UTF) and (*UCP) by setting the
PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
.
.
.SH "NEWLINE CONVENTION"
.rs
.sp
These are recognized only at the very start of the pattern or after option
settings with a similar syntax.
.sp
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
(*NUL) the NUL character (binary zero)
.
.
.SH "WHAT \eR MATCHES"
.rs
.sp
These are recognized only at the very start of the pattern or after option
setting with a similar syntax.
.sp
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
.
.
.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
.rs
.sp
(?=...) )
(*pla:...) ) positive lookahead
(*positive_lookahead:...) )
.sp
(?!...) )
(*nla:...) ) negative lookahead
(*negative_lookahead:...) )
.sp
(?<=...) )
(*plb:...) ) positive lookbehind
(*positive_lookbehind:...) )
.sp
(?<!...) )
(*nlb:...) ) negative lookbehind
(*negative_lookbehind:...) )
.sp
Each top-level branch of a lookbehind must have a limit for the number of
characters it matches. If any branch can match a variable number of characters,
the maximum for each branch is limited to a value set by the caller of
\fBpcre2_compile()\fP or defaulted. The default is set when PCRE2 is built
(ultimate default 255). If every branch matches a fixed number of characters,
the limit for each branch is 65535 characters.
.
.
.SH "NON-ATOMIC LOOKAROUND ASSERTIONS"
.rs
.sp
These assertions are specific to PCRE2 and are not Perl-compatible.
.sp
(?*...) )
(*napla:...) ) synonyms
(*non_atomic_positive_lookahead:...) )
.sp
(?<*...) )
(*naplb:...) ) synonyms
(*non_atomic_positive_lookbehind:...) )
.
.
.SH "SUBSTRING SCAN ASSERTION"
.rs
This feature is not Perl-compatible.
.sp
(*scan_substring:(grouplist)...) scan captured substring
(*scs:(grouplist)...) scan captured substring
.sp
The comma-separated list may identify groups in any of the following ways:
.sp
n absolute reference
+n relative reference
-n relative reference
<name> name
'name' name
.sp
.
.
.SH "SCRIPT RUNS"
.rs
.sp
(*script_run:...) ) script run, can be backtracked into
(*sr:...) )
.sp
(*atomic_script_run:...) ) atomic script run
(*asr:...) )
.
.
.SH "BACKREFERENCES"
.rs
.sp
\en reference by number (can be ambiguous)
\egn reference by number
\eg{n} reference by number
\eg+n relative reference by number (PCRE2 extension)
\eg-n relative reference by number
\eg{+n} relative reference by number (PCRE2 extension)
\eg{-n} relative reference by number
\ek<name> reference by name (Perl)
\ek'name' reference by name (Perl)
\eg{name} reference by name (Perl)
\ek{name} reference by name (.NET)
(?P=name) reference by name (Python)
.
.
.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
.rs
.sp
(?R) recurse whole pattern
(?n) call subroutine by absolute number
(?+n) call subroutine by relative number
(?-n) call subroutine by relative number
(?&name) call subroutine by name (Perl)
(?P>name) call subroutine by name (Python)
\eg<name> call subroutine by name (Oniguruma)
\eg'name' call subroutine by name (Oniguruma)
\eg<n> call subroutine by absolute number (Oniguruma)
\eg'n' call subroutine by absolute number (Oniguruma)
\eg<+n> call subroutine by relative number (PCRE2 extension)
\eg'+n' call subroutine by relative number (PCRE2 extension)
\eg<-n> call subroutine by relative number (PCRE2 extension)
\eg'-n' call subroutine by relative number (PCRE2 extension)
.
.
.SH "CONDITIONAL PATTERNS"
.rs
.sp
(?(condition)yes-pattern)
(?(condition)yes-pattern|no-pattern)
.sp
(?(n) absolute reference condition
(?(+n) relative reference condition (PCRE2 extension)
(?(-n) relative reference condition (PCRE2 extension)
(?(<name>) named reference condition (Perl)
(?('name') named reference condition (Perl)
(?(name) named reference condition (PCRE2, deprecated)
(?(R) overall recursion condition
(?(Rn) specific numbered group recursion condition
(?(R&name) specific named group recursion condition
(?(DEFINE) define groups for reference
(?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition
.sp
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
conditions or recursion tests. Such a condition is interpreted as a reference
condition if the relevant named group exists.
.
.
.SH "BACKTRACKING CONTROL"
.rs
.sp
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
if :NAME is present. The others just set a name for passing back to the caller,
but this is not a name that (*SKIP) can see. The following act immediately they
are reached:
.sp
(*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F)
(*MARK:NAME) set name to be passed back; synonym (*:NAME)
.sp
The following act only when a subsequent match failure causes a backtrack to
reach them. They all force a match failure, but they differ in what happens
afterwards. Those that advance the start-of-match point do so only if the
pattern is not anchored.
.sp
(*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character
(*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation
.sp
The effect of one of these verbs in a group called as a subroutine is confined
to the subroutine call.
.
.
.SH "CALLOUTS"
.rs
.sp
(?C) callout (assumed number 0)
(?Cn) callout with numerical data n
(?C"text") callout with string data
.sp
The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
start and the end), and the starting delimiter { matched with the ending
delimiter }. To encode the ending delimiter within the string, double it.
.
.
.SH "REPLACEMENT STRINGS"
.rs
.sp
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
\fBpcre2_substitute()\fP is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following forms:
.sp
$$ insert a dollar character
$n or ${n} insert the contents of group \fIn\fP
$<name> insert the contents of named group
$0 or $& insert the entire matched substring
$` insert the substring that precedes the match
$' insert the substring that follows the match
$_ insert the entire input string
$*MARK or ${*MARK} insert a control verb name
.sp
For ${n}, n can be a name or a number. If PCRE2_SUBSTITUTE_EXTENDED is set,
there is additional interpretation:
.P
1. Backslash is an escape character, and the forms described in "ESCAPED
CHARACTERS" above are recognized. Also:
.sp
\eQ...\eE can be used to suppress interpretation
\el force the next character to lower case
\eu force the next character to upper case
\eL force subsequent characters to lower case
\eU force subsequent characters to upper case
\eu\eL force next character to upper case, then all lower
\el\eU force next character to lower case, then all upper
\eE end \eL or \eU case forcing
\eb backspace character (note: as in character class in pattern)
\ev vertical tab character (note: not the same as in a pattern)
.sp
2. The Python form \eg<n>, where the angle brackets are part of the syntax and
\fIn\fP is either a group name or a number, is recognized as an alternative way
of inserting the contents of a group, for example \eg<3>.
.P
3. Capture substitution supports the following additional forms:
.sp
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
.sp
The substitution strings themselves are expanded. Backslash can be used to
escape colons and closing curly brackets.
.
.
.SH "SEE ALSO"
.rs
.sp
\fBpcre2pattern\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
\fBpcre2matching\fP(3), \fBpcre2\fP(3).
.
.
.SH AUTHOR
.rs
.sp
.nf
Philip Hazel
Retired from University Computing Service
Cambridge, England.
.fi
.
.
.SH REVISION
.rs
.sp
.nf
Last updated: 27 November 2024
Copyright (c) 1997-2024 University of Cambridge.
.fi
|