1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823
|
<h1>TRE API reference manual</h1>
<h2>The <tt>regcomp()</tt> functions</h2>
<a name="regcomp"></a>
<div class="code">
<code>
#include <tre/regex.h>
<br>
<br>
<font class="type">int</font>
<font class="func">regcomp</font>(<font
class="type">regex_t</font> *<font class="arg">preg</font>,
<font class="qual">const</font> <font class="type">char</font>
*<font class="arg">regex</font>, <font class="type">int</font>
<font class="arg">cflags</font>);
<br>
<font class="type">int</font> <font
class="func">regncomp</font>(<font class="type">regex_t</font>
*<font class="arg">preg</font>, <font class="qual">const</font>
<font class="type">char</font> *<font class="arg">regex</font>,
<font class="type">size_t</font> <font class="arg">len</font>,
<font class="type">int</font> <font class="arg">cflags</font>);
<br>
<font class="type">int</font> <font
class="func">regwcomp</font>(<font class="type">regex_t</font>
*<font class="arg">preg</font>, <font class="qual">const</font>
<font class="type">wchar_t</font> *<font
class="arg">regex</font>, <font class="type">int</font> <font
class="arg">cflags</font>);
<br>
<font class="type">int</font> <font
class="func">regwncomp</font>(<font class="type">regex_t</font>
*<font class="arg">preg</font>, <font class="qual">const</font>
<font class="type">wchar_t</font> *<font
class="arg">regex</font>, <font class="type">size_t</font>
<font class="arg">len</font>, <font class="type">int</font>
<font class="arg">cflags</font>);
<br>
<font class="type">void</font> <font
class="func">regfree</font>(<font class="type">regex_t</font>
*<font class="arg">preg</font>);
<br>
</code>
</div>
<p>
The <tt><font class="func">regcomp</font>()</tt> function compiles
the regex string pointed to by <tt><font
class="arg">regex</font></tt> to an internal representation and
stores the result in the pattern buffer structure pointed to by
<tt><font class="arg">preg</font></tt>. The <tt><font
class="func">regncomp</font>()</tt> function is like <tt><font
class="func">regcomp</font>()</tt>, but <tt><font
class="arg">regex</font></tt> is not terminated with the null
byte. Instead, the <tt><font class="arg">len</font></tt> argument
is used to give the length of the string, and the string may contain
null bytes. The <tt><font class="func">regwcomp</font>()</tt> and
<tt><font class="func">regwncomp</font>()</tt> functions work like
<tt><font class="func">regcomp</font>()</tt> and <tt><font
class="func">regncomp</font>()</tt>, respectively, but take a wide
character (<tt><font class="type">wchar_t</font></tt>) string
instead of a byte string.
</p>
<p>
The <tt><font class="arg">cflags</font></tt> argument is a the
bitwise inclusive OR of zero or more of the following flags (defined
in the header <tt><tre/regex.h></tt>):
</p>
<blockquote>
<dl>
<dt><tt>REG_EXTENDED</tt></dt>
<dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when
compiling <tt><font class="arg">regex</font></tt>. The default
syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is
considered obsolete.</dd>
<dt><tt>REG_ICASE</tt></dt>
<dd>Ignore case. Subsequent searches with the <a
href="#regexec"><tt>regexec</tt></a> family of functions using this
pattern buffer will be case insensitive.</dd>
<dt><tt>REG_NOSUB</tt></dt>
<dd>Do not report submatches. Subsequent searches with the <a
href="#regexec"><tt>regexec</tt></a> family of functions will only
report whether a match was found or not and will not fill the submatch
array.</dd>
<dt><tt>REG_NEWLINE</tt></dt>
<dd>Normally the newline character is treated as an ordinary
character. When this flag is used, the newline character
(<tt>'\n'</tt>, ASCII code 10) is treated specially as follows:
<ol>
<li>The match-any-character operator (dot <tt>"."</tt> outside a
bracket expression) does not match a newline.</li>
<li>A non-matching list (<tt>[^...]</tt>) not containing a newline
does not match a newline.</li>
<li>The match-beginning-of-line operator <tt>^</tt> matches the empty
string immediately after a newline as well as the empty string at the
beginning of the string (but see the <code>REG_NOTBOL</code>
<code>regexec()</code> flag below).
<li>The match-end-of-line operator <tt>$</tt> matches the empty
string immediately before a newline as well as the empty string at the
end of the string (but see the <code>REG_NOTEOL</code>
<code>regexec()</code> flag below).
</ol>
</dd>
<dt><tt>REG_LITERAL</tt></dt>
<dd>Interpret the entire <tt><font class="arg">regex</font></tt>
argument as a literal string, that is, all characters will be
considered ordinary. This is a nonstandard extension, compatible with
but not specified by POSIX.</dd>
<dt><tt>REG_NOSPEC</tt></dt>
<dd>Same as <tt>REG_LITERAL</tt>. This flag is provided for
compatibility with BSD.</dd>
<dt><tt>REG_RIGHT_ASSOC</tt></dt>
<dd>By default, concatenation is left associative in TRE, as per
the grammar given in the <a
href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base
specifications on regular expressions</a> of Std 1003.1-2001 (POSIX).
This flag flips associativity of concatenation to right associative.
Associativity can have an effect on how a match is divided into
submatches, but does not change what is matched by the entire regexp.
</dd>
<dt><tt>REG_UNGREEDY</tt></dt>
<dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and
can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior
by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd>
</dl>
</blockquote>
<p>
After a successful call to <tt><font class="func">regcomp</font></tt> it is
possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for
searching for matches in strings (see below). Once the pattern buffer is no
longer needed, it should be freed with <tt><font
class="func">regfree</font></tt> to free the memory allocated for it.
</p>
<p>
The <tt><font class="type">regex_t</font></tt> structure has the
following fields that the application can read:
</p>
<blockquote>
<dl>
<dt><tt><font class="type">size_t</font> <font
class="arg">re_nsub</font></tt></dt>
<dd>Number of parenthesized subexpressions in <tt><font
class="arg">regex</font></tt>.
</dd>
</dl>
</blockquote>
<p>
The <tt><font class="func">regcomp</font></tt> function returns
zero if the compilation was successful, or one of the following error
codes if there was an error:
</p>
<blockquote>
<dl>
<dt><tt>REG_BADPAT</tt></dt>
<dd>Invalid regexp. TRE returns this only if a multibyte character
set is used in the current locale, and <tt><font
class="arg">regex</font></tt> contained an invalid multibyte
sequence.</dd>
<dt><tt>REG_ECOLLATE</tt></dt>
<dd>Invalid collating element referenced. TRE returns this whenever
equivalence classes or multicharacter collating elements are used in
bracket expressions (they are not supported yet).</dd>
<dt><tt>REG_ECTYPE</tt></dt>
<dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd>
<dt><tt>REG_EESCAPE</tt></dt>
<dd>The last character of <tt><font class="arg">regex</font></tt>
was a backslash (<tt>\</tt>).</dd>
<dt><tt>REG_ESUBREG</tt></dt>
<dd>Invalid back reference; number in <tt>\<i>digit</i></tt>
invalid.</dd>
<dt><tt>REG_EBRACK</tt></dt>
<dd><tt>[]</tt> imbalance.</dd>
<dt><tt>REG_EPAREN</tt></dt>
<dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd>
<dt><tt>REG_EBRACE</tt></dt>
<dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd>
<dt><tt>REG_BADBR</tt></dt>
<dd><tt>{}</tt> content invalid: not a number, more than two numbers,
or first larger than second.
<dt><tt>REG_ERANGE</tt></dt>
<dd>Invalid character range, e.g. ending point is earlier in the
collating order than the starting point.</dd>
<dt><tt>REG_ESPACE</tt></dt>
<dd>Out of memory, or an internal limit exceeded.</dd>
<dt><tt>REG_BADRPT</tt></dt>
<dd>Invalid use of repetition operators: two or more repetition operators have
been chained in an undefined way.</dd>
<dt><tt>REG_BADMAX</tt></dt>
<dd>Maximum repetition in <tt>{}</tt> too large.</dd>
</dl>
</blockquote>
<h2>The <tt>regexec()</tt> functions</h2>
<a name="regexec"></a>
<div class="code">
<code>
#include <tre/regex.h>
<br>
<br>
<font class="type">int</font> <font
class="func">regexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">char</font> *<font class="arg">string</font>,
<font class="type">size_t</font> <font
class="arg">nmatch</font>,
<br>
<font class="type">regmatch_t</font> <font
class="arg">pmatch</font>[], <font class="type">int</font>
<font class="arg">eflags</font>);
<br>
<font class="type">int</font> <font
class="func">regnexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">char</font> *<font class="arg">string</font>,
<font class="type">size_t</font> <font class="arg">len</font>,
<br>
<font class="type">size_t</font> <font
class="arg">nmatch</font>, <font class="type">regmatch_t</font>
<font class="arg">pmatch</font>[], <font
class="type">int</font> <font class="arg">eflags</font>);
<br>
<font class="type">int</font> <font
class="func">regwexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">wchar_t</font> *<font class="arg">string</font>,
<font class="type">size_t</font> <font
class="arg">nmatch</font>,
<br>
<font class="type">regmatch_t</font> <font
class="arg">pmatch</font>[], <font class="type">int</font>
<font class="arg">eflags</font>);
<br>
<font class="type">int</font> <font
class="func">regwnexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">wchar_t</font> *<font class="arg">string</font>,
<font class="type">size_t</font> <font class="arg">len</font>,
<br>
<font class="type">size_t</font> <font
class="arg">nmatch</font>, <font class="type">regmatch_t</font>
<font class="arg">pmatch</font>[], <font
class="type">int</font> <font class="arg">eflags</font>);
</code>
</div>
<p>
The <tt><font class="func">regexec</font>()</tt> function matches
the null-terminated string against the compiled regexp <tt><font
class="arg">preg</font></tt>, initialized by a previous call to
any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions. The
<tt><font class="func">regnexec</font>()</tt> function is like
<tt><font class="func">regexec</font>()</tt>, but <tt><font
class="arg">string</font></tt> is not terminated with a null byte.
Instead, the <tt><font class="arg">len</font></tt> argument is used
to give the length of the string, and the string may contain null
bytes. The <tt><font class="func">regwexec</font>()</tt> and
<tt><font class="func">regwnexec</font>()</tt> functions work like
<tt><font class="func">regexec</font>()</tt> and <tt><font
class="func">regnexec</font>()</tt>, respectively, but take a wide
character (<tt><font class="type">wchar_t</font></tt>) string
instead of a byte string. The <tt><font
class="arg">eflags</font></tt> argument is a bitwise OR of zero or
more of the following flags:
</p>
<blockquote>
<dl>
<dt><code>REG_NOTBOL</code></dt>
<dd>
<p>
When this flag is used, the match-beginning-of-line operator
<tt>^</tt> does not match the empty string at the beginning of
<tt><font class="arg">string</font></tt>. If
<code>REG_NEWLINE</code> was used when compiling
<tt><font class="arg">preg</font></tt> the empty string
immediately after a newline character will still be matched.
</p>
</dd>
<dt><code>REG_NOTEOL</code></dt>
<dd>
<p>
When this flag is used, the match-end-of-line operator
<tt>$</tt> does not match the empty string at the end of
<tt><font class="arg">string</font></tt>. If
<code>REG_NEWLINE</code> was used when compiling
<tt><font class="arg">preg</font></tt> the empty string
immediately before a newline character will still be matched.
</p>
</dl>
<p>
These flags are useful when different portions of a string are passed
to <code>regexec</code> and the beginning or end of the partial string
should not be interpreted as the beginning or end of a line.
</p>
</blockquote>
<p>
If <code>REG_NOSUB</code> was used when compiling <tt><font
class="arg">preg</font></tt>, <tt><font
class="arg">nmatch</font></tt> is zero, or <tt><font
class="arg">pmatch</font></tt> is <code>NULL</code>, then the
<tt><font class="arg">pmatch</font></tt> argument is ignored.
Otherwise, the submatches corresponding to the parenthesized
subexpressions are filled in the elements of <tt><font
class="arg">pmatch</font></tt>, which must be dimensioned to have
at least <tt><font class="arg">nmatch</font></tt> elements.
</p>
<p>
The <tt><font class="type">regmatch_t</font></tt> structure contains
at least the following fields:
</p>
<blockquote>
<dl>
<dt><tt><font class="type">regoff_t</font> <font
class="arg">rm_so</font></tt></dt>
<dd>Offset from start of <tt><font class="arg">string</font></tt> to start of
substring. </dd>
<dt><tt><font class="type">regoff_t</font> <font
class="arg">rm_eo</font></tt></dt>
<dd>Offset from start of <tt><font class="arg">string</font></tt> to the first
character after the substring. </dd>
</dl>
</blockquote>
<p>
The length of a submatch can be computed by subtracting <code>rm_eo</code> and
<code>rm_so</code>. If a parenthesized subexpression did not participate in a
match, the <code>rm_so</code> and <code>rm_eo</code> fields for the
corresponding <code>pmatch</code> element are set to <code>-1</code>. Note
that when a multibyte character set is in effect, the submatch offsets are
given as byte offsets, not character offsets.
</p>
<p>
The <code>regexec()</code> functions return zero if a match was found,
otherwise they return <code>REG_NOMATCH</code> to indicate no match,
or <code>REG_ESPACE</code> to indicate that enough temporary memory
could not be allocated to complete the matching operation.
</p>
<h3>reguexec()</h3>
<div class="code">
<code>
#include <tre/regex.h>
<br>
<br>
<font class="qual">typedef struct</font> {
<br>
<font class="type">int</font> (*get_next_char)(<font
class="type">tre_char_t</font> *<font class="arg">c</font>, <font
class="type">unsigned int</font> *<font class="arg">pos_add</font>,
<font class="type">void</font> *<font class="arg">context</font>);
<br>
<font class="type">void</font> (*rewind)(<font
class="type">size_t</font> <font class="arg">pos</font>, <font
class="type">void</font> *<font class="arg">context</font>);
<br>
<font class="type">int</font> (*compare)(<font
class="type">size_t</font> <font class="arg">pos1</font>, <font
class="type">size_t</font> <font class="arg">pos2</font>, <font
class="type">size_t</font> <font class="arg">len</font>, <font
class="type">void</font> *<font class="arg">context</font>);
<br>
<font class="type">void</font> *<font
class="arg">context</font>;
<br>
} <font class="type">tre_str_source</font>;
<br>
<br>
<font class="type">int</font> <font
class="func">reguexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">tre_str_source</font> *<font class="arg">string</font>,
<font class="type">size_t</font> <font class="arg">nmatch</font>,
<br>
<font class="type">regmatch_t</font> <font
class="arg">pmatch</font>[], <font class="type">int</font>
<font class="arg">eflags</font>);
</code>
</div>
<p>
The <tt><font class="func">reguexec</font>()</tt> function works just
like the other <tt>regexec()</tt> functions, except that the input
string is read from user specified callback functions instead of a
character array. This makes it possible, for example, to match
regexps over arbitrary user specified data structures.
</p>
<p>
The <tt><font class="type">tre_str_source</font></tt> structure
contains the following fields:
</p>
<blockquote>
<dl>
<dt><tt>get_next_char</tt></dt>
<dd>This function must retrieve the next available character. If a
character is not available, the space pointed to by
<tt><font class="arg">c</font></tt> must be set to zero and it must return
a nonzero value. If a character is available, it must be stored
to the space pointed to by
<tt><font class="arg">c</font></tt>, and the integer pointer to by
<tt><font class="arg">pos_add</font></tt> must be set to the
number of units advanced in the input (the value must be
<tt>>=1</tt>), and zero must be returned.</dd>
<dt><tt>rewind</tt></dt>
<dd>This function must rewind the input stream to the position
specified by <tt><font class="arg">pos</font></tt>. Unless the regexp
uses back references, <tt>rewind</tt> is not needed and can be set to
<tt>NULL</tt>.</dd>
<dt><tt>compare</tt></dt>
<dd>This function compares two substrings in the input streams
starting at the positions specified by <tt><font
class="arg">pos1</font></tt> and <tt><font
class="arg">pos2</font></tt> of length <tt><font
class="arg">len</font></tt>. If the substrings are equal,
<tt>compare</tt> must return zero, otherwise a nonzero value must be
returned. Unless the regexp uses back references, <tt>compare</tt> is
not needed and can be set to <tt>NULL</tt>.</dd>
<dt><tt>context</tt></dt>
<dd>This is a context variable, passed as the last argument to
all of the above functions for keeping track of the internal state of
the users code.</dd>
</dl>
</blockquote>
<p>
The position in the input stream is measured in <tt><font
class="type">size_t</font></tt> units. The current position is the
sum of the increments gotten from <tt><font
class="arg">pos_add</font></tt> (plus the position of the last
<tt>rewind</tt>, if any). The starting position is zero. Submatch
positions filled in the <tt><font class="arg">pmatch</font>[]</tt>
array are, of course, given using positions computed in this way.
</p>
<p>
For an example of how to use <tt>reguexec()</tt>, see the
<tt>tests/test-str-source.c</tt> file in the TRE source code
distribution.
</p>
<h2>The approximate matching functions</h2>
<a name="regaexec"></a>
<div class="code">
<code>
#include <tre/regex.h>
<br>
<br>
<font class="qual">typedef struct</font> {<br>
<font class="type">int</font>
<font class="arg">cost_ins</font>;<br>
<font class="type">int</font>
<font class="arg">cost_del</font>;<br>
<font class="type">int</font>
<font class="arg">cost_subst</font>;<br>
<font class="type">int</font>
<font class="arg">max_cost</font>;<br><br>
<font class="type">int</font>
<font class="arg">max_ins</font>;<br>
<font class="type">int</font>
<font class="arg">max_del</font>;<br>
<font class="type">int</font>
<font class="arg">max_subst</font>;<br>
<font class="type">int</font>
<font class="arg">max_err</font>;<br>
} <font class="type">regaparams_t</font>;<br>
<br>
<font class="qual">typedef struct</font> {<br>
<font class="type">size_t</font>
<font class="arg">nmatch</font>;<br>
<font class="type">regmatch_t</font>
*<font class="arg">pmatch</font>;<br>
<font class="type">int</font>
<font class="arg">cost</font>;<br>
<font class="type">int</font>
<font class="arg">num_ins</font>;<br>
<font class="type">int</font>
<font class="arg">num_del</font>;<br>
<font class="type">int</font>
<font class="arg">num_subst</font>;<br>
} <font class="type">regamatch_t</font>;<br>
<br>
<font class="type">int</font> <font
class="func">regaexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">char</font> *<font class="arg">string</font>,<br>
<font class="type">regamatch_t</font>
*<font class="arg">match</font>,
<font class="type">regaparams_t</font>
<font class="arg">params</font>,
<font class="type">int</font>
<font class="arg">eflags</font>);
<br>
<font class="type">int</font> <font
class="func">reganexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">char</font> *<font class="arg">string</font>,
<font class="type">size_t</font> <font class="arg">len</font>,<br>
<font class="type">regamatch_t</font>
*<font class="arg">match</font>,
<font class="type">regaparams_t</font>
<font class="arg">params</font>,
<font class="type">int</font> <font class="arg">eflags</font>);
<br>
<font class="type">int</font> <font
class="func">regawexec</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font
class="arg">preg</font>, <font class="qual">const</font> <font
class="type">wchar_t</font> *<font class="arg">string</font>,<br>
<font class="type">regamatch_t</font>
*<font class="arg">match</font>,
<font class="type">regaparams_t</font>
<font class="arg">params</font>,
<font class="type">int</font>
<font class="arg">eflags</font>);
<br>
<font class="type">int</font>
<font class="func">regawnexec</font>(
<font class="qual">const</font>
<font class="type">regex_t</font>
*<font class="arg">preg</font>,
<font class="qual">const</font>
<font class="type">wchar_t</font>
*<font class="arg">string</font>,
<font class="type">size_t</font>
<font class="arg">len</font>,<br>
<font class="type">regamatch_t</font>
*<font class="arg">match</font>,
<font class="type">regaparams_t</font>
<font class="arg">params</font>,
<font class="type">int</font>
<font class="arg">eflags</font>);
<br>
</code>
</div>
<p>
The <tt><font class="func">regaexec</font>()</tt> function searches for
the best match in <tt><font class="arg">string</font></tt>
against the compiled regexp <tt><font
class="arg">preg</font></tt>, initialized by a previous call to
any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.
</p>
<p>
The <tt><font class="func">reganexec</font>()</tt> function is like
<tt><font class="func">regaexec</font>()</tt>, but <tt><font
class="arg">string</font></tt> is not terminated by a null byte.
Instead, the <tt><font class="arg">len</font></tt> argument is used to
tell the length of the string, and the string may contain null
bytes. The <tt><font class="func">regawexec</font>()</tt> and
<tt><font class="func">regawnexec</font>()</tt> functions work like
<tt><font class="func">regaexec</font>()</tt> and <tt><font
class="func">reganexec</font>()</tt>, respectively, but take a wide
character (<tt><font class="type">wchar_t</font></tt>) string instead
of a byte string.
</p>
<p>
The <tt><font class="arg">eflags</font></tt> argument is like for
the regexec() functions.
</p>
<p>
The <tt><font class="arg">params</font></tt> struct controls the
approximate matching parameters:
<blockquote>
<dl>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">cost_ins</font></tt></dt>
<dd>The default cost of an inserted character, that is, an extra
character in <tt><font class="arg">string</font></tt>.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">cost_del</font></tt></dt>
<dd>The default cost of a deleted character, that is, a character
missing from <tt><font class="arg">string</font></tt>.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">cost_subst</font></tt></dt>
<dd>The default cost of a substituted character.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">max_cost</font></tt></dt>
<dd>The maximum allowed cost of a match. If this is set to zero,
an exact matching is searched for, and results equivalent to
those returned by the <tt>regexec()</tt> functions are
returned.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">max_ins</font></tt></dt>
<dd>Maximum allowed number of inserted characters.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">max_del</font></tt></dt>
<dd>Maximum allowed number of deleted characters.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">max_subst</font></tt></dt>
<dd>Maximum allowed number of substituted characters.</dd>
<dt><tt><font class="type">int</font></tt>
<tt><font class="arg">max_err</font></tt></dt>
<dd>Maximum allowed number of errors (inserts + deletes +
substitutes).</dd>
</dl>
</blockquote>
<p>
The <tt><font class="arg">match</font></tt> argument points to a
<tt><font class="type">regamatch_t</font></tt> structure. The
<tt><font class="arg">nmatch</font></tt> and <tt><font
class="arg">pmatch</font></tt> field must be filled by the caller. If
<code>REG_NOSUB</code> was used when compiling the regexp, or
<code>match->nmatch</code> is zero, or
<code>match->pmatch</code> is <code>NULL</code>, the
<code>match->pmatch</code> argument is ignored. Otherwise, the
submatches corresponding to the parenthesized subexpressions are
filled in the elements of <code>match->pmatch</code>, which must be
dimensioned to have at least <code>match->nmatch</code> elements.
The <code>match->cost</code> field is set to the cost of the match
found, and the <code>match->num_ins</code>,
<code>match->num_del</code>, and <code>match->num_subst</code>
fields are set to the number of inserts, deletes, and substitutes in
the match, respectively.
</p>
<p>
The <tt>regaexec()</tt> functions return zero if a match with cost
smaller than <code>params->max_cost</code> was found, otherwise
they return <code>REG_NOMATCH</code> to indicate no match, or
<code>REG_ESPACE</code> to indicate that enough temporary memory could
not be allocated to complete the matching operation.
</p>
<h2>Miscellaneous</h2>
<div class="code">
<code>
#include <tre/regex.h>
<br>
<br>
<font class="type">int</font> <font
class="func">tre_have_backrefs</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font class="arg">preg</font>);
<br>
<font class="type">int</font> <font
class="func">tre_have_approx</font>(<font class="qual">const</font>
<font class="type">regex_t</font> *<font class="arg">preg</font>);
<br>
</code>
</div>
<p>
The <tt><font class="func">tre_have_backrefs</font>()</tt> and
<tt><font class="func">tre_have_approx</font>()</tt> functions return
1 if the compiled pattern has back references or uses approximate
matching, respectively, and 0 if not.
</p>
<h2>Checking build time options</h2>
<a name="tre_config"></a>
<div class="code">
<code>
#include <tre/regex.h>
<br>
<br>
<font class="type">char</font> *<font
class="func">tre_version</font>(<font class="type">void</font>);
<br>
<font class="type">int</font> <font
class="func">tre_config</font>(<font class="type">int</font> <font
class="arg">query</font>, <font class="type">void</font> *<font
class="arg">result</font>);
<br>
</code>
</div>
<p>
The <tt><font class="func">tre_config</font>()</tt> function can be
used to retrieve information of which optional features have been
compiled into the TRE library and information of other parameters that
may change between releases.
</p>
<p>
The <tt><font class="arg">query</font></tt> argument is an integer
telling what information is requested for. The <tt><font
class="arg">result</font></tt> argument is a pointer to a variable
where the information is returned. The return value of a call to
<tt><font class="func">tre_config</font>()</tt> is zero if <tt><font
class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise.
</p>
<p>
The following values are recognized for <tt><font
class="arg">query</font></tt>:
<blockquote>
<dl>
<dt><tt>TRE_CONFIG_APPROX</tt></dt>
<dd>The result is an integer that is set to one if approximate
matching support is available, zero if not.</dd>
<dt><tt>TRE_CONFIG_WCHAR</tt></dt>
<dd>The result is an integer that is set to one if wide character
support is available, zero if not.</dd>
<dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt>
<dd>The result is an integer that is set to one if multibyte character
set support is available, zero if not.</dd>
<dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt>
<dd>The result is an integer that is set to one if TRE has been
compiled to be compatible with the system regex ABI, zero if not.</dd>
<dt><tt>TRE_CONFIG_VERSION</tt></dt>
<dd>The result is a pointer to a static character string that gives
the version of the TRE library.</dd>
</dl>
</blockquote>
<p>
The <tt><font class="func">tre_version</font>()</tt> function returns
a short human readable character string which shows the software name,
version, and license.
<h2>Preprocessor definitions</h2>
<p>The header <tt><tre/regex.h></tt> defines certain
C preprocessor symbols.
<h3>Version information</h3>
<p>The following definitions may be useful for checking whether a new
enough version is being used. Note that it is recommended to use the
<tt>pkg-config</tt> tool for version and other checks in Autoconf
scripts.</p>
<blockquote>
<dl>
<dt><tt>TRE_VERSION</tt></dt>
<dd>The version string. </dd>
<dt><tt>TRE_VERSION_1</tt></dt>
<dd>The major version number (first part of version string).</dd>
<dt><tt>TRE_VERSION_2</tt></dt>
<dd>The minor version number (second part of version string).</dd>
<dt><tt>TRE_VERSION_3</tt></dt>
<dd>The micro version number (third part of version string).</dd>
</dl>
</blockquote>
<h3>Features</h3>
<p>The following definitions may be useful for checking whether all
necessary features are enabled. Use these only if compile time
checking suffices (linking statically with TRE). When linking
dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used
instead.</p>
<blockquote>
<dl>
<dt><tt>TRE_APPROX</tt></dt>
<dd>This is defined if approximate matching support is enabled. The
prototypes for approximate matching functions are defined only if
<tt>TRE_APPROX</tt> is defined.</dd>
<dt><tt>TRE_WCHAR</tt></dt>
<dd>This is defined if wide character support is enabled. The
prototypes for wide character matching functions are defined only if
<tt>TRE_WCHAR</tt> is defined.</dd>
<dt><tt>TRE_MULTIBYTE</tt></dt>
<dd>This is defined if multibyte character set support is enabled.
If this is not set any locale settings are ignored, and the default
locale is used when parsing regexps and matching strings.</dd>
</dl>
</blockquote>
|