File: pattern.differences.xml

package info (click to toggle)
php-doc 20100521-2
  • links: PTS, VCS
  • area: main
  • in suites: squeeze, wheezy
  • size: 59,992 kB
  • ctags: 4,085
  • sloc: xml: 796,833; php: 21,338; cpp: 500; sh: 117; makefile: 58; awk: 28
file content (156 lines) | stat: -rw-r--r-- 5,765 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision: 297028 $ -->
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
<article xml:id="reference.pcre.pattern.differences" xmlns="http://docbook.org/ns/docbook">
 <title>Perl Differences</title>
 <titleabbrev>Differences From Perl</titleabbrev>
 <para>
  The differences described here are with respect to Perl 5.005.
  <orderedlist>
   <listitem>
    <simpara>
     By default, a whitespace character is any character  that
     the C library function isspace() recognizes, though it is
     possible to compile PCRE  with alternative character type
     tables. Normally isspace() matches space, formfeed, newline,
     carriage return, horizontal tab, and vertical tab. Perl 5 no
     longer  includes vertical tab in its set of whitespace characters.
     The \v escape that was in the Perl documentation for
     a long time was never in fact recognized. However, the character
     itself was treated as whitespace at least up to 5.002.
     In 5.004 and 5.005 it does not match \s.
    </simpara>
   </listitem>
   <listitem>
    <simpara>
     PCRE does not allow repeat quantifiers on lookahead
     assertions. Perl permits them, but they do not mean what you
     might think. For example, (?!a){3} does not assert that  the
     next three characters are not "a". It just asserts that the
     next character is not "a" three times.
    </simpara>
   </listitem>
   <listitem>
    <simpara>
     Capturing subpatterns that occur inside negative
     lookahead assertions are counted, but their entries in the
     offsets vector are never set. Perl sets its numerical
     variables from any such patterns that are matched before the
     assertion fails to match something (thereby succeeding), but
     only  if  the negative lookahead assertion contains just one
     branch.
    </simpara>
   </listitem>
   <listitem>
    <simpara>
     Though binary zero characters are supported in  the  subject  string,
     they are not allowed in a pattern string because it is passed as a
     normal C string, terminated  by zero. The escape sequence "\x00" can
     be used in the pattern to represent a binary zero.
    </simpara>
    </listitem>
    <listitem>
    <simpara>
     The following Perl escape sequences  are  not  supported:
     \l,  \u,  \L,  \U. In fact these are implemented by
     Perl's general string-handling and are not part of its
     pattern matching engine.
    </simpara>
    </listitem>
    <listitem>
    <simpara>
     The Perl \G assertion is  not  supported  as  it  is  not
     relevant to single pattern matches.
    </simpara>
    </listitem>
    <listitem>
    <simpara>
     Fairly obviously, PCRE does not support the (?{code}) and (??{code})
     construction. However, there is support for recursive  patterns.
    </simpara>
    </listitem>
    <listitem>
    <simpara>
     There are at the time of writing some  oddities  in  Perl
     5.005_02  concerned  with  the  settings of captured strings
     when part of a pattern is repeated.  For  example,  matching
     "aba"  against the pattern /^(a(b)?)+$/ sets $2 to the value
     "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves  $2
     unset.    However,    if   the   pattern   is   changed   to
     /^(aa(b(b))?)+$/ then $2 (and $3) get set.
     In Perl 5.004 $2 is set in both cases, and that is also &true;
     of PCRE. If in the future Perl changes to a consistent state
     that is different, PCRE may change to follow.
    </simpara>
    </listitem>
    <listitem>
    <simpara>
     Another as yet unresolved discrepancy  is  that  in  Perl
     5.005_02  the  pattern /^(a)?(?(1)a|b)+$/ matches the string
     "a", whereas in PCRE it does not.  However, in both Perl and
     PCRE /^(a)?a/ matched against "a" leaves $1 unset.
    </simpara>
    </listitem>
    <listitem>
    <para>
     PCRE  provides  some  extensions  to  the  Perl  regular
     expression facilities:
      <orderedlist>
       <listitem>
        <simpara>
         Although lookbehind assertions must match  fixed  length
         strings,  each  alternative branch of a lookbehind assertion
         can match a different length of string. Perl 5.005  requires
         them all to have the same length.
       </simpara>
      </listitem>
      <listitem>
       <simpara>
        If <link linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
        is set and <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
        not set, the $ meta-character matches only at the very end of the
        string.
       </simpara>
      </listitem>
      <listitem>
       <simpara>
        If <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> is
        set, a backslash followed by a letter with no special meaning is
        faulted.
       </simpara>
      </listitem>
      <listitem>
       <simpara>
        If <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> is
        set, the greediness of the repetition  quantifiers  is inverted,
        that is, by default they are not greedy, but if followed by a
        question mark they are.
       </simpara>
      </listitem>
     </orderedlist>
    </para>
   </listitem>
  </orderedlist>
 </para>
</article>

<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->