1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
|
# These test special UTF and UCP features of DFA matching. The output is
# different for the different widths.
#subject dfa
# ----------------------------------------------------
# These are a selection of the more comprehensive tests that are run for
# non-DFA matching.
/X/utf
XX\x{d800}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=offset=3
Error -36 (bad UTF-8 offset)
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
/badutf/utf
X\xdf
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1
XX\xef
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XXX\xef\x80
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
X\xf7
Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1
XX\xf7\x80
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XXX\xf7\x80\x80
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
/shortutf/utf
XX\xdf\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
XX\xef\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XX\xef\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
\xf7\=ph
Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
\xf7\x80\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
# ----------------------------------------------------
# UCP and casing tests - except for the first two, these will all fail in 8-bit
# mode because they are testing UCP without UTF and use characters > 255.
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
No match
/\x{c1}+\x{e1}/iB,ucp
------------------------------------------------------------------
Bra
/i \x{c1}+
/i \x{e1}
Ket
End
------------------------------------------------------------------
\x{c1}\x{c1}\x{c1}
0: \xc1\xc1\xc1
1: \xc1\xc1
\x{e1}\x{e1}\x{e1}
0: \xe1\xe1\xe1
1: \xe1\xe1
/\x{120}\x{c1}/i,ucp,no_start_optimize
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
\x{121}\x{e1}
/\x{120}\x{c1}/i,ucp
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
\x{121}\x{e1}
/[^\x{120}]/i,no_start_optimize
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\x{121}
/[^\x{120}]/i,ucp,no_start_optimize
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{121}
/[^\x{120}]/i
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\x{121}
/[^\x{120}]/i,ucp
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{121}
/\x{120}{2}/i,ucp
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
\x{121}\x{121}
/[^\x{120}]{2}/i,ucp
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{121}\x{121}
# ----------------------------------------------------
# ----------------------------------------------------
# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
# mode; for the other widths they will fail.
/k*\x{ffffffff}/caseless,ucp
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
\x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
No match
# ----------------------------------------------------
# End of testinput14
|