1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
|
# These test special UTF and UCP features of DFA matching. The output is
# different for the different widths.
#subject dfa
# ----------------------------------------------------
# These are a selection of the more comprehensive tests that are run for
# non-DFA matching.
/X/utf
XX\x{d800}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=offset=3
No match
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
/badutf/utf
X\xdf
No match
XX\xef
No match
XXX\xef\x80
No match
X\xf7
No match
XX\xf7\x80
No match
XXX\xf7\x80\x80
No match
/shortutf/utf
XX\xdf\=ph
No match
XX\xef\=ph
No match
XX\xef\x80\=ph
No match
\xf7\=ph
No match
\xf7\x80\=ph
No match
# ----------------------------------------------------
# UCP and casing tests - except for the first two, these will all fail in 8-bit
# mode because they are testing UCP without UTF and use characters > 255.
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
No match
/\x{c1}+\x{e1}/iB,ucp
------------------------------------------------------------------
Bra
/i \x{c1}+
/i \x{e1}
Ket
End
------------------------------------------------------------------
\x{c1}\x{c1}\x{c1}
0: \xc1\xc1\xc1
1: \xc1\xc1
\x{e1}\x{e1}\x{e1}
0: \xe1\xe1\xe1
1: \xe1\xe1
/\x{120}\x{c1}/i,ucp,no_start_optimize
\x{121}\x{e1}
0: \x{121}\xe1
/\x{120}\x{c1}/i,ucp
\x{121}\x{e1}
0: \x{121}\xe1
/[^\x{120}]/i,no_start_optimize
\x{121}
0: \x{121}
/[^\x{120}]/i,ucp,no_start_optimize
\= Expect no match
\x{121}
No match
/[^\x{120}]/i
\x{121}
0: \x{121}
/[^\x{120}]/i,ucp
\= Expect no match
\x{121}
No match
/\x{120}{2}/i,ucp
\x{121}\x{121}
0: \x{121}\x{121}
/[^\x{120}]{2}/i,ucp
\= Expect no match
\x{121}\x{121}
No match
# ----------------------------------------------------
# ----------------------------------------------------
# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
# mode; for the other widths they will fail.
/k*\x{ffffffff}/caseless,ucp
\x{ffffffff}
0: \x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
0: K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
No match
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
No match
/k\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
0: K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
No match
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
No match
# ----------------------------------------------------
# End of testinput14
|