| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 
 | Description: force treat scanned string as UTF-8 when compiled regex is UTF8
 re::engine::RE2 is documented (in BUGS section of README)
 to not handle UTF-8 correctly.
 .
 Without this patch,
 scanning Latin1 string with UTF-8 regex reports wrong positions
 or potentially crashes,
 and misses e.g. "£" (which Perl re engine correctly matches).
 .
 With this patch,
 scanning UTF-8 string with UTF-8 regex should behave correctly,
 and still misses e.g. "£".
 .
 Scanning should be safer and more correct for UTF-8 strings,
 with only known side-effect of being slower for non-UTF-8 strings
 due to always upgrading string to UTF-8.
 For faster scanning of known ASCII string, use an ASCII regex.
Origin: https://github.com/dgl/re-engine-RE2/pull/8
Author: Todd Richmond <trichmond@proofpoint.com>
Bug: https://rt.cpan.org/Public/Bug/Display.html?id=116747
Bug: https://rt.cpan.org/Public/Bug/Display.html?id=131618
Last-Update: 2023-06-21
---
This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
--- a/re2_xs.cc
+++ b/re2_xs.cc
@@ -101,10 +101,12 @@
     // XXX: Need to compile two versions?
     /* The pattern is not UTF-8. Tell RE2 to treat it as Latin1. */
 #ifdef RXf_UTF8
-    if (!(flags & RXf_UTF8))
+    if (flags & RXf_UTF8)
 #else
-    if (!SvUTF8(pattern))
+    if (SvUTF8(pattern))
 #endif
+        extflags |= RXf_MATCH_UTF8;
+    else
         options.set_encoding(RE2::Options::EncodingLatin1);
 
     options.set_log_errors(false);
@@ -311,7 +313,7 @@
     RE2::Options options;
     options.Copy(previous->options());
 
-    return new RE2 (re2::StringPiece(RX_WRAPPED(rx), RX_WRAPLEN(rx)), options);
+    return new RE2 (previous->pattern(), options);
 }
 
 SV *
 |