File: 1003_utf8_flag.patch

package info (click to toggle)

libre-engine-re2-perl 0.18%2Bds-3

links: PTS, VCS
area: main
in suites: forky, sid
size: 440 kB
sloc: cpp: 270; perl: 80; makefile: 2; sh: 1

file content (50 lines) | stat: -rw-r--r-- 1,699 bytes

parent folder | download | duplicates (2)

Description: force treat scanned string as UTF-8 when compiled regex is UTF8
 re::engine::RE2 is documented (in BUGS section of README)
 to not handle UTF-8 correctly.
 .
 Without this patch,
 scanning Latin1 string with UTF-8 regex reports wrong positions
 or potentially crashes,
 and misses e.g. "£" (which Perl re engine correctly matches).
 .
 With this patch,
 scanning UTF-8 string with UTF-8 regex should behave correctly,
 and still misses e.g. "£".
 .
 Scanning should be safer and more correct for UTF-8 strings,
 with only known side-effect of being slower for non-UTF-8 strings
 due to always upgrading string to UTF-8.
 For faster scanning of known ASCII string, use an ASCII regex.
Origin: https://github.com/dgl/re-engine-RE2/pull/8
Author: Todd Richmond <trichmond@proofpoint.com>
Bug: https://rt.cpan.org/Public/Bug/Display.html?id=116747
Bug: https://rt.cpan.org/Public/Bug/Display.html?id=131618
Last-Update: 2023-06-21
---
This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
--- a/re2_xs.cc
+++ b/re2_xs.cc
@@ -101,10 +101,12 @@
     // XXX: Need to compile two versions?
     /* The pattern is not UTF-8. Tell RE2 to treat it as Latin1. */
 #ifdef RXf_UTF8
-    if (!(flags & RXf_UTF8))
+    if (flags & RXf_UTF8)
 #else
-    if (!SvUTF8(pattern))
+    if (SvUTF8(pattern))
 #endif
+        extflags |= RXf_MATCH_UTF8;
+    else
         options.set_encoding(RE2::Options::EncodingLatin1);
 
     options.set_log_errors(false);
@@ -311,7 +313,7 @@
     RE2::Options options;
     options.Copy(previous->options());
 
-    return new RE2 (re2::StringPiece(RX_WRAPPED(rx), RX_WRAPLEN(rx)), options);
+    return new RE2 (previous->pattern(), options);
 }
 
 SV *