1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
|
# -*- mode: org -*-
* Use ExtUtils::CppGuess's C++11 support, rather than doing it ourselves
(Probably needs some fixes in CppGuess for some platforms I tried though)
* Fix UTF-8 support
This turns out to be harder than I was thinking. The first step is to compile
two versions of the regexp, one for matching UTF-8 and one for matching
Latin1 (maybe on demand).
RE2 won't accept \x{...} escapes that are greater than the current character
set. I was hoping it would be possible to give a string containing these to
RE2 then let RE2 realise part of it won't match (e.g. (?:foo|\x{1234}) will
still match foo, even if the input string isn't UTF-8).
(I'm only talking about \x{...}; this is the only case I have to
care about, \p{...} *are* accepted by RE2 regardless. Due to Perl's
behaviour we can't have raw UTF-8 in the string if the UTF-8 flag
isn't on.)
The approach for now will probably be to replace \x{nnn} in strings (where
nnn>0xFF) with something that won't match (maybe [^\x00-\xff]), but allows
the other branches to match.
** Think about supporting perl 5.14's unicode regexp flags
At least at the top level, implementing within RE2 would be silly.
RE2 doesn't have all the behaviours perl does (i.e. /a is implied
for \d, etc.). Might just be a case of documenting what RE2 does,
once UTF-8 is working to some extent. An alternative could be to
make things explicit (e.g. you need to say "no feature
'unicode_strings'" if you happen to have enabled them to use RE2).
* Support more options
** never_nl could be useful for cpangrep optimisations
* Support RE2::Set functionality
i.e. a Regexp::RE2::Set class that can have RE2 regexps added into it
then a match method.
* Improve tests
** Improve performance comparisons
See maybe https://github.com/axiak/pyre2/blob/master/tests/performance.py
* Support /x (probably needs RE2 changes to do properly)
* Both Perl and RE2 store the stringification of the regexp, can we avoid this?
|