File: TODO

package info (click to toggle)
libre-engine-re2-perl 0.18%2Bds-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 440 kB
  • sloc: cpp: 270; perl: 80; makefile: 2; sh: 1
file content (41 lines) | stat: -rw-r--r-- 2,002 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# -*- mode: org -*-

* Use ExtUtils::CppGuess's C++11 support, rather than doing it ourselves
  (Probably needs some fixes in CppGuess for some platforms I tried though)

* Fix UTF-8 support
  This turns out to be harder than I was thinking. The first step is to compile
  two versions of the regexp, one for matching UTF-8 and one for matching
  Latin1 (maybe on demand).

  RE2 won't accept \x{...} escapes that are greater than the current character
  set. I was hoping it would be possible to give a string containing these to
  RE2 then let RE2 realise part of it won't match (e.g. (?:foo|\x{1234}) will
  still match foo, even if the input string isn't UTF-8).

  (I'm only talking about \x{...}; this is the only case I have to
  care about, \p{...} *are* accepted by RE2 regardless. Due to Perl's
  behaviour we can't have raw UTF-8 in the string if the UTF-8 flag
  isn't on.)

  The approach for now will probably be to replace \x{nnn} in strings (where
  nnn>0xFF) with something that won't match (maybe [^\x00-\xff]), but allows
  the other branches to match.
** Think about supporting perl 5.14's unicode regexp flags
  At least at the top level, implementing within RE2 would be silly.

  RE2 doesn't have all the behaviours perl does (i.e. /a is implied
  for \d, etc.). Might just be a case of documenting what RE2 does,
  once UTF-8 is working to some extent.  An alternative could be to
  make things explicit (e.g. you need to say "no feature
  'unicode_strings'" if you happen to have enabled them to use RE2).
* Support more options
** never_nl could be useful for cpangrep optimisations
* Support RE2::Set functionality
  i.e. a Regexp::RE2::Set class that can have RE2 regexps added into it
  then a match method.
* Improve tests
** Improve performance comparisons
   See maybe https://github.com/axiak/pyre2/blob/master/tests/performance.py
* Support /x (probably needs RE2 changes to do properly)
* Both Perl and RE2 store the stringification of the regexp, can we avoid this?