File: README.md

package info (click to toggle)
pcre2-ocaml 8.0.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 356 kB
  • sloc: ml: 2,435; ansic: 562; makefile: 34
file content (140 lines) | stat: -rw-r--r-- 4,313 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# PCRE2-OCaml - Perl Compatibility Regular Expressions for OCaml

Fork of the original [pcre-ocaml project](https://github.com/mmottl/pcre-ocaml)
for PCRE2 support.

These are the bindings as needed by the [Haxe
compiler](https://github.com/HaxeFoundation/haxe). I do not plan on maintaining
this repository.

This [OCaml](http://www.ocaml.org) library interfaces with the C library
[PCRE2](http://www.pcre.org), providing Perl-compatible regular expressions
for string matching.

## Features

PCRE2-OCaml offers:

- Pattern searching
- Subpattern extraction
- String splitting by patterns
- Pattern substitution

Reasons to choose PCRE2-OCaml:

- The PCRE2 library by Philip Hazel is mature and stable, implementing nearly
  all Perl regular expression features. High-level OCaml functions (split,
  replace, etc.) are compatible with Perl functions, as much as OCaml allows.
  Some developers find Perl-style regex syntax more intuitive and powerful
  than the Emacs-style regex used in OCaml's `Str` module.

- PCRE2-OCaml is reentrant and thread-safe, unlike the `Str` module. This
  reentrancy offers convenience, eliminating concerns about library state.

- High-level replacement and substitution functions in OCaml are faster than
  those in the `Str` module. When compiled to native code, they can even
  outperform Perl's C-based functions.

- Returned data is unique, allowing safe destructive updates without side
  effects.

- The library interface uses labels and default arguments for enhanced
  programming comfort.

## Usage

Please run:

```
$ odig odoc pcre2
```

Or (maybe?):

```
$ dune build @doc
```

Functions support two flag types:

1. **Convenience flags**: Readable and concise, translated internally on each
   call. Example:

   ```ocaml
   let rex = Pcre2.regexp ~flags:[`ANCHORED; `CASELESS] "some pattern" in
   (* ... *)
   ```

   These are easy to use but may incur overhead in loops. For performance
   optimization, consider the next approach.

2. **Internal flags**: Predefined and translated from convenience flags for
   optimal loop performance. Example:

   ```ocaml
   let iflags = Pcre2.cflags [`ANCHORED; `CASELESS] in
   for i = 1 to 1000 do
     let rex = Pcre2.regexp ~iflags "some pattern constructed at runtime" in
     (* ... *)
   done
   ```

   Translating flags outside loops saves cycles. Avoid creating regex in
   loops:

   ```ocaml
   for i = 1 to 1000 do
     let chunks = Pcre2.split ~pat:"[ \t]+" "foo bar" in
     (* ... *)
   done
   ```

   Instead, predefine the regex:

   ```ocaml
   let rex = Pcre2.regexp "[ \t]+" in
   for i = 1 to 1000 do
     let chunks = Pcre2.split ~rex "foo bar" in
     (* ... *)
   done
   ```

Functions use optional arguments with intuitive defaults. For instance,
`Pcre2.split` defaults to whitespace as the pattern. The `examples` directory
contains applications demonstrating PCRE2-OCaml's functionality.

## Restartable (Partial) Pattern Matching

PCRE2 includes a DFA match function for restarting partial matches with new
input, exposed via `pcre2_dfa_exec`. While not suitable for extracting
submatches or splitting strings, it's useful for streaming and search tasks.

Example of a partial match restarted:

```ocaml
utop # open Pcre2;;
utop # let rex = regexp "12+3";;
val rex : regexp = <abstr>
utop # let workspace = Array.make 40 0;;
val workspace : int array =
  [|0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
    0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0|]
utop # pcre2_dfa_match ~rex ~flags:[`PARTIAL_SOFT] ~workspace "12222";;
Exception: Pcre2.Error Partial.
utop # pcre2_dfa_match ~rex ~flags:[`PARTIAL_SOFT; `DFA_RESTART] ~workspace "2222222";;
Exception: Pcre2.Error Partial.
utop # pcre2_dfa_exec ~rex ~flags:[`PARTIAL_SOFT; `DFA_RESTART] ~workspace "2222222";;
Exception: Pcre2.Error Partial.
utop # pcre2_dfa_exec ~rex ~flags:[`PARTIAL_SOFT; `DFA_RESTART] ~workspace "223xxxx";;
- : int array = [|0; 3; 0|]
```

Refer to the `pcre2_dfa_exec` documentation and the `dfa_restart` example for
more information.

## Contact Information and Contributing

Submit bug reports, feature requests, and contributions via the
[GitHub issue tracker](https://github.com/camlp5/pcre2-ocaml/issues).

For the latest information, visit: <https://github.com/camlp5/pcre2-ocaml>