1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
|
Metadata-Version: 2.1
Name: pcre2
Version: 0.4.0
Summary: Python bindings for the PCRE2 regular expression library
Home-page: https://github.com/grtetrault/pcre2.py
Author: Garrett Tetrault
License: BSD 3-Clause License
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: C
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
License-File: LICENSE
# PCRE2.py: Python bindings for the PCRE2 regular expression library
This project contains Python bindings for [PCRE2](https://github.com/PCRE2Project/pcre2).
PCRE2 is the revised API for the Perl-compatible regular expressions (PCRE) library created by Philip Hazel.
For original source code, see the [official PCRE2 repository](https://github.com/PCRE2Project/pcre2).
## Installation
From PyPI:
```
pip install pcre2
```
If a wheel is not available for your platform, the module will be built from source.
Building requires:
* `cmake`
* C compiler toolchain, such as `gcc` and `make`
* `libtool`
* Python headers
## Usage
Regular expressions are compiled with `pcre2.compile()` which accepts both unicode strings and bytes-like objects.
This returns a `Pattern` object.
Expressions can be compiled with a number of options (combined with the bitwise-or operator) and can be JIT compiled,
```python
>>> import pcre2
>>> expr = r'(?<head>\w+)\s+(?<tail>\w+)'
>>> patn = pcre2.compile(expr, options=pcre2.I, jit=True)
>>> # Patterns can also be JIT compiled after initialization.
>>> patn.jit_compile()
```
Inspection of `Pattern` objects is done as follows,
```python
>>> patn.jit_size
980
>>> patn.name_dict()
{1: 'head', 2: 'tail'}
>>> patn.options
524296
>>> # Deeper inspection into options is available.
>>> pcre2.CompileOption.decompose(patn.options)
[<CompileOption.CASELESS: 0x8>, <CompileOption.UTF: 0x80000>]
```
Once compiled, `Pattern` objects can be used to match against strings.
Matching return a `Match` object, which has several functions to view results,
```python
>>> subj = 'foo bar buzz bazz'
>>> match = patn.match(subj)
>>> match.substring()
'foo bar'
>>> match.start(), match.end()
(8, 17)
```
Substitution is also supported, both from `Pattern` and `Match` objects,
```python
>>> repl = '$2 $1'
>>> patn.substitute(repl, subj) # Global substitutions by default.
'bar foo bazz buzz'
>>> patn.substitute(repl, subj, suball=False)
'bar foo buzz bazz'
>>> match.expand(repl)
'bar foo buzz bazz'
```
Additionally, `Pattern` objects support scanning over subjects for all non-overlapping matches,
```python
>>> for match in patn.scan(subj):
... print(match.substring('head'))
...
foo
buzz
```
## Performance
PCRE2 provides a fast regular expression library, particularly with JIT compilation enabled.
Below are the `regex-redux` benchmark results included in this repository,
| Script | Number of runs | Total time | Real time | User time | System time |
| ------------------- | -------------- | ---------- | ---------- | ----------- | ------------- |
| `baseline.py` | 10 | 3.020 | 0.302 | 0.020 | 0.086 |
| `vanilla.py` | 10 | 51.380 | 5.138 | 11.408 | 0.529 |
| `hand_optimized.py` | 10 | 13.190 | 1.319 | 2.846 | 0.344 |
| `pcre2_module.py` | 10 | 13.670 | 1.367 | 2.269 | 0.532 |
Script descriptions are as follows,
| Script | Description |
| ------------------- | -------------------------------------------------------------------- |
| `baseline.py` | Reads input file and outputs stored expected output |
| `vanilla.py` | Pure Python version |
| `hand_optimized.py` | Manually written Python `ctypes` bindings for shared PCRE2 C library |
| `pcre2_module.py` | Implementation using Python bindings written here |
Tests were performed on an M2 Macbook Air.
Note that to run benchmarks locally, [Git LFS](https://git-lfs.com/) must be installed to download the input dataset.
Additionally, a Python virtual environment must be created, and the package built
with `make init` and `make build` respectively.
For more information on this benchmark, see [The Computer Language Benchmarks Game](https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/regexredux.html).
See source code of benchmark scripts for details and original sources.
|