File: README.md

package info (click to toggle)
pgpcre 0.20190509-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 140 kB
  • sloc: ansic: 262; sql: 181; makefile: 19; sh: 1
file content (129 lines) | stat: -rw-r--r-- 4,482 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# pgpcre

[![Build Status](https://secure.travis-ci.org/petere/pgpcre.png)](http://travis-ci.org/petere/pgpcre)

This is a module for PostgreSQL that exposes Perl-compatible regular expressions (PCRE) functionality as functions and operators.  It is based on the popular [PCRE library](http://www.pcre.org/).

## Installation

You need to have libpcre installed.  pkg-config will be used to find it.

To build and install this module:

    make
    make install

or selecting a specific PostgreSQL installation:

    make PG_CONFIG=/some/where/bin/pg_config
    make PG_CONFIG=/some/where/bin/pg_config install

And finally inside the database:

    CREATE EXTENSION pgpcre;

## Using

A regular expression is a separate data type, named `pcre`.  (This is different from how the built-in regular expressions in PostgreSQL work, which are simply values of type `text`.)

The supported regular expressions are documented on the [pcrepattern(3)](http://linux.die.net/man/3/pcrepattern) man page.

### Basic matching

Boolean operators are available for checking whether a pattern matches a string.  These operators return true or false, respectively.  They only return null when one of the operands is null.

Examples:

    SELECT 'foo' ~ pcre 'fo+';
    SELECT 'bar' !~ pcre 'fo+';

You can also write it the other way around:

    SELECT pcre 'fo+' ~ 'foo';
    SELECT pcre 'fo+' !~ 'bar';

This can be handy for writing things like

    SELECT pcre 'fo+' ~ ANY(ARRAY['foo', 'bar']);

For Perl nostalgia, you can also use this operator:

    SELECT 'foo' =~ pcre 'fo+';

And if this operator is unique (which it should be, unless you have
something else installed that uses it), you can also write:

    SELECT 'foo' =~ 'fo+';

(The `~` operator, by contrast, is not unique, of course, because it is used by the built-in regular expressions.)

To get case-insensitive matching, set the appropriate option in the pattern, for example:

    SELECT 'FOO' ~ pcre '(?i)fo+';

### Extracting the matched string

To extract the substring that was matched by the pattern, use the
function `pcre_match`.  It returns either a value of type text, or
null if the pattern did not match.  Examples:

    SELECT pcre_match('fo+', 'foobar');  --> 'foo'
    SELECT pcre_match('fo+', 'barbar');  --> NULL

There is no support for extracting multiple matches of a pattern in a
string, because PCRE does not (easily) support that.

### Extracting captured substrings

Captured substrings (parenthesized subexpressions) are extracted using
the function `pcre_captured_substrings`.  It returns either an array
of text, or null if the pattern did not match.  Examples:

    SELECT pcre_captured_substrings('(fo+)(b..)', 'foobar');  --> ARRAY['foo','bar']
    SELECT pcre_captured_substrings('(fo+)(b..)', 'abcdef');  --> NULL

Note that elements of the array can be null if a substring was not used, for example:

    SELECT pcre_captured_substrings('(a|(z))(bc)', 'abc');  --> ARRAY['a',NULL,'bc']

### Storing regular expressions

You can store regular expression values of type `pcre` in tables, like
any other data.  Note, however, that the binary representation of the
`pcre` values contains the compiled regular expression, which is tied
to the version of the PCRE library.  If you upgrade the PCRE library
and use a compiled value created by a different version, things might
not work or even crash (according to the PCRE documentation; I don't
know how likely that is).  pgpcre will warn if you attempt to use a
value that was compiled by a different version of the library.  If
that happens, it is advisable to recompile and rewrite all stored
`pcre` values by doing something like

    UPDATE ... SET pcre_col = pcre_col::text::pcre

(To be clear, storing regular expressions in tables is not a typical
use.  Normally, you store text in tables and match it against regular
expressions provided by your application.)

## Discussion

Some possible advantages over the regular expression support built into PostgreSQL:

- richer pattern language, more familiar to Perl and Python programmers
- complete Unicode support
- saner operators and functions

Some disadvantages:

- no repeated matching (`'g'` flag)
- no index optimization

You can workaround the lack of index optimization by manually augmenting queries like

    column =~ '^foo'

with

    AND column ~>=~ 'foo' AND column ~<~ 'fop'

and creating the appropriate `text_pattern_ops` index as you would for the built-in pattern matching.