1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
|
# pgpcre
[](http://travis-ci.org/petere/pgpcre)
This is a module for PostgreSQL that exposes Perl-compatible regular expressions (PCRE) functionality as functions and operators. It is based on the popular [PCRE library](http://www.pcre.org/).
## Installation
You need to have libpcre installed. pkg-config will be used to find it.
To build and install this module:
make
make install
or selecting a specific PostgreSQL installation:
make PG_CONFIG=/some/where/bin/pg_config
make PG_CONFIG=/some/where/bin/pg_config install
And finally inside the database:
CREATE EXTENSION pgpcre;
## Using
A regular expression is a separate data type, named `pcre`. (This is different from how the built-in regular expressions in PostgreSQL work, which are simply values of type `text`.)
The supported regular expressions are documented on the [pcrepattern(3)](http://linux.die.net/man/3/pcrepattern) man page.
### Basic matching
Boolean operators are available for checking whether a pattern matches a string. These operators return true or false, respectively. They only return null when one of the operands is null.
Examples:
SELECT 'foo' ~ pcre 'fo+';
SELECT 'bar' !~ pcre 'fo+';
You can also write it the other way around:
SELECT pcre 'fo+' ~ 'foo';
SELECT pcre 'fo+' !~ 'bar';
This can be handy for writing things like
SELECT pcre 'fo+' ~ ANY(ARRAY['foo', 'bar']);
For Perl nostalgia, you can also use this operator:
SELECT 'foo' =~ pcre 'fo+';
And if this operator is unique (which it should be, unless you have
something else installed that uses it), you can also write:
SELECT 'foo' =~ 'fo+';
(The `~` operator, by contrast, is not unique, of course, because it is used by the built-in regular expressions.)
To get case-insensitive matching, set the appropriate option in the pattern, for example:
SELECT 'FOO' ~ pcre '(?i)fo+';
### Extracting the matched string
To extract the substring that was matched by the pattern, use the
function `pcre_match`. It returns either a value of type text, or
null if the pattern did not match. Examples:
SELECT pcre_match('fo+', 'foobar'); --> 'foo'
SELECT pcre_match('fo+', 'barbar'); --> NULL
There is no support for extracting multiple matches of a pattern in a
string, because PCRE does not (easily) support that.
### Extracting captured substrings
Captured substrings (parenthesized subexpressions) are extracted using
the function `pcre_captured_substrings`. It returns either an array
of text, or null if the pattern did not match. Examples:
SELECT pcre_captured_substrings('(fo+)(b..)', 'foobar'); --> ARRAY['foo','bar']
SELECT pcre_captured_substrings('(fo+)(b..)', 'abcdef'); --> NULL
Note that elements of the array can be null if a substring was not used, for example:
SELECT pcre_captured_substrings('(a|(z))(bc)', 'abc'); --> ARRAY['a',NULL,'bc']
### Storing regular expressions
You can store regular expression values of type `pcre` in tables, like
any other data. Note, however, that the binary representation of the
`pcre` values contains the compiled regular expression, which is tied
to the version of the PCRE library. If you upgrade the PCRE library
and use a compiled value created by a different version, things might
not work or even crash (according to the PCRE documentation; I don't
know how likely that is). pgpcre will warn if you attempt to use a
value that was compiled by a different version of the library. If
that happens, it is advisable to recompile and rewrite all stored
`pcre` values by doing something like
UPDATE ... SET pcre_col = pcre_col::text::pcre
(To be clear, storing regular expressions in tables is not a typical
use. Normally, you store text in tables and match it against regular
expressions provided by your application.)
## Discussion
Some possible advantages over the regular expression support built into PostgreSQL:
- richer pattern language, more familiar to Perl and Python programmers
- complete Unicode support
- saner operators and functions
Some disadvantages:
- no repeated matching (`'g'` flag)
- no index optimization
You can workaround the lack of index optimization by manually augmenting queries like
column =~ '^foo'
with
AND column ~>=~ 'foo' AND column ~<~ 'fop'
and creating the appropriate `text_pattern_ops` index as you would for the built-in pattern matching.
|