File: TDFA.hs

package info (click to toggle)
haskell-regex-tdfa 1.3.2.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 436 kB
  • sloc: haskell: 4,250; makefile: 3
file content (228 lines) | stat: -rw-r--r-- 7,109 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
{-|
Module: Text.Regex.TDFA
Copyright: (c) Chris Kuklewicz 2007-2009
SPDX-License-Identifier: BSD-3-Clause
Maintainer: Andreas Abel
Stability: stable

The "Text.Regex.TDFA" module provides a backend for regular
expressions. It provides instances for the classes defined and
documented in "Text.Regex.Base" and re-exported by this module.  If
you import this along with other backends then you should do so with
qualified imports (with renaming for convenience).

This regex-tdfa package implements, correctly, POSIX extended regular
expressions.  It is highly unlikely that the @regex-posix@ package on
your operating system is correct, see
<http://www.haskell.org/haskellwiki/Regex_Posix> for examples of your
OS's bugs.

= Importing and using

Declare a dependency on the @regex-tdfa@ library in your @.cabal@ file:

> build-depends: regex-tdfa ^>= 1.3.2

In Haskell modules where you want to use regexes simply @import@ /this/ module:

@
import "Text.Regex.TDFA"
@

= Basics

>>> let emailRegex = "[a-zA-Z0-9+._-]+\\@[-a-zA-Z]+\\.[a-z]+"
>>> "my email is first-name.lastname_1974@e-mail.com" =~ emailRegex :: Bool
True

>>> "invalid@mail@com" =~ emailRegex :: Bool
False

>>> "invalid@mail.COM" =~ emailRegex :: Bool
False

>>> "#@invalid.com" =~ emailRegex :: Bool
False

@
/-- non-monadic/
λ> \<to-match-against\> '=~' \<regex\>

/-- monadic, uses 'fail' on lack of match/
λ> \<to-match-against\> '=~~' \<regex\>
@

('=~') and ('=~~') are polymorphic in their return type. This is so that
regex-tdfa can pick the most efficient way to give you your result based on
what you need. For instance, if all you want is to check whether the regex
matched or not, there's no need to allocate a result string. If you only want
the first match, rather than all the matches, then the matching engine can stop
after finding a single hit.

This does mean, though, that you may sometimes have to explicitly specify the
type you want, especially if you're trying things out at the REPL.

= Common use cases

== Get the first match

@
/-- returns empty string if no match/
a '=~' b :: String  /-- or ByteString, or Text.../
@

>>> "alexis-de-tocqueville" =~ "[a-z]+" :: String
"alexis"

>>> "alexis-de-tocqueville" =~ "[0-9]+" :: String
""

== Check if it matched at all

@
a '=~' b :: Bool
@

>>> "alexis-de-tocqueville" =~ "[a-z]+" :: Bool
True

== Get first match + text before/after

@
/-- if no match, will just return whole/
/-- string in the first element of the tuple/
a =~ b :: (String, String, String)
@

>>> "alexis-de-tocqueville" =~ "de" :: (String, String, String)
("alexis-","de","-tocqueville")

>>> "alexis-de-tocqueville" =~ "kant" :: (String, String, String)
("alexis-de-tocqueville","","")

== Get first match + submatches

@
/-- same as above, but also returns a list of just submatches./
/-- submatch list is empty if regex doesn't match at all/
a '=~' b :: (String, String, String, [String])
@

>>> "div[attr=1234]" =~ "div\\[([a-z]+)=([^]]+)\\]" :: (String, String, String, [String])
("","div[attr=1234]","",["attr","1234"])

== Get /all/ matches

@
/-- can also return Data.Array instead of List/
'getAllTextMatches' (a '=~' b) :: [String]
@

>>> getAllTextMatches ("john anne yifan" =~ "[a-z]+") :: [String]
["john","anne","yifan"]

>>> getAllTextMatches ("* - . a + z" =~ "[--z]+") :: [String]
["-",".","a","z"]

= Feature support

This package does provide captured parenthesized subexpressions.

Depending on the text being searched this package supports Unicode.
The @[Char]@, @Text@, @Text.Lazy@, and @(Seq Char)@ text types support Unicode.  The @ByteString@
and @ByteString.Lazy@ text types only support ASCII.

As of version 1.1.1 the following GNU extensions are recognized, all
anchors:

* \\\` at beginning of entire text
* \\\' at end of entire text
* \\\< at beginning of word
* \\\> at end of word
* \\b at either beginning or end of word
* \\B at neither beginning nor end of word

The above are controlled by the 'newSyntax' Bool in 'CompOption'.

Where the "word" boundaries means between characters that are and are
not in the [:word:] character class which contains [a-zA-Z0-9_].  Note
that \\\< and \\b may match before the entire text and \\\> and \\b may
match at the end of the entire text.

There is no locale support, so collating elements like [.ch.] are
simply ignored and equivalence classes like [=a=] are converted to
just [a].  The character classes like [:alnum:] are supported over
ASCII only, valid classes are alnum, digit, punct, alpha, graph,
space, blank, lower, upper, cntrl, print, xdigit, word.

>>> getAllTextMatches ("john anne yifan" =~ "[[:lower:]]+") :: [String]
["john","anne","yifan"]


This package does not provide "basic" regular expressions.  This
package does not provide back references inside regular expressions.

The package does not provide Perl style regular expressions.  Please
look at the <http://hackage.haskell.org/package/regex-pcre regex-pcre>
and <http://hackage.haskell.org/package/pcre-light pcre-light> packages instead.

This package does not provide find-and-replace.

= Avoiding backslashes

If you find yourself writing a lot of regexes, take a look at
<http://hackage.haskell.org/package/raw-strings-qq raw-strings-qq>. It'll
let you write regexes without needing to escape all your backslashes.

@
\{\-\# LANGUAGE QuasiQuotes \#\-\}

import Text.RawString.QQ
import Text.Regex.TDFA

λ> "2 * (3 + 1) / 4" '=~' [r|\\([^)]+\\)|] :: String
"(3 + 1)"
@

-}

module Text.Regex.TDFA(getVersion_Text_Regex_TDFA
                      ,(=~),(=~~)
                      ,module Text.Regex.TDFA.Common
                      ,module Text.Regex.Base) where

import qualified Control.Monad.Fail as Fail
import Data.Version(Version)
import Text.Regex.Base
import Text.Regex.TDFA.String()
import Text.Regex.TDFA.ByteString()
import Text.Regex.TDFA.ByteString.Lazy()
import Text.Regex.TDFA.Text()
import Text.Regex.TDFA.Text.Lazy()
import Text.Regex.TDFA.Sequence()
import Text.Regex.TDFA.Common(Regex,CompOption(..),ExecOption(..))
--import Text.Regex.TDFA.Wrap(Regex,CompOption(..),ExecOption(..),(=~),(=~~))

import Paths_regex_tdfa(version)

getVersion_Text_Regex_TDFA :: Version
getVersion_Text_Regex_TDFA = version


-- | This is the pure functional matching operator.  If the target
-- cannot be produced then some empty result will be returned.  If
-- there is an error in processing, then 'error' will be called.
(=~) :: (RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target)
     => source1 -> source -> target
(=~) x r = let make :: RegexMaker Regex CompOption ExecOption a => a -> Regex
               make = makeRegex
           in match (make r) x

-- | This is the monadic matching operator.  If a single match fails,
-- then 'fail' will be called.
(=~~) :: (RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target, Fail.MonadFail m)
      => source1 -> source -> m target
(=~~) x r = do let make :: (RegexMaker Regex CompOption ExecOption a, Fail.MonadFail m) => a -> m Regex
                   make = makeRegexM
               q <- make r
               matchM q x