1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
|
bf(C++) itself provides facilities for handling regular expressions. Regular
expressions were already available in bf(C++) via its bf(C) heritage (as bf(C)
has always offered functions like ti(regcomp) and ti(regexec)), but the
dedicated regular expression facilities have a richer interface than the
traditional bf(C) facilities, and can be used in code using templates.
Before using the specific bf(C++) implementations of regular expressions the
header file tthi(regex) must be included.
Regular expressions are extensively documented elsewhere (e.g., bf(regex)(7),
i(Friedl, J.E.F)
url(Mastering Regular Expressions)(http://oreilly.com/catalog/), O'Reilly).
The reader is referred to these sources for a refresher on the topic of
regular expressions. In essence, regular expressions define a small
meta-language recognizing textual units (like `numbers', `identifiers', etc.).
They are extensively used in the context of em(lexical scanners) (cf. section
ref(Flexcpp)) when defining the sequence of input characters associated with
em(tokens). But they are also intensively used in other situations. Programs
like bf(sed)(1) and bf(grep)(1) use regular expressions to find pieces of text
in files having certain characteristics, and a program like bf(perl)(1) adds
some `sugar' to the regular expression language, simplifying the construction
of regular expressions. However, though extremely useful, it is also well
known that regular expressions tend to be very hard to read. Some even call
the regular expression language a em(write-only language): while specifying a
regular expression it's often clear why it's written in a particular way. But
the opposite, understanding what a regular expression is supposed to represent
if you lack the proper context, can be extremely difficult. That's why, from
the onset and as a emi(rule of thumb), it is stressed that an appropriate
comment should be provided, with em(each) regular expression, as to what it is
supposed to match.
In the upcoming sections first a short overview of the regular expression
language is provided, which is then followed by the facilities bf(C++) is
currently offering for using regular expressions. These facilities mainly
consist of classes helping you to specify regular expression, matching them to
text, and determining which parts of the text (if any) match (parts of) the
text being analyzed.
|