1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
|
<?xml version="1.0" encoding="UTF-8"?>
<section id="regexp-mod"><title>POSIX Regular Expressions</title>
<para>The ®exp-pac; module implements the &posix;
<ulink url="regexp.html">regular expressions</ulink>
matching by calling the standard &c-lang; system facilities.
The syntax of these ®-exp;s is described in many places,
such as your local ®ex; manual and &emacs; info pages.</para>
&base-module;
<simpara>When this module is present, &features-my;
contains the symbol <constant>:REGEXP</constant>.</simpara>
<variablelist id="regexp-api"><title>Regular Expression API</title>
<varlistentry id="re-match"><term><code>(&re-match; &pattern-r;
&string-r; &key-amp; (&start-k; 0) &end-k; &re-cflags-opts;
&re-eflags-opts;)</code></term>
<listitem><para>This macro returns as first value a &match-t; structure
containing the indices of the start and end of the first match for the
®-exp; &pattern-r; in &string-r;; or no values if there is no match.
Additionally, a &match-t; structure is returned for every matched
<literal>"\(...\)"</literal> group in &pattern-r;, in the
order that the open parentheses appear in &pattern-r;.
If &start-r; is non-&nil;, the search starts at that index in &string-r;.
If &end-r; is non-&nil;, only <code>(&subseq; &string-r; &start-r;
&end-r;)</code> is considered.
<example id="ex-re-match"><title>&re-match;</title>
<programlisting>
(&re-match; "quick" "The quick brown fox jumped quickly.")
<computeroutput>#S(&match-t; :START 4 :END 9)</computeroutput>
(&re-match; "quick" "The quick brown fox jumped quickly." :start 8)
<computeroutput>#S(&match-t; :START 27 :END 32)</computeroutput>
(&re-match; "quick" "The quick brown fox jumped quickly." :start 8 :end 30)
<computeroutput>&nil;</computeroutput>
(&re-match; "\\([a-z]*\\)[0-9]*\\(bar\\)" "foo12bar")
<computeroutput>#S(&match-t; :START 0 :END 8)</computeroutput> ;
<computeroutput>#S(&match-t; :START 0 :END 3)</computeroutput> ;
<computeroutput>#S(&match-t; :START 5 :END 8)</computeroutput>
</programlisting></example>
</para></listitem></varlistentry>
<varlistentry id="re-match-access">
<term><code>(&re-match-start; &match-r;)</code></term>
<term><code>(&re-match-end; &match-r;)</code></term>
<listitem><simpara>Return the start and end the &match-r;; &setf;-able.
</simpara></listitem></varlistentry>
<varlistentry id="re-match-string">
<term><code>(&re-match-string; &string-r; &match-r;)</code></term>
<listitem><simpara>Extracts the substring of &string-r; corresponding
to the given pair of start and end indices of &match-r;.
The result is shared with &string-r;.
If you want a &fresh; &string-t;, use ©-seq; or
&coerce; to &simple-string-t;.</simpara></listitem></varlistentry>
<varlistentry id="re-regexp-quote"><term><code>(&re-regexp-quote;
&string-r; &optional-amp; <replaceable>extended</replaceable>)</code></term>
<listitem><para>This function returns a ®-exp; &string-t;
that matches exactly &string-r; and nothing else.
This allows you to request an exact string match when calling a
function that wants a ®-exp;.
<example id="ex-re-quote"><title>&re-regexp-quote;</title>
<programlisting language="lisp">
(regexp-quote "^The cat$")
<computeroutput>"\\^The cat\\$"</computeroutput></programlisting></example>
One use of &re-regexp-quote; is to combine an exact string match with
context described as a ®-exp;.
When <replaceable>extended</replaceable> is non-&nil;, also
quote <keysym>#\+</keysym> and <keysym>#\?</keysym>.
</para></listitem></varlistentry>
<varlistentry id="re-regexp-compile"><term><code>(&re-regexp-compile;
&string-r; &key-amp; &re-cflags-opts;)</code></term>
<listitem><simpara>Compile the ®-exp; &string-r; into an
object suitable for &re-regexp-exec;.</simpara></listitem></varlistentry>
<varlistentry id="re-regexp-exec"><term><code>(&re-regexp-exec;
&pattern-r; &string-r; &key-amp; &ret-type-k;
(&start-k; 0) &end-k; &re-eflags-opts;)</code></term>
<listitem><simpara>Execute the &pattern-r;, which must be a compiled
®-exp; returned by &re-regexp-compile;, against the appropriate
portion of the &string-r;.</simpara>
<simpara>Returns &match-t; structures as &mul-val; (one for each
subexpression which successfully matched and one for the whole pattern).
</simpara><simpara>If &ret-type-k; is &list-t; (or &vector-t;), the
&match-t; structures are returned as a &list-t; (or a &vector-t;) instead.
Also, if there are more than &multiple-values-limit; &match-t; structures
to return, a &list-t; is returned instead of &mul-val;.
If &ret-type-k; is &boolean-t;, return &t; or &nil; as an indicator
of success or failure, but do ¬-e; allocate anything.
</simpara></listitem></varlistentry>
<varlistentry id="re-regexp-split"><term><code>(&re-regexp-split;
&pattern-r; &string-r; &key-amp; (&start-k; 0) &end-k;
&re-cflags-opts; &re-eflags-opts;)</code></term>
<listitem><simpara>Return a list of substrings of &string-r; (all
sharing the structure with &string-r;) separated by &pattern-r; (a
®-exp; &string-t; or a return value of &re-regexp-compile;)
</simpara></listitem></varlistentry>
<varlistentry id="re-with-loop-split"><term><code>(&re-with-loop-split;
(&var-r; &stream-r; &pattern-r; &key-amp; (&start-k; 0) &end-k;
&re-cflags-opts; &re-eflags-opts;) &body-amp; &body-r;)</code></term>
<listitem><simpara>Read lines from &stream-r;, split them with
&re-regexp-split; on &pattern-r;, and bind the resulting list to
&var-r;.</simpara></listitem></varlistentry>
<varlistentry id="re-cflags"><term>&re-cflags-opts;</term>
<listitem><simpara>These options control compilation of a pattern.
See ®ex; for their meaning.</simpara></listitem></varlistentry>
<varlistentry id="re-eflags"><term>&re-eflags-opts;</term>
<listitem><simpara>These options control execution of a pattern.
See ®ex; for their meaning.</simpara></listitem></varlistentry>
<varlistentry id="re-matcher"><term>&re-matcher;</term>
<listitem><simpara>A valid value for &apropos-matcher;.
This will work only when your &locale; is &utf-8;
because &clisp; uses &utf-8; internally and &posix; constrains
®ex; to use the current &locale;.</simpara></listitem></varlistentry>
</variablelist>
<example id="re-count-shell-users"><title>Count unix shell users</title>
<para>The following code computes the number of people who use a
particular shell:<programlisting language="lisp">
#!/usr/bin/clisp -C
(&use-package; "REGEXP")
(let ((h (make-hash-table :test #'equal :size 10)) (n 0))
(with-open-file (f (or (first &args;) "/etc/passwd"))
(with-loop-split (s f (s f (or (second &args;) ":")))
(incf (gethash (seventh s) h 0))))
(with-hash-table-iterator (i h)
(loop (multiple-value-bind (r k v) (i)
(unless r (return))
(format t "[~d] ~s~30t== ~5:d~%" (incf n) k v)))))
</programlisting></para>
<para>For comparison, the same (almost - except for the nice output formatting)
can be done by the following &perl;:<programlisting language="perl">
#!/usr/bin/perl -w
use diagnostics;
use strict;
my $IN = $ARGV[0] || "/etc/passwd";
open(INF,"≤ $IN") or die "$0: cannot read file [$IN]: $!\n";
my %hash;
while (≤INF≥) {
chop;
my @all = split($ARGV[1] || ":");
$hash{$all[6] || ""}++;
}
my $ii = 0;
for my $kk (keys(%hash)) {
print "[",++$ii,"] \"",$kk,"\" -- ",$hash{$kk},"\n";
}
close(INF);
</programlisting></para></example>
</section>
|