File: regexp.xml

package info (click to toggle)
clisp 1%3A2.49.20241228.gitc3ec11b-2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 57,724 kB
  • sloc: lisp: 124,909; ansic: 83,890; xml: 27,431; sh: 11,074; fortran: 7,307; makefile: 1,456; perl: 164; sed: 13
file content (161 lines) | stat: -rw-r--r-- 7,524 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
<?xml version="1.0" encoding="UTF-8"?>

<section id="regexp-mod"><title>POSIX Regular Expressions</title>

<para>The &regexp-pac; module implements the &posix;
 <ulink url="regexp.html">regular expressions</ulink>
matching by calling the standard &c-lang; system facilities.
The syntax of these &reg-exp;s is described in many places,
such as your local &regex; manual and &emacs; info pages.</para>
&base-module;
<simpara>When this module is present, &features-my;
 contains the symbol <constant>:REGEXP</constant>.</simpara>

<variablelist id="regexp-api"><title>Regular Expression API</title>
<varlistentry id="re-match"><term><code>(&re-match; &pattern-r;
   &string-r; &key-amp; (&start-k; 0) &end-k; &re-cflags-opts;
   &re-eflags-opts;)</code></term>
 <listitem><para>This macro returns as first value a &match-t; structure
  containing the indices of the start and end of the first match for the
  &reg-exp; &pattern-r; in &string-r;; or no values if there is no match.
  Additionally, a &match-t; structure is returned for every matched
  <literal>"\(...\)"</literal> group in &pattern-r;, in the
  order that the open parentheses appear in &pattern-r;.
  If &start-r; is non-&nil;, the search starts at that index in &string-r;.
  If &end-r; is non-&nil;, only <code>(&subseq; &string-r; &start-r;
  &end-r;)</code> is considered.
  <example id="ex-re-match"><title>&re-match;</title>
   <programlisting>
(&re-match; "quick" "The quick brown fox jumped quickly.")
<computeroutput>#S(&match-t; :START 4 :END 9)</computeroutput>
(&re-match; "quick" "The quick brown fox jumped quickly." :start 8)
<computeroutput>#S(&match-t; :START 27 :END 32)</computeroutput>
(&re-match; "quick" "The quick brown fox jumped quickly." :start 8 :end 30)
<computeroutput>&nil;</computeroutput>
(&re-match; "\\([a-z]*\\)[0-9]*\\(bar\\)" "foo12bar")
<computeroutput>#S(&match-t; :START 0 :END 8)</computeroutput> ;
<computeroutput>#S(&match-t; :START 0 :END 3)</computeroutput> ;
<computeroutput>#S(&match-t; :START 5 :END 8)</computeroutput>
</programlisting></example>
</para></listitem></varlistentry>

<varlistentry id="re-match-access">
 <term><code>(&re-match-start; &match-r;)</code></term>
 <term><code>(&re-match-end; &match-r;)</code></term>
 <listitem><simpara>Return the start and end the &match-r;; &setf;-able.
</simpara></listitem></varlistentry>

<varlistentry id="re-match-string">
 <term><code>(&re-match-string; &string-r; &match-r;)</code></term>
 <listitem><simpara>Extracts the substring of &string-r; corresponding
   to the given pair of start and end indices of &match-r;.
   The result is shared with &string-r;.
   If you want a &fresh; &string-t;, use &copy-seq; or
   &coerce; to &simple-string-t;.</simpara></listitem></varlistentry>

<varlistentry id="re-regexp-quote"><term><code>(&re-regexp-quote;
   &string-r; &optional-amp; <replaceable>extended</replaceable>)</code></term>
 <listitem><para>This function returns a &reg-exp; &string-t;
  that matches exactly &string-r; and nothing else.
  This allows you to request an exact string match when calling a
  function that wants a &reg-exp;.
  <example id="ex-re-quote"><title>&re-regexp-quote;</title>
   <programlisting language="lisp">
(regexp-quote "^The cat$")
<computeroutput>"\\^The cat\\$"</computeroutput></programlisting></example>
  One use of &re-regexp-quote; is to combine an exact string match with
  context described as a &reg-exp;.
  When <replaceable>extended</replaceable> is non-&nil;, also
  quote <keysym>#\+</keysym> and <keysym>#\?</keysym>.
</para></listitem></varlistentry>

<varlistentry id="re-regexp-compile"><term><code>(&re-regexp-compile;
   &string-r; &key-amp; &re-cflags-opts;)</code></term>
 <listitem><simpara>Compile the &reg-exp; &string-r; into an
  object suitable for &re-regexp-exec;.</simpara></listitem></varlistentry>

<varlistentry id="re-regexp-exec"><term><code>(&re-regexp-exec;
   &pattern-r; &string-r; &key-amp; &ret-type-k;
   (&start-k; 0) &end-k; &re-eflags-opts;)</code></term>
 <listitem><simpara>Execute the &pattern-r;, which must be a compiled
   &reg-exp; returned by &re-regexp-compile;, against the appropriate
   portion of the &string-r;.</simpara>
  <simpara>Returns &match-t; structures as &mul-val; (one for each
   subexpression which successfully matched and one for the whole pattern).
  </simpara><simpara>If &ret-type-k; is &list-t; (or &vector-t;), the
   &match-t; structures are returned as a &list-t; (or a &vector-t;) instead.
   Also, if there are more than &multiple-values-limit; &match-t; structures
   to return, a &list-t; is returned instead of &mul-val;.
   If &ret-type-k; is &boolean-t;, return &t; or &nil; as an indicator
   of success or failure, but do &not-e; allocate anything.
</simpara></listitem></varlistentry>

<varlistentry id="re-regexp-split"><term><code>(&re-regexp-split;
     &pattern-r; &string-r; &key-amp; (&start-k; 0) &end-k;
     &re-cflags-opts; &re-eflags-opts;)</code></term>
 <listitem><simpara>Return a list of substrings of &string-r; (all
  sharing the structure with &string-r;) separated by &pattern-r; (a
  &reg-exp; &string-t; or a return value of &re-regexp-compile;)
 </simpara></listitem></varlistentry>

<varlistentry id="re-with-loop-split"><term><code>(&re-with-loop-split;
     (&var-r; &stream-r; &pattern-r; &key-amp; (&start-k; 0) &end-k;
     &re-cflags-opts; &re-eflags-opts;) &body-amp; &body-r;)</code></term>
 <listitem><simpara>Read lines from &stream-r;, split them with
  &re-regexp-split; on &pattern-r;, and bind the resulting list to
  &var-r;.</simpara></listitem></varlistentry>

<varlistentry id="re-cflags"><term>&re-cflags-opts;</term>
 <listitem><simpara>These options control compilation of a pattern.
   See &regex; for their meaning.</simpara></listitem></varlistentry>

<varlistentry id="re-eflags"><term>&re-eflags-opts;</term>
 <listitem><simpara>These options control execution of a pattern.
   See &regex; for their meaning.</simpara></listitem></varlistentry>

<varlistentry id="re-matcher"><term>&re-matcher;</term>
 <listitem><simpara>A valid value for &apropos-matcher;.
   This will work only when your &locale; is &utf-8;
   because &clisp; uses &utf-8; internally and &posix; constrains
   &regex; to use the current &locale;.</simpara></listitem></varlistentry>

</variablelist>

<example id="re-count-shell-users"><title>Count unix shell users</title>

<para>The following code computes the number of people who use a
 particular shell:<programlisting language="lisp">
#!/usr/bin/clisp -C
(&use-package; "REGEXP")
(let ((h (make-hash-table :test #'equal :size 10)) (n 0))
  (with-open-file (f (or (first &args;) "/etc/passwd"))
    (with-loop-split (s f (s f (or (second &args;) ":")))
      (incf (gethash (seventh s) h 0))))
  (with-hash-table-iterator (i h)
    (loop (multiple-value-bind (r k v) (i)
            (unless r (return))
            (format t "[~d] ~s~30t== ~5:d~%" (incf n) k v)))))
</programlisting></para>

<para>For comparison, the same (almost - except for the nice output formatting)
can be done by the following &perl;:<programlisting language="perl">
#!/usr/bin/perl -w

use diagnostics;
use strict;

my $IN = $ARGV[0] || "/etc/passwd";
open(INF,"&le; $IN") or die "$0: cannot read file [$IN]: $!\n";
my %hash;
while (&le;INF&ge;) {
  chop;
  my @all = split($ARGV[1] || ":");
  $hash{$all[6] || ""}++;
}
my $ii = 0;
for my $kk (keys(%hash)) {
  print "[",++$ii,"] \"",$kk,"\"  -- ",$hash{$kk},"\n";
}
close(INF);
</programlisting></para></example>
</section>