File: usage.html

package info (click to toggle)
libstrscan-ruby 0.6.5-1
  • links: PTS
  • area: main
  • in suites: woody
  • size: 156 kB
  • ctags: 201
  • sloc: ruby: 865; ansic: 524; makefile: 57
file content (191 lines) | stat: -rw-r--r-- 4,848 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset="us-ascii"">
  <title>Usage</title>
</head>
<body bgcolor="#FFFFFF" >


<h1>Usage</h1>
<hr>

<h2>WARNING!!!</h2>
<p>
This library includes both Ruby and C versions of StringScanner.  Since the two
classes are completely different, please read this whole page before using them.
</p>

<h2>Purpose of StringScanner</h2>
<p>
StringScanner is a Ruby extension for fast scanning.
</p>
<p>
Since Ruby's Regexp class cannot perform sub-string matches, scanning a
sub-string requires first making a new String.  For example
</p>

<blockquote><pre>
p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )
</pre></blockquote>

<p>
This code will display "nil".  Another way to match it is like this:
</p>

<blockquote><pre>
str = " word word word"
while str.size > 0 do
  if /\A[ \t]+/ === str then
    str = $'
  elsif /\A\w+/ === str then
    str = $'
  end
end
</pre></blockquote>

<p>
But this method has a big performace problem.  $' makes a new string EVERY time.
So, in the above example, all these strings are created:
</p>

<blockquote><pre>
" word word word"
"word word word"
" word word"
"word word"
" word"
"word"
""
</pre></blockquote>

<p>
This results in a heavy load.  If the length of 'str' is 50KB, nearly 50KB ** 2
/ 5 = 50MB of memory is used.
</p>
<p>
StringScanner resolves this problem.<br>
StringScanner has a C string and a pointer to it. When scanning, StringScanner
will only increment the pointer, so no new strings are created.
As a result, speed will increase and memory usage will decrease.
</p>


<h3>Simple examples and methods</h3>
<p>
Here are two short examples of scanning routines.<br>
The first one is easy to write but performs quite poorly.  The second is still
easy to write, but is FAST thanks to the code in the StringScanner class.
</p>
<p>
First example:
</p>

<blockquote><pre>
ATOM = /\A\w+/
SPACE = /\A[ \t]+/

while str.size > 0 do
  if ATOM === str then
    str = $'
    return $&
  elsif SPACE === str then
    str = $'
    return $&
  end
end
</pre></blockquote>

<p>
Second example:
</p>

<blockquote><pre>
ATOM = /\A\w+/
SPACE = /\A[ \t]+/

s = StringScanner.new( str )
while s.rest? do
  if tmp = s.scan( ATOM ) then
    return tmp
  elsif tmp = s.scan( SPACE ) then
    return tmp
  end
end
</pre></blockquote>

<p>
The usage of StringScanner is simple.<br>
First: Create a StringScanner object.  Next, call the 'scan' method.  It returns
the matched string and at the same time increments its internally maintained
"scan pointer".  This is implemented using a pointer to char(char*).<br>
The 'skip' method is similar to 'scan', but returns the length of the matched
string.
</p>

<blockquote><pre>
s = StringScanner.new( "abcdefg" )   # scan pointer is on 'a', index 0
puts s.scan( /a/ )        # returns 'a'. scan pointer is on 'b', index 1
puts s.skip( /bc/ )       # returns 2. scan pointer is on 'd', index 3
</pre></blockquote>

<p>
After calling 'scan' or 'skip', the previous "scan pointer" is preserved in the
StringScanner object.  So, str[ prev pointer..current pointer ] is the "matched
string" (the string returned from 'scan') -- we can get it by calling the
'matched' method.  Here's an example:
</p>

<blockquote><pre>
puts s.matched            # returns 'bc'. scan pointer doesn't move
puts s.scan( /a/ )        # returns nil. again, scan pointer doesn't move.
puts s.matched            # returns 'bc'.
</pre></blockquote>

<p>
It is also possible to put the scan pointer back to its previous position. 
This can be accomplished by using the 'unscan' method.  However, 'unscan' can
only undo one 'scan' because the StringScanner object can only preserve one
"previous pointer" at a time.
</p>

<blockquote><pre>
puts s.scan( /de/ )       # returns 'de'. scan pointer is on 'f', index 5
s.unscan                  # scan pointer is on 'd', index 3
puts s.scan( /def/ )      # returns 'def'. scan pointer is on 'g', index 6
</pre></blockquote>

<p>
For more details, see the <a href="reference.html">reference manual</a>.
But of course the source code is the most inportant documentation, I think :-)
</p>


<h2>Ruby version of strscan</h2>
<p>
The Ruby version of StringScanner (StringScanner_R) resembles the C version, but
has these requirements:
</p>
<ul>
<li>\A must exist at the beginning of EVERY regexp when using scan, skip ...
<li>\A must NOT exist when using scan_until, skip_until ...
</ul>
<p>
This is troublesome, but there's no resolution to this problem.
</p>
<p>
If you only want to use the C version, simply put this in your code:
</p>

<blockquote><pre>
StringScanner.must_C_version
</pre></blockquote>

<hr><p>
Copyright (c) 1999-2001 Minero Aoki
<a href="mailto:aamine@loveruby.net"> &lt;aamine@loveruby.net&gt;</a>


</body>
</html>