1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset="us-ascii"">
<title>Usage</title>
</head>
<body bgcolor="#FFFFFF" >
<h1>Usage</h1>
<hr>
<h2>WARNING!!!</h2>
<p>
This library includes both Ruby and C versions of StringScanner. Since the two
classes are completely different, please read this whole page before using them.
</p>
<h2>Purpose of StringScanner</h2>
<p>
StringScanner is a Ruby extension for fast scanning.
</p>
<p>
Since Ruby's Regexp class cannot perform sub-string matches, scanning a
sub-string requires first making a new String. For example
</p>
<blockquote><pre>
p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )
</pre></blockquote>
<p>
This code will display "nil". Another way to match it is like this:
</p>
<blockquote><pre>
str = " word word word"
while str.size > 0 do
if /\A[ \t]+/ === str then
str = $'
elsif /\A\w+/ === str then
str = $'
end
end
</pre></blockquote>
<p>
But this method has a big performace problem. $' makes a new string EVERY time.
So, in the above example, all these strings are created:
</p>
<blockquote><pre>
" word word word"
"word word word"
" word word"
"word word"
" word"
"word"
""
</pre></blockquote>
<p>
This results in a heavy load. If the length of 'str' is 50KB, nearly 50KB ** 2
/ 5 = 50MB of memory is used.
</p>
<p>
StringScanner resolves this problem.<br>
StringScanner has a C string and a pointer to it. When scanning, StringScanner
will only increment the pointer, so no new strings are created.
As a result, speed will increase and memory usage will decrease.
</p>
<h3>Simple examples and methods</h3>
<p>
Here are two short examples of scanning routines.<br>
The first one is easy to write but performs quite poorly. The second is still
easy to write, but is FAST thanks to the code in the StringScanner class.
</p>
<p>
First example:
</p>
<blockquote><pre>
ATOM = /\A\w+/
SPACE = /\A[ \t]+/
while str.size > 0 do
if ATOM === str then
str = $'
return $&
elsif SPACE === str then
str = $'
return $&
end
end
</pre></blockquote>
<p>
Second example:
</p>
<blockquote><pre>
ATOM = /\A\w+/
SPACE = /\A[ \t]+/
s = StringScanner.new( str )
while s.rest? do
if tmp = s.scan( ATOM ) then
return tmp
elsif tmp = s.scan( SPACE ) then
return tmp
end
end
</pre></blockquote>
<p>
The usage of StringScanner is simple.<br>
First: Create a StringScanner object. Next, call the 'scan' method. It returns
the matched string and at the same time increments its internally maintained
"scan pointer". This is implemented using a pointer to char(char*).<br>
The 'skip' method is similar to 'scan', but returns the length of the matched
string.
</p>
<blockquote><pre>
s = StringScanner.new( "abcdefg" ) # scan pointer is on 'a', index 0
puts s.scan( /a/ ) # returns 'a'. scan pointer is on 'b', index 1
puts s.skip( /bc/ ) # returns 2. scan pointer is on 'd', index 3
</pre></blockquote>
<p>
After calling 'scan' or 'skip', the previous "scan pointer" is preserved in the
StringScanner object. So, str[ prev pointer..current pointer ] is the "matched
string" (the string returned from 'scan') -- we can get it by calling the
'matched' method. Here's an example:
</p>
<blockquote><pre>
puts s.matched # returns 'bc'. scan pointer doesn't move
puts s.scan( /a/ ) # returns nil. again, scan pointer doesn't move.
puts s.matched # returns 'bc'.
</pre></blockquote>
<p>
It is also possible to put the scan pointer back to its previous position.
This can be accomplished by using the 'unscan' method. However, 'unscan' can
only undo one 'scan' because the StringScanner object can only preserve one
"previous pointer" at a time.
</p>
<blockquote><pre>
puts s.scan( /de/ ) # returns 'de'. scan pointer is on 'f', index 5
s.unscan # scan pointer is on 'd', index 3
puts s.scan( /def/ ) # returns 'def'. scan pointer is on 'g', index 6
</pre></blockquote>
<p>
For more details, see the <a href="reference.html">reference manual</a>.
But of course the source code is the most inportant documentation, I think :-)
</p>
<h2>Ruby version of strscan</h2>
<p>
The Ruby version of StringScanner (StringScanner_R) resembles the C version, but
has these requirements:
</p>
<ul>
<li>\A must exist at the beginning of EVERY regexp when using scan, skip ...
<li>\A must NOT exist when using scan_until, skip_until ...
</ul>
<p>
This is troublesome, but there's no resolution to this problem.
</p>
<p>
If you only want to use the C version, simply put this in your code:
</p>
<blockquote><pre>
StringScanner.must_C_version
</pre></blockquote>
<hr><p>
Copyright (c) 1999-2001 Minero Aoki
<a href="mailto:aamine@loveruby.net"> <aamine@loveruby.net></a>
</body>
</html>
|