File: usage.html

package info (click to toggle)
libstrscan-ruby 0.6.5-1
links: PTS
area: main
in suites: woody
size: 156 kB
ctags: 201
sloc: ruby: 865; ansic: 524; makefile: 57
file content (191 lines) | stat: -rw-r--r-- 4,848 bytes
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset="us-ascii"">
  <title>Usage</title>
</head>
<body bgcolor="#FFFFFF" >


<h1>Usage</h1>
<hr>

<h2>WARNING!!!</h2>
<p>
This library includes both Ruby and C versions of StringScanner.  Since the two
classes are completely different, please read this whole page before using them.
</p>

<h2>Purpose of StringScanner</h2>
<p>
StringScanner is a Ruby extension for fast scanning.
</p>
<p>
Since Ruby's Regexp class cannot perform sub-string matches, scanning a
sub-string requires first making a new String.  For example
</p>

<blockquote><pre>
p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )
</pre></blockquote>

<p>
This code will display "nil".  Another way to match it is like this:
</p>

<blockquote><pre>
str = " word word word"
while str.size > 0 do
  if /\A[ \t]+/ === str then
    str = $'
  elsif /\A\w+/ === str then
    str = $'
  end
end
</pre></blockquote>

<p>
But this method has a big performace problem.  $' makes a new string EVERY time.
So, in the above example, all these strings are created:
</p>

<blockquote><pre>
" word word word"
"word word word"
" word word"
"word word"
" word"
"word"
""
</pre></blockquote>

<p>
This results in a heavy load.  If the length of 'str' is 50KB, nearly 50KB ** 2
/ 5 = 50MB of memory is used.
</p>
<p>
StringScanner resolves this problem.<br>
StringScanner has a C string and a pointer to it. When scanning, StringScanner
will only increment the pointer, so no new strings are created.
As a result, speed will increase and memory usage will decrease.
</p>


<h3>Simple examples and methods</h3>
<p>
Here are two short examples of scanning routines.<br>
The first one is easy to write but performs quite poorly.  The second is still
easy to write, but is FAST thanks to the code in the StringScanner class.
</p>
<p>
First example:
</p>

<blockquote><pre>
ATOM = /\A\w+/
SPACE = /\A[ \t]+/

while str.size > 0 do
  if ATOM === str then
    str = $'
    return $&
  elsif SPACE === str then
    str = $'
    return $&
  end
end
</pre></blockquote>

<p>
Second example:
</p>

<blockquote><pre>
ATOM = /\A\w+/
SPACE = /\A[ \t]+/

s = StringScanner.new( str )
while s.rest? do
  if tmp = s.scan( ATOM ) then
    return tmp
  elsif tmp = s.scan( SPACE ) then
    return tmp
  end
end
</pre></blockquote>

<p>
The usage of StringScanner is simple.<br>
First: Create a StringScanner object.  Next, call the 'scan' method.  It returns
the matched string and at the same time increments its internally maintained
"scan pointer".  This is implemented using a pointer to char(char*).<br>
The 'skip' method is similar to 'scan', but returns the length of the matched
string.
</p>

<blockquote><pre>
s = StringScanner.new( "abcdefg" )   # scan pointer is on 'a', index 0
puts s.scan( /a/ )        # returns 'a'. scan pointer is on 'b', index 1
puts s.skip( /bc/ )       # returns 2. scan pointer is on 'd', index 3
</pre></blockquote>

<p>
After calling 'scan' or 'skip', the previous "scan pointer" is preserved in the
StringScanner object.  So, str[ prev pointer..current pointer ] is the "matched
string" (the string returned from 'scan') -- we can get it by calling the
'matched' method.  Here's an example:
</p>

<blockquote><pre>
puts s.matched            # returns 'bc'. scan pointer doesn't move
puts s.scan( /a/ )        # returns nil. again, scan pointer doesn't move.
puts s.matched            # returns 'bc'.
</pre></blockquote>

<p>
It is also possible to put the scan pointer back to its previous position. 
This can be accomplished by using the 'unscan' method.  However, 'unscan' can
only undo one 'scan' because the StringScanner object can only preserve one
"previous pointer" at a time.
</p>

<blockquote><pre>
puts s.scan( /de/ )       # returns 'de'. scan pointer is on 'f', index 5
s.unscan                  # scan pointer is on 'd', index 3
puts s.scan( /def/ )      # returns 'def'. scan pointer is on 'g', index 6
</pre></blockquote>

<p>
For more details, see the <a href="reference.html">reference manual</a>.
But of course the source code is the most inportant documentation, I think :-)
</p>


<h2>Ruby version of strscan</h2>
<p>
The Ruby version of StringScanner (StringScanner_R) resembles the C version, but
has these requirements:
</p>
<ul>
<li>\A must exist at the beginning of EVERY regexp when using scan, skip ...
<li>\A must NOT exist when using scan_until, skip_until ...
</ul>
<p>
This is troublesome, but there's no resolution to this problem.
</p>
<p>
If you only want to use the C version, simply put this in your code:
</p>

<blockquote><pre>
StringScanner.must_C_version
</pre></blockquote>

<hr><p>
Copyright (c) 1999-2001 Minero Aoki
<a href="mailto:aamine@loveruby.net"> &lt;aamine@loveruby.net&gt;</a>


</body>
</html>