1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
|
=begin
= Sary Ruby Binding Reference
== Search with Suffix Array
This example shows how to search filename for pattern and
sort the search results by occurrence order and print them
line by line.
#!/usr/local/bin/ruby
require 'sary'
pattern = ARGV.shift
ARGV.each {|filename|
searcher = Sary::Searcher.new(filename)
if searcher.search(pattern)
searcher.sort_occurrences
searcher.each_context_line {|text| print text, "\n" }
end
}
== Construction of Suffix Array
This example shows how to assign index points line by line.
#!/usr/local/bin/ruby
require 'sary'
filename = ARGV.shift
arrayname = filename + ".ary"
text = open(filename, "r")
array = open(arrayname, "w")
offset = 0
text.each {|line|
array.print [offset].pack('N')
offset += line.length
}
text.close
array.close
builder = Sary::Builder.new(filename, arrayname)
builder.block_sort
== Sary module
This module has Sary::Searcher class and Sary::Builder class.
== Searcher class
Searcher stands for suffix array searcher.
All instance methods are destructive except
Searcher#count_occurrences.
--- Searcher.new(filename[,arrayname])
Create Searcher object for filename and arrayname.
It handles search and its results. If arrayname is ommitted,
((|filename|)).ary is used.
Raise (({IOError})) if error.
--- Searcher#search(pattern)
Search for the pattern
Return (({true})) if success. Return (({false})) if failed.
--- Searcher#isearch(pattern, len)
This method does efficient incremental searchs. Each search
is performed for the range of the previous search results.
Call the function continuously with the same pattern and
increase len incrementally to the length of the pattern
to do incremental searchs.
((<Searcher#sort_occurrences>)) MUST not be used together.
Return (({true})) if success. Return (({false})) if failed.
--- Searcher#isearch_reset
Reset internal states stored for searcher_isearch.
To use searcher_isearch with another pattern again, you should
call this function beforehand.
--- Searcher#icase_search(pattern)
Do case-insensitive search for the pattern.
Return (({true})) if success. Return (({false})) if failed.
--- Searcher#multi_search(pattern_array)
Search for the patterns in the ((|pattern_array|)) at once.
--- Searcher#get_next_context_line
Get the next search result line by line
The all results can be retrieved by calling the
functions continuously. Return (({nil})) no more results.
--- Searcher#get_next_context_lines([bkwrd, frwrd])
Get the next search result as context lines.
(Before ((|bkwrd|)) lines, After ((|frwrd|)) lines)
The all results can be retrieved by calling the
functions continuously. Return (({nil})) no more results.
--- Searcher#each_context_line{|text| ...}
The iterator for retrieving search results line by line.
--- Searcher#each_context_lines([bkwrd, frwrd]){|text| ...}
The iterator for retrieving search results every context lines.
--- Searcher#get_next_context_region(start_tag, end_tag)
Get the next search result as tagged regions between
((|start_tag|)) and ((|end_tag|)) (including start_tag and end_tag).
The all results can be retrieved by calling the
functions continuously. Return (({nil})) no more results.
--- Searcher#get_offsets
Get the result as an Array of file offset values from the beginning
of file.
--- Searcher#get_line_by_offset(offset)
Get the result as a line of string by offset value.
--- Searcher#get_ranges
Get the result as an Array of range values (Range Object).
--- Searcher#get_line_by_range(range)
Get the result as a line of string by range of file.
((|range|)) is a Range Object.
--- Searcher#count_occurrences
Return the number of hits of the search.
--- Searcher#sort_occurrences
Sort the search results in occurrence order.
--- Searcher#enable_cache
Enable the cache engine. Cache the search results and reuse
them for the same pattern later.
== Builder class
Builder stands for suffix array maker.
--- Builder.new(filename[,arrayname])
Create Builder object for filename and arrayname.
It handles search and its results. If arrayname is ommitted,
((|filename|)).ary is used.
Raise (({IOError})) if error.
--- Builder#sort [{|task, current, total, is_finished|...}]
Sort a suffix array.
The block is used as a callback for printing a progress bar.
Raise (({RuntimeError})) if error.
--- Builder#block_sort [{|task, current, total, is_finished|...}]
Sort a suffix array by memory-saving block sorting.
The block is used as a callback for printing a progress bar.
Raise (({RuntimeError})) if error.
--- Builder#set_block_size(size)
Set the block size for ((<Builder#block_sort>)).
--- Builder#set_nthread(n)
Set the number of threads for ((<Builder#block_sort>)).
Performance will improve if your machine has two or more CPUs.
=end
$Id: Reference.en.rd,v 1.2 2005/03/29 04:20:50 knok Exp $
|