1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196
|
-*- mode: outline -*-
* Introduction
This is a Ruby extension module to use SUFARY. The specification of
this library follows the specification of the Perl module for SUFARY.
* Usage
Include the next line into your script.
require 'sufary'
* Sufary Class
This is a class to access SUFARY files. The usage of this class is
similar to the usage of the Perl module for SUFARY. Each get_all_xxx
method can be used as an iterator. The length and lookup methods are
added.
**Super Class
Object
**Class Method
***new(text_file[, array_file])
***open(text_file[, array_file])
Creates a SUFARY object from text_file and array_file. If array_file
is omitted, text_file+'.ary' is used as the default name.
**Method
***reopen(text_file[, array_file])
Changes a SUFARY file of a SUFARY object.
***close
Closes a SUFARY file.
***init
Makes search area the whole text.
***search0(key)
Additional search. Searches the string key and returns the number of
found strings.
***search(key)
Normal search. Searches the string key and returns the number of
found strings. Equivalent with init + search0.
***self[key]
***line(num)
Returns the line including numth searched result. Nil if num is
invalid.
***lid(num)
Returns the ID number of the line including numth searched result.
The ID number of the line is the starting point in the file. Nil if
num is invalid.
***get_all_pos
***get_all_pos {...}
Returns an array of the positions of matched texts. If a block is
given, each position will be a parameter of it.
***get_all_line
***get_all_line {...}
Returns an array of the matched lines. If a block is given, each line
will be a parameter of it.
***get_all_lid
***get_all_lid {...}
Returns an array of the ID numbers of matched lines. If a block is
given, each ID number will be a parameter of it.
***getstr(idx, len)
Returns a string which starts from the idxth letter in the text and
whose length is len.
***id2line(idx)
Returns the line which includes the idxth letter in the text.
***get_block(idx, d1, d2)
Returns the block which includes the idxth letter in the text and
starts the string d1 and ends the string d2.
***pid2lid(idx)
Returns an ID of the line which includes the idxth letter in the text.
***common_prefix(key, sep)
Common Prefix Search is done. The searching key is key and the
separator is sep. Returns an array of the positions of matched texts.
If a block is given, each position will be a parameter of it.
***length
***size
Returns the number of the previous match.
***lookup(key)
Searches key and returns an array of the matched lines. If a block is
given, each matched line will be a parameter of it.
Let s be a SUFARY object. Then,
s.lookup(key).each { |l| print l, "\n" }
and
s.search(key); s.get_all_pos.sort.each { |l| print s.id2line(l), "\n"}
is equivalent.
* Did饹
This is a class to access DID files. The usage of this class is
similar to the usage of the Perl module for SUFARY.
**Super Class
Object
**Class Method
***new(docid_file)
***open(docid_file)
Creates a DID object from docid_file.
***reopen(text_file[, array_file])
Changes a DoID file of a DID object.
***close
Closes a DocID file.
***did_size
***length
***size
Returns the number of articles in the DocID file.
***didsearch(idx)
***search(idx)
Searches an article which includes the idxth letter of the text and
returns a list which consist of the number of the article, the start
position of it, and the length of it. Nil if no article is matched.
* Sample Program
** Basic Search
#!/usr/bin/ruby
require 'sufary'
x = Sufary.new("sample.txt")
nx = x.search("History")
print "FOUND #{nx}\n"
for i in 0..nx-1
print "[#{i}]\t#{x.line(i)}\n"
print "\tThe above line starts from the #{x.lid(i)} letter.\n";
end
x.get_all_pos do |i|
print ">>>\t#{x.id2line(i)}\n";
print "\tThe above line starts from the #{x.pid2lid(i)} letter.\n";
print "\tThe keyword starts from the #{i} letter\n";
print "\tThe above line is included the following article\n",x.get_block(i,'<DOC>','</DOC>'),"\n";
end
x.close
**Search using DocID
#!/usr/bin/ruby
require 'sufary'
x = Sufary.new("sample.txt")
d = Did.new("sample.txt.did")
# AND search
check = []
x.search("Nature")
x.get_all_pos do |i|
doc_no, start, size = *d.didsearch(i)
check[doc_no] = true if doc_no
end
x.search("History")
x.get_all_pos do |i|
doc_no, start, size = *d.didsearch(i)
if doc_no and check[doc_no]
print x.getstr(start, size), "\n"
check[doc_no] = false
end
end
d.close
x.close
**One liner
#!/usr/bin/ruby
require 'sufary'
Sufary.new("foo").lookup("bar") { |l| print l, "\n" }.close
|