1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
|
REL(1) REL(1)
NNAAMMEE
rel - order the relevance of text documents to a search
criteria
SSYYNNOOPPSSIISS
rel [options] patterns paths ...
DDEESSCCRRIIPPTTIIOONN
Rel is a program that determines the relevance of text
documents to a set of keywords expressed in boolean infix
notation. The list of file names that are relevant are
printed to the standard output, in order of relevance. The
boolean operators supported are logical or, logical and,
and logical not. These operators are represented by the
symbols, "|", "&", and, "!", respectively, and left and
right parenthesis, "(" and ")", are used as the grouping
operators. The paths can be files and/or directories-if it
is a directory, the program will recursively descend into
the directory, searching all files and directories con-
tained in the directory.
For example, the command:
rel "(directory & listing)" /usr/share/man/cat1
(ie., find the order of relevance of all files that con-
tain both of the words "directory" and "listing" in the
catman directory) will list a few tens of files, out of
the hundreds of catman files, of which "ls.1" is the among
the most relevant-meaning that to find the command that
lists directories in a Unix system, the "literature
search" was reduced, on average, by about 98%, which is a
considerable expediency in relation to browsing through
the files in the directory. Although this example is
remedial, a similar expediency can be demonstrated in
searching for documents in email repositories and text
archives.
Additional applications include information robots, (ie.,
"mailbots," or "infobots,") where the disposition (ie.,
delivery, filing, or viewing,) of text documents can be
determined dynamically, based on the relevance of the doc-
ument to a set of criteria, framed in boolean infix nota-
tion. Or, in other words, the program can be used to
order, or rank, text documents based on a "context," spec-
ified in a general mathematical language, similar to that
used in calculators.
The words in the query are case insensitive, and either
upper or lower case can be used.
Associativity of operators is left to right, and the
precedence of operators is identical to 'C':
September 13, 1996 1
REL(1) REL(1)
precedence operator
high ! = not
middle & = and
lowest | = or
The operator symbols can be escaped with the "\" character
to include the symbol in a search pattern. The "escape
space" character sequence represents one or more instances
of space character(s) in search patterns, and each
instance will match one or more consecutive whitespace
characters, (as defined by isspace(3) in ctype.h and/or
locale.h,) and allows phrases to be searched for. The
"many to one" whitespace character translation occurs in
both the keyword arguments and the text document(s). Mul-
tiple consecutive instances of the "escape space" charac-
ter sequence in keyword search phrases should not be used,
and single instances are appropriate only when necessary
to specify a consecutive sequence of keywords-the logical
and operator is the preferred searching construct when
searching documents that contain set(s) of keywords.
Hyphenation issues are addressed by deleting hyphens and
any following sequence of instances of whitespace charac-
ters, (as defined by isspace(3),) in both the keyword
arguments and the text document(s).
Backspace character issues are addressed by overwriting
the character before the backspace with the character
after the backspace, which will instantiate the character
of the last instance of of consecutive backspace/character
combinations. This is specifically for catman pages which
utilize underscore/backspace/character combinations for
underlining, in addition to backspace/character combina-
tions for bold (overstrike,) representation-note that for
this process to be successful, a single underscore (used
for underlining,) must preceed a single character in the
sequence.
OOPPTTIIOONNSS
--vv Print the version and copyright banner of the pro-
gram.
WWAARRNNIINNGGSS
In the interest of performance, Memory is allocated to
hold the entire file to be searched. Large files may cre-
ate resource issues.
The "not" boolean operator, '!', can NOT be used to find
the list of documents that do NOT contain a keyword or
phrase, (unless used in conjunction with a preceeding
boolean construct that will syntactically define an inter-
mediate accept criteria for the documents.) The rationale
is that the relevance of a set of documents that do NOT
September 13, 1996 2
REL(1) REL(1)
contain a phrase or keyword is ambiguous, and has no mean-
ing-ie., how can documents be ordered that do not contain
something? Whether this is a bug, or not, depends on
one's point of view.
SSEEEE AALLSSOO
egrep(1), agrep(1)
DDIIAAGGNNOOSSTTIICCSS
Error messages for illegal or incompatible search pat-
terns, for non-regular, missing or inaccessible files and
directories, or for (unlikely) memory allocation failure,
and signal errors.
AAUUTTHHOORRSS
----------------------------------------------------------------------
A license is hereby granted to reproduce this software source code and
to create executable versions from this source code for personal,
non-commercial use. The copyright notice included with the software
must be maintained in all copies produced.
THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES
WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF
MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE
AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE
INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.
Copyright (c) 1995, 1996, John Conover, All Rights Reserved.
Comments and/or bug reports should be addressed to:
john@johncon.com (John Conover)
----------------------------------------------------------------------
September 13, 1996 3
|