File: changes.html

package info (click to toggle)
swish-e 1.1-1
  • links: PTS
  • area: main
  • in suites: hamm, potato, slink
  • size: 380 kB
  • ctags: 340
  • sloc: ansic: 4,540; makefile: 77; sh: 12
file content (436 lines) | stat: -rw-r--r-- 14,990 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>
<HEAD>

<TITLE>SWISH Bug Fixes and Enhancements - Digital Library SunSITE</TITLE>

</HEAD>
<BODY VLINK ="#FF0000" LINK="#FF0000" ALINK="#FF0000" BGCOLOR="#FFFFFF">

<P ALIGN="CENTER"><A HREF="/cgi-bin/imagemap/newhead">
<IMG BORDER="0" ALT="Berkeley Digital Library SunSITE" 
SRC="/Images/newhead.gif" WIDTH="510" HEIGHT="50" ISMAP></A></P>

<P ALIGN="CENTER">
<A HREF="/SWISH-E/"><IMG ALT="SWISH-E" WIDTH="112"
HEIGHT="49" BORDER="0" SRC="/Images/swish-e.gif"></A><BR><IMG
SRC="/Images/swishbanner2.gif"></P>
<P ALIGN="CENTER"><IMG ALT="" SRC="/Images/dotrule1.gif"></P>

<H1 ALIGN="CENTER">SWISH Bug Fixes and 
Enhancements</H1>

<H3>Bug Fixes</H3>

<P>The following bugs have been fixed in SWISH-E:</P>
<DL>
<DT>Wild card *
<DD>  problem before fix: in a multiple words search, the results varied 
with the
  position of the term containing the asterisk in the query.

<DT>Merge option -M
  <DD>problem before fix: the created merged file was not in the right 
format,
  consequently any search on that index would cause swish to hang.

<DT>Unary operator "not"
  <DD>problem before fix: unreliable results

<DT>Explicit nested boolean
 <DD> problem before fix: urnreliable results
</DL>

<H3>New Features</H3>

<PRE>
- Ignore specified char's when in final position.
--------------------------------------------------
  It is sometimes convenient that certain char's are treated as normal
char when in the middle of a word while they are disregarded when in final
position. To exercise this option there should be in the config.h file
the following lines:

#define IGNORELAST 1
#define IGNORELASTCHAR "&lt;list of char&gt;"

For example if "." is listed in the IGNORELASTCHAR variable, words
will be indexed as follows:
Word            Indexed as
z39.50          z39.50
z39.50.         z39.50

  There is to note that the char's that are listed in the IGNORELASTCHAR
variable need also to be listed in the ENDCHARS variable, otherwise
the word is discarded as invalid. The char's in the list are written
in sequence within the quotes with no separation between them.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- Common removed words printing
-------------------------------
  This new swish version automatically prints out all the words that
are not indexed as too common according to the limits set in the PLIMIT 
and FLIMIT variables in the config.h file. 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

- META data tag support
_______________________
  It is now possible to search in META tags for names associated to a
particular metaName.

  There are two ways to associate a word to a metaName:

1) &lt;META NAME="metaName" CONTENT="words"&gt; the usual HTML tag used 
within &lt;HEAD&gt;&lt;/HEAD&gt;

2) &lt;!--META START NAME="metaName" --&gt;
    some text of any length
   &lt;!--META END --&gt;

In this way it is possible to mark pretty much any part of the text; please 
note, however, that the words associated to metaNames are not searchable 
in a plain search.

NOTE: Nested or overlapping META tags are not allowed and will lead to
unpredictable search results.

Step by Step indexing and search:
---------------------------------
Indexing:
In the user configuration file a new variable containing the metaNames
that will be used in the files (see user config file example at the
end of this doc); after adding the list of metaNames values to the
file, indexing proceeds as usual:
%swish-e -c &lt;config.file&gt;

If during indexing a metaName specified in a file is not listed in the
config.file, the user has the choice of having SWISH-E either aborting the
indexing with an error, or issuing a warning stating the metaName not in
the config.file and the file that contains it and continuing the index
construction, in which case the words are not associated to any metaName.
To exercise this choice, set the variable OKNOMETA in the conifig.h file
(see config.h file example at the end). 

Meta names are case insensitive, so they can be written with any
combination of upper and lower cases.

Search:
The search query has a slightly different syntax and is of the kind:
%swish-e -w "metaName = word" -f &lt;index.file&gt;

The equal sign indicates the presence of a metaName and the search
results are all the file where the META tag with NAME="metaName" has
CONTENT="word" (or where "word" is contained in the area marked by the
&lt;!--META START...&gt; and &lt;!--META END..&gt; tags).  

It is not necessary to have spaces at either side of the '=',
consequently the following are equivalent:
%swish-e -w "metaName = word" -f &lt;index.file&gt;
%swish-e -w "metaName=word" -f &lt;index.file&gt;
%swish-e -w "metaName= word" -f &lt;index.file&gt;

To search on a word that contain a '=', have a '/' precede the '=':
%swish-e -w "test/=3 = x/=4 or y/=5" -f &lt;index.file&gt;
this query returns the files where the word "x=4" is associated with
the metaName "test=3" or that contain the word "y=5" not associated
with any metaName.

Queries can be also constructed using any of the usual search features,
moreover metaName and plain search can be mixed in a single query.
e.g.
%swish-e -w "metaName1 = (a1 or a4) not (a3 and a7)"  -f yyy

This query will retrieve all the files in which the "metaName1" is
associated either with "a1" or "a4" and that do not contain the words
"a3" and "a7", where "a3" and "a7" are not associated to any meta
name.

###################################################################

config.h  example
-----------------
/*
** SWISH Default Configuration File
**
** Kevin Hughes, kevinh@eit.com 
** 3/11/94
**
** Two variables added IGNORELAST and IGNORELASTCHAE
**        G. Hill 3/12/97 ghill@library.berkeley.edu
**
**
** Added OKNOMETA to allow no failing in case the META name is
** not listed in the config.h
**        G. Hill 4/15/97 ghill@library.berkeley.edu
**
** The following are user-definable options that you can change
** to fine-tune SWISH's default options.
*/

/* #define NEXTSTEP */

/* You may need to define this if compiling on a NeXTstep machine.
*/

#define INDEXPERMS 0644

/* After SWISH generates an index file, it changes the permissions
** of the file to this mode. Change to the mode you like
** (note that it must be an octal number). If you don't want
** permissions to be changed for you, comment out this line.
*/

#define PLIMIT 80
#define FLIMIT 256

/* SWISH uses these parameters to automatically mark words as
** being too common while indexing. For instance, if I defined PLIMIT
** as 80 and FLIMIT as 256, SWISH would define a common word as
** a word that occurs in over 80% of all indexed files and over
** 256 files. Making these numbers lower will most likely make your
** index files smaller. Making PLIMIT and FLIMIT small will also
** ensure that searching consumes only so much CPU resources.
*/

#define VERBOSE 2

/* You can define VERBOSE to be a number from 0 to 3. 0 is totally
** silent operation; 3 is very verbose.
*/

#define MAXHITS 500

/* MAXHITS is the maximum number of results to return from a search.
*/

#define DEFAULT_RULE AND_RULE

/* If a list of search words is specified without booleans,
** SWISH will assume they are connected by a default rule.
** This can be AND_RULE or OR_RULE.
*/

#define TITLETOPLINES 12

/* This is how many lines deep SWISH will look into an HTML file to
** attempt to find a &lt;TITLE&gt; tag.
*/

#define EMPHASIZECOMMENTS 0

/* Normally, words within HTML comments are not assigned a higher
** relevance rank. If you're including keywords in comments
** define this as 1 so matching results will rise to the top
** of search results.
*/

#define MINWORDLIMIT 2

/* This is the minimum length of a word. Anything shorter will not
** be indexed.
*/

#define MAXWORDLIMIT 40

/* This is the maximum length of a word. Anything longer will not
** be indexed.
*/

#define ASCIIENTITIES 1

/* If defined as 1, all entities in search words and indexed
** words will be converted to an ASCII equivalent. For instance,
** with this feature you can index the word "resum&eacute;" or
** "resum&#233;" and it will be indexed as the word "resume".
** If defined as 0, only numerical entities will be converted
** to named entities, if they exist.
*/

#define IGNOREALLV 0
#define IGNOREALLC 0
#define IGNOREALLN 0

/* If IGNOREALLV is 1, words containing all vowels won't be indexed.
** If IGNOREALLC is 1, words containing all consonants won't be indexed.
** If IGNOREALLN is 1, words containing all digits won't be indexed.
** Define as 0 to allow words with consistent characters.
** Vowels are defined as "aeiou", digits are "0123456789".
*/

#define IGNOREROWV 6
#define IGNOREROWC 8
#define IGNOREROWN 7

/* IGNOREROWV is the maximum number of consecutive vowels a word can have.
** IGNOREROWC is the maximum number of consecutive consonants a word can have.
** IGNOREROWN is the maximum number of consecutive digits a word can have.
** Vowels are defined as "aeiou", digits are "0123456789".
*/

#define IGNORESAME 15

/* IGNORESAME is the maximum times a character can repeat in a word.
*/

#define WORDCHARS "abcdefghijklmnopqrstuvwxyz=&#;0123456789.@\|/-"

/* WORDCHARS is a string of characters which SWISH permits to
** be in words. Any strings which do not include these characters
** will not be indexed. You can choose from any character in
** the following string:
**
** abcdefghijklmnopqrstuvwxyz0123456789_\|/-+=?!@$%^'\"`~,.[]{}()
**
** Note that if you omit "0123456789&#;" you will not be able to
** index HTML entities. DO NOT use the asterisk (*), lesser than
** and greater than signs (&lt;), (&gt;), or colon (:).
**
** Including any of these four characters may cause funny things to happen.
** If you have a pressing need to index 8-bit characters, please contact
** me for possible user testing in the future.
**
** Also note that if you specify the backslash character (\) or
** double quote (") you need to type a backslash before them to
** make the compiler understand them.
*/

#define BEGINCHARS "abcdefghijklmnopqrstuvwxyz&0123456789"

/* Of the characters that you decide can go into words, this is
** a list of characters that words can begin with. It should be
** a subset of (or equal to) WORDCHARS.
*/

#define ENDCHARS "abcdefghijklmnopqrstuvwxyz;0123456789,."

/* This is the same as BEGINCHARS, except you're testing for
** valid characters at the ends of words.
*/

/* Note that if you really want to edit the default stopwords, (words
** that are deemed too common to be indexed) then you can do so in the
** file "swish.h". They don't have to be in alphabetical order.
*/

#define IGNORELAST 1

/* Variable that, if set to 1, will cause IGNORELASTCHAR to be direguared
** when in the final position in a word. This variable was introduced to solve
** the z39.50 problem - to have certain char valid in the middle of a sentence,
** but disreguarded when at the end  i.e. period. Defaults is false.
*/

#define IGNORELASTCHAR ".,"

/* Array that contains the char that, if considered valid in the middle of 
** a word need to be disreguarded when at the end. It is important to also
** set the given char's in the ENDCHARS array, otherwise the word will not
** be indexed because considered invalid.
*/


#define OKNOMETA 1
/* Variable that define if it is ok to fail in case the META name is not listed
** in the METANAMES variable. Value of 1 will cause the word to be listed as a
** regular words with no metaName attached, and only a warning listing the
** the meta name and the file in which it was found is issued.
*/

#define INDEXTAGS 0

/* Normally, all data in tags in HTML files (except for words in
** comments) is ignored. If you want to index HTML files with the
** text within tags and all, define this to be 1 and not 0.
*/

######################################################################

User configuration file example
--------------------------------

# Sample SWISH configuration file
# Kevin Hughes, kevinh@eit.com, 3/11/95
#
# Added MetaNames variable to support META tags
# G.Hill ghill@library.berkeley.edu 4/97

IndexDir /home/ghill/swish/dir5/records
# This is a space-separated list of files and
# directories you want indexed. You can specify
# more than one of these directives.

IndexFile /home/ghill/swish/dir5/myindex5
# This is what the generated index file will be.

MetaNames NaMe1 nAme2
# List of metaNames used in the files to index; names
# are case insensitive.

IndexName "Improvement index"
IndexDescription "This is an index to test bug fixes in swish." 
IndexPointer "http://xxxx"
IndexAdmin "Name, (e-mail address)"
# Extra information you can include in the index file.

IndexOnly .html
# Only files with these suffixes will be indexed.

IndexReport 3
# This is how detailed you want reporting. You can specify numbers
# 0 to 3 - 0 is totally silent, 3 is the most verbose.

FollowSymLinks no
# Put "yes" to follow symbolic links in indexing, else "no".

NoContents .gif .xbm .au .mov .mpg .pdf .ps
# Files with these suffixes will not have their contents indexed -
# only their file names will be indexed.

#ReplaceRules replace "/home/cleita/public_html/index/links" "http://sunsite.berkeley.edu/InternetIndex/Data"
# ReplaceRules allow you to make changes to file pathnames
# before they're indexed.

FileRules pathname contains admin testing demo trash construction confidential
FileRules filename contains # % ~ .bak .orig .old old.
FileRules title contains construction example pointers
FileRules directory contains .htaccess
# Files matching the above criteria will *not* be indexed.

IgnoreLimit 50 1000
# This automatically omits words that appear too often in the files
# (these words are called stopwords). Specify a whole percentage
# and a number, such as "80 256". This omits words that occur in
# over 80% of the files and appear in over 256 files. Comment out
# to turn of auto-stopwording.

#IgnoreWords SwishDefault
# The IgnoreWords option allows you to specify words to ignore.
# Comment out for no stopwords; the word "SwishDefault" will
# include a list of default stopwords. Words should be separated by spaces
# and may span multiple directives.



</PRE>

<P ALIGN="CENTER">
<IMG ALT="" WIDTH="470" HEIGHT="10"
SRC="/Images/dotrule.gif"></P>
<P ALIGN="CENTER">
SWISH is Copyright &copy; 1989, 1991 Free Software Foundation, Inc. <BR> 
59 Temple Place - Suite 330, Boston, MA  02111-1307, USA
<BR>SWISH-E is distributed with <B>no warranty</B> under the terms of the <A 
HREF="http://www.fsf.org/copyleft/gpl.html">GNU Public License</A>.<BR> 
Public questions may be posted to 
the <A HREF="mailto:swish-e@sunsite.berkeley.edu">SWISH-E Discussion</A>.

<BR>Document maintained at http://sunsite.berkeley.edu/SWISH-E/changes.html
by the SunSITE Manager.
<BR>Last update 8/12/97. SunSITE Manager:
<A HREF="mailto:manager@sunsite.berkeley.edu">
manager@sunsite.berkeley.edu</A></P>
</BODY>
</HTML>