File: README

package info (click to toggle)
swish++ 1.1b3-3
  • links: PTS
  • area: main
  • in suites: slink
  • size: 416 kB
  • ctags: 409
  • sloc: ansic: 2,842; makefile: 247; sh: 48
file content (91 lines) | stat: -rw-r--r-- 2,945 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
	The improvements over SWISH-E:
	------------------------------

	1. 8-10 times faster at indexing.  It achieves this speed by using:
		a) mmap(2) instead of stdio to read files
		b) very little explicit dynamic memory allocation
		c) more inlining and fewer function calls in inner loops
		d) better data structures and algorithms by virtue of
		   using STL (The C++ Standard Template Library), e.g.,
		   maps rather than linked lists

	2. Better results format of:

		rank path_name file_size file_title

	   By placing the file_title, which may contain spaces, last,
	   you can easily parse it, e.g.:

		($rank,$path,$size,$title) = split( / /, $_, 4 );

	3. Automatically splits and remerges large file sets.

	4. Parses hexadecimal numeric character entity references of
	   the form "&xhhh;" in addition to decimal ones.

	5. A separate text-extraction utility is included to assist in
	   indexing non-text files.  It can automatically uncompress
	   and recompress files on the fly.

	6. Searches are practically instantaneous because the index
	   file is mmap(2)'ed and binary-searchable immediately.

	7. The source code is smaller and more clearly written with
	   lots of comments including references to other works.  (The
	   source for SWISH-E is rather amateurishly written.)

	8. Everything is fully documented including the index file
	   format.


	Things not implemented:
	-----------------------

	Note: I wrote SWISH++ to solve my immediate indexing problems;
	therefore, I implemented only those features useful to me.  If
	others can also benefit from the work, great.  I may implement
	other features as time permits.

	1. META tags.  I didn't have a need for them.

	2. Configuration files.  SWISH++ allows everything to be
	   specified on the command-line.  (If you don't like doing a
	   lot of repetative typing, use a Makefile or cron job.)  The
	   code to parse configuration files simply isn't worth adding.

		IgnoreWords: I may add the ability to specify a file.

		IgnoreLimit: Use -p and -f

		IndexFile, IndexDir: Use the command line.

		IndexOnly: Use the -e option.

		IndexReport: Use the -v option.

		IndexName, IndexDescription, IndexPointer, IndexAdmin:
			These serve no purpose as far as I'm concerned.

		FollowSymLinks: Use the -l option.

		NoContents: Extensions must be specified explicitly.

		FileRules: I may add this in the future.

		ReplaceRules: It's a lot simpler simply to replace the
			pathnames in a Perl CGI script.  The code to
			implement ReplaceRules simply isn't worth
			adding.

	3. Searching within specific HTML tags, the SWISH-E "-t"
	   option.  I've never seen any other web search engine allow
	   this to this extent.  Alta Vista only allows you to look in
	   the title.

	4. The "crash and burn on a certain file" feature.  SWISH++
	   should not crash on any file.  Period.  If it does, there's
	   a bug and I'll fix it.


	- Paul J. Lucas
	  pjl@best.com