File: README

package info (click to toggle)
serpento 0.3.6
  • links: PTS
  • area: main
  • in suites: woody
  • size: 292 kB
  • ctags: 381
  • sloc: python: 1,644; ansic: 666; perl: 157; sh: 116; makefile: 72
file content (67 lines) | stat: -rw-r--r-- 2,624 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
serpento (for the lack of better name) is a dict (RFC 2229) server
written in python.

License: GPL, with the addition: It can be linked with
whatever you want, without any restrictions.
Welcome to the world of "your opensource license is better 
than mine" :-)


Requirenments:
- Python 2.0 (does NOT work with Python 1.5, could probably
  be forced to work with python 1.6 with some effort)
- unix-like operating system (so far tested only on linux)
- some programs from tools/ directory rely on konwert to
  convert between different encodings
  (http://www.kki.net.pl/qrczak/programy/linux/konwert/)

Features:
- full UNICODE support (well, not full yet :-))
- can use raw dict file (the one with %h %d) and automatically 
  format output in plain text.
- dictionaries can be compressed with dictzip(1)
- uses the same index file as dictd
- supports following strategies:
    exact      Match words exactly
    prefix     Match prefixes
    suffix     Match suffixes
    substring  Match substring occurring anywhere in word
    re         POSIX 1003.2 regular expressions (-)
    fnmatch    fnmatch-like (* ? as wildcards) (-) 
    soundex    Match using SOUNDEX algorithm (--)
    metaphone  metaphone algorithm (--)
    lev        Match words within Levenshtein distance one (-)
    (-) : does not work correctly with UNICODE characters outside 
          ASCII range (yet)
    (--): cannot in principle work correctly with UNICODE characters 
          outside ASCII range, because it is designed for English words only.
- easily extendible with new types of databases

"Features":
- case sensitive

Drawbacks:
- early version
- no documentation (see comments in source :-))
- starting takes significant time


On case sensitivity:

It is rather difficult to provide good case-insensitive searching if you
deal with languages other than English (just see what is the upper case
of LATIN SMALL LETTER SHARP S (U+00DF), and besides, in some scripts
distinction in case is important (romanized tlhIngan Hol, IPA....), and
often it is desirable to search while supressing other disctinctions
(e.g. diacritics, vowel marks in some languages etc...).

Therefore, for serpento, case sensitivity is declared to be a feature.
Case insensitive search can be achieved by special index file with index
entries in (let's say) lowercase, stripped of all diacritics and similar
(like transliteration in case of cyrillic alphabet), and it is client's
responsibility to convert the query to lowercase, strip diacritics,
transliterate and similar.

tools/pyindex2lower.py can be used to convert utf-8 index file 
to lowercase ascii.