1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
|
serpento (for the lack of better name) is a dict (RFC 2229) server
written in python.
License: GPL, with the addition: It can be linked with
whatever you want, without any restrictions.
Welcome to the world of "your opensource license is better
than mine" :-)
Requirenments:
- Python 2.0 (does NOT work with Python 1.5, could probably
be forced to work with python 1.6 with some effort)
- unix-like operating system (so far tested only on linux)
- some programs from tools/ directory rely on konwert to
convert between different encodings
(http://www.kki.net.pl/qrczak/programy/linux/konwert/)
Features:
- full UNICODE support (well, not full yet :-))
- can use raw dict file (the one with %h %d) and automatically
format output in plain text.
- dictionaries can be compressed with dictzip(1)
- uses the same index file as dictd
- supports following strategies:
exact Match words exactly
prefix Match prefixes
suffix Match suffixes
substring Match substring occurring anywhere in word
re POSIX 1003.2 regular expressions (-)
fnmatch fnmatch-like (* ? as wildcards) (-)
soundex Match using SOUNDEX algorithm (--)
metaphone metaphone algorithm (--)
lev Match words within Levenshtein distance one (-)
(-) : does not work correctly with UNICODE characters outside
ASCII range (yet)
(--): cannot in principle work correctly with UNICODE characters
outside ASCII range, because it is designed for English words only.
- easily extendible with new types of databases
"Features":
- case sensitive
Drawbacks:
- early version
- no documentation (see comments in source :-))
- starting takes significant time
On case sensitivity:
It is rather difficult to provide good case-insensitive searching if you
deal with languages other than English (just see what is the upper case
of LATIN SMALL LETTER SHARP S (U+00DF), and besides, in some scripts
distinction in case is important (romanized tlhIngan Hol, IPA....), and
often it is desirable to search while supressing other disctinctions
(e.g. diacritics, vowel marks in some languages etc...).
Therefore, for serpento, case sensitivity is declared to be a feature.
Case insensitive search can be achieved by special index file with index
entries in (let's say) lowercase, stripped of all diacritics and similar
(like transliteration in case of cyrillic alphabet), and it is client's
responsibility to convert the query to lowercase, strip diacritics,
transliterate and similar.
tools/pyindex2lower.py can be used to convert utf-8 index file
to lowercase ascii.
|