File: trietool.1

package info (click to toggle)
libdatrie 0.2.14-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,244 kB
  • sloc: sh: 5,045; ansic: 4,292; makefile: 205
file content (123 lines) | stat: -rw-r--r-- 4,628 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
.\"                                      Hey, EMACS: -*- nroff -*-
.\" First parameter, NAME, should be all caps
.\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
.\" other parameters are allowed: see man(7), man(1)
.TH TRIETOOL 1 "DECEMBER 2008"
.\" Please adjust this date whenever revising the manpage.
.\"
.\" Some roff macros, for reference:
.\" .nh        disable hyphenation
.\" .hy        enable hyphenation
.\" .ad l      left justify
.\" .ad b      justify to both left and right margins
.\" .nf        disable filling
.\" .fi        enable filling
.\" .br        insert line break
.\" .sp <n>    insert n+1 empty lines
.\" for manpage-specific macros, see man(7)
.SH NAME
trietool \- trie manipulation tool
.SH SYNOPSIS
\fBtrietool\fP [ \fIoptions\fP ] \fItrie command arg\fP ...
.SH DESCRIPTION
\fBtrietool\fP is the command-line tool for manipulating double-array trie 
data.  It can be used to query, add and remove words in a trie.
.P
.SS The Trie
The \fItrie\fP argument specifies the name of the trie to manipulate. 
A trie is stored in a file with `.tri' extension. However, to create a new
trie, one needs to prepare a file with `.abm' extension, describing the
Unicode ranges of alphabet set of the trie.  The ABM defines a set of
vectors that map Unicode characters into a continuous range of integers.
The mapped integers will be used as internal alphabet for the trie. 
Such mapping can improve the space allocation within the trie data, regardless 
of non-continuity of the character set being used, as the mapped range is 
always continuous.
.P
The ABM file is a plain text file, with each line listing a range of 32-bit 
Unicodes to be added to the alphabet set, in the format:
.IP
[0xSSSS,0xTTTT]
.P
where `0xSSSS' and `0xTTTT' are hexadecimal values of starting and ending 
character code for the range, respectively.
.P
For example, for a dictionary that contains only English words witout any 
punctuations, one may prepare `\fItrie\fP.abm' as:
.IP
[0x0041,0x005a]
.br
[0x0061,0x007a]
.P
The first line lists the ASCII codes for A-Z, and the second for a-z.
.P
No more than 255 alphabets are allowed in a trie.
.P
The created `.tri' file will incorporate the ABM data.  So, the `.abm' file
is not required after the first creation, and will be ignored.
.SH COMMANDS
Available commands are:
.TP
\fBadd\fP \fIword data\fP ...
Add \fIword\fP to trie, associated with integer \fIdata\fP.  Arbitrary number of
words-data pairs can be given.  Two arguments will be read at a time, the first 
will be treated as \fIword\fP, and the second as \fIdata\fP.
.TP
\fBadd-list\fP [ \fIoptions\fP ] \fIlist-file\fP
Add words with associated data listed in \fIlist-file\fP to trie.  The 
\fIlist-file\fP must be a text file listing one word per line.  The associated
data can be put after the word in the same line, separated with tab (`\\t')
character.  If the data field is omitted, a default value (\-1) will be used
instead.
.TP
.B " "
\fIOptions\fP are available for this command:
.RS
.TP
.B \-e, \-\-encoding \fIenc\fP
Specify character encoding of the \fIlist-file\fP contents, such as `UTF-8'.
If omitted, current locale codeset is assumed.
.RE
.TP
\fBdelete\fP \fIword\fP ...
Delete \fIword\fP from trie.  Arbitrary number of words to delete can be given.
.TP
\fBdelete-list\fP [ \fIoptions\fP ] \fIlist-file\fP
Delete words listed in \fIlist-file\fP from trie.  The \fIlist-file\fP must be 
a text file listing one word per line.
.TP
.B " "
\fIOptions\fP are available for this command:
.RS
.TP
.B \-e, \-\-encoding \fIenc\fP
Specify character encoding of the \fIlist-file\fP contents, such as `UTF-8'.
If omitted, current locale codeset is assumed.
.RE
.TP
\fBquery\fP \fIword\fP
Search for \fIword\fP in trie.  If \fIword\fP exists, its associated data
is printed to standard output.  Otherwise, error message is printed to standard
error, with nothing printed to standard output.
.TP
\fBlist\fP
List all words in trie to standard output.  The output lists one word-data pair
per line, separated with tab (`\\t') character, the format appropriate for
being \fIlist-file\fP for the \fBadd-list\fP command.
.SH OPTIONS
This program follows the usual GNU command line syntax, with long
options starting with two dashes (`\-\-').
A summary of options is included below.
.TP
.B \-p, \-\-path \fIdir\fP
Set trie directory to \fIdir\fP [default=`.']
.TP
.B \-h, \-\-help
Show summary of options.
.TP
.B \-V, \-\-version
Show version of program.
.SH AUTHOR
libdatrie was written by Theppitak Karoonboonyanan.
.PP
This manual page was written by Theppitak Karoonboonyanan <theppitak@gmail.com>.