File: genpyt.pod

package info (click to toggle)
sunpinyin 3.0.0~rc1%2Bds1-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 1,944 kB
  • sloc: cpp: 13,586; python: 923; makefile: 198
file content (54 lines) | stat: -rw-r--r-- 1,314 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
=head1 NAME

genpyt - generate the PINYIN lexicon

=head1 SYNOPSIS

B<genpyt> I<lexicon-file> I<result-file> I<log-file> I<slm-file>

=head1 DESCRIPTION

B<genpyt> is used to generate the PINYIN lexicon. 
It only works on zh_CN.UTF-8 locale.

=head1 ARGUMENTS

=over 4

=item I<lexicon-file>

Specify a dictionary file. It should be a line-based text file in utf-8 encoding
. Each line looks like:

   CCC  id  [pinyin'pinyin'pinyin]*

A default dictionary file can be found at F</usr/share/sunpinyin/dict.utf8>.


=item I<result-file>

The output binary PINYIN lexicon file. This lexicon contains a trie presenting the key tree of PINYIN. And all of the candidate words are sorted using the unigram in I<slm-file>. This file can be used with sunpinyin input method engines.


=item I<log-file>

Specify the file to where the log goes. The I<log-file> can be seen as the human-readble presentation of the binary output file.


=item I<slm-file>

The language model from which the unigram information are retrieved. Typically, the I<slm-file> is generated by B<slmthread>.

=back

=head1 AUTHOR

Originally written by Phill.Zhang E<lt>phill.zhang@sun.comE<gt>.
Currently maintained by Kov.Chai E<lt>tchaikov@gmail.comE<gt>.

=head1 SEE ALSO

B<slmthread>(1).

=for comment
-*- indent-tabs-mode: nil -*- vim:et:ts=4