1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
|
=head1 NAME
genpyt - generate the PINYIN lexicon
=head1 SYNOPSIS
B<genpyt> I<lexicon-file> I<result-file> I<log-file> I<slm-file>
=head1 DESCRIPTION
B<genpyt> is used to generate the PINYIN lexicon.
It only works on zh_CN.UTF-8 locale.
=head1 ARGUMENTS
=over 4
=item I<lexicon-file>
Specify a dictionary file. It should be a line-based text file in utf-8 encoding
. Each line looks like:
CCC id [pinyin'pinyin'pinyin]*
A default dictionary file can be found at F</usr/share/sunpinyin/dict.utf8>.
=item I<result-file>
The output binary PINYIN lexicon file. This lexicon contains a trie presenting the key tree of PINYIN. And all of the candidate words are sorted using the unigram in I<slm-file>. This file can be used with sunpinyin input method engines.
=item I<log-file>
Specify the file to where the log goes. The I<log-file> can be seen as the human-readble presentation of the binary output file.
=item I<slm-file>
The language model from which the unigram information are retrieved. Typically, the I<slm-file> is generated by B<slmthread>.
=back
=head1 AUTHOR
Originally written by Phill.Zhang E<lt>phill.zhang@sun.comE<gt>.
Currently maintained by Kov.Chai E<lt>tchaikov@gmail.comE<gt>.
=head1 SEE ALSO
B<slmthread>(1).
=for comment
-*- indent-tabs-mode: nil -*- vim:et:ts=4
|