1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
|
.TH PATGEN 1 "16 June 2015" "Web2C 2022"
.\"=====================================================================
.if t .ds TX \fRT\\h'-0.1667m'\\v'0.20v'E\\v'-0.20v'\\h'-0.125m'X\fP
.if n .ds TX TeX
.ie t .ds OX \fIT\v'+0.25m'E\v'-0.25m'X\fP
.el .ds OX TeX
.\" BX definition must follow TX so BX can use TX
.if t .ds BX \fRB\s-2IB\s0\fP\*(TX
.if n .ds BX BibTeX
.\" LX definition must follow TX so LX can use TX
.if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s0\\h'-0.15m'\\v'0.15v'\fP\*(TX
.if n .ds LX LaTeX
.\"=====================================================================
.SH NAME
patgen \- generate patterns for TeX hyphenation
.SH SYNOPSIS
.B patgen
.I dictionary_file pattern_file patout_file translate_file
.\"=====================================================================
.SH DESCRIPTION
This manual page is not meant to be exhaustive.
See also the Info file or manual
.I "Web2C: A TeX implementation"
available as part of the TeX Live distribution or at
.IR http://tug.org/web2c .
.PP
The
.I patgen
program reads the
.I dictionary_file
containing a list of hyphenated words and the
.I pattern_file
containing previously-generated patterns (if any) for a particular
language (not a complete TeX source file; see below), and produces the
.I patout_file
with (previously- plus newly-generated) hyphenation patterns for that
language. The
.I translate_file
defines language specific values for the parameters
.IR left_hyphen_min " and " right_hyphen_min
used by \*(TX's hyphenation algorithm and the external representation
of the lower and upper case version(s) of all \`letters' of that
language. Further details of the pattern generation process such as
hyphenation levels and pattern lengths are requested interactively from
the user's terminal. Optionally
.I patgen
creates a new dictionary file
.BI pattmp. n
showing the good and bad hyphens found by the generated patterns, where
.I n
is the highest hyphenation level.
.PP
The patterns generated by
.I patgen
can be read by
.B initex
for use in hyphenating words. For a real-life example of
.IR patgen 's
output, see
.IR $TEXMFMAIN/tex/generic/hyphen/hyphen.tex ,
which contains the patterns \*(TX uses for English by default.
At some sites, patterns for (many) other languages may be available,
and the local
.B tex
programs may have them preloaded.
.PP
All filenames must be complete; no adding of default
extensions or path searching is done.
.\"=====================================================================
.SH FILE FORMATS
.TP \w'@@'u+2n
.B Letters
When
.B initex
digests hyphenation patterns, \*(TX first expands macros and the result
must entirely consist of digits (hyphenation levels), dots (\`.', edge
of a word), and letters. In pattern files for non-English languages
letters are often represented by macros or other expandable constructs.
For the purpose of
.I patgen
these are just character sequences, subject to the condition that no
such sequence is a prefix of another one.
.TP \w'@@'u+2n
.B Dictionary file
A dictionary file contains a weighted list of hyphenated words, one word
per line starting in column 1. A digit in column 1 indicates a global
word weight (initially =1) applicable to all following words up to the
next global word weight. A digit at some intercharacter position
indicates a weight for that position only.
The hyphens in a word are indicated by \`-', \`*', or \`.' (or their
replacements as defined in the translate file) for hyphens yet to be
found, \`good' hyphens (correctly found by the patterns), and \`bad'
hyphens (erroneously found by the patterns) respectively; when reading a
dictionary file \`*' is treated like \`-' and \`.' is ignored.
.TP
.B Pattern file
A pattern file contains only patterns in the format above, e.g., from a
previous run of patgen. It may \fInot\fR contain any \*(TX comments or
control sequences. For instance, this is not a valid pattern file:
.nf
% this is a pattern file read by TeX.
\\patterns{%
.\|.\|.
}
.fi
It can only contain the actual patterns, i.e., the `.\|.\|.'.
.TP
.B Translate file
A translate file starts with a line containing the values of
.I left_hyphen_min
in columns 1-2,
.I right_hyphen_min
in columns 3-4, and either a blank or the replacement for one of the
"hyphen" characters \`-', \`*', and \`.' in columns 5, 6, and 7. (Input
lines are padded with blanks as for many \*(TX related programs.)
Each following line defines one \`letter': an arbitrary delimiter
character in column 1, followed by one or more external representations
of that character (first the \`lower' case one used for output), each
one terminated by the delimiter and the whole sequence terminated by
another delimiter.
If the translate file is empty, the values
.IR left_hyphen_min "=2, " right_hyphen_min "=3,"
and the 26 lower case letters
.BR a .\|.\|. z
with their upper case representations
.BR A .\|.\|. Z
are assumed.
.TP
.B Terminal input
After reading the
.I translate_file
and any previously-generated patterns from
.IR pattern_file ,
.I patgen
requests input from the user's terminal.
First the integer values of
.IR hyph_start " and " hyph_finish ,
the lowest and highest hyphenation level for which patterns are to be
generated. The value of
.I hyph_start
should be larger than any hyphenation level already present in
.IR pattern_file .
Then, for each hyphenation level, the integer values of
.IR pat_start " and " pat_finish ,
the smallest and largest pattern length to be analyzed, as well as
.IR "good weight" ", " "bad weight" ", and " threshold ,
the weights for good and bad hyphens and a weight threshold for useful
patterns.
Finally the decision (\`y' or \`Y' vs. anything else) whether or not to
produce a hyphenated word list.
.\"=====================================================================
.SH FILES
.TP \w'@@'u+2n
.I $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
The original hyphenation patterns for English, by Donald Knuth and Frank
Liang.
.TP
.I http://www.ctan.org/pkg/ushyph
Additional hyphenation patterns for English, extended by Gerard Kuiken.
.TP
.I http://www.ctan.org/pkg/hyph-utf8
Collected hyphenation patterns for many languages in many formats.
.TP
.I http://www.ctan.org/tex-archive/language/
General CTAN directory for patterns and support for many other languages.
.\"=====================================================================
.SH "SEE ALSO"
Frank Liang and Peter Breitenlohner,
patgen.web.
.PP
Frank Liang,
.IR "Word hy-phen-a-tion by com-puter" ,
STAN-CS-83-977,
Stanford University Ph.D. thesis, 1983,
http://tug.org/docs/liang.
.PP
Donald E. Knuth,
.IR "The \*(OXbook" ,
Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.
.\"=====================================================================
.SH AUTHORS
Frank Liang wrote the first version of this program. Peter
Breitenlohner made a
substantial revision in 1991 for \*(TX 3.
The first version was published as the appendix to the
.I \*(OXware
technical report. Howard Trickey originally ported it to Unix.
|