1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361
|
.TH PRECONV 1 "7 November 2018" "Groff Version 1.22.3"
.SH NAME
preconv \- convert encoding of input files to something GNU troff understands
.
.
.\" license (copying)
.de co
Copyright \[co] 2006-2014 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.
Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be included in
translations approved by the Free Software Foundation instead of in
the original English.
..
.
.\" --------------------------------------------------------------------
.SH SYNOPSIS
.\" --------------------------------------------------------------------
.
.SY preconv
.OP \-dr
.OP \-e encoding
.RI [ files
.IR .\|.\|. ]
.
.SY preconv
.B \-h
|
.B \-\-help
.
.SY preconv
.B \-v
|
.B \-\-version
.YS
.
.
.PP
It is possible to have whitespace between the
.B \-e
command line option and its parameter.
.
.
.\" --------------------------------------------------------------------
.SH DESCRIPTION
.\" --------------------------------------------------------------------
.
.B preconv
reads
.I files
and converts its encoding(s) to a form GNU
.BR troff (1)
can process, sending the data to standard output.
.
Currently, this means ASCII characters and \[oq]\e[uXXXX]\[cq]
entities, where \[oq]XXXX\[cq] is a hexadecimal number with four to
six digits, representing a Unicode input code.
.
Normally,
.B preconv
should be invoked with the
.B \-k
and
.B \-K
options of
.BR groff .
.
.
.\" --------------------------------------------------------------------
.SH OPTIONS
.\" --------------------------------------------------------------------
.
.TP
.B \-d
Emit debugging messages to standard error (mainly the used encoding).
.
.TP
.BI \-D encoding
Specify default encoding if everything fails (see below).
.
.TP
.BI \-e encoding
Specify input encoding explicitly, overriding all other methods.
.
This corresponds to
.BR groff \[aq]s
.BI \-K encoding
option.
.
Without this switch,
.B preconv
uses the algorithm described below to select the input encoding.
.
.TP
.B \-\-help
.TQ
.B \-h
Print help message.
.
.TP
.B \-r
Do not add \&.lf requests.
.
.TP
.B \-\-version
.TQ
.B \-v
Print version number.
.
.
.\" --------------------------------------------------------------------
.SH USAGE
.\" --------------------------------------------------------------------
.
.B preconv
tries to find the input encoding with the following algorithm.
.
.IP 1.
If the input encoding has been explicitly specified with option
.BR \-e ,
use it.
.
.IP 2.
Otherwise, check whether the input starts with a
.I Byte Order Mark
(BOM, see below).
.
If found, use it.
.
.IP 3.
Finally, check whether there is a known
.I coding tag
(see below) in either the first or second input line.
.
If found, use it.
.
.IP 4.
If everything fails, use a default encoding as given with option
.BR \-D ,
by the current locale, or \[oq]latin1\[cq] if the locale is set to
\[oq]C\[cq], \[oq]POSIX\[cq], or empty (in that order).
.
.
.PP
Note that the
.B groff
program supports a
.B GROFF_ENCODING
environment variable which is eventually expanded to option
.BR \-k .
.
.
.\" --------------------------------------------------------------------
.SS "Byte Order Mark"
.\" --------------------------------------------------------------------
.
The Unicode Standard defines character U+FEFF as the Byte Order Mark
(BOM).
.
On the other hand, value U+FFFE is guaranteed not be a Unicode character at
all.
.
This allows to detect the byte order within the data stream (either
big-endian or lower-endian), and the MIME encodings \%\[oq]UTF-16\[cq]
and \%\[oq]UTF-32\[cq] mandate that the data stream starts with U+FEFF.
.
Similarly, the data stream encoded as \%\[oq]UTF-8\[cq] might start
with a BOM (to ease the conversion from and to \%UTF-16 and \%UTF-32).
.
In all cases, the byte order mark is
.I not
part of the data but part of the encoding protocol; in other words,
.BR preconv \[aq]s
output doesn\[aq]t contain it.
.
.
.PP
Note that U+FEFF not at the start of the input data actually is
emitted; it has then the meaning of a \[oq]zero width no-break
space\[cq] character \[en] something not needed normally in
.BR groff .
.
.
.\" --------------------------------------------------------------------
.SS "Coding Tags"
.\" --------------------------------------------------------------------
.
Editors which support more than a single character encoding need tags
within the input files to mark the file\[aq]s encoding.
.
While it is possible to guess the right input encoding with the help of
heuristic algorithms for data which represents a greater amount of a natural
language, it is still just a guess.
.
Additionally, all algorithms fail easily for input which is either too short
or doesn\[aq]t represent a natural language.
.
.
.PP
For these reasons,
.B preconv
supports the coding tag convention (with some restrictions) as used by
.B "GNU Emacs"
and
.B XEmacs
(and probably other programs too).
.
.
.PP
Coding tags in
.B "GNU Emacs"
and
.B XEmacs
are stored in so-called
.IR "File Variables" .
.
.B preconv
recognizes the following syntax form which must be put into a troff comment
in the first or second line.
.
.RS
.PP
\-*\-
.IR tag1 :
.IR value1 ;
.IR tag2 :
.IR value2 ;
\&.\|.\|.\& \-*\-
.RE
.
.
.PP
The only relevant tag for
.B preconv
is \[oq]coding\[cq] which can take the values listed below.
.
Here an example line which tells
.B Emacs
to edit a file in troff mode, and to use \%latin2 as its encoding.
.
.RS
.PP
.EX
\&.\[rs]" \-*\- mode: troff; coding: latin-2 \-*\-\""
.EE
.RE
.
.
.PP
The following list gives all MIME coding tags (either lowercase or
uppercase) supported by
.BR preconv ;
this list is hard-coded in the source.
.
.RS
.PP
.ad l
\%big5, \%cp1047, \%euc-jp, \%euc-kr, \%gb2312, \%iso-8859-1,
\%iso-8859-2, \%iso-8859-5, \%iso-8859-7, \%iso-8859-9, \%iso-8859-13,
\%iso-8859-15, \%koi8-r, \%us-ascii, \%utf-8, \%utf-16, \%utf-16be,
\%utf-16le
.ad
.RE
.
.
.PP
In addition, the following hard-coded list of other tags is recognized
which eventually map to values from the list above.
.
.RS
.PP
.ad l
\%ascii, \%chinese-big5, \%chinese-euc, \%chinese-iso-8bit, \%cn-big5,
\%\%cn-gb, \%cn-gb-2312, \%cp878, \%csascii, \%csisolatin1,
\%cyrillic-iso-8bit, \%cyrillic-koi8, \%euc-china, \%euc-cn,
\%euc-japan, \%euc-japan-1990, \%euc-korea, \%greek-iso-8bit,
\%iso-10646/utf8, \%iso-10646/utf-8, \%iso-latin-1, \%iso-latin-2,
\%iso-latin-5, \%iso-latin-7, \%iso-latin-9, \%japanese-euc,
\%japanese-iso-8bit, \%jis8, \%koi8, \%korean-euc, \%korean-iso-8bit,
\%latin-0, \%latin1, \%latin-1, \%latin-2, \%latin-5, \%latin-7,
\%latin-9, \%mule-utf-8, \%mule-utf-16, \%mule-utf-16be,
\%mule-utf-16-be, \%mule-utf-16be-with-signature, \%mule-utf-16le,
\%mule-utf-16-le, \%mule-utf-16le-with-signature, \%utf8, \%utf-16-be,
\%utf-16-be-with-signature, \%utf-16be-with-signature, \%utf-16-le,
\%utf-16-le-with-signature, \%utf-16le-with-signature
.ad
.RE
.
.
.PP
Those tags are taken from
.B "GNU Emacs"
and
.BR XEmacs ,
together with some aliases.
.
Trailing \%\[oq]-dos\[cq], \%\[oq]-unix\[cq], and \%\[oq]-mac\[cq]
suffixes of coding tags (which give the end-of-line convention used in
the file) are stripped off before the comparison with the above tags
happens.
.
.SS "Iconv Issues"
.B preconv
by itself only supports three encodings: \%latin-1, cp1047, and \%UTF-8;
all other encodings are passed to the
.B iconv
library functions.
.
At compile time it is searched and checked for a valid
.B iconv
implementation; a call to \[oq]preconv \-\-version\[cq] shows whether
.B iconv
is used.
.
.
.\" --------------------------------------------------------------------
.SH BUGS
.\" --------------------------------------------------------------------
.
.B preconv
doesn\[aq]t support
.I "local variable lists"
yet.
.
This is a different syntax form to specify local variables at the end of a
file.
.
.
.\" --------------------------------------------------------------------
.SH "SEE ALSO"
.\" --------------------------------------------------------------------
.
.BR groff (1)
.br
the
.B "GNU Emacs"
and
.B XEmacs
info pages
.
.
.\" --------------------------------------------------------------------
.SH COPYING
.\" --------------------------------------------------------------------
.co
.
.
.\" Emacs setting
.\" Local Variables:
.\" mode: nroff
.\" End:
|