File: utf8trans.html

package info (click to toggle)
docbook2x 0.8.8-18
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,740 kB
  • sloc: xml: 16,229; sh: 3,674; perl: 3,461; ansic: 639; makefile: 409; sed: 11
file content (186 lines) | stat: -rw-r--r-- 6,642 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
<title>docbook2X: utf8trans</title>
<link rel="stylesheet" href="docbook2X.css" type="text/css" />
<link rev="made" href="mailto:stevecheng@users.sourceforge.net" />
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1" />
<link rel="start" href="docbook2X.html" title=
"docbook2X: Documentation Table of Contents" />
<link rel="up" href="charsets.html" title=
"docbook2X: Character set conversion" />
<link rel="prev" href="charsets.html" title=
"docbook2X: Character set conversion" />
<link rel="next" href="faq.html" title="docbook2X: FAQ" />
</head>
<body>
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center"><span><strong class=
"command">utf8trans</strong></span></th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href=
"charsets.html">&lt;&lt; Previous</a>&nbsp;</td>
<th width="60%" align="center">Character set conversion</th>
<td width="20%" align="right">&nbsp;<a accesskey="n" href=
"faq.html">Next &gt;&gt;</a></td>
</tr>
</table>
<hr /></div>
<div class="refentry" lang="en" xml:lang="en"><a id="utf8trans"
name="utf8trans"></a>
<div class="titlepage"></div>
<a id="id2538852" class="indexterm" name="id2538852"></a><a id=
"id2538859" class="indexterm" name="id2538859"></a><a id=
"id2538866" class="indexterm" name="id2538866"></a><a id=
"id2538873" class="indexterm" name="id2538873"></a><a id=
"id2538883" class="indexterm" name="id2538883"></a><a id=
"id2538890" class="indexterm" name="id2538890"></a>
<div class="refnamediv">
<h2>Name</h2>
<p><span><strong class="command">utf8trans</strong></span> &mdash;
Transliterate UTF-8 characters according to a table</p>
</div>
<div class="refsynopsisdiv">
<h2>Synopsis</h2>
<div class="cmdsynopsis">
<p><code class="command">utf8trans</code> <em class=
"replaceable"><code>charmap</code></em> [<em class=
"replaceable"><code>file</code></em>...]</p>
</div>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2538961"
name="id2538961"></a>
<h2>Description</h2>
<a id="id2538967" class="indexterm" name="id2538967"></a>
<p><span><strong class="command">utf8trans</strong></span>
transliterates characters in the specified files (or standard
input, if they are not specified) and writes the output to standard
output. All input and output is in the UTF-8 encoding.</p>
<p>This program is usually used to render characters in Unicode
text files as some markup escapes or ASCII transliterations. (It is
not intended for general charset conversions.) It provides
functionality similar to the character maps in XSLT 2.0 (XML
Stylesheet Language &ndash; Transformations, version 2.0).</p>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2539001"
name="id2539001"></a>
<h2>Options</h2>
<div class="variablelist">
<dl>
<dt><span class="term"><code class="option">-m</code>,</span>
<span class="term"><code class="option">--modify</code></span></dt>
<dd>
<p>Modifies the given files in-place with their transliterated
output, instead of sending it to standard output.</p>
<p>This option is useful for efficient transliteration of many
files at once.</p>
</dd>
<dt><span class="term"><code class=
"option">--help</code></span></dt>
<dd>
<p>Show brief usage information and exit.</p>
</dd>
<dt><span class="term"><code class=
"option">--version</code></span></dt>
<dd>
<p>Show version and exit.</p>
</dd>
</dl>
</div>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2539071"
name="id2539071"></a>
<h2>Usage</h2>
<p>The translation is done according to the rules in the
&ldquo;<span class="quote">character map</span>&rdquo;, named in
the file <em class="replaceable"><code>charmap</code></em>. It has
the following format:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Each line represents a translation entry, except for blank lines
and comment lines, which are ignored.</p>
</li>
<li>
<p>Any amount of whitespace (space or tab) may precede the start of
an entry.</p>
</li>
<li>
<p>Comment lines begin with <code class="literal">#</code>.
Everything on the same line is ignored.</p>
</li>
<li>
<p>Each entry consists of the Unicode codepoint of the character to
translate, in hexadecimal, followed <span class=
"emphasis"><em>one</em></span> space or tab, followed by the
translation string, up to the end of the line.</p>
</li>
<li>
<p>The translation string is taken literally, including any leading
and trailing spaces (except the delimeter between the codepoint and
the translation string), and all types of characters. The newline
at the end is not included.</p>
</li>
</ol>
</div>
<p>The above format is intended to be restrictive, to keep
<span><strong class="command">utf8trans</strong></span> simple. But
if a XML-based format is desired, there is a <code class=
"filename">xmlcharmap2utf8trans</code> script that comes with the
docbook2X distribution, that converts character maps in XSLT 2.0
format to the <span><strong class=
"command">utf8trans</strong></span> format.</p>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2539164"
name="id2539164"></a>
<h2>Limitations</h2>
<div class="itemizedlist">
<ul>
<li>
<p><span><strong class="command">utf8trans</strong></span> does not
work with binary files, because malformed UTF-8 sequences in the
input are substituted with U+FFFD characters. However, null
characters in the input are handled correctly. This limitation may
be removed in the future.</p>
</li>
<li>
<p>There is no way to include a newline or null in the substitution
string.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href=
"charsets.html">&lt;&lt; Previous</a>&nbsp;</td>
<td width="20%" align="center"><a accesskey="u" href=
"charsets.html">Up</a></td>
<td width="40%" align="right">&nbsp;<a accesskey="n" href=
"faq.html">Next &gt;&gt;</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Character set
conversion&nbsp;</td>
<td width="20%" align="center"><a accesskey="h" href=
"docbook2X.html">Table of Contents</a></td>
<td width="40%" align="right" valign="top">&nbsp;FAQ</td>
</tr>
</table>
</div>
<p class="footer-homepage"><a href=
"http://docbook2x.sourceforge.net/" title=
"docbook2X: Home page">docbook2X home page</a></p>
</body>
</html>