1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
<title>docbook2X: utf8trans</title>
<link rel="stylesheet" href="docbook2X.css" type="text/css" />
<link rev="made" href="mailto:stevecheng@users.sourceforge.net" />
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1" />
<link rel="start" href="docbook2X.html" title=
"docbook2X: Documentation Table of Contents" />
<link rel="up" href="charsets.html" title=
"docbook2X: Character set conversion" />
<link rel="prev" href="charsets.html" title=
"docbook2X: Character set conversion" />
<link rel="next" href="faq.html" title="docbook2X: FAQ" />
</head>
<body>
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center"><span><strong class=
"command">utf8trans</strong></span></th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href=
"charsets.html"><< Previous</a> </td>
<th width="60%" align="center">Character set conversion</th>
<td width="20%" align="right"> <a accesskey="n" href=
"faq.html">Next >></a></td>
</tr>
</table>
<hr /></div>
<div class="refentry" lang="en" xml:lang="en"><a id="utf8trans"
name="utf8trans"></a>
<div class="titlepage"></div>
<a id="id2538852" class="indexterm" name="id2538852"></a><a id=
"id2538859" class="indexterm" name="id2538859"></a><a id=
"id2538866" class="indexterm" name="id2538866"></a><a id=
"id2538873" class="indexterm" name="id2538873"></a><a id=
"id2538883" class="indexterm" name="id2538883"></a><a id=
"id2538890" class="indexterm" name="id2538890"></a>
<div class="refnamediv">
<h2>Name</h2>
<p><span><strong class="command">utf8trans</strong></span> —
Transliterate UTF-8 characters according to a table</p>
</div>
<div class="refsynopsisdiv">
<h2>Synopsis</h2>
<div class="cmdsynopsis">
<p><code class="command">utf8trans</code> <em class=
"replaceable"><code>charmap</code></em> [<em class=
"replaceable"><code>file</code></em>...]</p>
</div>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2538961"
name="id2538961"></a>
<h2>Description</h2>
<a id="id2538967" class="indexterm" name="id2538967"></a>
<p><span><strong class="command">utf8trans</strong></span>
transliterates characters in the specified files (or standard
input, if they are not specified) and writes the output to standard
output. All input and output is in the UTF-8 encoding.</p>
<p>This program is usually used to render characters in Unicode
text files as some markup escapes or ASCII transliterations. (It is
not intended for general charset conversions.) It provides
functionality similar to the character maps in XSLT 2.0 (XML
Stylesheet Language – Transformations, version 2.0).</p>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2539001"
name="id2539001"></a>
<h2>Options</h2>
<div class="variablelist">
<dl>
<dt><span class="term"><code class="option">-m</code>,</span>
<span class="term"><code class="option">--modify</code></span></dt>
<dd>
<p>Modifies the given files in-place with their transliterated
output, instead of sending it to standard output.</p>
<p>This option is useful for efficient transliteration of many
files at once.</p>
</dd>
<dt><span class="term"><code class=
"option">--help</code></span></dt>
<dd>
<p>Show brief usage information and exit.</p>
</dd>
<dt><span class="term"><code class=
"option">--version</code></span></dt>
<dd>
<p>Show version and exit.</p>
</dd>
</dl>
</div>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2539071"
name="id2539071"></a>
<h2>Usage</h2>
<p>The translation is done according to the rules in the
“<span class="quote">character map</span>”, named in
the file <em class="replaceable"><code>charmap</code></em>. It has
the following format:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Each line represents a translation entry, except for blank lines
and comment lines, which are ignored.</p>
</li>
<li>
<p>Any amount of whitespace (space or tab) may precede the start of
an entry.</p>
</li>
<li>
<p>Comment lines begin with <code class="literal">#</code>.
Everything on the same line is ignored.</p>
</li>
<li>
<p>Each entry consists of the Unicode codepoint of the character to
translate, in hexadecimal, followed <span class=
"emphasis"><em>one</em></span> space or tab, followed by the
translation string, up to the end of the line.</p>
</li>
<li>
<p>The translation string is taken literally, including any leading
and trailing spaces (except the delimeter between the codepoint and
the translation string), and all types of characters. The newline
at the end is not included.</p>
</li>
</ol>
</div>
<p>The above format is intended to be restrictive, to keep
<span><strong class="command">utf8trans</strong></span> simple. But
if a XML-based format is desired, there is a <code class=
"filename">xmlcharmap2utf8trans</code> script that comes with the
docbook2X distribution, that converts character maps in XSLT 2.0
format to the <span><strong class=
"command">utf8trans</strong></span> format.</p>
</div>
<div class="refsect1" lang="en" xml:lang="en"><a id="id2539164"
name="id2539164"></a>
<h2>Limitations</h2>
<div class="itemizedlist">
<ul>
<li>
<p><span><strong class="command">utf8trans</strong></span> does not
work with binary files, because malformed UTF-8 sequences in the
input are substituted with U+FFFD characters. However, null
characters in the input are handled correctly. This limitation may
be removed in the future.</p>
</li>
<li>
<p>There is no way to include a newline or null in the substitution
string.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href=
"charsets.html"><< Previous</a> </td>
<td width="20%" align="center"><a accesskey="u" href=
"charsets.html">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href=
"faq.html">Next >></a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Character set
conversion </td>
<td width="20%" align="center"><a accesskey="h" href=
"docbook2X.html">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> FAQ</td>
</tr>
</table>
</div>
<p class="footer-homepage"><a href=
"http://docbook2x.sourceforge.net/" title=
"docbook2X: Home page">docbook2X home page</a></p>
</body>
</html>
|