File: charsets.html

package info (click to toggle)
docbook2x 0.8.8-18
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,740 kB
  • sloc: xml: 16,229; sh: 3,674; perl: 3,461; ansic: 639; makefile: 409; sed: 11
file content (143 lines) | stat: -rw-r--r-- 6,069 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
<title>docbook2X: Character set conversion</title>
<link rel="stylesheet" href="docbook2X.css" type="text/css" />
<link rev="made" href="mailto:stevecheng@users.sourceforge.net" />
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1" />
<meta name="description" content=
"Discussion on reproducing non-ASCII characters in the   converted output" />
<link rel="start" href="docbook2X.html" title=
"docbook2X: Documentation Table of Contents" />
<link rel="up" href="docbook2X.html" title=
"docbook2X: Documentation Table of Contents" />
<link rel="prev" href="sgml2xml-isoent.html" title=
"docbook2X: sgml2xml-isoent" />
<link rel="next" href="utf8trans.html" title=
"docbook2X: utf8trans" />
</head>
<body>
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Character set conversion</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href=
"sgml2xml-isoent.html">&lt;&lt; Previous</a>&nbsp;</td>
<th width="60%" align="center">&nbsp;</th>
<td width="20%" align="right">&nbsp;<a accesskey="n" href=
"utf8trans.html">Next &gt;&gt;</a></td>
</tr>
</table>
<hr /></div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title"><a id="charsets" name="charsets"></a>Character
set conversion</h2>
</div>
</div>
</div>
<a id="id2537977" class="indexterm" name="id2537977"></a><a id=
"id2537983" class="indexterm" name="id2537983"></a><a id=
"id2537990" class="indexterm" name="id2537990"></a><a id=
"id2537997" class="indexterm" name="id2537997"></a><a id=
"id2538612" class="indexterm" name="id2538612"></a><a id=
"id2538619" class="indexterm" name="id2538619"></a><a id=
"id2538625" class="indexterm" name="id2538625"></a><a id=
"id2538632" class="indexterm" name="id2538632"></a><a id=
"id2538639" class="indexterm" name="id2538639"></a><a id=
"id2538649" class="indexterm" name="id2538649"></a><a id=
"id2538656" class="indexterm" name="id2538656"></a>
<p>When translating XML to legacy ASCII-based formats with poor
support for Unicode, such as man pages and Texinfo, there is always
the problem that Unicode characters in the source document also
have to be translated somehow.</p>
<p>A straightforward character set conversion from Unicode does not
suffice, because the target character set, usually US-ASCII or ISO
Latin-1, do not contain common characters such as dashes and
directional quotation marks that are widely used in XML documents.
But document formatters (man and Texinfo) allow such characters to
be entered by a markup escape: for example, <span class=
"markup">\(lq</span> for the left directional quote <code class=
"literal">&ldquo;</code>. And if a markup-level escape is not
available, an ASCII transliteration might be used: for example,
using the ASCII less-than sign <span class="markup">&lt;</span> for
the angle quotation mark <span class="markup">&lang;</span>.</p>
<p>So the Unicode character problem can be solved in two steps:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p><a href="utf8trans.html"><span><strong class=
"command">utf8trans</strong></span></a>, a program included in
docbook2X, maps Unicode characters to markup-level escapes or
transliterations.</p>
<p>Since there is not necessarily a fixed, official mapping of
Unicode characters, <span><strong class=
"command">utf8trans</strong></span> can read in user-modifiable
character mappings expressed in text files and apply them. (Unlike
most character set converters.)</p>
<p>In <code class="filename">charmaps/man/roff.charmap</code> and
<code class="filename">charmaps/man/texi.charmap</code> are
character maps that may be used for man-page and Texinfo
conversion. The programs <a href=
"db2x_manxml.html"><span><strong class=
"command">db2x_manxml</strong></span></a> and <a href=
"db2x_texixml.html"><span><strong class=
"command">db2x_texixml</strong></span></a> will apply these
character maps, or another character map specified by the user,
automatically.</p>
</li>
<li>
<p>The rest of the Unicode text is converted to some other
character set (encoding). For example, a French document with
accented characters (such as <code class="literal">&eacute;</code>)
might be converted to ISO Latin 1.</p>
<p>This step is applied after <span><strong class=
"command">utf8trans</strong></span> character mapping, using the
<span class="citerefentry"><span class=
"refentrytitle"><span><strong class=
"command">iconv</strong></span></span></span> encoding conversion
tool. Both <a href="db2x_manxml.html"><span><strong class=
"command">db2x_manxml</strong></span></a> and <a href=
"db2x_texixml.html"><span><strong class=
"command">db2x_texixml</strong></span></a> can call <span class=
"citerefentry"><span class="refentrytitle"><span><strong class=
"command">iconv</strong></span></span></span> automatically when
producing their output.</p>
</li>
</ol>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href=
"sgml2xml-isoent.html">&lt;&lt; Previous</a>&nbsp;</td>
<td width="20%" align="center">&nbsp;</td>
<td width="40%" align="right">&nbsp;<a accesskey="n" href=
"utf8trans.html">Next &gt;&gt;</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top"><span><strong class=
"command">sgml2xml-isoent</strong></span>&nbsp;</td>
<td width="20%" align="center"><a accesskey="h" href=
"docbook2X.html">Table of Contents</a></td>
<td width="40%" align="right" valign="top">
&nbsp;<span><strong class="command">utf8trans</strong></span></td>
</tr>
</table>
</div>
<p class="footer-homepage"><a href=
"http://docbook2x.sourceforge.net/" title=
"docbook2X: Home page">docbook2X home page</a></p>
</body>
</html>