1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
|
<html lang="en">
<head>
<title>UTF-8 Support - ne's manual</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="ne's manual">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Reference.html#Reference" title="Reference">
<link rel="prev" href="Emergency-Save.html#Emergency-Save" title="Emergency Save">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
pre.display { font-family:inherit }
pre.format { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }
pre.smallformat { font-family:inherit; font-size:smaller }
pre.smallexample { font-size:smaller }
pre.smalllisp { font-size:smaller }
span.sc { font-variant:small-caps }
span.roman { font-family:serif; font-weight:normal; }
span.sansserif { font-family:sans-serif; font-weight:normal; }
--></style>
</head>
<body>
<div class="node">
<a name="UTF-8-Support"></a>
<a name="UTF_002d8-Support"></a>
<p>
Previous: <a rel="previous" accesskey="p" href="Emergency-Save.html#Emergency-Save">Emergency Save</a>,
Up: <a rel="up" accesskey="u" href="Reference.html#Reference">Reference</a>
<hr>
</div>
<h3 class="section">3.11 UTF-8 Support</h3>
<p><a name="index-UTF_002d8-Support-75"></a>
Since version 1.30, <code>ne</code> can manipulate UTF-8 files and supports
UTF-8 when communicating with the user. At startup, <code>ne</code> fetches
the system locale description, and checks whether it contains the string
‘<samp><span class="samp">utf8</span></samp>’ or ‘<samp><span class="samp">utf-8</span></samp>’. In this case, it starts communicating with
the user using UTF-8. This behaviour can be modified either using a
suitable command line option (see see <a href="Arguments.html#Arguments">Arguments</a>), or using
<a href="UTF8IO.html#UTF8IO">UTF8IO</a>. This makes it possible to display and read from the
keyboard a wide range of characters.
<p>Independently of the input/output encoding, <code>ne</code> keeps track of the
encoding of each buffer. <code>ne</code> does not try to select a particular
coding on a buffer, unless it is forced to do so, for instance because a
certain character is inserted. Once a buffer has a definite encoding,
however, it keeps it forever.
<p>More precisely, every buffer may be in one of three <em>encoding
modes</em>: US-ASCII, when it is entirely composed of US-ASCII characters;
8-bit, if it contains also other characters, but it is not UTF-8
encoded; and finally, UTF-8, if it is UTF-8-encoded.
<p>The behaviour of <code>ne</code> in US-ASCII and 8-bit mode is similar to
previous versions: each byte in the buffer is considered a separate
character.
<p>There are, however, two important differences: first, if I/O is not
UTF-8 encoded, <em>any</em> encoding of the ISO-8859 family will work
flawlessly, as <code>ne</code> merely reads bytes from the keyboard and
displays bytes on the screen. On the contrary, in the case of UTF-8
input/output <code>ne</code> must take a decision as to which encoding is used
for non-UTF-8 buffers, and presently this is hardwired to ISO-8859-1.
Second, since version 1.34, 8-bit buffers use localized casing and
character type functions. This means that case-insensitive searches or
case foldings will work with, say, Cyrillic characters, provided that
your locale is set correctly.
<p>In UTF-8 mode, instead, <code>ne</code> interprets the bytes in the buffer in
a different way—several bytes may encode a single character. The whole
process is completely transparent to the user, but if you really want to
look at the buffer content, you can switch to 8-bit mode (see
see <a href="UTF8.html#UTF8">UTF8</a>).
<p>For most operations, UTF-8 support should be transparent. However, in
some cases, in particular when mixing buffers with different encodings,
<code>ne</code> will refuse to perform certain operations because of
incompatible encodings.
<p>The main limitation of UTF-8 buffers is that when searching for a
regular expression in a UTF-8 text, character sets may only contain
US-ASCII characters (see see <a href="Regular-Expressions.html#Regular-Expressions">Regular Expressions</a>). You can, of
course, partially emulate a full UTF-8 character set implementation
specifying the possible alternatives using ‘<samp><span class="samp">|</span></samp>’ (but you have no
ranges).
</body></html>
|