File: UTF_002d8-Support.html

package info (click to toggle)
ne 2.1-1
links: PTS
area: main
in suites: squeeze
size: 3,540 kB
ctags: 3,408
sloc: ansic: 24,666; perl: 500; makefile: 207; sh: 9
file content (91 lines) | stat: -rw-r--r-- 4,625 bytes
<html lang="en">
<head>
<title>UTF-8 Support - ne's manual</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="ne's manual">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Reference.html#Reference" title="Reference">
<link rel="prev" href="Emergency-Save.html#Emergency-Save" title="Emergency Save">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
  pre.display { font-family:inherit }
  pre.format  { font-family:inherit }
  pre.smalldisplay { font-family:inherit; font-size:smaller }
  pre.smallformat  { font-family:inherit; font-size:smaller }
  pre.smallexample { font-size:smaller }
  pre.smalllisp    { font-size:smaller }
  span.sc    { font-variant:small-caps }
  span.roman { font-family:serif; font-weight:normal; } 
  span.sansserif { font-family:sans-serif; font-weight:normal; } 
--></style>
</head>
<body>
<div class="node">
<a name="UTF-8-Support"></a>
<a name="UTF_002d8-Support"></a>
<p>
Previous:&nbsp;<a rel="previous" accesskey="p" href="Emergency-Save.html#Emergency-Save">Emergency Save</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Reference.html#Reference">Reference</a>
<hr>
</div>

<h3 class="section">3.11 UTF-8 Support</h3>

<p><a name="index-UTF_002d8-Support-75"></a>
Since version 1.30, <code>ne</code> can manipulate UTF-8 files and supports
UTF-8 when communicating with the user. At startup, <code>ne</code> fetches
the system locale description, and checks whether it contains the string
&lsquo;<samp><span class="samp">utf8</span></samp>&rsquo; or &lsquo;<samp><span class="samp">utf-8</span></samp>&rsquo;. In this case, it starts communicating with
the user using UTF-8. This behaviour can be modified either using a
suitable command line option (see see <a href="Arguments.html#Arguments">Arguments</a>), or using
<a href="UTF8IO.html#UTF8IO">UTF8IO</a>. This makes it possible to display and read from the
keyboard a wide range of characters.

   <p>Independently of the input/output encoding, <code>ne</code> keeps track of the
encoding of each buffer. <code>ne</code> does not try to select a particular
coding on a buffer, unless it is forced to do so, for instance because a
certain character is inserted. Once a buffer has a definite encoding,
however, it keeps it forever.

   <p>More precisely, every buffer may be in one of three <em>encoding
modes</em>: US-ASCII, when it is entirely composed of US-ASCII characters;
8-bit, if it contains also other characters, but it is not UTF-8
encoded; and finally, UTF-8, if it is UTF-8-encoded.

   <p>The behaviour of <code>ne</code> in US-ASCII and 8-bit mode is similar to
previous versions: each byte in the buffer is considered a separate
character.

   <p>There are, however, two important differences: first, if I/O is not
UTF-8 encoded, <em>any</em> encoding of the ISO-8859 family will work
flawlessly, as <code>ne</code> merely reads bytes from the keyboard and
displays bytes on the screen. On the contrary, in the case of UTF-8
input/output <code>ne</code> must take a decision as to which encoding is used
for non-UTF-8 buffers, and presently this is hardwired to ISO-8859-1. 
Second, since version 1.34, 8-bit buffers use localized casing and
character type functions. This means that case-insensitive searches or
case foldings will work with, say, Cyrillic characters, provided that
your locale is set correctly.

   <p>In UTF-8 mode, instead, <code>ne</code> interprets the bytes in the buffer in
a different way&mdash;several bytes may encode a single character. The whole
process is completely transparent to the user, but if you really want to
look at the buffer content, you can switch to 8-bit mode (see
see <a href="UTF8.html#UTF8">UTF8</a>).

   <p>For most operations, UTF-8 support should be transparent. However, in
some cases, in particular when mixing buffers with different encodings,
<code>ne</code> will refuse to perform certain operations because of
incompatible encodings.

   <p>The main limitation of UTF-8 buffers is that when searching for a
regular expression in a UTF-8 text, character sets may only contain
US-ASCII characters (see see <a href="Regular-Expressions.html#Regular-Expressions">Regular Expressions</a>). You can, of
course, partially emulate a full UTF-8 character set implementation
specifying the possible alternatives using &lsquo;<samp><span class="samp">|</span></samp>&rsquo; (but you have no
ranges).

   </body></html>