File: nonascii.texi

package info (click to toggle)
elisp-manual-ja 20-2.5-jp-4
links: PTS
area: main
in suites: sarge
size: 3,996 kB
ctags: 239
sloc: lisp: 2,837; perl: 182; makefile: 45; sh: 16
file content (1878 lines) | stat: -rw-r--r-- 91,647 bytes
parent folder | download | duplicates (2)
@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
@c Copyright (C) 1998 Free Software Foundation, Inc. 
@c See the file elisp.texi for copying conditions.
@setfilename ../info/characters
@node Non-ASCII Characters, Searching and Matching, Text, Top
@c @chapter Non-ASCII Characters
@chapter $BHs(BASCII$BJ8;z(B
@c @cindex multibyte characters
@c @cindex non-ASCII characters
@cindex $B%^%k%A%P%$%HJ8;z(B
@cindex $BHs(BASCII$BJ8;z(B

@c   This chapter covers the special issues relating to non-@sc{ASCII}
@c characters and how they are stored in strings and buffers.
$BK\>O$G$O!"Hs(B@sc{ASCII}$B$K4XO"$9$kFCJL$J$3$H$,$i$H(B
$B$=$l$i$,J8;zNs$d%P%C%U%!$K$I$N$h$&$KJ]B8$5$l$k$+$K$D$$$F=R$Y$^$9!#(B

@menu
* Text Representations::
* Converting Representations::
* Selecting a Representation::
* Character Codes::
* Character Sets::
* Chars and Bytes::
* Splitting Characters::
* Scanning Charsets::
* Translation of Characters::
* Coding Systems::
* Input Methods::
@end menu

@node Text Representations
@c @section Text Representations
@section $B%F%-%9%HI=8=(B
@c @cindex text representations
@cindex $B%F%-%9%HI=8=(B

@c   Emacs has two @dfn{text representations}---two ways to represent text
@c in a string or buffer.  These are called @dfn{unibyte} and
@c @dfn{multibyte}.  Each string, and each buffer, uses one of these two
@c representations.  For most purposes, you can ignore the issue of
@c representations, because Emacs converts text between them as
@c appropriate.  Occasionally in Lisp programming you will need to pay
@c attention to the difference.
Emacs$B$K$O(B2$B$D$N(B@dfn{$B%F%-%9%HI=8=(B}$B!"$D$^$j!"(B
$BJ8;zNs$d%P%C%U%!$G%F%-%9%H$rI=$9J}K!$,(B2$B$D$"$j$^$9!#(B
$B$3$l$i$O!"(B@dfn{$B%f%K%P%$%H(B}$B!J(Bunibyte$B!K$H(B
@dfn{$B%^%k%A%P%$%H(B}$B!J(Bmultibyte$B!K$H8F$P$l$^$9!#(B
$B3FJ8;zNs$d3F%P%C%U%!$G$O!"$3$l$i$N(B2$B$D$NI=8=$N0lJ}$r;H$$$^$9!#(B
$B$[$H$s$I$NL\E*$K$O!"(BEmacs$B$,$3$l$i$N$"$$$@$GE,@Z$KJQ49$9$k$N$G!"(B
$BFI<T$O$3$l$i$NI=8=$K4X$7$F$OL5;k$G$-$^$9!#(B
Lisp$B%W%m%0%i%`$G$O!"$3$l$i$N0c$$$KCm0U$9$kI,MW$,$7$P$7$P$"$j$^$9!#(B

@c @cindex unibyte text
@cindex $B%f%K%P%$%H%F%-%9%H(B
@c   In unibyte representation, each character occupies one byte and
@c therefore the possible character codes range from 0 to 255.  Codes 0
@c through 127 are @sc{ASCII} characters; the codes from 128 through 255
@c are used for one non-@sc{ASCII} character set (you can choose which
@c character set by setting the variable @code{nonascii-insert-offset}).
$B%f%K%P%$%HI=8=$G$O!"3FJ8;z$O(B1$B%P%$%H$r@j$a!"(B
$B$=$N$?$a!"2DG=$JJ8;z%3!<%I$NHO0O$O(B0$B$+$i(B255$B$G$9!#(B
$B%3!<%I(B0$B$+$i(B127$B$O(B@sc{ASCII}$BJ8;z$G$9!#(B
$B%3!<%I(B128$B$+$i(B255$B$OHs(B@sc{ASCII}$BJ8;z=89g$N(B1$B$D(B
$B!JJQ?t(B@code{nonascii-insert-offset}$B$K@_Dj$7$FJ8;z=89g$rA*$Y$k!K(B
$B$K;H$o$l$^$9!#(B

@c @cindex leading code
@c @cindex multibyte text
@c @cindex trailing codes
@cindex $B%j!<%G%#%s%0%3!<%I(B
@cindex $B%^%k%A%P%$%H%F%-%9%H(B
@cindex $B%H%l%$%j%s%0%3!<%I(B
@c   In multibyte representation, a character may occupy more than one
@c byte, and as a result, the full range of Emacs character codes can be
@c stored.  The first byte of a multibyte character is always in the range
@c 128 through 159 (octal 0200 through 0237).  These values are called
@c @dfn{leading codes}.  The second and subsequent bytes of a multibyte
@c character are always in the range 160 through 255 (octal 0240 through
@c 0377); these values are @dfn{trailing codes}.
$B%^%k%A%P%$%HI=8=$G$O!"(B1$BJ8;z$O(B1$B%P%$%H0J>e$r@j$a!"(B
$B$=$N$?$a!"(BEmacs$B$NJ8;z%3!<%I$NHO0OA4BN$r3JG<$G$-$k$N$G$9!#(B
$B%^%k%A%P%$%HJ8;z$N:G=i$N%P%$%H$O$D$M$K(B128$B$+$i(B159$B!J(B8$B?J?t$G(B0200$B$+$i(B0237$B!K$N(B
$BHO0O$K$"$j$^$9!#(B
$B$3$l$i$NCM$r(B@dfn{$B%j!<%G%#%s%0%3!<%I(B}$B!J(Bleading code$B!K$H8F$S$^$9!#(B
$B%^%k%A%P%$%HJ8;z$N(B2$B%P%$%H0J9_$O$D$M$K(B160$B$+$i(B255$B!J(B8$B?J?t$G(B0240$B$+$i(B0377$B!K$N(B
$BHO0O$K$"$j$^$9!#(B
$B$3$l$i$NCM$r(B@dfn{$B%H%l%$%j%s%0%3!<%I(B}$B!J(Btrailing code$B!K$H8F$S$^$9!#(B

@c   In a buffer, the buffer-local value of the variable
@c @code{enable-multibyte-characters} specifies the representation used.
@c The representation for a string is determined based on the string
@c contents when the string is constructed.
$B%P%C%U%!$G$O!"JQ?t(B@code{enable-multibyte-characters}$B$N(B
$B%P%C%U%!%m!<%+%k$JCM$,;HMQ$9$kI=8=$r;XDj$7$^$9!#(B
$BJ8;zNs$NI=8=$O!"J8;zNs$r:n@.$9$k$H$-$NJ8;zNs$NFbMF$K4p$E$$$F7hDj$5$l$^$9!#(B

@defvar enable-multibyte-characters
@tindex enable-multibyte-characters
@c This variable specifies the current buffer's text representation.
@c If it is non-@code{nil}, the buffer contains multibyte text; otherwise,
@c it contains unibyte text.
$B$3$NJQ?t$O!"%P%C%U%!$N%F%-%9%HI=8=$r;XDj$9$k!#(B
$B$3$l$,(B@code{nil}$B0J30$G$"$k$H!"%P%C%U%!$O%^%k%A%P%$%H%F%-%9%H$rJ];}$9$k!#(B
$B$5$b$J$1$l$P%f%K%P%$%H%F%-%9%H$rJ];}$9$k!#(B

@c You cannot set this variable directly; instead, use the function
@c @code{set-buffer-multibyte} to change a buffer's representation.
$B$3$NJQ?t$KD>@\@_Dj$9$k$3$H$O$G$-$J$$!#(B
$B$=$N$+$o$j$K!"%P%C%U%!$NI=8=$rJQ99$9$k$K$O!"(B
$B4X?t(B@code{set-buffer-multibyte}$B$r;H$&!#(B
@end defvar

@defvar default-enable-multibyte-characters
@tindex default-enable-multibyte-characters
@c This variable's value is entirely equivalent to @code{(default-value
@c 'enable-multibyte-characters)}, and setting this variable changes that
@c default value.  Setting the local binding of
@c @code{enable-multibyte-characters} in a specific buffer is not allowed,
@c but changing the default value is supported, and it is a reasonable
@c thing to do, because it has no effect on existing buffers.
$B$3$NJQ?t$NCM$O!"(B
@code{(default-value 'enable-multibyte-characters)}$B$K40A4$KEy2A$G$"$j!"(B
$B$3$NJQ?t$K@_Dj$9$k$H%G%U%)%k%HCM$rJQ99$9$k!#(B
$B%P%C%U%!$N(B@code{enable-multibyte-characters}$B$N%m!<%+%k$JB+G{$K@_Dj$9$k$3$H$O(B
$B5v$5$l$F$$$J$$$,!"%G%U%)%k%HCM$rJQ99$9$k$3$H$O2DG=$G$"$j!"(B
$B$=$&$7$F$b4{B8$N%P%C%U%!$K$O1F6A$7$J$$$N$GM}$K$+$J$C$F$$$k!#(B

@c The @samp{--unibyte} command line option does its job by setting the
@c default value to @code{nil} early in startup.
$B%3%^%s%I9T%*%W%7%g%s(B@samp{--unibyte}$B$O!"(B
$B5/F0;~$NAa$$CJ3,$G%G%U%)%k%HCM$K(B@code{nil}$B$r@_Dj$9$k$3$H$GLrL\$r2L$?$9!#(B
@end defvar

@defun multibyte-string-p string
@tindex multibyte-string-p
@c Return @code{t} if @var{string} contains multibyte characters.
$BJ8;zNs(B@var{string}$B$K%^%k%A%P%$%HJ8;z$,4^$^$l$k$H(B@code{t}$B$rJV$9!#(B
@end defun

@node Converting Representations
@c @section Converting Text Representations
@section $B%F%-%9%HI=8=$NJQ49(B

@c   Emacs can convert unibyte text to multibyte; it can also convert
@c multibyte text to unibyte, though this conversion loses information.  In
@c general these conversions happen when inserting text into a buffer, or
@c when putting text from several strings together in one string.  You can
@c also explicitly convert a string's contents to either representation.
Emacs$B$O%f%K%P%$%H%F%-%9%H$r%^%k%A%P%$%H$KJQ49$G$-$^$9!#(B
$B%^%k%A%P%$%H%F%-%9%H$r%f%K%P%$%H$K$bJQ49$G$-$^$9$,!"(B
$B$3$NJQ49$G$O>pJs$,7gMn$7$^$9!#(B
$B%P%C%U%!$K%F%-%9%H$rA^F~$9$k$H$-!"$"$k$$$O!"(B
$BJ#?t$NJ8;zNs$+$i(B1$B$D$NJ8;zNs$K%F%-%9%H$r<}$a$k$H$-$K!"(B
$B0lHL$K$3$l$i$NJQ49$,9T$o$l$^$9!#(B
$BJ8;zNs$NFbMF$r$I$A$i$+$NI=8=$KL@<(E*$K$bJQ49$G$-$^$9!#(B

@c   Emacs chooses the representation for a string based on the text that
@c it is constructed from.  The general rule is to convert unibyte text to
@c multibyte text when combining it with other multibyte text, because the
@c multibyte representation is more general and can hold whatever
@c characters the unibyte text has.
Emacs$B$O!"J8;zNs$r:n@.$9$k$H$-$K$O$=$NFbMF$K4p$E$$$F(B
$BJ8;zNs$NI=8=$rA*$S$^$9!#(B
$B0lHLB'$O!"%f%K%P%$%H%F%-%9%H$rB>$N%^%k%A%P%$%H%F%-%9%H$KAH$_F~$l$k$H$-$K$O(B
$B%f%K%P%$%H%F%-%9%H$r%^%k%A%P%$%H%F%-%9%H$KJQ49$7$^$9!#(B
$B%^%k%A%P%$%HI=8=$N$[$&$,HFMQ$G$"$j!"(B
$B%f%K%P%$%H%F%-%9%H$N$I$s$JJ8;z$G$bJ];}$G$-$k$+$i$G$9!#(B

@c   When inserting text into a buffer, Emacs converts the text to the
@c buffer's representation, as specified by
@c @code{enable-multibyte-characters} in that buffer.  In particular, when
@c you insert multibyte text into a unibyte buffer, Emacs converts the text
@c to unibyte, even though this conversion cannot in general preserve all
@c the characters that might be in the multibyte text.  The other natural
@c alternative, to convert the buffer contents to multibyte, is not
@c acceptable because the buffer's representation is a choice made by the
@c user that cannot be overridden automatically.
$B%P%C%U%!$K%F%-%9%H$rA^F~$9$k$H$-$K$O!"(BEmacs$B$O!"(B
$BEv3:%P%C%U%!$N(B@code{enable-multibyte-characters}$B$N;XDj$K=>$C$?(B
$B%P%C%U%!$NI=8=$K%F%-%9%H$rJQ49$7$^$9!#(B
$BFC$K!"%f%K%P%$%H%P%C%U%!$K%^%k%A%P%$%H%F%-%9%H$rA^F~$9$k$H$-$K$O!"(B
$B%^%k%A%P%$%H%F%-%9%HFb$N$9$Y$F$NJ8;z$r0lHL$K$OJ]B8$G$-$J$/$F$b!"(B
Emacs$B$O%F%-%9%H$r%f%K%P%$%H$KJQ49$7$^$9!#(B
$B<+A3$JBeBX0F$O%P%C%U%!FbMF$r%^%k%A%P%$%H$KJQ49$9$k$3$H$G$9$,!"(B
$B$3$l$O<u$1F~$l$i$l$^$;$s!#(B
$B%P%C%U%!$NI=8=$O%f!<%6!<$,A*Br$7$?$b$N$G$"$j<+F0E*$K$OL5;k$G$-$J$$$+$i$G$9!#(B

@c   Converting unibyte text to multibyte text leaves @sc{ASCII} characters
@c unchanged, and likewise 128 through 159.  It converts the non-@sc{ASCII}
@c codes 160 through 255 by adding the value @code{nonascii-insert-offset}
@c to each character code.  By setting this variable, you specify which
@c character set the unibyte characters correspond to (@pxref{Character
@c Sets}).  For example, if @code{nonascii-insert-offset} is 2048, which is
@c @code{(- (make-char 'latin-iso8859-1) 128)}, then the unibyte
@c non-@sc{ASCII} characters correspond to Latin 1.  If it is 2688, which
@c is @code{(- (make-char 'greek-iso8859-7) 128)}, then they correspond to
@c Greek letters.
$B%f%K%P%$%H%F%-%9%H$r%^%k%A%P%$%H%F%-%9%H$KJQ49$7$F$b(B
@sc{ASCII}$BJ8;z$OL5JQ99$G$"$j!"(B128$B$+$i(B159$B$bF1MM$G$9!#(B
160$B$+$i(B255$B$NHs(B@sc{ASCII}$B$K$D$$$F$O!"(B
$B3FJ8;z$K(B@code{nonascii-insert-offset}$B$NCM$r2C;;$9$k$3$H$GJQ49$7$^$9!#(B
$B$3$NJQ?t$K@_Dj$9$k$H!"%f%K%P%$%HJ8;z$,$I$NJ8;z=89g$KBP1~$9$k$+$r;XDj$G$-$^$9(B
$B!J(B@pxref{Character Sets}$B!K!#(B
$B$?$H$($P!"(B@code{nonascii-insert-offset}$B$,(B
@code{(- (make-char 'latin-iso8859-1) 128)}$B$N(B2048$B$G$"$k$H!"(B
$BHs(B@sc{ASCII}$B$N%f%K%P%$%H$O(BLatin 1$B$KBP1~$7$^$9!#(B
@code{(- (make-char 'greek-iso8859-7) 128)}$B$N(B2688$B$G$"$k$H!"(B
$B%.%j%7%cJ8;z$KBP1~$7$^$9!#(B

@c   Converting multibyte text to unibyte is simpler: it performs
@c logical-and of each character code with 255.  If
@c @code{nonascii-insert-offset} has a reasonable value, corresponding to
@c the beginning of some character set, this conversion is the inverse of
@c the other: converting unibyte text to multibyte and back to unibyte
@c reproduces the original unibyte text.
$B%^%k%A%P%$%H%F%-%9%H$r%f%K%P%$%H$KJQ49$9$k$N$O4JC1$G!"(B
$B3FJ8;z%3!<%I$H(B255$B$NO@M}@Q$r$H$j$^$9!#(B
@code{nonascii-insert-offset}$B$K(B
$BJ8;z=89g$N;O$^$j$KBP1~$9$k9gM}E*$JCM$,@_Dj$5$l$F$$$l$P!"(B
$B$3$NJQ49$O5UJQ49$K$J$j$^$9!#(B
$B$D$^$j!"%f%K%P%$%H%F%-%9%H$r%^%k%A%P%$%H$KJQ49$7!"(B
$B$=$l$r%f%K%P%$%H$KLa$9$H$b$H$N%f%K%P%$%H%F%-%9%H$K$J$j$^$9!#(B

@defvar nonascii-insert-offset
@tindex nonascii-insert-offset
@c This variable specifies the amount to add to a non-@sc{ASCII} character
@c when converting unibyte text to multibyte.  It also applies when
@c @code{self-insert-command} inserts a character in the unibyte
@c non-@sc{ASCII} range, 128 through 255.  However, the function
@c @code{insert-char} does not perform this conversion.
$B$3$NJQ?t$O!"%f%K%P%$%H%F%-%9%H$r%^%k%A%P%$%H$KJQ49$9$k$H$-$K(B
$BHs(B@sc{ASCII}$BJ8;z$K2C;;$9$kCM$r;XDj$9$k!#(B
$B$3$l$O!"(B128$B$+$i(B255$B$N%f%K%P%$%H$NHs(B@sc{ASCII}$B$NHO0O$NJ8;z$rA^F~$9$k(B
@code{self-insert-command}$B$K$bE,MQ$5$l$k!#(B
$B$7$+$7!"4X?t(B@code{insert-char}$B$O$3$NJQ49$r9T$o$J$$!#(B

@c The right value to use to select character set @var{cs} is @code{(-
@c (make-char @var{cs}) 128)}.  If the value of
@c @code{nonascii-insert-offset} is zero, then conversion actually uses the
@c value for the Latin 1 character set, rather than zero.
$BJ8;z=89g(B@var{cs}$B$rA*Br$9$k@5$7$$CM$O!"(B
@code{(- (make-char @var{cs}) 128)}$B$G$"$k!#(B
@code{nonascii-insert-offset}$B$NCM$,(B0$B$G$"$k$H!"(B
$B<B:]$NJQ49$K$O(B0$B$G$O$J$/(BLatin 1$BJ8;z=89g$KBP$9$kCM$r;H$&!#(B
@end defvar

@defvar nonascii-translation-table
@tindex nonascii-translation-table
@c This variable provides a more general alternative to
@c @code{nonascii-insert-offset}.  You can use it to specify independently
@c how to translate each code in the range of 128 through 255 into a
@c multibyte character.  The value should be a vector, or @code{nil}.
@c If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
$B$3$NJQ?t$O!"(B@code{nonascii-insert-offset}$B$N$h$j0lHLE*$JBeBX$rDs6!$9$k!#(B
128$B$+$i(B255$B$NHO0O$N3F%3!<%I$r%^%k%A%P%$%HJ8;z$KJQ49$9$kJ}K!$r(B
$BFHN)$7$F;XDj$9$k$?$a$K;H$($k!#(B
$B$=$NCM$O%Y%/%H%k$+(B@code{nil}$B$G$"$k$3$H!#(B
$B$3$l$,(B@code{nil}$B0J30$G$"$k$H!"(B@code{nonascii-insert-offset}$B$KM%@h$9$k!#(B
@end defvar

@defun string-make-unibyte string
@tindex string-make-unibyte
@c This function converts the text of @var{string} to unibyte
@c representation, if it isn't already, and returns the result.  If
@c @var{string} is a unibyte string, it is returned unchanged.
$B$3$N4X?t$O!"(B@var{string}$B$N%F%-%9%H$,$9$G$K%f%K%P%$%H$G$J$1$l$P(B
$B%f%K%P%$%HI=8=$KJQ49$7$F$+$i7k2L$rJV$9!#(B
@var{string}$B$,%f%K%P%$%H$G$"$l$PL5JQ99$GJV$9!#(B
@end defun

@defun string-make-multibyte string
@tindex string-make-multibyte
@c This function converts the text of @var{string} to multibyte
@c representation, if it isn't already, and returns the result.  If
@c @var{string} is a multibyte string, it is returned unchanged.
$B$3$N4X?t$O!"(B@var{string}$B$N%F%-%9%H$,$9$G$K%^%k%A%P%$%H$G$J$1$l$P(B
$B%^%k%A%P%$%HI=8=$KJQ49$7$F$+$i7k2L$rJV$9!#(B
@var{string}$B$,%^%k%A%P%$%H$G$"$l$PL5JQ99$GJV$9!#(B
@end defun

@node Selecting a Representation
@c @section Selecting a Representation
@section $BI=8=$NA*Br(B

@c   Sometimes it is useful to examine an existing buffer or string as
@c multibyte when it was unibyte, or vice versa.
$B4{B8$N%P%C%U%!$dJ8;zNs$,%f%K%P%$%H$G$"$k$H$-$K(B
$B%^%k%A%P%$%H$H$7$FD4$Y$?$j!"$=$N5U$N$h$&$KD4$Y$k$N$,(B
$BM-MQ$J$3$H$b$"$j$^$9(B

@defun set-buffer-multibyte multibyte
@tindex set-buffer-multibyte
@c Set the representation type of the current buffer.  If @var{multibyte}
@c is non-@code{nil}, the buffer becomes multibyte.  If @var{multibyte}
@c is @code{nil}, the buffer becomes unibyte.
$B%+%l%s%H%P%C%U%!$NI=8=J}K!$r@_Dj$9$k!#(B
@var{multibyte}$B$,(B@code{nil}$B0J30$G$"$k$H!"%P%C%U%!$O%^%k%A%P%$%H$K$J$k!#(B
@var{multibyte}$B$,(B@code{nil}$B$G$"$k$H!"%P%C%U%!$O%f%K%P%$%H$K$J$k!#(B

@c This function leaves the buffer contents unchanged when viewed as a
@c sequence of bytes.  As a consequence, it can change the contents viewed
@c as characters; a sequence of two bytes which is treated as one character
@c in multibyte representation will count as two characters in unibyte
@c representation.
$B$3$N4X?t$O!"%P%$%HNs$H$7$F$_$?%P%C%U%!FbMF$rJQ99$7$J$$!#(B
$B$=$N7k2L!"J8;z$H$7$F8+$?$H$-$NFbMF$rJQ99$G$-$k!#(B
$B%^%k%A%P%$%HI=8=$G$O(B1$BJ8;z$H$_$J$5$l$k(B2$B%P%$%H$NNs$O!"(B
$B%f%K%P%$%HI=8=$G$O(B2$BJ8;z$K$J$k!#(B

@c This function sets @code{enable-multibyte-characters} to record which
@c representation is in use.  It also adjusts various data in the buffer
@c (including overlays, text properties and markers) so that they cover the
@c same text as they did before.
$B$3$N4X?t$O!"(B@code{enable-multibyte-characters}$B$K(B
$B$I$A$i$NI=8=$r;HMQ$7$F$$$k$+$r5-O?$9$k!#(B
$B$5$i$K!J%*!<%P%l%$!"%F%-%9%HB0@-!"%^!<%+$J$I$N!K%P%C%U%!Fb$N$5$^$6$^$J(B
$B%G!<%?$rD4@0$7$F!"$=$l0JA0$HF1MM$KF1$8%F%-%9%H$K5Z$V$h$&$K$9$k!#(B
@end defun

@defun string-as-unibyte string
@tindex string-as-unibyte
@c This function returns a string with the same bytes as @var{string} but
@c treating each byte as a character.  This means that the value may have
@c more characters than @var{string} has.
$B$3$N4X?t$O!"3F%P%$%H$r(B1$BJ8;z$H$_$J$7$F(B
@var{string}$B$HF1$8%P%$%H$NJ8;zNs$rJV$9!#(B
$B$D$^$j!"CM$K$O(B@var{string}$B$h$jB?$/$NJ8;z$,4^$^$l$k$3$H$,$"$k!#(B

@c If @var{string} is unibyte already, then the value is @var{string}
@c itself.
@var{string}$B$,$9$G$K%f%K%P%$%H$G$"$k$H!"(B
$BCM$O(B@var{string}$B$=$N$b$N$G$"$k!#(B
@end defun

@defun string-as-multibyte string
@tindex string-as-multibyte
@c This function returns a string with the same bytes as @var{string} but
@c treating each multibyte sequence as one character.  This means that the
@c value may have fewer characters than @var{string} has.
$B$3$N4X?t$O!"%^%k%A%P%$%H$N3FNs$r(B1$BJ8;z$H$_$J$7$F(B
@var{string}$B$HF1$8%P%$%H$NJ8;zNs$rJV$9!#(B
$B$D$^$j!"CM$K$O(B@var{string}$B$h$j>/$J$$J8;z$,4^$^$l$k$3$H$,$"$k!#(B

@c If @var{string} is multibyte already, then the value is @var{string}
@c itself.
@var{string}$B$,$9$G$K%^%k%A%P%$%H$G$"$k$H!"(B
$BCM$O(B@var{string}$B$=$N$b$N$G$"$k!#(B
@end defun

@node Character Codes
@c @section Character Codes
@section $BJ8;z%3!<%I(B
@c @cindex character codes
@cindex $BJ8;z%3!<%I(B

@c   The unibyte and multibyte text representations use different character
@c codes.  The valid character codes for unibyte representation range from
@c 0 to 255---the values that can fit in one byte.  The valid character
@c codes for multibyte representation range from 0 to 524287, but not all
@c values in that range are valid.  In particular, the values 128 through
@c 255 are not legitimate in multibyte text (though they can occur in ``raw
@c bytes''; @pxref{Explicit Encoding}).  Only the @sc{ASCII} codes 0
@c through 127 are fully legitimate in both representations.
$B%f%K%P%$%H$H%^%k%A%P%$%H$N%F%-%9%HI=8=$G$O!"(B
$B0[$J$kJ8;z%3!<%I$r;H$C$F$$$^$9!#(B
$B%f%K%P%$%HI=8=$K$*$$$F@5$7$$J8;z%3!<%I$O(B0$B$+$i(B255$B$NHO0O$G$"$j!"(B
$B$3$l$i$NCM$O(B1$B%P%$%H$K<}$^$j$^$9!#(B
$B%^%k%A%P%$%HI=8=$K$*$$$F@5$7$$J8;z%3!<%I$O(B0$B$+$i(B524287$B$NHO0O$G$9$,!"(B
$B$3$NHO0O$N$9$Y$F$NCM$,@5$7$$$H$O8B$j$^$;$s!#(B
$BFC$K!"CM(B128$B$+$i(B255$B$O(B
$B!J!X@8$N%P%$%H!Y$K$O$"$j$&$k!#(B@pxref{Explicit Encoding}$B!K!"(B
$B%^%k%A%P%$%H%F%-%9%H$G$O@5$7$/$"$j$^$;$s!#(B
0$B$+$i(B127$B$N(B@sc{ASCII}$B%3!<%I$N$_$,!"$I$A$i$NI=8=$G$b40A4$K@5$7$$$N$G$9!#(B

@defun char-valid-p charcode
@c This returns @code{t} if @var{charcode} is valid for either one of the two
@c text representations.
$B$3$N4X?t$O!"(B@var{charcode}$B$,(B2$B$D$N%F%-%9%HI=8=$N$I$A$i$+0lJ}$G(B
$B@5$7$1$l$P(B@code{t}$B$rJV$9!#(B

@example
(char-valid-p 65)
     @result{} t
(char-valid-p 256)
     @result{} nil
(char-valid-p 2248)
     @result{} t
@end example
@end defun

@node Character Sets
@c @section Character Sets
@section $BJ8;z=89g(B
@c @cindex character sets
@cindex $BJ8;z=89g(B

@c   Emacs classifies characters into various @dfn{character sets}, each of
@c which has a name which is a symbol.  Each character belongs to one and
@c only one character set.
Emacs$B$OJ8;z$r$5$^$6$^$J(B@dfn{$BJ8;z=89g(B}$B!J(Bcharacter set$B!K$KJ,N`$7$^$9!#(B
$BJ8;z=89g$K$O%7%s%\%k$G$"$kL>A0$,$"$j$^$9!#(B
$B3FJ8;z$O$?$C$?(B1$B$D$NJ8;z=89g$KB0$7$^$9!#(B

@c   In general, there is one character set for each distinct script.  For
@c example, @code{latin-iso8859-1} is one character set,
@c @code{greek-iso8859-7} is another, and @code{ascii} is another.  An
@c Emacs character set can hold at most 9025 characters; therefore, in some
@c cases, characters that would logically be grouped together are split
@c into several character sets.  For example, one set of Chinese
@c characters, generally known as Big 5, is divided into two Emacs
@c character sets, @code{chinese-big5-1} and @code{chinese-big5-2}.
$B0lHL$K!"0[$J$kJ8;zBN7O$4$H$K(B1$B$D$NJ8;z=89g$,$"$j$^$9!#(B
$B$?$H$($P!"(B@code{latin-iso8859-1}$B$O(B1$B$D$NJ8;z=89g$G$"$j!"(B
@code{greek-iso8859-7}$B$OJL$NJ8;z=89g$G$"$j!"(B
@code{ascii}$B$bJL$NJ8;z=89g$G$9!#(B
Emacs$B$N(B1$B$D$NJ8;z=89g$K$O:GBg(B9025$B8D$NJ8;z$rJ];}$G$-$^$9!#(B
$B$7$?$,$C$F!"O@M}E*$K$O(B1$B$D$NJ8;z=89g$K$^$H$a$i$l$kJ8;z72$r!"(B
$BJ#?t$NJ8;z=89g$KJ,3d$9$k>l9g$b$"$j$^$9!#(B
$B$?$H$($P!"(BBig 5$B$H$7$F0lHL$K$OCN$i$l$F$$$kCf9qJ8;z$N(B1$B$D$N=89g$O!"(B
Emacs$B$N(B2$B$D$NJ8;z=89g!"(B@code{chinese-big5-1}$B$H(B@code{chinese-big5-2}$B$K(B
$BJ,3d$5$l$^$9!#(B

@defun charsetp object
@tindex charsetp
@c Return @code{t} if @var{object} is a character set name symbol,
@c @code{nil} otherwise.
@var{object}$B$,J8;z=89g$NL>A0$N%7%s%\%k$G$"$l$P(B@code{t}$B$rJV$9!#(B
$B$5$b$J$1$l$P(B@code{nil}$B$rJV$9!#(B
@end defun

@defun charset-list
@tindex charset-list
@c This function returns a list of all defined character set names.
$B$3$N4X?t$O!"Dj5A$5$l$F$$$k$9$Y$F$NJ8;z=89g$NL>A0$N%j%9%H$rJV$9!#(B
@end defun

@defun char-charset character
@tindex char-charset
@c This function returns the name of the character
@c set that @var{character} belongs to.
$B$3$N4X?t$OJ8;z(B@var{character}$B$,B0$9$kJ8;z=89g$NL>A0$rJV$9!#(B
@end defun

@node Chars and Bytes
@c @section Characters and Bytes
@section $BJ8;z$H%P%$%H(B
@c @cindex bytes and characters
@cindex $B%P%$%H$HJ8;z(B

@c @cindex introduction sequence
@c @cindex dimension (of character set)
@cindex $BF3F~Ns(B
@cindex $B<!85!JJ8;z=89g!K(B
@c   In multibyte representation, each character occupies one or more
@c bytes.  Each character set has an @dfn{introduction sequence}, which is
@c normally one or two bytes long.  (Exception: the @sc{ASCII} character
@c set has a zero-length introduction sequence.)  The introduction sequence
@c is the beginning of the byte sequence for any character in the character
@c set.  The rest of the character's bytes distinguish it from the other
@c characters in the same character set.  Depending on the character set,
@c there are either one or two distinguishing bytes; the number of such
@c bytes is called the @dfn{dimension} of the character set.
$B%^%k%A%P%$%HI=8=$G$O!"3FJ8;z$O(B1$B%P%$%H$+$=$l0J>e$N%P%$%H$r@j$a$^$9!#(B
$B3FJ8;z=89g$K$O!"DL>o$O(B1$B%P%$%HD9$+(B2$B%P%$%HD9$N(B
@dfn{$BF3F~Ns(B}$B!J(Bintroduction sequence$B!K$,$"$j$^$9(B
$B!JNc30!'(B@sc{ASCII}$B$NF3F~Ns$O(B0$B%P%$%HD9$G$"$k!K!#(B
$BF3F~Ns$O!"J8;z=89g$NG$0U$NJ8;z$N%P%$%HNs$N;O$^$j$G$9!#(B
$BJ8;z$N%P%$%HNs$N;D$j$NItJ,$O!"F1$8J8;z=89gFb$GB>$NJ8;z$H$=$NJ8;z$r6hJL$7$^$9!#(B
$BJ8;z=89g$K0MB8$7$F!"6hJL$9$k$?$a$N%P%$%H$O(B1$B%P%$%H$+(B2$B%P%$%H$G$9!#(B
$B$=$N$h$&$J%P%$%H?t$rJ8;z=89g$N(B@dfn{$B<!85(B}$B!J(Bdimension$B!K$H8F$S$^$9!#(B

@defun charset-dimension charset
@tindex charset-dimension
@c This function returns the dimension of @var{charset};
@c at present, the dimension is always 1 or 2.
$B$3$N4X?t$O!"J8;z=89g(B@var{charset}$B$N<!85$rJV$9!#(B
$B8=:_!"<!85$O$D$M$K(B1$B$+(B2$B$G$"$k!#(B
@end defun

@c   This is the simplest way to determine the byte length of a character
@c set's introduction sequence:
$BJ8;z=89g$NF3F~Ns$N%P%$%HD9$rH=Dj$9$k$b$C$H$b4JC1$JJ}K!$O$D$.$N$H$*$j$G$9!#(B

@example
(- (char-bytes (make-char @var{charset}))
   (charset-dimension @var{charset}))
@end example

@node Splitting Characters
@c @section Splitting Characters
@section $BJ8;z$NJ,3d(B

@c   The functions in this section convert between characters and the byte
@c values used to represent them.  For most purposes, there is no need to
@c be concerned with the sequence of bytes used to represent a character,
@c because Emacs translates automatically when necessary.
$BK\@a$N4X?t$O!"J8;z$H$=$l$rI=8=$9$k$?$a$KMQ$$$i$l$k%P%$%HCM$N$"$$$@$N(B
$BJQ49$r9T$$$^$9!#(B
$B$[$H$s$I$NL\E*$K4X$7$F$O!"(BEmacs$B$,I,MW$K1~$8$F<+F0E*$K9T$&$?$a!"(B
$BJ8;z$rI=8=$9$k$?$a$N%P%$%HNs$r07$&I,MW$O$"$j$^$;$s!#(B

@defun char-bytes character
@tindex char-bytes
@c This function returns the number of bytes used to represent the
@c character @var{character}.  This depends only on the character set that
@c @var{character} belongs to; it equals the dimension of that character
@c set (@pxref{Character Sets}), plus the length of its introduction
@c sequence.
$B$3$N4X?t$O!"J8;z(B@var{character}$B$rI=8=$9$k$?$a$KI,MW$J%P%$%H?t$rJV$9!#(B
$B$3$l$O!"J8;z(B@var{character}$B$,B0$9$kJ8;z=89g$@$1$K0MB8$7!"(B
$B$=$NJ8;z=89g!J(B@pxref{Character Sets}$B!K$N<!85$H$=$NF3F~Ns$NOB$KEy$7$$!#(B

@example
(char-bytes 2248)
     @result{} 2
(char-bytes 65)
     @result{} 1
(char-bytes 192)
     @result{} 1
@end example

@c The reason this function can give correct results for both multibyte and
@c unibyte representations is that the non-@sc{ASCII} character codes used
@c in those two representations do not overlap.
$B%^%k%A%P%$%HI=8=$H%f%K%P%$%HI=8=$N$I$A$i$KBP$7$F$b(B
$B$3$N4X?t$G@5$7$$7k2L$rF@$i$l$k$N$O!"(B
2$B$D$NI=8=$GMQ$$$i$l$kHs(B@sc{ASCII}$BJ8;z%3!<%I$K=E$J$j$,$J$$$+$i$G$"$k!#(B
@end defun

@defun split-char character
@tindex split-char
@c Return a list containing the name of the character set of
@c @var{character}, followed by one or two byte values (integers) which
@c identify @var{character} within that character set.  The number of byte
@c values is the character set's dimension.
$BJ8;z(B@var{character}$B$NJ8;z=89g$NL>A0$KB3$1$F!"(B
$B$=$NJ8;z=89g$G(B@var{character}$B$r<1JL$9$k(B1$B%P%$%H$+(B2$B%P%$%H$NCM!J@0?t!K$+$i(B
$B@.$k%j%9%H$rJV$9!#(B
$B%P%$%HCM$N8D?t$O$=$NJ8;z=89g$N<!85$G$"$k!#(B

@example
(split-char 2248)
     @result{} (latin-iso8859-1 72)
(split-char 65)
     @result{} (ascii 65)
@end example

@c Unibyte non-@sc{ASCII} characters are considered as part of
@c the @code{ascii} character set:
$B%f%K%P%$%H$NHs(B@sc{ASCII}$BJ8;z$O!"(B
$BJ8;z=89g(B@code{ascii}$B$N0lIt$H$_$J$9!#(B

@example
(split-char 192)
     @result{} (ascii 192)
@end example
@end defun

@defun make-char charset &rest byte-values
@tindex make-char
@c This function returns the character in character set @var{charset}
@c identified by @var{byte-values}.  This is roughly the inverse of
@c @code{split-char}.  Normally, you should specify either one or two
@c @var{byte-values}, according to the dimension of @var{charset}.  For
@c example,
$B$3$N4X?t$O!"J8;z=89g(B@var{charset}$B$K$*$$$F(B
@var{byte-values}$B$G<1JL$5$l$kJ8;z$rJV$9!#(B
$B$3$l$O!"(B@code{split-char}$B$N$[$\5U4X?t$K$"$?$k!#(B
$BDL>o!"J8;z=89g(B@var{charset}$B$N<!85$K1~$8$F!"(B
1$B$D$+(B2$B$D$N(B@var{byte-values}$B$r;XDj$9$k!#(B
$B$?$H$($P$D$.$N$H$*$j!#(B

@example
(make-char 'latin-iso8859-1 72)
     @result{} 2248
@end example
@end defun

@c @cindex generic characters
@cindex $BHFMQJ8;z(B
@c   If you call @code{make-char} with no @var{byte-values}, the result is
@c a @dfn{generic character} which stands for @var{charset}.  A generic
@c character is an integer, but it is @emph{not} valid for insertion in the
@c buffer as a character.  It can be used in @code{char-table-range} to
@c refer to the whole character set (@pxref{Char-Tables}).
@c @code{char-valid-p} returns @code{nil} for generic characters.
@c For example:
@var{byte-values}$B$r;XDj$;$:$K(B@code{make-char}$B$r8F$S=P$9$H!"(B
$B$=$N7k2L$OJ8;z=89g(B@var{charset}$B$rBeI=$9$k(B
@dfn{$BHFMQJ8;z(B}$B!J(Bgeneric character$B!K$G$"$k!#(B
$BHFMQJ8;z$O@0?t$G$"$k$,!"J8;z$H$7$F%P%C%U%!$KA^F~$9$k$K$O(B
$B@5$7$/(B@emph{$B$J$$(B}$B$b$N$G$"$k!#(B
1$B$D$NJ8;z=89gA4BN$rI=$9$?$a$K(B@code{char-table-range}$B$G;H$($k(B
$B!J(B@pxref{Char-Tables}$B!K!#(B
@code{char-valid-p}$B$OHFMQJ8;z$KBP$7$F$O(B@code{nil}$B$rJV$9!#(B
$B$?$H$($P$D$.$N$H$*$j!#(B

@example
(make-char 'latin-iso8859-1)
     @result{} 2176
(char-valid-p 2176)
     @result{} nil
(split-char 2176)
     @result{} (latin-iso8859-1 0)
@end example

@node Scanning Charsets
@c @section Scanning for Character Sets
@section $BJ8;z=89g$NAv::(B

@c   Sometimes it is useful to find out which character sets appear in a
@c part of a buffer or a string.  One use for this is in determining which
@c coding systems (@pxref{Coding Systems}) are capable of representing all
@c of the text in question.
$B%P%C%U%!$dJ8;zNs$N0lItJ,$K$I$NJ8;z=89g$,8=$l$k$+$r(B
$BD4$Y$i$l$k$HM-MQ$J$3$H$,$"$j$^$9!#(B
$B$=$N(B1$B$D$NMQES$O!"Ev3:%F%-%9%H$9$Y$F$rI=8=$9$kG=NO$,$"$k(B
$B%3!<%G%#%s%0%7%9%F%`!J(B@pxref{Coding Systems}$B!K$rC5$9$3$H$G$9!#(B

@defun find-charset-region beg end &optional translation
@tindex find-charset-region
@c This function returns a list of the character sets that appear in the
@c current buffer between positions @var{beg} and @var{end}.
$B$3$N4X?t$O!"%+%l%s%H%P%C%U%!$N(B@var{beg}$B$H(B@var{end}$B$N$"$$$@$K(B
$B8=$l$kJ8;z=89g$N%j%9%H$rJV$9!#(B

@c The optional argument @var{translation} specifies a translation table to
@c be used in scanning the text (@pxref{Translation of Characters}).  If it
@c is non-@code{nil}, then each character in the region is translated
@c through this table, and the value returned describes the translated
@c characters instead of the characters actually in the buffer.
$B>JN,2DG=$J0z?t(B@var{translation}$B$O!"(B
$B%F%-%9%H$rAv::$9$k$H$-$K;HMQ$9$kJQ49I=$r;XDj$9$k(B
$B!J(B@pxref{Translation of Characters}$B!K!#(B
$B$3$l$,(B@code{nil}$B0J30$G$"$k$H!"NN0hFb$N3FJ8;z$r$3$NI=$r2p$7$FJQ49$7!"(B
$BLa$jCM$O!"%P%C%U%!Fb$N<B:]$NJ8;z$N$+$o$j$KJQ49$7$?J8;z$K4X$9$k>pJs$rM?$($k!#(B
@end defun

@defun find-charset-string string &optional translation
@tindex find-charset-string
@c This function returns a list of the character sets
@c that appear in the string @var{string}.
$B$3$N4X?t$O!"J8;zNs(B@var{string}$B$K8=$l$kJ8;z=89g$N%j%9%H$rJV$9!#(B

@c The optional argument @var{translation} specifies a
@c translation table; see @code{find-charset-region}, above.
$B>JN,2DG=$J0z?t(B@var{translation}$B$OJQ49I=$r;XDj$9$k!#(B
$B>e5-$N(B@code{find-charset-region}$B$r;2>H!#(B
@end defun

@node Translation of Characters
@c @section Translation of Characters
@section $BJ8;z$NJQ49(B
@c @cindex character translation tables
@c @cindex translation tables
@cindex $BJ8;zJQ49I=(B
@cindex $BJQ49I=(B

@c   A @dfn{translation table} specifies a mapping of characters
@c into characters.  These tables are used in encoding and decoding, and
@c for other purposes.  Some coding systems specify their own particular
@c translation tables; there are also default translation tables which
@c apply to all other coding systems.
@dfn{$BJQ49I=(B}$B!J(Btranslation table$B!K$O!"J8;z72$rJ8;z72$XBP1~IU$1$^$9!#(B
$B$3$l$i$NI=$O!"Id9f2=$HI|9f2=!"B>$NL\E*$K;H$o$l$^$9!#(B
$BFH<+$NJQ49I=$r;XDj$9$k%3!<%G%#%s%0%7%9%F%`$b$"$j$^$9!#(B
$BB>$N$9$Y$F$N%3!<%G%#%s%0%7%9%F%`$KE,MQ$5$l$k(B
$B%G%U%)%k%H$NJQ49I=$b$"$j$^$9!#(B

@defun make-translation-table translations
@c This function returns a translation table based on the arguments
@c @var{translations}.  Each argument---each element of
@c @var{translations}---should be a list of the form @code{(@var{from}
@c . @var{to})}; this says to translate the character @var{from} into
@c @var{to}.
$B$3$N4X?t$O!"0z?t(B@var{translations}$B$K4p$E$$$?JQ49I=$rJV$9!#(B
$B0z?t(B@var{translations}$B$N3FMWAG$O!"(B
@code{(@var{from} . @var{to})}$B$N7A$G$"$j!"(B
$BJ8;z(B@var{from}$B$r(B@var{to}$B$XJQ49$9$k$3$H$r0UL#$9$k!#(B

@c You can also map one whole character set into another character set with
@c the same dimension.  To do this, you specify a generic character (which
@c designates a character set) for @var{from} (@pxref{Splitting Characters}).
@c In this case, @var{to} should also be a generic character, for another
@c character set of the same dimension.  Then the translation table
@c translates each character of @var{from}'s character set into the
@c corresponding character of @var{to}'s character set.
1$B$D$NJ8;z=89gA4BN$rF1$8<!85$NJL$NJ8;z=89g$XBP1~IU$1$k$3$H$b2DG=$G$"$k!#(B
$B$=$l$K$O!"(B@var{from}$B$K!JJ8;z=89g$rI=$9!KHFMQJ8;z$r;XDj$9$k(B
$B!J(B@pxref{Splitting Characters}$B!K!#(B
$B$3$N>l9g!"(B@var{to}$B$b!"F1$8<!85$NJL$NJ8;z=89g$NHFMQJ8;z$G$"$k$3$H!#(B
$B$3$&$9$k$H!"$3$NJQ49I=$O!"(B@var{from}$B$NJ8;z=89g$N3FJ8;z$r(B
@var{to}$B$NJ8;z=89g$NBP1~$9$kJ8;z$XJQ49$9$k!#(B
@end defun

@c   In decoding, the translation table's translations are applied to the
@c characters that result from ordinary decoding.  If a coding system has
@c property @code{character-translation-table-for-decode}, that specifies
@c the translation table to use.  Otherwise, if
@c @code{standard-character-translation-table-for-decode} is
@c non-@code{nil}, decoding uses that table.
$BI|9f2=$G$O!"$b$H$NI|9f2=7k2L$NJ8;z$KJQ49I=$K$h$kJQ49$rE,MQ$7$^$9!#(B
$B%3!<%G%#%s%0%7%9%F%`$KB0@-(B@code{character-translation-table-for-decode}$B$,(B
$B$"$l$P!"$3$l$O;HMQ$9$kJQ49I=$r;XDj$7$^$9!#(B
$B$5$b$J$1$l$P!"(B@code{standard-character-translation-table-for-decode}$B$,(B
@code{nil}$B0J30$G$"$l$P!"I|9f2=$G$O$=$NI=$r;H$$$^$9!#(B

@c   In encoding, the translation table's translations are applied to the
@c characters in the buffer, and the result of translation is actually
@c encoded.  If a coding system has property
@c @code{character-translation-table-for-encode}, that specifies the
@c translation table to use.  Otherwise the variable
@c @code{standard-character-translation-table-for-encode} specifies the
@c translation table.
$BId9f2=$G$O!"%P%C%U%!Fb$NJ8;z$KJQ49I=$K$h$kJQ49$rE,MQ$7!"(B
$BJQ497k2L$r<B:]$KId9f2=$7$^$9!#(B
$B%3!<%G%#%s%0%7%9%F%`$KB0@-(B@code{character-translation-table-for-encode}$B$,(B
$B$"$l$P!"$3$l$O;HMQ$9$kJQ49I=$r;XDj$7$^$9!#(B
$B$5$b$J$1$l$P!"JQ?t(B@code{standard-character-translation-table-for-encode}$B$,(B
$B;HMQ$9$kJQ49I=$r;XDj$7$^$9!#(B

@defvar standard-character-translation-table-for-decode
@c This is the default translation table for decoding, for
@c coding systems that don't specify any other translation table.
$B$3$l$O!"JQ49I=$r;XDj$7$J$$%3!<%G%#%s%0%7%9%F%`$KBP$9$k(B
$BI|9f2=;~$N%G%U%)%k%H$NJQ49I=$G$"$k!#(B
@end defvar

@defvar standard-character-translation-table-for-encode
@c This is the default translation table for encoding, for
@c coding systems that don't specify any other translation table.
$B$3$l$O!"JQ49I=$r;XDj$7$J$$%3!<%G%#%s%0%7%9%F%`$KBP$9$k(B
$BId9f2=;~$N%G%U%)%k%H$NJQ49I=$G$"$k!#(B
@end defvar

@node Coding Systems
@c @section Coding Systems
@section $B%3!<%G%#%s%0%7%9%F%`(B

@c @cindex coding system
@cindex $B%3!<%G%#%s%0%7%9%F%`(B
@c   When Emacs reads or writes a file, and when Emacs sends text to a
@c subprocess or receives text from a subprocess, it normally performs
@c character code conversion and end-of-line conversion as specified
@c by a particular @dfn{coding system}.
Emacs$B$,%U%!%$%k$rFI$_=q$-$7$?$j!"(B
Emacs$B$,%5%V%W%m%;%9$X%F%-%9%H$rAw$C$?$j(B
$B%5%V%W%m%;%9$+$i%F%-%9%H$r<u$1<h$k$H$-$K$O!"(B
@dfn{$B%3!<%G%#%s%0%7%9%F%`(B}$B!J(Bcoding system$B!K$G;XDj$5$l$k(B
$BJ8;z%3!<%IJQ49$H9TKvJQ49$r9T$$$^$9!#(B

@menu
* Coding System Basics::
* Encoding and I/O::
* Lisp and Coding Systems::
* User-Chosen Coding Systems::
* Default Coding Systems::
* Specifying Coding Systems::
* Explicit Encoding::
* Terminal I/O Encoding::
* MS-DOS File Types::
@end menu

@node Coding System Basics
@c @subsection Basic Concepts of Coding Systems
@subsection $B%3!<%G%#%s%0%7%9%F%`$N4pK\35G0(B

@c @cindex character code conversion
@cindex $BJ8;z%3!<%IJQ49(B
@c   @dfn{Character code conversion} involves conversion between the encoding
@c used inside Emacs and some other encoding.  Emacs supports many
@c different encodings, in that it can convert to and from them.  For
@c example, it can convert text to or from encodings such as Latin 1, Latin
@c 2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022.  In some
@c cases, Emacs supports several alternative encodings for the same
@c characters; for example, there are three coding systems for the Cyrillic
@c (Russian) alphabet: ISO, Alternativnyj, and KOI8.
@dfn{$BJ8;z%3!<%IJQ49(B}$B!J(Bcharacter code conversion$B!K$H$O!"(B
Emacs$B$NFbIt$G;HMQ$9$kId9f$HB>$NId9f$H$N$"$$$@$G$NJQ49$N$3$H$G$9!#(B
Emacs$B$G$O!"Aj8_$KJQ49$G$-$kB?$/$N0[$J$kId9f$r07$($^$9!#(B
$B$?$H$($P!"(BEmacs$B$O!"(BLatin 1$B!"(BLatin 2$B!"(BLatin 3$B!"(BLatin 4$B!"(BLatin 5$B!"(B
ISO 2022$B$N$$$/$D$+$NJQ<o$rAj8_$KJQ49$G$-$^$9!#(B
$BF1$8J8;z=89g$KBP$9$k0[$J$kId9f$r07$&$3$H$b$G$-$^$9!#(B
$B$?$H$($P!"%-%j%k!J%m%7%"8l!KJ8;z$KBP$7$F$O(B
ISO$B!"(BAlternativnyj$B!"(BKOI8$B$N(B3$B$D$N%3!<%G%#%s%0%7%9%F%`$,$"$j$^$9!#(B

@c   Most coding systems specify a particular character code for
@c conversion, but some of them leave this unspecified---to be chosen
@c heuristically based on the data.
$B$[$H$s$I$N%3!<%G%#%s%0%7%9%F%`$G$OJQ49$9$kJ8;z%3!<%I$rFCDj$7$^$9$,!"(B
$B;XDj$;$:$K%G!<%?$K4p$E$$$FH/8+E*<jK!$GA*$V$b$N$b$"$j$^$9!#(B

@c @cindex end of line conversion
@cindex $B9TKvJQ49(B
@c   @dfn{End of line conversion} handles three different conventions used
@c on various systems for representing end of line in files.  The Unix
@c convention is to use the linefeed character (also called newline).  The
@c DOS convention is to use the two character sequence, carriage-return
@c linefeed, at the end of a line.  The Mac convention is to use just
@c carriage-return.
@dfn{$B9TKvJQ49(B}$B!J(Bend of line conversion$B!K$O!"(B
$B%U%!%$%kFb$N9T$N=*$j$rI=$9$5$^$6$^$J%7%9%F%`$G(B
$B;H$o$l$F$$$k(B3$B$D$N0[$J$k47=,$r07$$$^$9!#(B
UNIX$B$N47=,$G$O!"9TAw$jJ8;z!J2~9TJ8;z$H$b8F$V!K$r;H$$$^$9!#(B
DOS$B$N47=,$G$O!"9TKv$K$OI|5"$H9TAw$j$N(B2$BJ8;z$NNs$r;H$$$^$9!#(B
Mac$B$N47=,$G$O!"I|5"$N$_$r;H$$$^$9!#(B

@c @cindex base coding system
@c @cindex variant coding system
@cindex $B4pDl%3!<%G%#%s%0%7%9%F%`(B
@cindex $BJQ<o%3!<%G%#%s%0%7%9%F%`(B
@c   @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line
@c conversion unspecified, to be chosen based on the data.  @dfn{Variant
@c coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and
@c @code{latin-1-mac} specify the end-of-line conversion explicitly as
@c well.  Most base coding systems have three corresponding variants whose
@c names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}.
@code{latin-1}$B$N$h$&$J(B@dfn{$B4pDl%3!<%G%#%s%0%7%9%F%`(B}$B!J(Bbase coding system$B!K(B
$B$G$O!"9TKvJQ49$r;XDj$;$:$K%G!<%?$K4p$E$$$FA*$S$^$9!#(B
@code{latin-1-unix}$B!"(B@code{latin-1-dos}$B!"(B@code{latin-1-mac}$B$N$h$&$J(B
@dfn{$BJQ<o%3!<%G%#%s%0%7%9%F%`(B}$B!J(Bvariant coding system$B!K$G$O!"(B
$BL@<(E*$K9TKvJQ49$b;XDj$7$^$9!#(B
$B$[$H$s$I$N4pDl%3!<%G%#%s%0%7%9%F%`$K$O!"(B
@samp{-unix}$B!"(B@samp{-dos}$B!"(B@samp{-mac}$B$rIU2C$7$F:n$i$l$kL>A0$N(B
$BBP1~$9$k(B3$B$D$NJQ<o$,$"$j$^$9!#(B

@c   The coding system @code{raw-text} is special in that it prevents
@c character code conversion, and causes the buffer visited with that
@c coding system to be a unibyte buffer.  It does not specify the
@c end-of-line conversion, allowing that to be determined as usual by the
@c data, and has the usual three variants which specify the end-of-line
@c conversion.  @code{no-conversion} is equivalent to @code{raw-text-unix}:
@c it specifies no conversion of either character codes or end-of-line.
$B%3!<%G%#%s%0%7%9%F%`(B@code{raw-text}$B$O(B
$BJ8;z%3!<%IJQ49$r9T$o$J$$FCJL$J$b$N$G!"(B
$B$3$N%3!<%G%#%s%0%7%9%F%`$GK,Ld$7$?%P%C%U%!$O%f%K%P%$%H%P%C%U%!$K$J$j$^$9!#(B
$B9TKvJQ49$b;XDj$7$J$$$N$GFbMF$K4p$E$$$F7hDj$G$-!"(B
$B9TKvJQ49$r;XDj$9$k(B3$B$D$NJQ<o$b$"$j$^$9!#(B
@code{no-conversion}$B$O(B@code{raw-text-unix}$B$KEy2A$G$"$j!"(B
$BJ8;z%3!<%I$b9TKv$bJQ49$7$J$$$3$H$r;XDj$7$^$9!#(B

@c   The coding system @code{emacs-mule} specifies that the data is
@c represented in the internal Emacs encoding.  This is like
@c @code{raw-text} in that no code conversion happens, but different in
@c that the result is multibyte data.
$B%3!<%G%#%s%0%7%9%F%`(B@code{emacs-mule}$B$O!"(B
Emacs$BFbIt$G$NId9f$G%G!<%?$rI=8=$9$k$3$H$r;XDj$7$^$9!#(B
$B$3$l$O!"%3!<%IJQ49$r9T$o$J$$$H$$$&0UL#$G$O(B@code{raw-text}$B$K;w$F$$$^$9$,!"(B
$B7k2L$,%^%k%A%P%$%H%G!<%?$K$J$kE@$,0[$J$j$^$9!#(B

@defun coding-system-get coding-system property
@tindex coding-system-get
@c This function returns the specified property of the coding system
@c @var{coding-system}.  Most coding system properties exist for internal
@c purposes, but one that you might find useful is @code{mime-charset}.
@c That property's value is the name used in MIME for the character coding
@c which this coding system can read and write.  Examples:
$B$3$N4X?t$O!"%3!<%G%#%s%0%7%9%F%`(B@var{coding-system}$B$N;XDj$7$?B0@-$rJV$9!#(B
$B%3!<%G%#%s%0%7%9%F%`$N$[$H$s$I$NB0@-$OFbItL\E*MQ$G$"$k$,!"(B
$BFI<T$,M-MQ$H;W$&$b$N$,(B1$B$D!"(B@code{mime-charset}$B$,$"$k!#(B
$B$3$NB0@-$NCM$O!"Ev3:%3!<%G%#%s%0%7%9%F%`$GFI$_=q$-$9$k(B
$BJ8;z%3!<%I8~$1$N(BMIME$B$K;HMQ$9$kL>A0$G$"$k!#(B

@example
(coding-system-get 'iso-latin-1 'mime-charset)
     @result{} iso-8859-1
(coding-system-get 'iso-2022-cn 'mime-charset)
     @result{} iso-2022-cn
(coding-system-get 'cyrillic-koi8 'mime-charset)
     @result{} koi8-r
@end example

@c The value of the @code{mime-charset} property is also defined
@c as an alias for the coding system.
$BB0@-(B@code{mime-charset}$B$NCM$O!"(B
$B%3!<%G%#%s%0%7%9%F%`$NJLL>$H$7$F$bDj5A$5$l$F$$$k!#(B
@end defun

@node Encoding and I/O
@c @subsection Encoding and I/O
@subsection $BId9f2=$HF~=PNO(B

@c   The principal purpose of coding systems is for use in reading and
@c writing files.  The function @code{insert-file-contents} uses
@c a coding system for decoding the file data, and @code{write-region}
@c uses one to encode the buffer contents.
$B%3!<%G%#%s%0%7%9%F%`$N<gL\E*$O!"%U%!%$%k$NFI$_=q$-$K;H$&$3$H$G$9!#(B
$B4X?t(B@code{insert-file-contents}$B$O%U%!%$%k$N%G!<%?$rI|9f2=$9$k$?$a$K(B
$B%3!<%G%#%s%0%7%9%F%`$r;H$$!"(B
@code{write-region}$B$O%P%C%U%!FbMF$rId9f2=$9$k$?$a$K(B
$B%3!<%G%#%s%0%7%9%F%`$r;H$$$^$9!#(B

@c   You can specify the coding system to use either explicitly
@c (@pxref{Specifying Coding Systems}), or implicitly using the defaulting
@c mechanism (@pxref{Default Coding Systems}).  But these methods may not
@c completely specify what to do.  For example, they may choose a coding
@c system such as @code{undefined} which leaves the character code
@c conversion to be determined from the data.  In these cases, the I/O
@c operation finishes the job of choosing a coding system.  Very often
@c you will want to find out afterwards which coding system was chosen.
$B;HMQ$9$k%3!<%G%#%s%0%7%9%F%`$rL@<($9$k!J(B@pxref{Specifying Coding Systems}$B!K(B
$B$3$H$b$G$-$k$7!"(B
$B%G%U%)%k%H$N5!9=!J(B@pxref{Default Coding Systems}$B!K$r0E$K;H$&$3$H$b$G$-$^$9!#(B
$B$7$+$7!"$3$l$i$NJ}<0$G$O$9$Y$-$3$H$r40A4$K;XDj$7$-$l$J$$$3$H$b$"$j$^$9!#(B
$B$?$H$($P!"(B@code{undefined}$B$N$h$&$J%3!<%G%#%s%0%7%9%F%`$rA*$s$G!"(B
$B%G!<%?$K4p$E$$$FJ8;z%3!<%IJQ49$r9T$&$h$&$K$9$k$+$b$7$l$^$;$s!#(B
$B$=$N$h$&$J>l9g!"%3!<%G%#%s%0%7%9%F%`$NA*Br$O(B
$BF~=PNOA`:n$K$h$C$F40N;$7$^$9!#(B
$B$7$P$7$P!"A*Br$5$l$?%3!<%G%#%s%0%7%9%F%`$r$"$H$GCN$j$?$/$J$j$^$9!#(B

@defvar buffer-file-coding-system
@tindex buffer-file-coding-system
@c This variable records the coding system that was used for visiting the
@c current buffer.  It is used for saving the buffer, and for writing part
@c of the buffer with @code{write-region}.  When those operations ask the
@c user to specify a different coding system,
@c @code{buffer-file-coding-system} is updated to the coding system
@c specified.
$B$3$NJQ?t$O!"%+%l%s%H%P%C%U%!$GK,Ld$9$k$H$-$K;HMQ$7$?(B
$B%3!<%G%#%s%0%7%9%F%`$r5-O?$9$k!#(B
$B$3$l$O!"%P%C%U%!$rJ]B8$7$?$j!"(B
@code{write-region}$B$G%P%C%U%!$N0lIt$r=q$/$H$-$K;H$o$l$k!#(B
$B$3$l$i$NA`:n$K$*$$$F!"%f!<%6!<$KJL$N%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k$h$&$K(B
$BLd$$9g$o$;$?>l9g$K$O!"(B@code{buffer-file-coding-system}$B$O(B
$B;XDj$5$l$?JL$N%3!<%G%#%s%0%7%9%F%`$K99?7$5$l$k!#(B
@end defvar

@defvar save-buffer-coding-system
@tindex save-buffer-coding-system
@c This variable specifies the coding system for saving the buffer---but it
@c is not used for @code{write-region}.  When saving the buffer asks the
@c user to specify a different coding system, and
@c @code{save-buffer-coding-system} was used, then it is updated to the
@c coding system that was specified.
$B$3$NJQ?t$O!"(B@code{write-region}$B$K$O;H$o$J$$$,!"(B
$B%P%C%U%!$rJ]B8$9$k$?$a$K;H$&%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k!#(B
$B%P%C%U%!$rJ]B8$9$k:]$K!"%f!<%6!<$KJL$N%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k$h$&$K(B
$BLd$$9g$o$;!"$+$D!"(B@code{save-buffer-coding-system}$B$rMQ$$$F$$$k>l9g$K$O!"(B
$B$3$l$O;XDj$5$l$?JL$N%3!<%G%#%s%0%7%9%F%`$K99?7$5$l$k!#(B
@end defvar

@defvar last-coding-system-used
@tindex last-coding-system-used
@c I/O operations for files and subprocesses set this variable to the
@c coding system name that was used.  The explicit encoding and decoding
@c functions (@pxref{Explicit Encoding}) set it too.
$B%U%!%$%k$d%5%V%W%m%;%9$KBP$9$kF~=PNOA`:n$G$O!"(B
$B;HMQ$7$?%3!<%G%#%s%0%7%9%F%`L>$r$3$NJQ?t$K@_Dj$9$k!#(B
$BL@<(E*$KId9f2=!?I|9f2=$9$k4X?t!J(B@pxref{Explicit Encoding}$B!K$b(B
$B$3$NJQ?t$K@_Dj$9$k!#(B

@c @strong{Warning:} Since receiving subprocess output sets this variable,
@c it can change whenever Emacs waits; therefore, you should use copy the
@c value shortly after the function call which stores the value you are
@c interested in.
@strong{$B7Y9p!'(B}@code{ }
$B%5%V%W%m%;%9$+$i=PNO$r<u$1<h$k$H$3$NJQ?t$,@_Dj$5$l$k$?$a!"(B
Emacs$B$,BT$D$?$S$KJQ2=$9$k2DG=@-$,$"$k!#(B
$B$7$?$,$C$F!"FI<T$N6=L#$,$"$kCM$rJ]B8$9$k$h$&$J4X?t$r8F$S=P$7$?D>8e$K(B
$B$=$NCM$r%3%T!<$7$F;H$&$3$H!#(B
@end defvar

@c   The variable @code{selection-coding-system} specifies how to encode
@c selections for the window system.  @xref{Window System Selections}.
$BJQ?t(B@code{selection-coding-system}$B$O!"(B
$B%&%#%s%I%&%7%9%F%`$N%;%l%/%7%g%s$rId9f2=$9$kJ}K!$r;XDj$7$^$9!#(B
@xref{Window System Selections}$B!#(B

@node Lisp and Coding Systems
@c @subsection Coding Systems in Lisp
@subsection Lisp$B$K$*$1$k%3!<%G%#%s%0%7%9%F%`(B

@c   Here are Lisp facilities for working with coding systems;
$B%3!<%G%#%s%0%7%9%F%`$r07$&(BLisp$B$N5!G=$K$D$$$F=R$Y$^$9!#(B

@defun coding-system-list &optional base-only
@tindex coding-system-list
@c This function returns a list of all coding system names (symbols).  If
@c @var{base-only} is non-@code{nil}, the value includes only the
@c base coding systems.  Otherwise, it includes variant coding systems as well.
$B$3$N4X?t$O!"$9$Y$F$N%3!<%G%#%s%0%7%9%F%`L>!J%7%s%\%k!K$N%j%9%H$rJV$9!#(B
@var{base-only}$B$,(B@code{nil}$B0J30$G$"$k$H!"(B
$BCM$K$O4pDl%3!<%G%#%s%0%7%9%F%`$N$_$r4^$a$k!#(B
$B$5$b$J$1$l$P!"CM$K$OJQ<o%3!<%G%#%s%0%7%9%F%`$b4^$^$l$k!#(B
@end defun

@defun coding-system-p object
@tindex coding-system-p
@c This function returns @code{t} if @var{object} is a coding system
@c name.
$B$3$N4X?t$O!"(B@var{object}$B$,%3!<%G%#%s%0%7%9%F%`L>$G$"$k$H(B@code{t}$B$rJV$9!#(B
@end defun

@defun check-coding-system coding-system
@tindex check-coding-system
@c This function checks the validity of @var{coding-system}.
@c If that is valid, it returns @var{coding-system}.
@c Otherwise it signals an error with condition @code{coding-system-error}.
$B$3$N4X?t$O!"(B@var{coding-system}$B$N@5Ev@-$rD4$Y$k!#(B
$B@5$7$$$b$N$J$i$P(B@var{coding-system}$B$rJV$9!#(B
$B$5$b$J$1$l$P!">r7o(B@code{coding-system-error}$BIU$-$N%(%i!<$rDLCN$9$k!#(B
@end defun

@defun coding-system-change-eol-conversion coding-system eol-type
@tindex coding-system-change-eol-conversion
@c This function returns a coding system which is like @var{coding-system}
@c except for its eol conversion, which is specified by @code{eol-type}.
@c @var{eol-type} should be @code{unix}, @code{dos}, @code{mac}, or
@c @code{nil}.  If it is @code{nil}, the returned coding system determines
@c the end-of-line conversion from the data.
$B$3$N4X?t$O!"(B@var{coding-system}$B$KN`;w$N%3!<%G%#%s%0%7%9%F%`$rJV$9$,!"(B
@code{eol-type}$B$G;XDj$5$l$?9TKvJQ49$N$b$N$G$"$k!#(B
@var{eol-type}$B$O!"(B@code{unix}$B!"(B@code{dos}$B!"(B@code{mac}$B!"(B@code{nil}$B$N(B
$B$$$:$l$+$G$"$k$3$H!#(B
@code{nil}$B$G$"$k$H!"JV$5$l$?%3!<%G%#%s%0%7%9%F%`$O!"(B
$B%G!<%?$+$i9TKvJQ49$r7hDj$9$k!#(B
@end defun

@defun coding-system-change-text-conversion eol-coding text-coding
@tindex coding-system-change-text-conversion
@c This function returns a coding system which uses the end-of-line
@c conversion of @var{eol-coding}, and the text conversion of
@c @var{text-coding}.  If @var{text-coding} is @code{nil}, it returns
@c @code{undecided}, or one of its variants according to @var{eol-coding}.
$B$3$N4X?t$O!"9TKvJQ49$K(B@var{eol-coding}$B$r;H$$!"(B
$B%F%-%9%H$NJQ49$K(B@var{text-coding}$B$r;H$C$F$$$k%3!<%G%#%s%0%7%9%F%`$rJV$9!#(B
@var{text-coding}$B$,(B@code{nil}$B$G$"$k$H!"(B
@code{undecided}$B$+(B@var{eol-coding}$B$K1~$8$?(B@code{undecided}$B$NJQ<o$N(B1$B$D$rJV$9!#(B
@end defun

@defun find-coding-systems-region from to
@tindex find-coding-systems-region
@c This function returns a list of coding systems that could be used to
@c encode a text between @var{from} and @var{to}.  All coding systems in
@c the list can safely encode any multibyte characters in that portion of
@c the text.
$B$3$N4X?t$O!"(B@var{from}$B$H(B@var{to}$B$N$"$$$@$N%F%-%9%H$NId9f2=$K;HMQ$G$-$k(B
$B%3!<%G%#%s%0%7%9%F%`$N%j%9%H$rJV$9!#(B
$B%j%9%HFb$N$9$Y$F$N%3!<%G%#%s%0%7%9%F%`$O!"Ev3:ItJ,$N%F%-%9%H$N(B
$B$I$s$J%^%k%A%P%$%HJ8;z$b0BA4$KId9f2=$G$-$k!#(B

@c If the text contains no multibyte characters, the function returns the
@c list @code{(undecided)}.
$B%F%-%9%H$K%^%k%A%P%$%HJ8;z$,4^$^$l$J$$>l9g!"(B
$B4X?t$O%j%9%H(B@code{(undecided)}$B$rJV$9!#(B
@end defun

@defun find-coding-systems-string string
@tindex find-coding-systems-string
@c This function returns a list of coding systems that could be used to
@c encode the text of @var{string}.  All coding systems in the list can
@c safely encode any multibyte characters in @var{string}.  If the text
@c contains no multibyte characters, this returns the list
@c @code{(undecided)}.
$B$3$N4X?t$O!"J8;zNs(B@var{string}$B$N%F%-%9%H$NId9f2=$K;HMQ$G$-$k(B
$B%3!<%G%#%s%0%7%9%F%`$N%j%9%H$rJV$9!#(B
$B%j%9%HFb$N$9$Y$F$N%3!<%G%#%s%0%7%9%F%`$O!"(B@var{string}$B$N(B
$B$I$s$J%^%k%A%P%$%HJ8;z$b0BA4$KId9f2=$G$-$k!#(B
$B%F%-%9%H$K%^%k%A%P%$%HJ8;z$,4^$^$l$J$$>l9g!"(B
$B$3$l$O%j%9%H(B@code{(undecided)}$B$rJV$9!#(B
@end defun

@defun find-coding-systems-for-charsets charsets
@tindex find-coding-systems-for-charsets
@c This function returns a list of coding systems that could be used to
@c encode all the character sets in the list @var{charsets}.
$B$3$N4X?t$O!"%j%9%H(B@var{charsets}$BFb$N$9$Y$F$NJ8;z=89g$NId9f2=$K;HMQ$G$-$k(B
$B%3!<%G%#%s%0%7%9%F%`$N%j%9%H$rJV$9!#(B
@end defun

@defun detect-coding-region start end &optional highest
@tindex detect-coding-region
@c This function chooses a plausible coding system for decoding the text
@c from @var{start} to @var{end}.  This text should be ``raw bytes''
@c (@pxref{Explicit Encoding}).
$B$3$N4X?t$O!"(B@var{start}$B$+$i(B@var{end}$B$^$G$N%F%-%9%H$rI|9f2=$9$k(B
$B$b$C$H$b$i$7$$%3!<%G%#%s%0%7%9%F%`$rA*$V!#(B
$B$3$N%F%-%9%H$O!X@8$N%P%$%H!Y!J(B@pxref{Explicit Encoding}$B!K$G$"$k$3$H!#(B

@c Normally this function returns a list of coding systems that could
@c handle decoding the text that was scanned.  They are listed in order of
@c decreasing priority.  But if @var{highest} is non-@code{nil}, then the
@c return value is just one coding system, the one that is highest in
@c priority.
$B$3$N4X?t$O!"DL>o!"Av::$7$?%F%-%9%H$NI|9f2=$r07$($k(B
$B%3!<%G%#%s%0%7%9%F%`$N%j%9%H$rJV$9!#(B
$B$=$l$i$OM%@h=g0L$N9_=g$KJB$V!#(B
$B$7$+$7!"(B@var{highest}$B$,(B@code{nil}$B0J30$G$"$k$H!"(B
$BLa$jCM$O$b$C$H$b=g0L$N9b$$(B1$B$D$N%3!<%G%#%s%0%7%9%F%`$G$"$k!#(B

@c If the region contains only @sc{ASCII} characters, the value
@c is @code{undecided} or @code{(undecided)}.
$BNN0h$K(B@sc{ASCII}$BJ8;z$@$1$,4^$^$l$k>l9g!"(B
$BCM$O(B@code{undecided}$B$+(B@code{(undecided)}$B$G$"$k!#(B
@end defun

@defun detect-coding-string string highest
@tindex detect-coding-string
@c This function is like @code{detect-coding-region} except that it
@c operates on the contents of @var{string} instead of bytes in the buffer.
$B$3$N4X?t$O(B@code{detect-coding-region}$B$HF1MM$G$"$k$,!"(B
$B%P%C%U%!Fb$N%P%$%H$N$+$o$j$KJ8;zNs(B@var{string}$B$NFbMF$K:nMQ$9$k!#(B
@end defun

@c   @xref{Process Information}, for how to examine or set the coding
@c systems used for I/O to a subprocess.
$B%5%V%W%m%;%9$H$NF~=PNO$K;HMQ$5$l$k%3!<%G%#%s%0%7%9%F%`$r(B
$BD4$Y$?$j@_Dj$9$kJ}K!$K$D$$$F$O!"(B@xref{Process Information}$B!#(B

@node User-Chosen Coding Systems
@c @subsection User-Chosen Coding Systems
@subsection $B%f!<%6!<;XDj$N%3!<%G%#%s%0%7%9%F%`(B

@tindex select-safe-coding-system
@defun select-safe-coding-system from to &optional preferred-coding-system
@c This function selects a coding system for encoding the text between
@c @var{from} and @var{to}, asking the user to choose if necessary.
$B$3$N4X?t$O(B@var{from}$B$H(B@var{to}$B$N$"$$$@$N%F%-%9%H$rId9f2=$9$k(B
$B%3!<%G%#%s%0%7%9%F%`$rA*$V$,!"(B
$BI,MW$J$i$P%f!<%6!<$KLd$$9g$o$;$k!#(B

@c The optional argument @var{preferred-coding-system} specifies a coding
@c system to try first.  If that one can handle the text in the specified
@c region, then it is used.  If this argument is omitted, the current
@c buffer's value of @code{buffer-file-coding-system} is tried first.
$B>JN,2DG=$J0z?t(B@var{preferred-coding-system}$B$O!"(B
$B:G=i$K;n$9%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k!#(B
$B$=$l$,;XDjNN0h$N%F%-%9%H$r=hM}$G$-$k$J$i$P!"$=$l$r;H$&!#(B
$B$3$N0z?t$r>JN,$9$k$H!"(B
@code{buffer-file-coding-system}$B$N%+%l%s%H%P%C%U%!$G$NCM$r$^$:;n$9!#(B

@c If the region contains some multibyte characters that the preferred
@c coding system cannot encode, this function asks the user to choose from
@c a list of coding systems which can encode the text, and returns the
@c user's choice.
$BNN0hFb$K(B@var{preferred-coding-system}$B$GId9f2=$G$-$J$$(B
$B%^%k%A%P%$%HJ8;z$,$"$k>l9g!"(B
$B$3$N4X?t$O!"Ev3:%F%-%9%H$rId9f2=2DG=$J%3!<%G%#%s%0%7%9%F%`0lMw$+$i(B
$B%f!<%6!<$KA*Br$7$F$b$i$$!"%f!<%6!<$,A*Br$7$?$b$N$rJV$9!#(B

@c One other kludgy feature: if @var{from} is a string, the string is the
@c target text, and @var{to} is ignored.
$BFC<l5!G=!'(B@code{ }@var{from}$B$,J8;zNs$G$"$k$H!"(B
$BJ8;zNs$rD4$Y$kBP>]$H$7!"(B@var{to}$B$OL5;k$9$k!#(B
@end defun

@c   Here are two functions you can use to let the user specify a coding
@c system, with completion.  @xref{Completion}.
$BJd40$rMQ$$$F%f!<%6!<$K%3!<%G%#%s%0%7%9%F%`$r;XDj$5$;$k$?$a$K;H$($k(B
2$B$D$N4X?t$O$D$.$N$H$*$j$G$9!#(B
@xref{Completion}$B!#(B

@defun read-coding-system prompt &optional default
@tindex read-coding-system
@c This function reads a coding system using the minibuffer, prompting with
@c string @var{prompt}, and returns the coding system name as a symbol.  If
@c the user enters null input, @var{default} specifies which coding system
@c to return.  It should be a symbol or a string.
$B$3$N4X?t$O!"J8;zNs(B@var{prompt}$B$r%W%m%s%W%H$H$7$F(B
$B%_%K%P%C%U%!$r;H$C$F%3!<%G%#%s%0%7%9%F%`$rFI$_<h$j!"(B
$B%3!<%G%#%s%0%7%9%F%`L>$r%7%s%\%k$H$7$FJV$9!#(B
$B%f!<%6!<$NF~NO$,6u$G$"$k$H!"(B
@var{default}$B$OJV$9$Y$-%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k!#(B
$B$=$l$O%7%s%\%k$+J8;zNs$G$"$k$3$H!#(B
@end defun

@defun read-non-nil-coding-system prompt
@tindex read-non-nil-coding-system
@c This function reads a coding system using the minibuffer, prompting with
@c string @var{prompt}, and returns the coding system name as a symbol.  If
@c the user tries to enter null input, it asks the user to try again.
@c @xref{Coding Systems}.
$B$3$N4X?t$O!"J8;zNs(B@var{prompt}$B$r%W%m%s%W%H$H$7$F(B
$B%_%K%P%C%U%!$r;H$C$F%3!<%G%#%s%0%7%9%F%`$rFI$_<h$j!"(B
$B%3!<%G%#%s%0%7%9%F%`L>$r%7%s%\%k$H$7$FJV$9!#(B
$B%f!<%6!<$,6u$rF~NO$7$h$&$H$9$k$H:FEYLd$$9g$o$;$k!#(B
@pxref{Coding Systems}$B!#(B
@end defun

@node Default Coding Systems
@c @subsection Default Coding Systems
@subsection $B%G%U%)%k%H$N%3!<%G%#%s%0%7%9%F%`(B

@c   This section describes variables that specify the default coding
@c system for certain files or when running certain subprograms, and the
@c function that I/O operations use to access them.
$BK\@a$G$O!"FCDj$N%U%!%$%k$dFCDj$N%5%V%W%m%0%i%`$r<B9T$9$k$H$-$N(B
$B%G%U%)%k%H$N%3!<%G%#%s%0%7%9%F%`$r;XDj$9$kJQ?t$H!"(B
$B$=$l$i$r;H$C$?F~=PNOA`:n$r9T$&4X?t$K$D$$$F=R$Y$^$9!#(B

@c   The idea of these variables is that you set them once and for all to the
@c defaults you want, and then do not change them again.  To specify a
@c particular coding system for a particular operation in a Lisp program,
@c don't change these variables; instead, override them using
@c @code{coding-system-for-read} and @code{coding-system-for-write}
@c (@pxref{Specifying Coding Systems}).
$B$3$l$i$NJQ?t$NL\E*$O!"FI<T$,K>$`%G%U%)%k%H$r$$$C$?$s$3$l$i$K@_Dj$7$F$*$1$P!"(B
$B:FEYJQ99$9$kI,MW$,$J$$$h$&$K$9$k$3$H$G$9!#(B
Lisp$B%W%m%0%i%`$NFCDj$NA`:n8~$1$KFCDj$N%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k$K$O!"(B
$B$3$l$i$NJQ?t$rJQ99$7$J$$$G$/$@$5$$!#(B
$B$+$o$j$K!"(B@code{coding-system-for-read}$B$d(B@code{coding-system-for-write}$B$r(B
$B;H$C$F>e=q$-$7$^$9!J(B@pxref{Specifying Coding Systems}$B!K!#(B

@defvar file-coding-system-alist
@tindex file-coding-system-alist
@c This variable is an alist that specifies the coding systems to use for
@c reading and writing particular files.  Each element has the form
@c @code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular
@c expression that matches certain file names.  The element applies to file
@c names that match @var{pattern}.
$B$3$NJQ?t$O!"FCDj$N%U%!%$%k$NFI$_=q$-$K;HMQ$9$k(B
$B%3!<%G%#%s%0%7%9%F%`$r;XDj$9$kO"A[%j%9%H$G$"$k!#(B
$B3FMWAG$O(B@code{(@var{pattern} . @var{coding})}$B$N7A$G$"$j!"(B
@var{pattern}$B$OFCDj$N%U%!%$%kL>$K0lCW$9$k@55,I=8=$G$"$k!#(B
@var{pattern}$B$K0lCW$9$k%U%!%$%kL>$KEv3:MWAG$rE,MQ$9$k!#(B

@c = $B8m?"(B @var{val}$B$O(B@var{coding}
@c The @sc{cdr} of the element, @var{coding}, should be either a coding
@c system, a cons cell containing two coding systems, or a function symbol.
@c If @var{val} is a coding system, that coding system is used for both
@c reading the file and writing it.  If @var{val} is a cons cell containing
@c two coding systems, its @sc{car} specifies the coding system for
@c decoding, and its @sc{cdr} specifies the coding system for encoding.
$BMWAG$N(B@sc{cdr}$B!"(B@var{coding}$B$O%3!<%G%#%s%0%7%9%F%`$G$"$k$+!"(B
2$B$D$N%3!<%G%#%s%0%7%9%F%`$r<}$a$?%3%s%9%;%k$G$"$k$+!"(B
$B4X?t%7%s%\%k$G$"$k$3$H!#(B
@var{coding}$B$,%3!<%G%#%s%0%7%9%F%`$G$"$k$H!"(B
$B%U%!%$%k$NFI$_=q$-$NN>J}$K$=$N%3!<%G%#%s%0%7%9%F%`$r;H$&!#(B
@var{coding}$B$,(B2$B$D$N%3!<%G%#%s%0%7%9%F%`$r<}$a$?%3%s%9%;%k$G$"$k$H!"(B
$B$=$N(B@sc{car}$B$OI|9f2=$K;H$&%3!<%G%#%s%0%7%9%F%`$r;XDj$7!"(B
$B$=$N(B@sc{cdr}$B$OId9f2=$K;H$&%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k!#(B

@c If @var{val} is a function symbol, the function must return a coding
@c system or a cons cell containing two coding systems.  This value is used
@c as described above.
@var{coding}$B$,4X?t%7%s%\%k$G$"$k$H!"(B
$B$=$N4X?t$O!"%3!<%G%#%s%0%7%9%F%`$+!"(B
2$B$D$N%3!<%G%#%s%0%7%9%F%`$r<}$a$?%3%s%9%;%k$rJV$9$3$H!#(B
$B$=$NCM$O>e$K=R$Y$?$h$&$K;H$o$l$k!#(B
@end defvar

@defvar process-coding-system-alist
@tindex process-coding-system-alist
@c This variable is an alist specifying which coding systems to use for a
@c subprocess, depending on which program is running in the subprocess.  It
@c works like @code{file-coding-system-alist}, except that @var{pattern} is
@c matched against the program name used to start the subprocess.  The coding
@c system or systems specified in this alist are used to initialize the
@c coding systems used for I/O to the subprocess, but you can specify
@c other coding systems later using @code{set-process-coding-system}.
$B$3$NJQ?t$O!"%5%V%W%m%;%9$G<B9T$7$F$$$k%W%m%0%i%`$K0MB8$7$F(B
$B%5%V%W%m%;%9$K;H$&%3!<%G%#%s%0%7%9%F%`$r;XDj$9$kO"A[%j%9%H$G$"$k!#(B
@code{file-coding-system-alist}$B$HF1MM$KF/$/$,!"(B
@var{pattern}$B$O%5%V%W%m%;%9$r;O$a$k$?$a$KMQ$$$?%W%m%0%i%`L>$KBP$7$F(B
$B0lCW$r<h$kE@$,0[$J$k!#(B
$B$3$NO"A[%j%9%H$K;XDj$7$?%3!<%G%#%s%0%7%9%F%`$O!"(B
$B%5%V%W%m%;%9$H$NF~=PNO$K;HMQ$9$k%3!<%G%#%s%0%7%9%F%`$N=i4|2=$KMQ$$$l$k$,!"(B
@code{set-process-coding-system}$B$r;H$C$F!"(B
$B$"$H$GJL$N%3!<%G%#%s%0%7%9%F%`$r;XDj$G$-$k!#(B
@end defvar

@c   @strong{Warning:} Coding systems such as @code{undecided} which
@c determine the coding system from the data do not work entirely reliably
@c with asynchronous subprocess output.  This is because Emacs handles
@c asynchronous subprocess output in batches, as it arrives.  If the coding
@c system leaves the character code conversion unspecified, or leaves the
@c end-of-line conversion unspecified, Emacs must try to detect the proper
@c conversion from one batch at a time, and this does not always work.
@strong{$B7Y9p!'(B}@code{ }
$B%G!<%?$+$i%3!<%G%#%s%0%7%9%F%`$r7hDj$9$k(B@code{undecided}$B$N$h$&$J(B
$B%3!<%G%#%s%0%7%9%F%`$O!"HsF14|%5%V%W%m%;%9$N=PNO$KBP$7$F$O(B
$B40A4$K?.Mj@-$N$"$kF0:n$O$G$-$J$$!#(B
$B$3$l$O!"(BEmacs$B$,HsF14|%5%V%W%m%;%9$N=PNO$,(B
$BE~Ce$9$k$?$S$K0l2t$G=hM}$9$k$+$i$G$"$k!#(B
$B%3!<%G%#%s%0%7%9%F%`$,J8;z%3!<%IJQ49$d9TKvJQ49$rL$;XDj$K$7$F$$$k$H!"(B
Emacs$B$O(B1$B$D$N2t$+$i@5$7$$JQ49$r8!=P$7$h$&$H;n$_$k$,!"(B
$B$3$l$,$D$M$KF0:n$9$k$H$O8B$i$J$$!#(B

@c   Therefore, with an asynchronous subprocess, if at all possible, use a
@c coding system which determines both the character code conversion and
@c the end of line conversion---that is, one like @code{latin-1-unix},
@c rather than @code{undecided} or @code{latin-1}.
$B$7$?$,$C$F!"HsF14|%5%V%W%m%;%9$G$O!"2DG=$J8B$j(B
$BJ8;z%3!<%IJQ49$H9TKvJQ49$NN>J}$r;XDj$7$?%3!<%G%#%s%0%7%9%F%`$r;H$$$^$9!#(B
$B$D$^$j!"(B@code{undecided}$B$d(B@code{latin-1}$B$J$I$G$O$J$/!"(B
@code{latin-1-unix}$B$N$h$&$J$b$N$r;H$$$^$9!#(B

@defvar network-coding-system-alist
@tindex network-coding-system-alist
@c This variable is an alist that specifies the coding system to use for
@c network streams.  It works much like @code{file-coding-system-alist},
@c with the difference that the @var{pattern} in an element may be either a
@c port number or a regular expression.  If it is a regular expression, it
@c is matched against the network service name used to open the network
@c stream.
$B$3$NJQ?t$O!"%M%C%H%o!<%/%9%H%j!<%`$K;HMQ$9$k%3!<%G%#%s%0%7%9%F%`$r(B
$B;XDj$9$kO"A[%j%9%H$G$"$k!#(B
@code{file-coding-system-alist}$B$HF1MM$KF/$/$,!"(B
$BMWAGFb$N(B@var{pattern}$B$O%]!<%HHV9f$+@55,I=8=$G$"$kE@$,0[$J$k!#(B
$B$=$l$,@55,I=8=$G$"$k$H!"%M%C%H%o!<%/%9%H%j!<%`$r3+$/$?$a$K(B
$B;HMQ$7$?%M%C%H%o!<%/%5!<%S%9L>$KBP$7$F0lCW$r$H$k!#(B
@end defvar

@defvar default-process-coding-system
@tindex default-process-coding-system
@c This variable specifies the coding systems to use for subprocess (and
@c network stream) input and output, when nothing else specifies what to
@c do.
$B$3$NJQ?t$O!"$J$K$b;XDj$5$l$F$$$J$$%5%V%W%m%;%9!J$d%M%C%H%o!<%/%9%H%j!<%`!K(B
$B$NF~=PNO$K;HMQ$9$k%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k!#(B

@c The value should be a cons cell of the form @code{(@var{input-coding}
@c . @var{output-coding})}.  Here @var{input-coding} applies to input from
@c the subprocess, and @var{output-coding} applies to output to it.
$BCM$O!"(B@code{(@var{input-coding} . @var{output-coding})}$B$N7A$N(B
$B%3%s%9%;%k$G$"$k$3$H!#(B
$B$3$3$G!"(B@var{input-coding}$B$O%5%V%W%m%;%9$+$i$NF~NO$KE,MQ$5$l!"(B
@var{output-coding}$B$O$=$l$X$N=PNO$KE,MQ$5$l$k!#(B
@end defvar

@defun find-operation-coding-system operation &rest arguments
@tindex find-operation-coding-system
@c This function returns the coding system to use (by default) for
@c performing @var{operation} with @var{arguments}.  The value has this
@c form:
$B$3$N4X?t$O!"(B@var{arguments}$B$r;XDj$7$F(B@var{operation}$B$r9T$&$H$-$K(B
$B!J%G%U%)%k%H$G!K;HMQ$5$l$k%3!<%G%#%s%0%7%9%F%`$rJV$9!#(B
$B$=$NCM$O$D$.$N7A$G$"$k!#(B

@example
(@var{decoding-system} @var{encoding-system})
@end example

@c The first element, @var{decoding-system}, is the coding system to use
@c for decoding (in case @var{operation} does decoding), and
@c @var{encoding-system} is the coding system for encoding (in case
@c @var{operation} does encoding).
$BBh(B1$BMWAG(B@var{decoding-system}$B$O(B
$B!J(B@var{operation}$B$,I|9f2=$r9T$&>l9g$K$O!KI|9f2=$KMQ$$$k(B
$B%3!<%G%#%s%0%7%9%F%`$G$"$j!"(B
@var{encoding-system}$B$O(B
$B!J(B@var{operation}$B$,Id9f2=$r9T$&>l9g$K$O!KId9f2=$KMQ$$$k(B
$B%3!<%G%#%s%0%7%9%F%`$G$"$k!#(B

@c The argument @var{operation} should be an Emacs I/O primitive:
@c @code{insert-file-contents}, @code{write-region}, @code{call-process},
@c @code{call-process-region}, @code{start-process}, or
@c @code{open-network-stream}.
$B0z?t(B@var{operation}$B$O!"(BEmacs$B$NF~=PNO4pK\4X?t$N(B
@code{insert-file-contents}$B!"(B@code{write-region}$B!"(B@code{call-process}$B!"(B
@code{call-process-region}$B!"(B@code{start-process}$B!"(B
@code{open-network-stream}$B$N$$$:$l$+$G$"$k$3$H!#(B

@c The remaining arguments should be the same arguments that might be given
@c to that I/O primitive.  Depending on which primitive, one of those
@c arguments is selected as the @dfn{target}.  For example, if
@c @var{operation} does file I/O, whichever argument specifies the file
@c name is the target.  For subprocess primitives, the process name is the
@c target.  For @code{open-network-stream}, the target is the service name
@c or port number.
$B;D$j$N0z?t$O!"$3$l$i$NF~=PNO4pK\4X?t$K;XDj$9$k$G$"$m$&0z?t$HF1$8$G$"$k$3$H!#(B
$B4pK\4X?t$K0MB8$7$F!"0z?t$N(B1$B$D$r(B@dfn{$BBP>](B}$B$H$7$FA*$V!#(B
$B$?$H$($P!"(B@var{operation}$B$,%U%!%$%kF~=PNO$r9T$&>l9g!"(B
$B%U%!%$%kL>$r;XDj$9$k0z?t$,BP>]$G$"$k!#(B
$B%5%V%W%m%;%9$N4pK\4X?t$G$O!"%W%m%;%9L>$,BP>]$G$"$k!#(B
@code{open-network-stream}$B$G$O!"%5!<%S%9L>$d%]!<%HHV9f$,BP>]$G$"$k!#(B

@c This function looks up the target in @code{file-coding-system-alist},
@c @code{process-coding-system-alist}, or
@c @code{network-coding-system-alist}, depending on @var{operation}.
@c @xref{Default Coding Systems}.
$B$3$N4X?t$O!"(B@var{operation}$B$K1~$8$FEv3:BP>]$r(B
@code{file-coding-system-alist}$B$d(B
@code{process-coding-system-alist}$B$d(B
@code{network-coding-system-alist}$B$GC5$9!#(B
@pxref{Default Coding Systems}$B!#(B
@end defun

@node Specifying Coding Systems
@c @subsection Specifying a Coding System for One Operation
@subsection 1$B$D$NA`:n8~$1$K%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k(B

@c   You can specify the coding system for a specific operation by binding
@c the variables @code{coding-system-for-read} and/or
@c @code{coding-system-for-write}.
$BJQ?t(B@code{coding-system-for-read}$B$H!?$d(B@code{coding-system-for-write}$B$r(B
$BB+G{$9$k$3$H$G!"FCDj$N(B1$B$D$NA`:n8~$1$N%3!<%G%#%s%0%7%9%F%`$r;XDj$G$-$^$9!#(B

@defvar coding-system-for-read
@tindex coding-system-for-read
@c If this variable is non-@code{nil}, it specifies the coding system to
@c use for reading a file, or for input from a synchronous subprocess.
$B$3$NJQ?t$,(B@code{nil}$B0J30$G$"$k$H!"(B
$B%U%!%$%k$rFI$`$H$-$dF14|%W%m%;%9$+$i$NF~NO$KMQ$$$k(B
$B%3!<%G%#%s%0%7%9%F%`$r;XDj$9$k!#(B

@c It also applies to any asynchronous subprocess or network stream, but in
@c a different way: the value of @code{coding-system-for-read} when you
@c start the subprocess or open the network stream specifies the input
@c decoding method for that subprocess or network stream.  It remains in
@c use for that subprocess or network stream unless and until overridden.
$B$3$l$OHsF14|%W%m%;%9$d%M%C%H%o!<%/%9%H%j!<%`$K$bE,MQ$5$l$k$,!"(B
$B0[$J$C$?J}K!$GE,MQ$5$l$k!#(B
$B%5%V%W%m%;%9$r3+;O$7$?$j%M%C%H%o!<%/%9%H%j!<%`$r3+$$$?$H$-$N(B
@code{coding-system-for-read}$B$NCM$O!"(B
$B$=$N%5%V%W%m%;%9$d%M%C%H%o!<%/%9%H%j!<%`$NF~NO$NI|9f2=J}K!$r;XDj$9$k!#(B
$BJQ99$5$l$J$$8B$j!"$=$N%5%V%W%m%;%9$d%M%C%H%o!<%/%9%H%j!<%`$K(B
$BBP$7$F;H$o$lB3$1$k!#(B

@c The right way to use this variable is to bind it with @code{let} for a
@c specific I/O operation.  Its global value is normally @code{nil}, and
@c you should not globally set it to any other value.  Here is an example
@c of the right way to use the variable:
$B$3$NJQ?t$N@5$7$$;H$$J}$O!"FCDj$NF~=PNOA`:n$KBP$7$F(B
@code{let}$B$GB+G{$9$k$3$H$G$"$k!#(B
$B$=$N%0%m!<%P%k$JCM$ODL>o$O(B@code{nil}$B$G$"$j!"(B
$B%0%m!<%P%k$K$3$l0J30$NCM$r@_Dj$9$k$Y$-$G$O$J$$!#(B
$B$3$NJQ?t$N@5$7$$;H$$J}$NNc$r$D$.$K<($9!#(B

@example
@c ;; @r{Read the file with no character code conversion.}
@c ;; @r{Assume @sc{crlf} represents end-of-line.}
;; @r{$BJ8;z%3!<%IJQ49$;$:$K%U%!%$%k$+$iFI$`(B}
;; @r{@sc{crlf}$B$,9TKv$rI=$9$H2>Dj$9$k(B}
(let ((coding-system-for-write 'emacs-mule-dos))
  (insert-file-contents filename))
@end example

@c When its value is non-@code{nil}, @code{coding-system-for-read} takes
@c precedence over all other methods of specifying a coding system to use for
@c input, including @code{file-coding-system-alist},
@c @code{process-coding-system-alist} and
@c @code{network-coding-system-alist}.
$B$=$NCM$,(B@code{nil}$B0J30$G$"$k$H!"(B
@code{coding-system-for-read}$B$O!"(B
@code{file-coding-system-alist}$B!"(B
@code{process-coding-system-alist}$B!"(B@code{network-coding-system-alist}$B!"(B
$B$r4^$a$FF~NO$KMQ$$$k%3!<%G%#%s%0%7%9%F%`$N(B
$BB>$N$9$Y$F$N;XDjJ}K!$KM%@h$9$k!#(B
@end defvar

@defvar coding-system-for-write
@tindex coding-system-for-write
@c This works much like @code{coding-system-for-read}, except that it
@c applies to output rather than input.  It affects writing to files,
@c subprocesses, and net connections.
$B$3$l$O(B@code{coding-system-for-read}$B$HF1MM$KF/$/$,!"(B
$BF~NO$G$O$J$/=PNO$KE,MQ$5$l$kE@$,0[$J$k!#(B
$B%U%!%$%k!"%5%V%W%m%;%9!"%M%C%H%o!<%/@\B3$X=q$/$3$H$K1F6A$9$k!#(B

@c When a single operation does both input and output, as do
@c @code{call-process-region} and @code{start-process}, both
@c @code{coding-system-for-read} and @code{coding-system-for-write}
@c affect it.
@code{call-process-region}$B$H(B@code{start-process}$B$N$h$&$K!"(B
1$B$D$NA`:n$GF~NO$H=PNO$r9T$&$H$-$K$O!"(B
@code{coding-system-for-read}$B$H(B@code{coding-system-for-write}$B$N(B
$BN>J}$,1F6A$9$k!#(B
@end defvar

@defvar inhibit-eol-conversion
@tindex inhibit-eol-conversion
@c When this variable is non-@code{nil}, no end-of-line conversion is done,
@c no matter which coding system is specified.  This applies to all the
@c Emacs I/O and subprocess primitives, and to the explicit encoding and
@c decoding functions (@pxref{Explicit Encoding}).
$B$3$NJQ?t$,(B@code{nil}$B0J30$G$"$k$H!"(B
$B%3!<%G%#%s%0%7%9%F%`$G$J$K$,;XDj$5$l$F$$$h$&$H9TKvJQ49$r9T$o$J$$!#(B
$B$3$l$O!"(BEmacs$B$NF~=PNO$H%5%V%W%m%;%9$N$9$Y$F$N4pK\4X?t!"(B
$BL@<(E*$JId9f2=!?I|9f2=4X?t!J(B@pxref{Explicit Encoding}$B!K$KE,MQ$5$l$k!#(B
@end defvar

@node Explicit Encoding
@c @subsection Explicit Encoding and Decoding
@subsection $BL@<(E*$JId9f2=$HI|9f2=(B
@c @cindex encoding text
@c @cindex decoding text
@cindex $B%F%-%9%H$NId9f2=(B
@cindex $B%F%-%9%H$NI|9f2=(B

@c   All the operations that transfer text in and out of Emacs have the
@c ability to use a coding system to encode or decode the text.
@c You can also explicitly encode and decode text using the functions
@c in this section.
Emacs$B$X!?$+$i%F%-%9%H$rE>Aw$9$k$9$Y$F$NA`:n$K$O!"(B
$B%F%-%9%H$rId9f2=$7$?$jI|9f2=$9$k%3!<%G%#%s%0%7%9%F%`$r;H$&G=NO$,$"$j$^$9!#(B
$BK\@a$K=R$Y$k4X?t$rMQ$$$F%F%-%9%H$rL@<(E*$KId9f2=$7$?$jI|9f2=$G$-$^$9!#(B

@c @cindex raw bytes
@cindex $B@8$N%P%$%H(B
@c   The result of encoding, and the input to decoding, are not ordinary
@c text.  They are ``raw bytes''---bytes that represent text in the same
@c way that an external file would.  When a buffer contains raw bytes, it
@c is most natural to mark that buffer as using unibyte representation,
@c using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}),
@c but this is not required.  If the buffer's contents are only temporarily
@c raw, leave the buffer multibyte, which will be correct after you decode
@c them.
$BId9f2=$N7k2L$HI|9f2=$9$kF~NO$O!"DL>o$N(BEmacs$B$N%F%-%9%H$G$O$"$j$^$;$s!#(B
$B$=$l$i$O!X@8$N%P%$%H!Y!"$D$^$j!"30It%U%!%$%k$HF1$8J}K!$G(B
$B%F%-%9%H$rI=8=$9$k%P%$%HNs$G$9!#(B
$B%P%C%U%!$K@8$N%P%$%H$,<}$a$i$l$F$$$k>l9g!"(B
@code{set-buffer-multibyte}$B!J(B@pxref{Selecting a Representation}$B!K$rMQ$$$F(B
$B%P%C%U%!$O%f%K%P%$%HI=8=$G$"$k$H0u$rIU$1$k$N$,$b$C$H$b<+A3$G$9$,!"(B
$B$3$l$OI,?\$G$O$"$j$^$;$s!#(B
$B%P%C%U%!$NFbMF$,C1$K0l;~E*$K@8$N%P%$%H$G$"$k$H$-$K$O!"(B
$B%P%C%U%!$O%^%k%A%P%$%H$N$^$^$K$7$F$*$-$^$9!#(B
$B%P%C%U%!FbMF$rI|9f2=$9$l$P@5$7$/$J$j$^$9!#(B

@c   The usual way to get raw bytes in a buffer, for explicit decoding, is
@c to read them from a file with @code{insert-file-contents-literally}
@c (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile}
@c argument when visiting a file with @code{find-file-noselect}.
$BL@<(E*$KI|9f2=$9$k$?$a$K%P%C%U%!$K@8$N%P%$%H$rF~$l$kIaDL$NJ}K!$O!"(B
@code{insert-file-contents-literally}$B!J(B@pxref{Reading from Files}$B!K$G(B
$B%U%!%$%k$+$iFI$`$+!"(B
@code{find-file-noselect}$B$G%U%!%$%k$rK,Ld$9$k$H$-$K0z?t(B@var{rawfile}$B$K(B
@code{nil}$B0J30$r;XDj$7$^$9!#(B

@c   The usual way to use the raw bytes that result from explicitly
@c encoding text is to copy them to a file or process---for example, to
@c write them with @code{write-region} (@pxref{Writing to Files}), and
@c suppress encoding for that @code{write-region} call by binding
@c @code{coding-system-for-write} to @code{no-conversion}.
$B%F%-%9%H$NL@<(E*$JId9f2=$GF@$?7k2L$G$"$k@8$N%P%$%H$r;H$&IaDL$NJ}K!$O!"(B
$B%U%!%$%k$d%W%m%;%9$X$=$l$i$r%3%T!<$7$^$9!#(B
$B$?$H$($P!"(B@code{write-region}$B!J(B@pxref{Writing to Files}$B!K$G$=$l$i$r=q$/$K$O!"(B
@code{coding-system-for-write}$B$K(B@code{no-conversion}$B$rB+G{$7$F(B
@code{write-region}$B$NId9f2=$rM^@)$7$^$9!#(B

@c   Raw bytes sometimes contain overlong byte-sequences that look like a
@c proper multibyte character plus extra bytes containing trailing codes.
@c For most purposes, Emacs treats such a sequence in a buffer or string as
@c a single character, and if you look at its character code, you get the
@c value that corresponds to the multibyte character sequence---the extra
@c bytes are disregarded.  This behavior is not quite clean, but raw bytes
@c are used only in limited places in Emacs, so as a practical matter
@c problems can be avoided.
$B@8$N%P%$%H$K$O!"@5$7$$%^%k%A%P%$%HJ8;z$K(B
$BM>J,$J%H%l%$%j%s%0%3!<%I$,IU$$$?$h$&$K8+$($kD9$9$.$k%P%$%HNs$,(B
$B4^$^$l$k>l9g$,$"$j$^$9!#(B
$B$[$H$s$I$NL\E*$K$O!"%P%C%U%!$dJ8;zNs$N$=$N$h$&$JNs$r(BEmacs$B$O(B1$BJ8;z$H$7$F07$$!"(B
$B$=$NJ8;z%3!<%I$rD4$Y$k$H%^%k%A%P%$%HJ8;z$NNs$KBP1~$7$?CM$rF@$k$O$:$G$9!#(B
$BM>J,$J%P%$%HNs$OL5;k$5$l$^$9!#(B
$B$3$N$U$k$^$$$OF)L@@-$,$h$/$"$j$^$;$s$,!"(B
$B@8$N%P%$%H$O(BEmacs$B$N8BDj$5$l$?>lLL$G$N$_;H$o$l!"<BMQ>e$NLdBj$O2sHr$G$-$^$9!#(B

@defun encode-coding-region start end coding-system
@tindex encode-coding-region
@c This function encodes the text from @var{start} to @var{end} according
@c to coding system @var{coding-system}.  The encoded text replaces the
@c original text in the buffer.  The result of encoding is ``raw bytes,''
@c but the buffer remains multibyte if it was multibyte before.
$B$3$N4X?t$O!"%3!<%G%#%s%0%7%9%F%`(B@var{coding-system}$B$K=>$C$F(B
@var{start}$B$+$i(B@var{end}$B$N%F%-%9%H$rId9f2=$9$k!#(B
$BId9f2=7k2L$O%P%C%U%!Fb$N$b$H$N%F%-%9%H$rCV$-49$($k!#(B
$BId9f2=7k2L$O!X@8$N%P%$%H!Y$G$"$k$,!"(B
$B%^%k%A%P%$%H$G$"$C$?%P%C%U%!$O%^%k%A%P%$%H$N$^$^$G$"$k!#(B
@end defun

@defun encode-coding-string string coding-system
@tindex encode-coding-string
@c This function encodes the text in @var{string} according to coding
@c system @var{coding-system}.  It returns a new string containing the
@c encoded text.  The result of encoding is a unibyte string of ``raw bytes.''
$B$3$N4X?t$O!"%3!<%G%#%s%0%7%9%F%`(B@var{coding-system}$B$K=>$C$F(B
$BJ8;zNs(B@var{string}$B$N%F%-%9%H$rId9f2=$9$k!#(B
$BId9f2=$7$?%F%-%9%H$r4^$`?7$?$JJ8;zNs$rJV$9!#(B
$BId9f2=7k2L$O!X@8$N%P%$%H!Y$N%f%K%P%$%HJ8;zNs$G$"$k!#(B
@end defun

@defun decode-coding-region start end coding-system
@tindex decode-coding-region
@c This function decodes the text from @var{start} to @var{end} according
@c to coding system @var{coding-system}.  The decoded text replaces the
@c original text in the buffer.  To make explicit decoding useful, the text
@c before decoding ought to be ``raw bytes.''
$B$3$N4X?t$O!"%3!<%G%#%s%0%7%9%F%`(B@var{coding-system}$B$K=>$C$F(B
@var{start}$B$+$i(B@var{end}$B$N%F%-%9%H$rI|9f2=$9$k!#(B
$BI|9f2=7k2L$O%P%C%U%!Fb$N$b$H$N%F%-%9%H$rCV$-49$($k!#(B
$BL@<(E*$JI|9f2=$,M-MQ$G$"$k$?$a$K$O!"(B
$BI|9f2=A0$N%F%-%9%H$O!X@8$N%P%$%H!Y$G$"$k$3$H!#(B
@end defun

@defun decode-coding-string string coding-system
@tindex decode-coding-string
@c This function decodes the text in @var{string} according to coding
@c system @var{coding-system}.  It returns a new string containing the
@c decoded text.  To make explicit decoding useful, the contents of
@c @var{string} ought to be ``raw bytes.''
$B$3$N4X?t$O!"%3!<%G%#%s%0%7%9%F%`(B@var{coding-system}$B$K=>$C$F(B
$BJ8;zNs(B@var{string}$B$N%F%-%9%H$rI|9f2=$9$k!#(B
$BI|9f2=$7$?%F%-%9%H$r4^$`?7$?$JJ8;zNs$rJV$9!#(B
$BL@<(E*$JI|9f2=$,M-MQ$G$"$k$?$a$K$O!"(B
$BI|9f2=A0$N(B@var{string}$B$NFbMF$O!X@8$N%P%$%H!Y$G$"$k$3$H!#(B
@end defun

@node Terminal I/O Encoding
@c @subsection Terminal I/O Encoding
@subsection $BC<KvF~=PNO$NId9f2=(B

@c   Emacs can decode keyboard input using a coding system, and encode
@c terminal output.  This is useful for terminals that transmit or display
@c text using a particular encoding such as Latin-1.  Emacs does not set
@c @code{last-coding-system-used} for encoding or decoding for the
@c terminal.
Emacs$B$O!"%3!<%G%#%s%0%7%9%F%`$rMQ$$$F%-!<%\!<%IF~NO$rI|9f2=$7$?$j!"(B
$BC<Kv=PNO$rId9f2=$G$-$^$9!#(B
Latin-1$B$J$I$NFCDj$NId9f$rMQ$$$F%F%-%9%H$rAw?.$7$?$jI=<($9$k(B
$BC<Kv$KBP$7$F$O!"$3$l$OM-MQ$G$9!#(B
Emacs$B$O!"C<Kv$KBP$9$kId9f2=$dI|9f2=$G$O(B
@code{last-coding-system-used}$B$K@_Dj$7$^$;$s!#(B

@defun keyboard-coding-system
@tindex keyboard-coding-system
@c This function returns the coding system that is in use for decoding
@c keyboard input---or @code{nil} if no coding system is to be used.
$B$3$N4X?t$O!"%-!<%\!<%IF~NO$NI|9f2=$KMQ$$$F$$$k(B
$B%3!<%G%#%s%0%7%9%F%`$rJV$9!#(B
$B%3!<%G%#%s%0%7%9%F%`$r;HMQ$7$F$$$J$1$l$P(B@code{nil}$B$rJV$9!#(B
@end defun

@defun set-keyboard-coding-system coding-system
@tindex set-keyboard-coding-system
@c This function specifies @var{coding-system} as the coding system to
@c use for decoding keyboard input.  If @var{coding-system} is @code{nil},
@c that means do not decode keyboard input.
$B$3$N4X?t$O!"%-!<%\!<%IF~NO$NI|9f2=$K;HMQ$9$k%3!<%G%#%s%0%7%9%F%`$H$7$F(B
@var{coding-system}$B$r;XDj$9$k!#(B
@var{coding-system}$B$,(B@code{nil}$B$G$"$k$H!"(B
$B%-!<%\!<%IF~NO$KI|9f2=$rMQ$$$J$$$3$H$r0UL#$9$k!#(B
@end defun

@defun terminal-coding-system
@tindex terminal-coding-system
@c This function returns the coding system that is in use for encoding
@c terminal output---or @code{nil} for no encoding.
$B$3$N4X?t$O!"C<Kv=PNO$NId9f2=$KMQ$$$F$$$k(B
$B%3!<%G%#%s%0%7%9%F%`$rJV$9!#(B
$B%3!<%G%#%s%0%7%9%F%`$r;HMQ$7$F$$$J$1$l$P(B@code{nil}$B$rJV$9!#(B
@end defun

@defun set-terminal-coding-system coding-system
@tindex set-terminal-coding-system
@c This function specifies @var{coding-system} as the coding system to use
@c for encoding terminal output.  If @var{coding-system} is @code{nil},
@c that means do not encode terminal output.
$B$3$N4X?t$O!"C<Kv=PNO$NId9f2=$K;HMQ$9$k%3!<%G%#%s%0%7%9%F%`$H$7$F(B
@var{coding-system}$B$r;XDj$9$k!#(B
@var{coding-system}$B$,(B@code{nil}$B$G$"$k$H!"(B
$BC<Kv=PNO$KId9f2=$rMQ$$$J$$$3$H$r0UL#$9$k!#(B
@end defun

@node MS-DOS File Types
@c @subsection MS-DOS File Types
@subsection MS-DOS$B$N%U%!%$%k7?(B
@c @cindex DOS file types
@c @cindex MS-DOS file types
@c @cindex Windows file types
@c @cindex file types on MS-DOS and Windows
@c @cindex text files and binary files
@c @cindex binary files and text files
@cindex DOS$B$N%U%!%$%k7?(B
@cindex MS-DOS$B$N%U%!%$%k7?(B
@cindex Windows$B$N%U%!%$%k7?(B
@cindex $B%U%!%$%k7?!"(BMS-DOS$B$H(BWindows
@cindex $B%F%-%9%H%U%!%$%k$H%P%$%J%j%U%!%$%k(B
@cindex $B%P%$%J%j%U%!%$%k$H%F%-%9%H%U%!%$%k(B

@c   Emacs on MS-DOS and on MS-Windows recognizes certain file names as
@c text files or binary files.  By ``binary file'' we mean a file of
@c literal byte values that are not necessary meant to be characters.
@c Emacs does no end-of-line conversion and no character code conversion
@c for a binary file.  Meanwhile, when you create a new file which is
@c marked by its name as a ``text file'', Emacs uses DOS end-of-line
@c conversion.
MS-DOS$B$d(BMS-Windows$B>e$N(BEmacs$B$O!"(B
$BFCDj$N%U%!%$%kL>$r%F%-%9%H%U%!%$%k$d%P%$%J%j%U%!%$%k$H$7$FG'<1$7$^$9!#(B
$B!X%P%$%J%j%U%!%$%k!Y$H$O!"I,$:$7$bJ8;z$r0UL#$7$J$$%P%$%HCM$N%U%!%$%k$G$9!#(B
Emacs$B$O!"%P%$%J%j%U%!%$%k$KBP$7$F$O9TKvJQ49$dJ8;z%3!<%IJQ49$r9T$$$^$;$s!#(B
$B0lJ}!"$=$NL>A0$+$i!X%F%-%9%H%U%!%$%k!Y$H0u$,IU$$$?(B
$B?75,%U%!%$%k$r:n@.$9$k$H!"(BEmacs$B$O(BDOS$B$N9TKvJQ49$r9T$$$^$9!#(B

@defvar buffer-file-type
@c This variable, automatically buffer-local in each buffer, records the
@c file type of the buffer's visited file.  When a buffer does not specify
@c a coding system with @code{buffer-file-coding-system}, this variable is
@c used to determine which coding system to use when writing the contents
@c of the buffer.  It should be @code{nil} for text, @code{t} for binary.
@c If it is @code{t}, the coding system is @code{no-conversion}.
@c Otherwise, @code{undecided-dos} is used.
$B$3$NJQ?t$O!"3F%P%C%U%!$G<+F0E*$K%P%C%U%!%m!<%+%k$K$J$j!"(B
$B%P%C%U%!$GK,Ld$7$?%U%!%$%k$N%U%!%$%k7?$r5-O?$9$k!#(B
$B%P%C%U%!$,(B@code{buffer-file-coding-system}$B$G(B
$B%3!<%G%#%s%0%7%9%F%`$r;XDj$7$J$$>l9g!"(B
$B%P%C%U%!FbMF$r=q$-=P$9$H$-$KMQ$$$k%3!<%G%#%s%0%7%9%F%`$r(B
$B$3$NJQ?t$rMQ$$$F7hDj$9$k!#(B
$B%F%-%9%H$KBP$7$F$O(B@code{nil}$B!"%P%$%J%j$KBP$7$F(B@code{t}$B$G$"$k$3$H!#(B
$B$3$l$,(B@code{t}$B$G$"$k$H!"%3!<%G%#%s%0%7%9%F%`$O(B@code{no-conversion}$B$G$"$k!#(B
$B$5$b$J$1$l$P!"(B@code{undecided-dos}$B$rMQ$$$k!#(B

@c Normally this variable is set by visiting a file; it is set to
@c @code{nil} if the file was visited without any actual conversion.
$BDL>o!"$3$NJQ?t$O%U%!%$%k$rK,Ld$9$k$H@_Dj$5$l$k!#(B
$B$$$+$J$kJQ49$b9T$o$:$K%U%!%$%k$rK,Ld$9$k$H(B@code{nil}$B$K@_Dj$5$l$k!#(B
@end defvar

@defopt file-name-buffer-file-type-alist
@c This variable holds an alist for recognizing text and binary files.
@c Each element has the form (@var{regexp} . @var{type}), where
@c @var{regexp} is matched against the file name, and @var{type} may be
@c @code{nil} for text, @code{t} for binary, or a function to call to
@c compute which.  If it is a function, then it is called with a single
@c argument (the file name) and should return @code{t} or @code{nil}.
$B$3$NJQ?t$O!"%F%-%9%H!?%P%$%J%j%U%!%$%k$rG'<1$9$k$?$a$NO"A[%j%9%H$rJ];}$9$k!#(B
$B3FMWAG$O(B(@var{regexp} . @var{type})$B$N7A$G$"$k!#(B
$B$3$3$G!"(B@var{regexp}$B$O%U%!%$%kL>$KBP$7$F0lCW$r$H$j!"(B
@var{type}$B$O!"%F%-%9%H%U%!%$%k$G$O(B@code{nil}$B!"(B
$B%P%$%J%j%U%!%$%k$G$O(B@code{t}$B!"$"$k$$$O!"(B
$B$I$A$i$G$"$k$+$r7W;;$9$k$?$a$K8F$S=P$94X?t$G$"$k!#(B
$B$=$l$,4X?t$G$"$k$H!"(B1$B$D$N0z?t!J%U%!%$%kL>!K$G8F$P$l!"(B
@code{t}$B$+(B@code{nil}$B$rJV$9$3$H!#(B

@c Emacs when running on MS-DOS or MS-Windows checks this alist to decide
@c which coding system to use when reading a file.  For a text file,
@c @code{undecided-dos} is used.  For a binary file, @code{no-conversion}
@c is used.
MS-DOS$B$d(BMS-Windows$B$GF0:n$7$F$$$k(BEmacs$B$O!"(B
$B$3$NO"A[%j%9%H$rD4$Y$F!"%U%!%$%k$rFI$`:]$K;HMQ$9$k(B
$B%3!<%G%#%s%0%7%9%F%`$r7hDj$9$k!#(B
$B%F%-%9%H%U%!%$%k$G$O(B@code{undecided-dos}$B$,;H$o$l$k!#(B
$B%P%$%J%j%U%!%$%k$G$O(B@code{no-conversion}$B$,;H$o$l$k!#(B

@c If no element in this alist matches a given file name, then
@c @code{default-buffer-file-type} says how to treat the file.
$B;XDj$7$?%U%!%$%k$,$3$NO"A[%j%9%H$NMWAG$K0lCW$7$J$$$H!"(B
@code{default-buffer-file-type}$B$,%U%!%$%k$N07$$J}$r;XDj$9$k!#(B
@end defopt

@defopt default-buffer-file-type
@c This variable says how to handle files for which
@c @code{file-name-buffer-file-type-alist} says nothing about the type.
$B$3$NJQ?t$O!"(B@code{file-name-buffer-file-type-alist}$B$,;XDj$7$J$$7?$N(B
$B%U%!%$%k$N07$$J}$r;XDj$9$k!#(B

@c If this variable is non-@code{nil}, then these files are treated as
@c binary: the coding system @code{no-conversion} is used.  Otherwise,
@c nothing special is done for them---the coding system is deduced solely
@c from the file contents, in the usual Emacs fashion.
$B$3$NJQ?t$,(B@code{nil}$B0J30$G$"$k$H!"$=$N$h$&$J%U%!%$%k$O%P%$%J%j$H$7$F07$o$l!"(B
$B%3!<%G%#%s%0%7%9%F%`(B@code{no-conversion}$B$rMQ$$$k!#(B
$B$5$b$J$1$l$P$=$l$i$KBP$7$FFCJL$J$3$H$r9T$o$:$K!"(B
Emacs$B$NDL>o$N$H$*$j$K%U%!%$%kFbMF$+$i%3!<%G%#%s%0%7%9%F%`$r7hDj$9$k!#(B
@end defopt

@node Input Methods
@c @section Input Methods
@section $BF~NOJ}<0(B
@c @cindex input methods
@cindex $BF~NOJ}<0(B

@c   @dfn{Input methods} provide convenient ways of entering non-@sc{ASCII}
@c characters from the keyboard.  Unlike coding systems, which translate
@c non-@sc{ASCII} characters to and from encodings meant to be read by
@c programs, input methods provide human-friendly commands.  (@xref{Input
@c Methods,,, emacs, The GNU Emacs Manual}, for information on how users
@c use input methods to enter text.)  How to define input methods is not
@c yet documented in this manual, but here we describe how to use them.
@dfn{$BF~NOJ}<0(B}$B!J(Binput method$B!K$O!"(B
$B%-!<%\!<%I$+$iHs(B@sc{ASCII}$BJ8;z$rF~NO$9$k4JJX$JJ}K!$rDs6!$7$^$9!#(B
$B%W%m%0%i%`$,FI$_<h$k$?$a$NHs(B@sc{ASCII}$BJ8;z$NId9fJQ49$r9T$&(B
$B%3!<%G%#%s%0%7%9%F%`$H0[$J$j!"(B
$BF~NOJ}<0$O?M4V8~$1$N%3%^%s%I$rDs6!$7$^$9!#(B
$B!J%F%-%9%H$rF~NO$9$k$?$a$NF~NOJ}<0$N;H$$J}$K$D$$$F$O!"(B
@pxref{Input Methods,, $BF~NOJ}<0(B, emacs, GNU Emacs $B%^%K%e%"%k(B}$B!#!K(B
$BF~NOJ}<0$NDj5AJ}K!$K$D$$$F$OK\=q$G$O$^$@L@J82=$7$F$"$j$^$;$s$,!"(B
$B$3$3$G$O$=$l$i$N;H$$J}$K$D$$$F=R$Y$^$9!#(B

@c   Each input method has a name, which is currently a string;
@c in the future, symbols may also be usable as input method names.
$B3FF~NOJ}<0$K$OL>A0$,$"$j$^$9!#(B
$B$=$l$O8=:_$N$H$3$mJ8;zNs$G$9$,!"(B
$B>-Mh$OF~NOJ}<0L>$H$7$F%7%s%\%k$b;H$($k$h$&$K$J$j$^$9!#(B

@tindex current-input-method
@defvar current-input-method
@c This variable holds the name of the input method now active in the
@c current buffer.  (It automatically becomes local in each buffer when set
@c in any fashion.)  It is @code{nil} if no input method is active in the
@c buffer now.
$B$3$NJQ?t$O!"%+%l%s%H%P%C%U%!$G8=:_3h@-$JF~NOJ}<0$NL>A0$rJ];}$9$k!#(B
$B!J$3$NJQ?t$K@_Dj$9$k$H<+F0E*$K%P%C%U%!%m!<%+%k$K$J$k!#!K(B
@code{nil}$B$G$"$k$H!"%P%C%U%!$G$OF~NOJ}<0$,3h@-$G$O$J$$!#(B
@end defvar

@tindex default-input-method
@defvar default-input-method
@c This variable holds the default input method for commands that choose an
@c input method.  Unlike @code{current-input-method}, this variable is
@c normally global.
$B$3$NJQ?t$O!"F~NOJ}<0$rA*$V%3%^%s%I8~$1$N%G%U%)%k%H$NF~NOJ}<0$rJ];}$9$k!#(B
@code{current-input-method}$B$H0[$J$j!"$3$NJQ?t$ODL>o$O%0%m!<%P%k$G$"$k!#(B
@end defvar

@tindex set-input-method
@defun set-input-method input-method
@c This function activates input method @var{input-method} for the current
@c buffer.  It also sets @code{default-input-method} to @var{input-method}.
@c If @var{input-method} is @code{nil}, this function deactivates any input
@c method for the current buffer.
$B$3$N4X?t$O!"%+%l%s%H%P%C%U%!$K$*$$$F(B
$BF~NOJ}<0(B@var{input-method}$B$r3h@-$K$9$k!#(B
@code{default-input-method}$B$K$b(B@var{input-method}$B$r@_Dj$9$k!#(B
@var{input-method}$B$,(B@code{nil}$B$G$"$k$H!"(B
$B$3$N4X?t$O%+%l%s%H%P%C%U%!$NF~NOJ}<0$rIT3h@-$K$9$k!#(B
@end defun

@tindex read-input-method-name
@defun read-input-method-name prompt &optional default inhibit-null
@c This function reads an input method name with the minibuffer, prompting
@c with @var{prompt}.  If @var{default} is non-@code{nil}, that is returned
@c by default, if the user enters empty input.  However, if
@c @var{inhibit-null} is non-@code{nil}, empty input signals an error.
$B$3$N4X?t$O!"%W%m%s%W%H(B@var{prompt}$B$rMQ$$$F%_%K%P%C%U%!$GF~NOJ}<0L>$rFI$`!#(B
@var{default}$B$,(B@code{nil}$B0J30$G$"$k$H!"(B
$B%f!<%6!<$,6u$NF~NO$r$9$k$H%G%U%)%k%H$G$3$l$rJV$9!#(B
$B$7$+$7!"(B@var{inhibit-null}$B$,(B@code{nil}$B0J30$G$"$k$H!"(B
$B6u$NF~NO$O%(%i!<$rDLCN$9$k!#(B

@c The returned value is a string.
$BLa$jCM$OJ8;zNs$G$"$k!#(B
@end defun

@tindex input-method-alist
@defvar input-method-alist
@c This variable defines all the supported input methods.
@c Each element defines one input method, and should have the form:
$B$3$NJQ?t$O!";HMQ2DG=$J$9$Y$F$NF~NOJ}<0$rDj5A$9$k!#(B
$B3FMWAG$O(B1$B$D$NF~NOJ}<0$rDj5A$7!"$D$.$N7A$G$"$k$3$H!#(B

@example
(@var{input-method} @var{language-env} @var{activate-func}
 @var{title} @var{description} @var{args}...)
@end example

@c Here @var{input-method} is the input method name, a string;
@c @var{language-env} is another string, the name of the language
@c environment this input method is recommended for.  (That serves only for
@c documentation purposes.)
$B$3$3$G!"(B@var{input-method}$B$OF~NOJ}<0L>$G$"$jJ8;zNs$G$"$k!#(B
@var{language-env}$B$bJL$NJ8;zNs$G$"$jEv3:F~NOJ}<0$r(B
$B?d>)$9$k8@8l4D6-$NL>A0$G$"$k!#(B
$B!J$3$l$O@bL@J8L\E*$N$?$a$@$1$G$"$k!#!K(B

@c @var{title} is a string to display in the mode line while this method is
@c active.  @var{description} is a string describing this method and what
@c it is good for.
@var{title}$B$O!"$3$NF~NOJ}<0$,3h@-$G$"$k>l9g$K(B
$B%b!<%I9T$KI=<($5$l$kJ8;zNs$G$"$k!#(B
@var{description}$B$O$3$NF~NOJ}<0$H2?8~$-$G$"$k$+$r(B
$B@bL@$9$kJ8;zNs$G$"$k!#(B

@c @var{activate-func} is a function to call to activate this method.  The
@c @var{args}, if any, are passed as arguments to @var{activate-func}.  All
@c told, the arguments to @var{activate-func} are @var{input-method} and
@c the @var{args}.
@var{activate-func}$B$O!"$3$NF~NOJ}<0$r3h@-$K$9$k$?$a$K8F$S=P$94X?t$G$"$k!#(B
@var{args}$B$,$"$l$P(B@var{activate-func}$B$X$N0z?t$H$7$FEO$5$l$k!#(B
$B$D$^$j!"(B@var{activate-func}$B$N0z?t$O(B@var{input-method}$B$H(B@var{args}$B$G$"$k!#(B
@end defvar

@c   The fundamental interface to input methods is through the
@c variable @code{input-method-function}.  @xref{Reading One Event}.
$BF~NOJ}<0$KBP$9$k4pK\E*$J%$%s%?!<%U%'%$%9$O(B
$BJQ?t(B@code{input-method-function}$B$r2p$7$F9T$$$^$9!#(B
@xref{Reading One Event}$B!#(B