1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
|
.\" Hey Emacs! This file is -*- nroff -*- source.
.\"
.\" Copyright (C) Markus Kuhn, 1995
.\"
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License as
.\" published by the Free Software Foundation; either version 2 of
.\" the License, or (at your option) any later version.
.\"
.\" The GNU General Public License's references to "object code"
.\" and "executables" are to be interpreted as the output of any
.\" document formatting or typesetting system, including
.\" intermediate and printed output.
.\"
.\" This manual is distributed in the hope that it will be useful,
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public
.\" License along with this manual; if not, write to the Free
.\" Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139,
.\" USA.
.\"
.\" 1995-11-26 Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>
.\" First version written
.\" İȨ mapping, Laser www.linuxforum.net 2000
.\"
.TH UNICODE 7 "1995-12-27" "Linux" "Linux Programmer's Manual"
.SH (NAME)
Unicode \- 16 λͳһַ
.SH (DESCRIPTION)
ʱ
.B ISO 10646
.BR "ַͨ (Universal Character Set, UCS)".
.B UCS
бַַ,ұ֤
.BR " (round-trip compatibility)",
Ҳ˵һַ
.B UCS
καַ֮תʱ, תԱ֤Ϣʧ
.B UCS
˱ʾַַ֪Ȱ
Щʹչ,ҲЩ: Greek,
Cyrillic, Hebrew,Arabic, Armenian, Gregorian, Japanese,
Chinese, Hiragana, Katakana, Korean, Hangul, Devangari,
Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
Kannada, alayam, Thai, Lao, Bopomofo,ȵ.,
Tibetian, Khmer, Runic, Ethiopian, Hieroglyphics,
Indo-European , , ڱ
.1993 귢ñʱ, ܶԺЩ
еĴõı. , Щַ, Լ
TeX, PostScript, MS-DOS, Macintosh, Videotext, OCR, к
ִϵͳṩĴͼ, ӡˢ, ѧͿѧ,
, һЩرԱ֤Ѵַ
Ŀת.
.B UCS
(ISO 10646) һ 31 λַϵ, , Ŀǰ
ֻʹǰ 65534 λ (0x0000-0xfffd, DZΪ
.BR "Կ (Basic Multilingual Plane,BMP))",
ַ,
ֻЩܹŹֵַ(磮 Hieroglyphics)Ϊר
ĿѧĿ, Żڽijʱ, Ҫ 16 λ BMP ֮IJ.
0x0000 0x007f ֮
.B UCS
ַ;
.B US-ASCII
ַһ,
0x0000 0x00ff ַ֮
.B ISO 8859-1 Latin-1
ַ
.SH ַ (COMBINING CHARACTERS)
һЩ
.B UCS
뱻
.BR "ַ(combining characters)".
еڴֻϵ. һַֻ
ǰַһ.
.BR UCS
ҪַԼı,
, ַһַĿʶǺ.
ַǸЩεַ. , Umlaut-A
(Ĵдĸ A)ȿԱʾΪ
.B UCS
0x00c4, Ҳ
һ"дĸ A"һ"Ϸ":
0x0041 0x0308 ʾ
.SH ʵּ (IMPLEMENTATION LEVELS)
ڲϵͳַ֧ĸ, ISO 10646
ָ
.BR UCS
ʵּ:
.TP 0.9i
1 (Level 1)
ַ֧ Hangul Jamo ַ(һָ
ӵרõı, Hangul ڱַ).
.TP
2 (Level 2)
ڼ1, ȴһЩҲ֧һЩַ.
(磮 Hebrew, Arabic, Devangari, Bengali, Gurmukhi,
Gujarati, Oriya, Tamil, Telugo, Kannada, Malayalam, Thai Lao).
.TP
3 (Level 3)
֧
.B UCS
ַ.
.PP
Unicode Эᷢ Unicode 1.1 ISO 10646
, ڵ 3 ִмֻ
.B UCS (Կ Basic Multilingual Plane).
Unicode 1.1 ΪһЩ ISO 10646 ַ
һЩ嶨.
.SH LINUX µ UNICODE (UNICODE UNDER LINUX)
Linux , Ϊ˽ַʵָ, Ŀǰִֻ
м 1 µ
.B BMP.
ߵִмʺרŵִʽ,
һͨϵͳַ. linux C
.B wchar_t
һ
зλ 32 λͲֵΪ
.B UCS4
룮
ػָϵͳַʹ
.B UTF-8
.BR "ISO 8859-1" ı룮
⺯
.BR wctomb,
.BR mbtowc,
.B wprintf
Ϳڲ
.B wchar_t
ַַϵͳַ֮ת.
.SH ˽ (PRIVATE AREA)
.BR BMP
, 0xe000 0xf8ff ķΧ˽Զ
κַ. Linux , ˽ϸΪԱκնû
ʹõ 0xe000 0xefff ķΧ, Լ 0xf000 0xf8ff linux
ûõ linux .H. Peter Anvin(<Peter.Anvin@linux.org>,
Yggdrasil Computing,Inc) άǼǷ䵽 linux ַ.
һЩ Unicode ȱٵ DEC VT100 ͼַ, ʹ̨
建ֱӻЩַ, һЩ Klingon
Ĺʹõַ.
.SH (LITERATURE)
.TP 0.2i
*
Information technology \- Universal Multiple-Octet Coded Character
Set (UCS) \- Part 1: Architecture and Basic Multilingual Plane.
International Standard ISO 10646-1, International Organization
for Standardization, Geneva, 1993.
.BR UCS
ʽ淶, dzʽ, Ҳܺ, dz. Ҫ
Ϣ, ȥ www.iso.ch.
.TP
*
The Unicode Standard \- Worldwide Character Encoding Version 1.0.
The Unicode Consortium, Addison-Wesley,
Reading, MA, 1991.
Unicode Ѿ 1.1.4 , 1.0 IJ ftp.unicode.org ҵ.
Unicode 2.0 Ҳ 1996 һ.
.TP
*
S. Harbison, G. Steele. C \- A Reference Manual. Fourth edition,
Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3.
һܺõ C Ա̲ο. ڵĵİ 1994 Ա
ISO C ĵһ (ISO/IEC 9899:1990), ˴
ַµ C ⺯.
.SH ȱ (BUGS)
дֲҳʱ,linux
.B UCS
C Կ֧Զδ.
.SH (AUTHOR)
Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>
.SH ּ(SEE ALSO)
.B utf-8(7)
.B http://www.linuxforum.net/books/UTF-8-Unicode.html
.SH "[İά]"
mapping Email: mapping@263.net
.SH "[ݸ]"
2000/11/06
.SH "[й Linux ̳ man ֲҳƻ]"
.TP
.BI www.cmpp.net/
|