1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314
|
.\" Hey Emacs! This file is -*- nroff -*- source.
.\"
.\" Copyright (C) Markus Kuhn, 1995, 2001
.\"
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License as
.\" published by the Free Software Foundation; either version 2 of
.\" the License, or (at your option) any later version.
.\"
.\" The GNU General Public License's references to "object code"
.\" and "executables" are to be interpreted as the output of any
.\" document formatting or typesetting system, including
.\" intermediate and printed output.
.\"
.\" This manual is distributed in the hope that it will be useful,
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public
.\" License along with this manual; if not, write to the Free
.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111,
.\" USA.
.\"
.\" 1995-11-26 Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>
.\" First version written
.\" 2001-05-11 Markus Kuhn <mgk25@cl.cam.ac.uk>
.\" Update
.\"
.\" Japanese Version Copyright (c) 1997 HANATAKA Shinya
.\" all rights reserved.
.\" Translated Thu Jun 3 20:36:31 JST 1997
.\" by HANATAKA Shinya <hanataka@abyss.rim.or.jp>
.\" Updated & Modified Sat Jun 23 07:30:09 JST 2001
.\" by Yuichi SATO <ysato@h4.dion.ne.jp>
.\"
.\"WORD:
.\"WORD: diacritical mark ȯ
.\"WORD: International Phonetic Alphabet ݲ
.\"WORD:
.\"
.TH UNICODE 7 2001-05-11 "GNU" "Linux Programmer's Manual"
.SH ̾
Unicode \- ʸ
.SH
ݵ
.B ISO 10646
.B "ʸ (Universal Character Set (UCS))"
Ƥ롣
UCS ¾ʤʸʸƴޤǤ롣
ˡ
.B "θߴ (round-trip compatibility)"
ݾڤ롣
㤨¾椫 UCS Ѵ˸ѴȤƤ⡢
ξ⼺ʤʤ褦Ѵơ֥뤳ȤǤ롣
UCS ϸŪΤƤƤθɽΤɬפʸޤǤ롣
ˤϥƥʸꥷʸʸإ֥饤ʸӥʸ
˥ʸ른ʸǤʤܡڹǻȤƤ
ˤϡʿ̾Ҳ̾ϥʸ
ǡʡʸ٥ʸ७ʸ顼ʸ
䡼ʸߡʸƥ륰ʸʥʸޥ䡼ʸ
ʸ饪ʸʸܥݥեʸ ()
٥åʸ롼ʸԥʸʥʸ
ʸʸ
ʸߥޡʸϥʸ
ʸ (׳) ʸʤɤޤޤ롣
ޤСƤʤʸդƤ⡢
ԥ塼ǻѤ뤿
ɤΤ褦ʥɤäȤɤȤ椬ʤƤꡢ
ǽŪˤɲä
ҥդŪʤʥɡ衼åѸǤʤ
ƥʸʸʸʤɤοŪʸФƤ롣
UCS ϡʸ˲äơTeX, PostScript, APL, MS-DOS, MS-Windows,
Macintosh, OCR եȡ¿Υɥץå
ǥƥࡢʤɤ
桦桦ص桦ʳصʤɤ¿ޤ褦ˤʤä
UCS (ISO 10646)
.I "31ӥåȤʸ祢ƥ㡼"
ҤƤꡢ128 Ĥ 24 ӥå
.IR " (" group )
鹽Ƥ롣
Ʒ 256 Ĥ 16 ӥå
.IR " (" plane )
ʬ䤵Ƥꡢʸ 256 Ĥ 8 ӥå
.IR " (" row )
256
.IR " (" column )
˰֤롣
εʤ Part 1
.RB ( "ISO 10646-1" )
Ǥϡǽ 65534 ĤΥɰ (0x0000 0xfffd) Ƥ롣
0 0 ̤Ǥ
.IR "¿ (Basic Multilingual Plane (BMP))"
롣
εʤ Part 2
.RB ( "ISO 10646-2" )
Ǥϡ 0 BMP γǤ
0x10000 0x10ffff ϰϤˤ
.I ""
ʸɲä
εʤǤ 0x10ffff ֤ۤʸɲäͽϤʤΤǡ
ͽۤǤ뾭ˤƤϡ
ɶ֤Τ롼 0 ΰʬϼºݤˤϻȤ뤳ȤϤʤ
BMP ˤ¾ʸǰ̤˻ȤƤʸޤޤƤ롣
ISO 10646-2 ɲä줿̤ϡ
βʳʬǡȡ⼡ץȥ롦
Υեδ֤ʤɤǻȤüʸС롣
.PP
UCS ʸ 2 ХȤΥɤɽΤ
.B UCS-2
Ǥ (BMP ʸΤ)
ޤ
.B UCS-4
Ǥʸ 4 ХȤΥɤɽ롣
ˡASCII 륽եȥؤβ̸ߴΤ
.B UTF-8
ɷ롣
ޤ0x10ffff ޤǤ BMP ʸ
UCS-2 бեȥȤθߴΤ
.B UTF-16
ɷ롣
.PP
UCS ʸ 0x0000 0x007f ϡŵŪ
.B US-ASCII
ʸʸƱǤ롣
ޤ 0x0000 0x00ff ϰϤǤϡ
.B ISO 8859-1 Latin-1
ʸʸƱǤ롣
.SS "ʸ (Combining Characters)"
.B UCS
ΤĤΥɡݥȤ
.I "ʸ (combining characters)"
˳ƤƤ롣
ϥץ饤ΰưʤȡ˻Ƥ롣
ʸľʸ˥ȤΤߤä롣
ǤפʥդʸϤ켫ȤΥɤ UCS ˻äƤ롣
ǹʸƤʸ˥Ȥȯä뤳ȤǤ롣
ʸϾˤ줬ʸ³
㤨Хɥĸʸ A 饦 ("Latin capital letter A with diaeresis")
UCS äƽ줿 0x00c4 Ǥ⡢
̾ A "Latin capital letter A"
"combining diaeresis (ʬ)" ³ȹ礻
(0x0041 0x0308) ΤɤǤɽ뤳ȤǤ롣
.PP
ʸϡʸؿΥɡ
ݲȤ桼ʤɤˤɬܤǤ롣
.SS ٥
ƤΥƥ˹ʸΤ褦ʿʤݡȤԤƤ櫓ǤϤʤ
ISO 10646-1 ϰʲλʳ UCS μ٥ꤷƤ롣
.TP 0.9i
Level 1
ʸ
.B ϥ롦ʸ
(ʴڹīʸ沽
沽Ǥϡϥ벻Υդ
3 Ĥޤ 2 Ĥ첻ҲɤȤ߹碌沽) ϥݡȤʤ
.TP
Level 2
Level 1 ƱͤʸɬܤȤΤʸ
(㤨Сʸ饪ʸإ֥饤ʸӥʸ
ǡʡʸޥ䡼ʸʤ) ϻȤ롣
.TP
Level 3
Ƥ
.B UCS
ʸݡȤ롣
.PP
.B ˥ɡ (Unicode Consortium)
ȯԤ줿
.B Unicode 3.0 Standard
ϡISO 10646-1:2000 ˵Ҥ줿
.B UCS Basic Multilingual Plane
level 3 ƱǤ롣
.B Unicode 3.1
Ǥ ISO 10646-2 ̤ɲäƤ롣
Unicode Consortium ȯԤ Unicode ʤȵѥݡȤˤꡢ
ʸΰ̣ȿ侩ˡˤĤƤιʤ롣
εʽ䵻ѥݡȤǡUnicode ʸ
Խ¤ؤӡѴɽ뤿
ɥ饤ȥ르ꥺबʬ롣
.SS "Linux ˤ Unicode"
GNU/Linux ǤϡC η
.B wchar_t
դ 32 ӥåǤ롣
ͤ C 饤֥ˤ (٤ƤΥˤ)
.B UCS
ɤͤȤƲᤵ롣
GNU C 饤֥꤬ץꥱΤ餻뤿εȤơ
.B __STDC_ISO_10646__
롣
ISO C99 ʤǻꤵƤ롣
ASCII ߴ
.B UTF-8
ޥХȥɤǤϡϥȥࡦü̿
ץ졼ƥȥե롦ե̾Ķѿˤơ
UCS/Unicode ASCII Τ褦˻ȤȤǤ롣
UTF-8 ʸɤȤƻȤȤ
ƤΥץꥱΤ餻뤿ˤϡ
("LANG=en_GB.UTF-8" Τ褦) ĶѿȤäŬڤ
.I (locale)
ʤФʤʤ
.PP
.B nl_langinfo(CODESET)
ؿ줿ɤ֤̾
Ū
.I wchar_t
ʸʸƥʸɤѴ (Ѵ) Τ˻Ȥ
.BR wctomb (3)
.BR mbsrtowcs (3)
ˤ
.BR wcwidth (3)
Ȥä饤֥ؿϡ
ʸϤǤɤ뤬ʤ (0\(en2) ֤
.PP
Ū˸ȡLinux ǤϸߤΤȤ
BMP level 1 ΤߤȤ٤Ǥ롣
ʸ (Ȥ˥ʸ) Ǥϡ
١ʸ 2 ĤޤǤιʸȤȤ
UTF-8 üߥ졼 ISO 10646 ե (level 2) ǥݡȤƤ롣
Ū˸С⤷ǽʤФ餫ʸȤ٤Ǥ
(Unicode Ǥϡ
.B "Normalization Form C (ʸ)"
Ȥ)
.SS ץ饤١ȡꥢ
.B BMP
0xe000 0xf8ff ϰϤϡʤǤϤʤʸƤ
ŪʻѤΤͽƤ롣
Linux ߥ˥ƥǤϡ
Υץ饤١ȡꥢ˺٤ʬ䤷ƻѤ롣
0xe000 0xefff ϰϤϥɡ桼ġ˻Ѥ뤳ȤǤ롣
0xf000 0xf8ff ϰϤ Linux Zone
Ƥ Linux 桼Ƕ̤˻Ѥ롣
Linux Zone ؤʸƤϿϡ
H. Peter Anvin <Peter.Anvin@linux.org> ˤäƴƤ롣
.SS ʸ
.TP 0.2i
*
Information technology \(em Universal Multiple-Octet Coded Character
Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
International Standard ISO/IEC 10646-1, International Organization
for Standardization, Geneva, 2000.
.B UCS
θʻͤǤ롣
http://www.iso.ch/ ʸǤ CD-ROM PDF եȤǤ롣
.TP
*
The Unicode Standard, Version 3.0.
The Unicode Consortium, Addison-Wesley,
Reading, MA, 2000, ISBN 0-201-61633-5.
.TP
*
S. Harbison, G. Steele. C: A Reference Manual. Fourth edition,
Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3.
C ץˤĤƤΤȤƤɤͽǤ롣
ǤǤϡ磻ʸޥХʸɤ
¿ο C 饤֥ؿ
ä줿 ISO C90 ʤ 1994 Amendment 1 СƤ롣
磻ʸޥХʸΥݡȤ
˲ ISO C99 ϡޤСƤʤ
.TP
*
Unicode ѥݡȡ
.RS
http://www.unicode.org/unicode/reports/
.RE
.TP
*
Markus Kuhn: Unix/Linux Τ UTF-8 Unicode FAQ
.RS
http://www.cl.cam.ac.uk/~mgk25/unicode.html
.I linux-utf8
ꥹȤɤ뤿ξ롣
Linux Unicode ȤΥɥХõΤ˰ɤǤ롣
.RE
.TP
*
Bruno Haible: Unicode HOWTO.
.RS
ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html
.RE
.SH Х
Υޥ˥奢롦ڡǸ˲ǡ
GNU C 饤֥
.B UTF-8
ݡȤϴƤ롣
XFree86 ˤ륵ݡȤϿʹǤ롣
.B UTF-8
DzŬ˻Ȥ륢ץꥱ
(¿ͭ̾ʥǥ) κϡޤʹǤ롣
Linux Ǥ
.B UCS
ݡȤǤ̾ CJK 2 磻ʸ롣
ñʽŤǤˤʸ⤢롣
麸ؽʸإ֥饤ʸӥʸɸʸʤɤ
֤ɬפȤʸϥݡȤƤʤ
ߡʸ줿ƥ襨
GUI ץꥱ (HTML ӥ塼ɥץå) ǤΤ
ݡȤƤ롣
.\" .SH
.\" Markus Kuhn <mgk25@cl.cam.ac.uk>
.SH Ϣ
.BR setlocale (3),
.BR charsets (7),
.BR utf-8 (7)
|