1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
|
Unicode Ѵѳĥ⥸塼
version 0.5.3
-
ܥ⥸塼ϡISO/IEC 10646 (Unicode) ʸ ܸʸ
ȤѴΤγĥ⥸塼Ǥ
ݡȤƤ ǥ UCS-4, UTF-16, UTF-8,
EUC-JP, CP932 (Windows ǻȤƤ Shift_JIS Ѽ) Ǥ
ưŪ˥ǥǧ뵡ǽϤޤ
EUC-JP Unicode Ȥδ֤Ѵơ֥뤬 version 0.3.x ޤ
ȤѹƤΤǤղ
- ȡ
Ruby-1.6 ʹߤǤưǧƤޤRuby-1.6.7
ʹߤλѤᤷޤ
Ŭʥǥ쥯ȥ uconv ΥŸƤ
gzip -dc < uconv-version.tar.gz | tar xvf -
cd uconv
EUC-JPCP932 Ѵơ֥äƤ뤿ᡤʤ礭
塼ˤʤޤ⤷EUC-JP ޤ CP932 Τɤ餫
Ȥʤ extconf.rb USE_EUC USE_SJIS ȥ
ǤޤWindows ξ硤USE_WIN32API ꤹ
CP932 Ѵ Win32 API ȤΤǡ⥸塼
Ǥޤ
̾Υ⥸塼륤ȡԤäƤưŪ
бƤϰʲ̤Ǥ
ruby extconf.rb
make
make install
- Ȥ
Ruby make Ū˥Ƥʤϡ
require "uconv"
ȤƤѤƤ
- ⥸塼ؿ
u16swap (u2swap)u4swap ʳ UTF-16, UCS-4 ʸϥ
륨ǥǤʤƤϤʤޤ
UCS-2 äƤؿ UTF-16 褦ѹ
ޤ
ؿˤäƤϤ٤Ƥ ZERO WIDTH NO-BREAK SPACE (U+FEFF)
BYTE ORDER MARK (BOM) ȤߤʤƺƤޤޤ
| Ѵ
| EUC-JP CP932 UTF-8 UTF-16 UCS-4
---------+------------------------------------------------
EUC-JP| \ - euctou8 euctou16 -
CP932 | - \ sjistou8 sjistou16 -
UTF-8 | u8toeuc u8tosjis \ u8tou16 u8tou4
UTF-16| u16toeuc u16tosjis u16tou8 u16swap u16tou4
USC-4 | - - u4tou8 u4tou16 u4swap
utf16 = Uconv.u16swap(utf16)
ucs2 = Uconv.u2swap(ucs2)
utf16 = Uconv.u16swap!(utf16)
ucs2 = Uconv.u2swap!(ucs2)
UTF-16 ʸХȥåפޤȥ륨ǥ
ξϥӥåǥѴޤ
! դؿϰʸľѹޤ
ucs4 = Uconv.u4swap(ucs4)
ucs4 = Uconv.u4swap!(ucs4)
UCS-4 ʸХȥåפޤ1234 ȤХȥ
4321 Ѵޤ
! դؿϰʸľѹޤ
utf16 = Uconv.u8tou16(utf8)
ucs2 = Uconv.u8tou2(utf8)
UTF-8 ʸ UTF-16 ʸѴޤ UTF-8
㳰ȯޤUTF-16 ɽǤʸ
(U-00000000 U-0010FFFF) ʳʸäƤ㳰
ȯޤ
utf8 = Uconv.u16tou8(utf16)
utf8 = Uconv.u2tou8(ucs2)
UTF-16 ʸ UTF-8 ʸѴޤǥեȤǤ
ZWNBSP (U+FEFF) Ϻޤʥȡڥ
㳰ȯޤ
utf8 = Uconv.u4tou8(ucs4)
UCS-4 ʸ UTF-8 ʸѴޤǥեȤǤ
ZWNBSP (U+FEFF) Ϻޤ
ucs4 = Uconv.u8tou4(utf8)
UTF-8 ʸ UCS-4 ʸѴޤ UTF-8
㳰ȯޤ
utf16 = Uconv.u4tou16(ucs4)
UCS-4 ʸ UTF-16 ʸѴޤUTF-16 ɽ
Ǥʸ (U-00000000 U-0010FFFF) ʳʸä
㳰ȯޤ
ucs4 = Uconv.u16tou4(utf16)
UTF-16 ʸ UCS-4 ʸѴޤʥ
ȡڥ㳰ȯޤ
euc = Uconv.u16toeuc(utf16)
euc = Uconv.u2toeuc(ucs2)
UTF-16 ʸ EUC-JP ʸѴޤѴǤʤʸ
Uconv.unknown_unicode_handler ̤ξ硤'?'
ˤʤޤ
utf16 = Uconv.euctou16(euc)
ucs2 = Uconv.euctou2(euc)
EUC-JP ʸ UTF-16 ʸѴޤ
euc = Uconv.u8toeuc(utf8)
UTF-8 ʸ EUC-JP ʸѴޤ
u16toeuc(u8tou16(utf8)) ƱǤ
utf8 = Uconv.euctou8(euc)
EUC-JP ʸ UTF-8 ʸѴޤ
u16tou8(euctou16(euc)) ƱǤ
sjis = Uconv.u16tosjis(utf16)
sjis = Uconv.u2tosjis(ucs2)
UTF-16 ʸ CP932 ʸѴޤѴǤʤʸ
Uconv.unknown_unicode_handler ̤ξ硤'?'
ˤʤޤ
utf16 = Uconv.sjistou16(sjis)
ucs2 = Uconv.sjistou2(sjis)
CP932 ʸ UTF-16 ʸѴޤ
sjis = Uconv.u8tosjis(utf8)
UTF-8 ʸ CP932 ʸѴޤ
u16tosjis(u8tou16(utf8)) ƱǤ
utf8 = Uconv.sjistou8(sjis)
CP932 ʸ UTF-8 ʸѴޤ
u16tou8(sjistou16(sjis)) ƱǤ
euc = Uconv.unknown_unicode_handler(unicode)
** deprecated **
UTF-16 ޤ UTF-8 EUC-JP ޤ CP932 Ѵ
ȤѴǤʤ UCS ʸ ä˸ƤӽФ
ɥǤȤ Unocode ʸɤϤ
ޤͤȤʸ֤Ʋ֤Ǥ̤
Ǥ
euc = Uconv.unknown_unicode_euc_handler(unicode)
UTF-16 ޤ UTF-8 EUC-JP Ѵ
ȤѴǤʤ UCS ʸ ä˸ƤӽФ
ɥǤȤ Unocode ʸɤϤ
ޤͤȤʸ֤Ʋ֤Ǥ̤
Ǥ
sjis = Uconv.unknown_unicode_sjis_handler(unicode)
UTF-16 ޤ UTF-8 CP932 Ѵ
ȤѴǤʤ UCS ʸ ä˸ƤӽФ
ɥǤȤ Unocode ʸɤϤ
ޤͤȤʸ֤Ʋ֤Ǥ̤
Ǥ
unicode = Uconv.unknown_euc_handler(euc)
EUC-JP UTF-16 ޤ UTF-8 ѴȤˡJIS X
0208JIS X 0212 ̤ʸä˸ƤӽФ
ϥɥǤȤơ1 ХȤ 3 ХȤ
EUC-JP ʸϤޤͤȤ 31 ӥåȤ
֤Ƥ֤Ǥ̤Ǥ
unicode = Uconv.unknown_sjis_handler(sjis)
CP932 UTF-16 ޤ UTF-8 ѴȤˡCP932
̤ʸä˸ƤӽФϥɥǤ
Ȥơ1 ХȤޤ 2 ХȤ CP932 ʸ
ޤͤȤ 31 ӥåȤ֤Ƥ
֤Ǥ̤Ǥ
unicode = Uconv.euc_hook(euc)
unicode = Uconv.sjis_hook(sjis)
euc = Uconv.unicode_euc_hook(unicode)
sjis = Uconv.unicode_sjis_hook(unicode)
flag = Uconv::eliminate_zwnbsp
Uconv::eliminate_zwnbsp = flag
ZWNBSP ʸե饰λȡѹԤޤflag
true false ꤷƲͤ true Ǥ
true ξ硤u4tou8, u16tou8 ؿ ZWNBSP ʸ
ޤfalse ξ ZWNBSP ʸ¸ޤ
flag = Uconv::shortest
Uconv::shortest = flag
ûե饰λȡѹԤޤflag true
false ꤷƲͤ true Ǥtrue ξ硤
u8to* ؿ UTF-8 ʸûǤʤ㳰ȯ
ޤ
char = Uconv::replace_invalid
Uconv::replace_invalid(char)
UTF-8, UTF-16, UCS-4 ʸѴȤʥХ
äִʸȡꤷޤnil ꤷ
硤ʥХѴ褦Ȥ㳰ȯ
nil ʳꤵ줿硤ʥХ
˻ꤵ줿ɥݥȤʸޤͤ
nil Ǥ
-
ΥСǤ Windows Unicode եȤθߴ
θơUnicode Inc. Ѵơ֥˰äΤ
ȤäƤޤversion 0.4 Ǥ CP932 Ѵơ֥̤
ѰդΤǡEUC-JP Ѵơ֥ Unicode Inc. Ѵơ
֥ΤޤޤȤޤ
ΥСǤ WAVE DASH [U+301C] FULL WIDTH TILDE
[U+FF5E] EUC-JP ѴݡξȤ '' (EUC-JP:
A1C1) ˤƤޤversion 0.4 Ǥ FULL WIDTH TILDE
̤ʸˤʤޤդ EUC-JP '' UCS-2 ޤ
UTF-8 Ѵ U+FF5E ѴƤޤU+301C
Ѵ褦ˤʤޤ
CP932 Ѵơ֥Ȥ WAVE DASH ̤ʸ
FULL WIDTH TILDE '' (CP932: 8160) Ѵޤ
USE_WIN32API ꤷ硤UCS -> CP932 Ѵ̤ơ
ȤäȰۤʤ̤ˤʤ뤳Ȥޤ
-
ܳĥ⥸塼ϵͤݻޤ
ܳĥ⥸塼ϡRuby ΤΥ饤ˤäѤ
뤳ȤǤޤ
-
<yoshidam@yoshidam.net>
-
Jan 3, 2010 version 0.5.3 Ruby 1.9.1
Aug 23, 2004 version 0.5.2 pre-conversion hook for Win32
Aug 19, 2004 version 0.5.1 u2s, s2u, shift_jis-2004
Aug 16, 2004 version 0.5.0 pre-conversion hook, euc-jis-2004, eucjp-open
Jul 18, 2004 version 0.4.13 ϰϥå
Mar 12, 2003 version 0.4.12 for ruby 1.8.0
Oct 3, 2002 version 0.4.11 --enable-compat-win32api ɲ
(CP932 ơ֥ Win32API ߴˤ)
Sep 4, 2002 version 0.4.10 ν
Feb 10, 2002 version 0.4.9 replace_invalid ɲ
Dec 10, 2001 version 0.4.8 ֤δб
Nov 23, 2001 version 0.4.7 non-shortest form UTF-8 å
Exception Uconv::Error ѹ
Mar 4, 2001 version 0.4.6 s2u_conv2
USE_WIN32API ɲ
Jan 30, 2001 version 0.4.5 u2s_conv2
USC/CP932 Ѵơ֥ѹ
Apr 18, 2000 version 0.4.4 SJIS UCS ѴХ
Mar 11, 2000 version 0.4.3 non-constant initializers κ
Nov 23, 1999 version 0.4.2 eliminate_zwnbsp ե饰ɲ
ustring 饤֥ǿǤ˺ؤ
㴳ι®
Nov 19, 1999 version 0.4.1 addUString ΥХեå
Nov 5, 1999 version 0.4.0 CP932 б
Mar 29, 1999 version 0.3.1 GC ˳ͤʤΤ xmalloc λѤ
Feb 22, 1999 version 0.3.0 UCS-4UTF-16 ݡ
Jan 13, 1999 version 0.2.2 ݡ
Aug 15, 1998 version 0.2.1 Ѹ README եɲ
Jul 24, 1998 version 0.2 ѴǤʤʸäȤ˥ϥ
ɥƤӽФ褦ѹ
ˡѹ
Jul 8, 1998 version 0.1
|