1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
|
Muntilingualizaion of w3m
2003/03/08
H. Sakamoto
Introduction
I have tried the muntilingualization of w3m (w3m-m17n).
The patch for w3m-0.4.1 is available on the following site.
http://www2u.biglobe.ne.jp/~hsaka/w3m/index.html#m17n
patch/w3m-0.4.1-m17n-20030308.tar.gz
patch/README.m17n
It is a development version. And enough test is not preformed because
I can understand Japanese only. Please use, test, and report bugs.
Now, w3m-m17n has following functions.
Supported encoding schemes (character set)
* Japanese
EUC-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212
(EUC-JISX0213) (JIS X 0213)
ISO-2022-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212, etc.
ISO-2022-JP-2 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212,
GB 2312, KS X 1001, ISO 8859-1, ISO 8859-7, etc.
ISO-2022-JP-3 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213, etc.
Shift_JIS(CP932) - US_ASCII, JIS X 0208, JIS X 0201, CP932 extension
Shift_JISX0213 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213
* Chinese (simplified)
EUC-CN(GB2312) - US_ASCII, GB 2312
ISO-2022-CN - US_ASCII, GB 2312, CNS-11643-1,..7, etc.
GBK(CP936) - US_ASCII, GB 2312, GBK
GB18030 - US_ASCII, GB 2312, GBK, GB18030, Unicode,
HZ-GB-2312 - US_ASCII, GB 2312
* Chinese (Taiwan, tradisional)
EUC-TW - US_ASCII, CNS 11643-1,..16
ISO-2022-CN - US_ASCII, CNS-11643-1,..7, GB 2312, etc.
Big5 - Big5
HKSCS - Big5, HKSCS
* Korean
EUC-KR - US_ASCII, KS X 1001 Wansung
ISO-2022-KR - US_ASCII, KS X 1001 Wansung, etc.
Johab - US_ASCII, KS X 1001 Johab
UHC(CP949) - US_ASCII, KS X 1001 Wansung, UHC
* Vietnamese
TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258
* Thai
TIS-620 (ISO-8859-11), CP874
* Other
US_ASCII, ISO-8859-1 - 10, 13 - 15,
KOI8-R, KOI8-U, NeXT, CP437, CP737, CP775, CP850, CP852, CP855, CP856,
CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006,
CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257
* Unicode (UCS-4)
UTF-8, UTF-7
NOTE:
* The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are
treated as US_ASCII because they are used in tags of HTML document.
Another variant of US_ASCII is treated without change.
* JIS C 6226(old JIS) is treated as JIS X 0208.
* The sequence '~\n' of HZ is not supported.
Display
There are two method for multilingual diplay.
(1) kterm + ISO-2022-JP/CN/KR
* kterm can handle JIS X 0213, CNS 11643, if the following patch
is applied.
http://www.st.rim.or.jp/~hanataka/kterm-6.2.0.ext02.patch.gz
* Specify the fontList for kterm with -fl option or in ~/.Xdefaults.
-fl "*--16-*-jisx0213.2000-*,\
*--16-*-jisx0212.1990-0,\
*--16-*-ksc5601.1987-0,\
*--16-*-gb2312.1980-0,\
*--16-*-cns11643.1992-*,\
*--16-*-iso8859-*"
Fonts of JIS X 0213 exist in
http://www.mars.sphere.ne.jp/imamura/jisx0213.html
* Set the "display_charset" to ISO-2022-JP(or ISO-2022-JP-2, KR, CN),
and "strict_iso2022" to OFF on the option pannel. (see below)
(2) xterm + UTF-8
* Use xterm (xterm-140 or later) of XFree86.
http://www.clark.net/pub/dickey/xterm/xterm.html
* Fonts of Unicode exist in
http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
http://openlab.ring.gr.jp/efont/index.html.en
* Use xterm with -u8 option.
The fonts are specified such as
-fn "*-medium-*--13-*-iso10646-1" \
-fb "*-bold-*--13-*-iso10646-1" \
-fw "*-medium-*-ja-13-*-iso10646-1"
* Set the "display_charset" to UTF-8.
And, it is better that "pre_conv" is ON.
(3) mlterm + ISO-2022-JP/KR/CN
* Homepage
http://mlterm.sourceforge.net/
* Set encoding of mlterm to ISO-2022-JP/KR/CN or UTF-8.
* Set the "display_charset" to ISO-2022-JP/KR/CN or UTF-8.
Command line options
-I <document charset>
-O <display/output charset>
j(p): ISO-2022-JP
j(p)2: ISO-2022-JP-2
j(p)3: ISO-2022-JP-3
cn: ISO-2022-CN
kr: ISO-2022-KR
e(j): EUC-JP
ec,g(b): EUC-CN(GB2312)
et: EUC-TW
ek: EUC-KR
s(jis): Shift_JIS
sjisx0213: Shift_JISX0213
gbk: GBK
gb18030: GB18030
h(z): HZ-GB-2312
b(ig5): Big5
hk(scs): HKSCS
jo(hab): Johab
uhc: UHC
l?: ISO-8859-?
t(is): TIS-620(ISO-8859-11)
tc(vn): TCVN-5712 VN-1
v(iscii): VISCII 1.1
vp(s): VPS
ko(i8r): KOI8-R
koi8u: KOI8-U
n(ext): NeXT
cp???: CP???
w12??: CP12??
u(tf8): UTF-8
u(tf)7: UTF-7
Option pannel
display_charset
Display charset.
document_charset
Defalut Document charset.
auto_detect
Automatic charset detect when loading. (Default: ON)
system_charset
System charset. It is used for configuration files and file name.
follow_locale
System charset follows locale($LANG). (Default: ON)
ext_halfdump
Output with display charset when -halfdump.
search_conv
Adjust search string for document charset. (Default: ON)
use_wide
Use multi column characters. (Default: ON)
use_combining
Use combining characters. (Default: ON)
use_language_tag
Use Unicode language tags. (Default: ON)
ucs_conv
Charset conversion using Unicode map. (Default: ON)
pre_conv
Charset conversion when loading. (Default: OFF)
fix_width
Fix character width when conversion. (Default: ON)
If it is OFF, the rendering may collapse.
use_gb12345_map
Use GB 12345 Unicode map instead of GB 2312's. (Default: OFF)
If it is ON, GB2312 can be converted to Big5, EUC-TW, or EUC-JP.
use_jisx0201
Use JIS X 0201 Roman for ISO-2022-JP. (Default: OFF)
use_jisc6226
Use JIS C 6226:1978 for ISO-2022-JP. (Default: OFF)
use_jisx0201k
Use JIS X 0201 Katakana. (Default: OFF)
use_jisx0212
Use JIS X 0212:1990 (Supplemental Kanji). (Default: OFF)
use_jisx0213
Use JIS X 0213:2000 (2000JIS). (Default: OFF)
strict_iso2022
Strict ISO-2022-JP/KR/CN. (Default: ON)
If it is OFF, all ISO 2022 base character set can be displayed
with ISO-2022-JP/KR/CN.
east_asian_width
Use double width for some Unicode characters. (Default: OFF)
If it is ON, treat East Asian Ambiguous characters as double width.
gb18030_as_ucs
Treat 4 bytes char. of GB18030 as Unicode. (Default: OFF)
simple_preserve_space
Simple Preserve space.
If it is ON, a space is remained in Japanese and some other languages.
alt_entity
Use alternate expression with ASCII for entities. (Default: ON)
If it is OFF, entities are treated as ISO 8859-1
graphic_char
Use DEC special graphics for border of table and menu.
If it is OFF, ruled line is used with CJK charset or UTF-8.
Code conversion
The following special code conversions are supported.
* EUC-JP <-> ISO-2022-JP <-> Shift-JIS
* EUC-CN <-> ISO-2022-CN <-> HZ-GB-2312
* EUC-TW <-> ISO-2022-CN
* EUC-KR <-> ISO-2022-KR <-> Johab (only Symbol and Hanja)
Other conversions are based on Unicode.
Change document charset
Press '=' (show document infomation), and select document charaset.
If you specify the following keymaps,
keymap C CHARSET
keymap M-c DEFAULT_CHARSET
you can press `C' to change the current document charset,
and `M-c' to change the default document charset.
Line Editing
Input coding system is followed by display coding system.
NOTE:
* HZ can not be used as input coding system.
* Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because
SI(\017) and SO(\016) are already assigned as other command key.
(SO is assigned as `next-history'). If you want to use SI and SO,
press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of
7bit ISO-2022 are recognited. When you press C-@ again, the default
binding is set.
Regular expression
Multilingual regular expression is supported.
-----------------------------------
Change log
2003/03/08 w3m-0.4.1-m17n-20030308
* Base on w3m-0.4.1
2003/02/24 w3m-0.4-m17n-20030224
* Base on w3m-0.4
2003/02/11 w3m-0.4rc1-m17n-20030211
* Base on w3m-0.4rc1
2003/02/07 w3m-0.3.2.2-m17n-20030207
* Base on w3m-0.3.2.2+cvs-1.742
2003/02/01 w3m-0.3.2.2-m17n-20030201
* Base on w3m-0.3.2.2+cvs-1.734
2003/01/31 w3m-0.3.2.2-m17n-20030131
* Base on w3m-0.3.2.2+cvs-1.732
2003/01/23 w3m-0.3.2.2-m17n-20030123
* Base on w3m-0.3.2.2+cvs-1.705
2003/01/22 w3m-0.3.2.2-m17n-20030122
* Base on w3m-0.3.2.2+cvs-1.699
2003/01/01 w3m-0.3.2.2-m17n-20030101
* Base on w3m-0.3.2.2+cvs-1.655
2002/12/22 w3m-0.3.2.2-m17n-20021222
* Base on w3m-0.3.2.2+cvs-1.640
2002/12/19 w3m-0.3.2.2-m17n-20021219
* Base on w3m-0.3.2.2+cvs-1.635
2002/12/07 w3m-0.3.2.2-m17n-20021207
* Base on w3m-0.3.2.2+cvs-1.599
* Fixed a problem on int != long system
2002/11/27 w3m-0.3.2.1-m17n-20021127
* Base on w3m-0.3.2.1+cvs-1.562
2002/11/20 w3m-0.3.2-m17n-20021120
* Base on w3m-0.3.2+cvs-1.538
2002/11/18
* Added UTF-7 to auto detection of charset.
2002/11/16 w3m-0.3.2-m17n-20021116
* Base on w3m-0.3.2+cvs-1.526
2002/11/13 w3m-0.3.2-m17n-20021113
* Base on w3m-0.3.2+cvs-1.506
2002/11/12 w3m-0.3.2-m17n-20021112
* Base on w3m-0.3.2+cvs-1.498
2002/11/09 w3m-0.3.2-m17n-20021109
* Base on w3m-0.3.2+cvs-1.490
2002/11/07 w3m-0.3.2-m17n-20021107
* Base on w3m-0.3.2
* Applied [w3m-dev 03371]
2002/10/22 w3m-0.3.1-m17n-20021022
* Base on w3m-0.3.1+cvs-1.444
2002/07/17 w3m-0.3.1-m17n-20020717
* Base on w3m-0.3.1
2002/05/29 w3m-0.3-m17n-20020529
* Base on w3m-0.3+cvs-1.379.
2002/03/16 w3m-0.3-m17n-20020316
* Base on w3m-0.3+cvs-1.353.
2002/03/11 w3m-0.3-m17n-20020311
* Base on w3m-0.3+cvs-1.342.
* Some bug fixes.
2002/02/16 w3m-0.2.5-m17n-20020216
* Base on w3m-0.2.5+cvs-1.319.
* Added an option "use_wide"
2002/02/05 w3m-0.2.5-m17n-20020205
* Base on w3m-0.2.5+cvs-1.302.
2002/02/02 w3m-0.2.5-m17n-20020202
* Base on w3m-0.2.5+cvs-1.291.
2002/01/31 w3m-0.2.4-m17n-20020131
* Base on w3m-0.2.4+cvs-1.278.
2002/01/29 w3m-0.2.4-m17n-20020129
* Base on w3m-0.2.4+cvs-1.268.
* Some bug fixes.
2002/01/28 w3m-0.2.4-m17n-20020128
* Base on w3m-0.2.4+cvs-1.265.
2002/01/08 w3m-0.2.4-m17n-20020108
* Base on w3m-0.2.4.
2002/01/07
* Replaced some wc_conv,wc_Str_conv with wc_conv_strict,wc_Str_conv_strict.
2001/12/31
* Added the conversion between HKSCS and Unicode.
* Changed the conversion table between Big5 and Unicode.
* Deleted the special conversion between Big5 and CNS11643.
* Fixed HKSCS.
2001/12/30 w3m-0.2.3.2-m17n-20011230
* Base on w3m-0.2.3.2+cvs-1.196.
2001/12/22 w3m-0.2.3.2-m17n-20011222
* Base on w3m-0.2.3.2.
* [w3m-dev-en 00660] can't compile if INET6 is defined
* [w3m-dev-en 00663] double meanings for WC_N_???
2001/12/21 w3m-0.2.3.1-m17n-20011221
* Base on w3m-0.2.3.1.
* Support of HKSCS, KOI8-U, UTF-7.
The conversion table between HKSCS and Unicode is not yet available.
* Add the conversion between ISO 8859-16 and Unicode.
* Add option 'ext_halfdump'.
2001/04/14 w3m-(0.2.1)-m17n-0.20
* Support of UTF-7.
* [w3m-dev 01913] ([w3m-dev-en 00452])
2001/04/12 w3m-(0.2.1)-m17n-0.19
* TILDE of JISX0212, JISX0213 -> FULLWIDTH TILDE of Unicode.
* MICRO SIGN of Unicode -> GREEK SMALL MU of JISX0208.
* [w3m-dev 01892], [w3m-dev 01894], [w3m-dev 01898], [w3m-dev 01902]
2001/03/31
* Changed implement of <_SYMBOL> again.
* When -dump option, "pre_conv" is false as default.
2001/03/29
* Support combining characters of TCVN 5712.
* [w3m-dev 01873], [w3m-dev-en 00411].
2001/03/28
* Setting -suffix="" can be okay in confiugre. (thanks to naddy!)
* Bugfix: when #define USE_SSL and #undef USE_SSL_VERIFY, rc.c
doesn't compile. (thanks to naddy!)
* [w3m-dev 01859].
* Bugfix: 0xA0 is error in Shift-JIS.
* Changed implement of <_SYMBOL> ([w3m-dev 01852]).
2001/03/24 w3m-(0.2.1)-m17n-0.18
* Base on w3m-0.2.1.
* [w3m-dev 01703], [w3m-dev 01814], [w3m-dev 01823]
* Separated ISO-2022-JP-3 from ISO-2022-JP.
* Improved auto detection.
2001/03/23
* Base on w3m-0.2.0.
2001/03/21
* Added functions (CHARSET and DEFAULT_CHARSET).
* Improved document charset detection of frame HTML.
2001/03/20
* Conversion from FULL WIDTH variant except ASCII to normal character.
2001/03/18 w3m-(0.1.11-pre-hsaka24)-m17n-0.17
* Based on "[w3m-dev 01779] w3m-0.1.11-pre-hsaka24".
* Prefer JIS X 0213 than JIS X 0212.
2001/03/14 w3m-(0.1.11-pre-kokb23)-m17n-0.16
* Add the conversion between JIS X 0213 and Unicode Extention B.
* Bugfix: conversion between JIS X 0213 and Unicode.
* Bugfix: treat UHC as Hangul.
* Ignore "search_conv" if "pre_conv" is ON.
2001/03/09 w3m-(0.1.11-pre-kokb23)-m17n-0.15
* Improvement of wc_wchar_t (mainly for Unicode).
* Some bugfixes for Unicode.
* Ignore "use_gb12345_map" option when output with GBK or GB18030.
* When -dump option, "prev_conv" is always true.
* when -dump or -halfdump option, some proccessing is skiped.
* Get system charset from the environment variable LC_CTYPE -> LANG -> LC_ALL.
* Bugfixes: [w3m-dev 01724], [w3m-dev 01726], [w3m-dev 01752],
[w3m-dev 01753], [w3m-dev 01754]
2001/03/06 w3m-(0.1.11-pre-kokb23)-m17n-0.14
* Support of Language tag (UTR#7).
* Bugfix: conversion between GB18030, Johab and Unicode.
2001/03/04 w3m-(0.1.11-pre-kokb23)-m17n-0.13
* Support of GBK(CP936), GB18030, UHC(CP949) !
* Unicode mapping table of GB2312 and GB12345 became compatible with
CP936, GB18030. (Code point: 0xA1A4, 0xA1AA)
* Allow 0xFFFE and 0xFFFF in Uncide (due to compatibility with GB18030).
* Bugfix: code point of NBSP in Unicode.
2001/03/03 w3m-(0.1.11-pre-kokb23)-m17n-0.12
* I wrote English README.m17n.
-------------------------------------------
Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
http://www2u.biglobe.ne.jp/~hsaka/
|