1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376
|
<style>
.reverse {color: white; background-color: black;}
.unispecial {background-color: cyan;}
code {background-color: #CCC0C0;}
</style>
<head>
<meta name="description" content="Powerful text editor with extensive Unicode and CJK support.">
<meta name="keywords" content="editor, text mode editor, programmers editor, programming editor, Unicode editor, UTF-8 editor, Unicode text editor, UTF-8 text editor, Unicode text mode editor, UTF-8 text mode editor, text mode Unicode editor, text mode UTF-8 editor, text mode HTML editor, CJK editor">
<meta name="robots" content="index">
<title>mined 2000 Unicode Howto</title>
<script language="JavaScript">
top.select ("unicode");
function chapter (ch) {
if (ch.id) {
/* highlight chapter in navigation */
if (top.select2) {
top.select2 ("unicode", ch.id);
}
}
}
</script>
</head>
<br>
<center>
<h3>Mined Unicode Howto</h3>
<h4>Environment setup and Usage of mined for Unicode text</h4>
</center>
<br>
<ul>
<li> <img align=absmiddle src=handr.gif>
See the <a href=features.html#unicode>mined features</a> page for
an overview of mined features for Unicode editing.
<ul><li>See the next <a href=#handling>section below</a>
for an overview of how to use these features.
</ul>
<p>
<li> For general information on Unicode and its support on computers,
see also Markus Kuhn's
<a target=_blank href=http://www.cl.cam.ac.uk/~mgk25/unicode.html>
UTF-8 and Unicode FAQ for Unix/Linux</a>.
<ul><li>See the final <a href=#setup>section below</a> for some setup hints
for a Unicode-enabled environment, even on legacy computers.
</ul>
</ul>
<a name=handling></a>
<span id=handling onMouseOver="chapter (this);">
<br>
<dl>
<dd>
<h4>Handling Unicode text with mined</h4>
<dl>
<p>
<dt><i>Screen handling</i>
<dd>
Usually, mined will auto-detect a UTF-8 terminal and also
the detailed features it has (like double-width and
combining characters, Arabic ligature joining, different width
data sets).
<p>
<dt><i>Character encoding</i>
<dd>
By default, mined detects automatically if the text in an edited
file is UTF-8 encoded (Unicode character set) or not (either
8 bit encoded or CJK encoded); it also detects and maintains UTF-16.
<br>
Mined handles illegal UTF-8 sequences transparently so
if you accidentally open an 8 bit or CJK encoded file in UTF-8
mode, or a file with mixed parts, you can edit the text without
problems and will not loose any information. Non-UTF-8 codes
are indicated by display background highlighting.
<br>
While editing, you can switch the character encoding assumed
for text interpretation with the encoding menu
(left-click to toggle current and previous encoding,
right-click to open menu).
<p>
<dt><i>Unicode display on non-Unicode terminal</i>
<dd>
Characters that cannot be displayed in the encoding of the terminal
are indicated by some suitable replacement, indicated by
coloured background.
Indications are chosen as to suggest the text character as best
as possible, with special indications for combining characters
<code class=unispecial>'</code>,
quotation marks <code class=unispecial>"</code>,
dashes <code class=unispecial>-</code>,
the Euro symbol <code class=unispecial>E</code> etc., and
using a base character according to <i>Unicode decomposition</i>
for accented and other precomposed characters.
<br>
Please consult the manual page, section
<a href=mined.html#utf8display>Unicode display</a> for details.
<p>
<dt><i>Combining characters</i>
<dd>
Mined supports display and editing of combined characters
consisting of a base character and one or more combining
characters, in one of two modes:
<ul>
<li>Combined display mode: combined characters are displayed as
they should appear, navigation within the combined character
is possible (Control-cursor-left/right), the character information
display (HOP ESC u, or from "?" Info menu) shows which part (base
or combining character) of the combined character you are
positioned on, Mark/Copy/Paste and Control-Del acts on the
respective position.
<li>Separated display mode: base character and combining characters
are separated for explicit handling.
</ul>
These modes can be selected and are indicated in the
Combining display flag: <code></code>: combined mode,
<code class=unispecial>`</code>: separated mode.
<br>
See the manual page, section <a href=mined.html#combined>
Combining characters</a> for details.
<p>
<dt><i>Bidirectional display</i>
<dd>
Mined auto-detects if it is running in a terminal supporting
Arabic (by checking LAM/ALEF ligature joining) and other
right-to-left scripts (e.g. mlterm), or it can be told so with
the command line parameter <code>+UU</code>.
<br>
The mined runtime support library contains a script
<code>mterm</code> to invoke the <code>mlterm</code> terminal
emulator with suitable parameters to set up bidi mode and a
suitable font.
<p>
<dt><i>CJK and 8 bit character set support on Unicode terminal</i>
<dd>
Mined support for major CJK encodings is also best used in a
UTF-8 terminal (unless you need specific CJK input features of
dedicated terminals); this setup is also well suited for editing
text encoded in various 8 bit character sets.
<br>
See the <a href=features.html#cjk>mined features</a> page for an
overview of CJK support features.
<br>
See the manual page, sections
<a href=mined.html#charencoding>Character encoding support</a> and
<a href=mined.html#cjk>CJK support</a> for details.
</dl>
</dl>
</span>
<a name=setup></a>
<span id=setup onMouseOver="chapter (this);">
<br>
<dl>
<dd>
<h4>Unicode environment setup</h4>
<dl>
<p>
<dt><i><img align=absmiddle src=handr.gif>Quick and easy</i>
<dd>
Use the command <a href=uterm.html><code>uterm</code></a> to
invoke a UTF-8 enabled terminal with automatic selection of a
suitable font for best coverage of Unicode characters.
<ul type=circle>
<li>The <code>uterm</code> script comes with the mined
package; it is included in the mined runtime support library
and may be installed in the path with the mined application.
<li>Note: The <code>uterm</code> script assumes that a UTF-8 enabled
version of xterm or rxvt-unicode is already installed on your
system, as well as fonts suitable for your needs.
If this is not the case on your system, follow the advice below.
</ul>
<p>
<dt><i>Install suitable terminal</i>
<dd>
Mined is a text mode editor. Its UTF-8 display and input support
is available with terminal emulators supporting UTF-8 and running
in UTF-8 mode, like
<a target=_blank href=http://invisible-island.net/xterm/xterm.html>
xterm</a> (version >= 145),
<a target=_blank href=http://sourceforge.net/projects/rxvt-unicode>
rxvt-unicode</a>,
<a target=_blank href=http://sourceforge.net/projects/mlterm>
mlterm</a>, KDE konsole, gnome-terminal, Linux console,
cygwin console, MinTTY, PuTTY.
<ul>
<li>If you don't have a recent version of xterm on your
system, compile it yourself;
invoke <code>configure --enable-wide-chars</code> or use the script
<code>configure-xterm</code> from the mined runtime support
library. Then invoke <code>make</code>. You may want to compact
the resulting executable with <code>strip xterm</code>; then
install it into your path, e.g. in <code>$HOME/bin</code>.
<br><img align=absmiddle src=handr.gif>Note: xterm, like
mined, can be used to enable UTF-8 and Unicode support on
legacy systems, even if they do not offer any "locale" support,
and without needing root priviledge.
</ul>
<p>
<dt><i>Install suitable fonts</i>
<dd>
Install Unicode fonts for your X server.
<ul>
<li>To check if your X installation already provides Unicode fonts,
you may invoke the command <nobr><code>xlsfonts | grep iso10646</code></nobr>.
If this doesn't list anything, or if you cannot find a suitable font
setup, do one of the following:
<dl>
<dd>
<li><i>Automatic installation:</i>
<dd>The Mined runtime support library contains a script
<code>installfonts</code> that downloads these fonts and installs them
with your X server. It finally gives some hints how to add them to
your permanent font configuration.
<li><i>Manual installation:</i>
<dd>
<ul>
<li>Retrieve some of the following fonts:
<ol>
<li> <a href=http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz>
UCS fonts for X</a>
with their
<a href=http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz>
CJK supplement</a>
from Markus Kuhn's page
<a target=_blank href=http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html>
Unicode fonts and tools for X11</a>
<li> <a href=http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-75dpi100dpi.tar.gz>
Adobe and B&H bitmap fonts</a>
from the same site which contain fixed width Courier and
Lucida Typewriter fonts
<li> <a href=http://www.inp.nsk.su/~bolkhov/files/fonts/univga/uni-vga.tgz>
Unicode VGA font</a>
from
<a target=_blank href=http://www.inp.nsk.su/~bolkhov/files/fonts/univga/>
Dmitry Bolkhovityanov</a>'s site
<li> <href=http://bibliofile.mc.duke.edu/gww/fonts/Monospace/MonospaceRoman.bdf.tar.gz>
Monospace Roman BDF fonts</a>
and their Oblique / Bold / Bold Oblique supplements from
<a target=_blank href=http://bibliofile.mc.duke.edu/gww/fonts/Unicode.html>
George Williams Unicode fonts page</a>
</ol>
<li>The nicest looking font in the UCS fonts archive mentioned above
is the 10x20 size font, it is suitable for higher screen resolutions.
Unfortunately, the CJK double-width fonts are not distributed in
the corresponding 20x20 size, but only in the 18x18 size. The
corresponding single-width font in 9x18 size, however, looks quite
spindly and for my taste rather awkward.
<br>
For this reason, I am providing a script to generate 20x20 CJK fonts
automatically from the 18x18 UCS fonts distributed for X servers.
It is <code>bdf18to20</code> and you find it in the mined runtime
support library. Go into the directory where you unpacked the fonts
and invoke the script.
<li>Install the fonts with your X server: unpack them into a directory
(e.g. <code>$HOME/xfonts</code>), go into that directory, invoke the
<code>mkfontdir</code> command. Then make sure that the fonts are
loaded into your X server, using the command
<code>xset +fp $HOME/xfonts</code>; a suitable place to include this
automatically would be your <code>$HOME/.xinitrc</code> X
initialisation file if you have one.
<dl>
<dd><i>Note:</i> If you are working in a network, make sure the xset
command is invoked such that the X server has access to the given
directory on the machine it is running on.
<dd>Some X servers (e.g. Exceed on Windows) do not accept BDF fonts;
use the "Compile Fonts" function of the configuration menu to install
the fonts.
</dl>
</ul>
</dl>
</ul>
<p>
<dt><i>Start terminal in UTF-8 mode</i>
<dd>
Invoke a terminal window in UTF-8 mode and configure it to use
fonts sufficient to display the text you want to edit.
<ul>
<li>Invoke xterm with suitable resource configuration or command line
parameters.
<ul>
<li>I recommend to invoke xterm with the script
<a href=uterm.html><code>uterm</code></a> from the mined runtime
support library.
<li>Alternatively, invoke <code>xterm -u8</code> or
<code>xterm -en UTF-8</code> to enforce UTF-8 mode, depending on system
configuration; also the option <code>+lc</code> may be needed in addition.
</ul>
<br><img align=absmiddle src=handr.gif>Mined detects UTF-8 terminal
mode automatically (exception: cygwin 1.7 UTF-8 console after rlogin
or telnet).
So it will work even if your locale environment is not configured properly.
<dl>
<dd><i>Note:</i> xterm is quite touchy about configuring suitable
matching fonts for single-width and double-width glyphs. If you are
unlucky, CJK character display will result in garbage on the screen.
My recommendation is to generate the 20x20 UCS fonts with my
<code>bdf18to20</code> script as mentioned above and configure xterm
to use 10x20 – it will then automatically select one of the 20x20
fonts for double-width characters; if you have a preference among
them, use the -fw command line option or the wideFont X resource (in
your <code>$HOME/.Xdefaults</code> file).
See the pattern file <code>Xdefaults.mined</code> in the mined runtime
support library for suggestions of suitable entries.
(Double-width font matching works much better with rxvt which even seems
to scale double-width fonts in an acceptable way if needed.)
</dl>
<li>If you prefer rxvt, use rxvt-unicode and make sure to indicate
using UTF-8 by setting a locale in your environment that is installed
on your system, for example <code>LC_ALL=en_US.UTF-8 urxvt</code> on cygwin.
<dl>
<dd><i>Note:</i> rxvt is quite touchy about configuring a known locale
setting; it does not have a strict UTF-8 option that would reliably
work on all systems.
</dl>
<li>
<i>Note:</i> For hints how to configure the environment explicitly so
that rxvt, konsole and other applications work with UTF-8, see the
mined manual page (about LC_CTYPE and other environment variables).
Accurate locale setting is not needed by xterm and mined.
<br>For other terminals (e.g. mlterm), see their manual for how to
configure UTF-8 mode.
<li>Alternatively, you can start mined directly together with its own
terminal window. For this purpose, the mined runtime support library
contains the script <code>umined</code>.
<img align=absmiddle src=handr.gif> This script also quickly enables you
to use the most recent version of Unicode width data (specifying wide
and combining characters) as built-in to xterm in contrast to
system-provided locale data which may refer to an older version of Unicode.
<li>On a Windows system, you can also use the script
<code>wmined</code> or <code>wmined.bat</code> which will invoke mined
in a MinTTY terminal window (not needing an X server). If MinTTY is not
installed, <code>wmined</code> will try rxvt instead (the old version of
rxvt which can also run stand-alone without X server but is not
Unicode-enabled; the new rxvt-unicode however (called urxvt on cygwin)
cannot run stand-alone).
The terminal is configured to use UTF-8 (for MinTTY; ignored by rxvt)
and to apply Windows look-and-feel colour settings (by inspecting the
Windows registry; with rxvt, wmined also tries to match your font size
preferences).
</ul>
</dl>
</dl>
</span>
<hr>
<dt>Mined <a target=_top href=./>homepage</a> and download.
<dt><a href=mailto:mined@towo.net>Thomas Wolff</a>
|