1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
|
$Id: README,v 1.7 1998/02/14 22:43:02 tiggr Exp $
UC is a utility program to extract information from the Unicode Data file
and various ISO to Unicode mapping files. The files created with UC are
used by the TOM String classes to perform encoding conversions, and test
character predicates.
For more information on TOM, see http://www.gerbil.org/tom/.
(tiggr@gerbil.org).
INVOCATION
uc options action
OPTIONS
-o file
Output to the <file>. When not specified, output is written to
stdout.
-m name
Specify the 8-bit to unicode mapping file to be used. The format of
this file is that of version 0.1, format A of the ISO to Unicode
mapping tables:
Three tab-separated columns
1: ISO 8859-1 code (in hex as 0xXX)
2: Unicode (in hex as 0xXXXX)
3: Unicode name (follows a comment sign, '#')
-u name
Specify the name of the UnicodeData file to be used.
ACTIONS
digit
letter
numeric
punctuation
space
isupper
islower
Create a bitset, where the following value being non-zero indicates
the character {a} satisfies the predicate. This expression assumes
the bitset has been read into the ByteArray {set}.
set[a / 8] & (1 << a % 8)
The UnicodeData file must be specified. If no mapping file is
specified, the bitset contains 8192 bytes with predicates for
unicode characters. With a mapping file, the bitset contains 32
bytes with predicates of the characters within the encoding of the
mapping.
map
This option reads a Unicode ISO 8859 mapping file and outputs a
512-byte array describing 256 chars, MSB first.
Only a mapping file needs to be specified.
lower
upper
title
Without a mapping file, output a line BASE OTHER NUM for each range
of NUM characters starting at BASE, which under the conversion map
to the NUM characters starting with OTHER.
With a mapping file, output a 256 byte array containing the
conversion for each character in the encoding. Characters
unaffected by the conversion are specified as to convert to
themselves.
|