File: README

package info (click to toggle)
tom 1.1.1-2
  • links: PTS
  • area: main
  • in suites: potato
  • size: 6,340 kB
  • ctags: 2,244
  • sloc: objc: 27,863; ansic: 9,804; sh: 7,411; yacc: 3,377; lex: 966; asm: 208; makefile: 62; cpp: 10
file content (78 lines) | stat: -rw-r--r-- 2,178 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
$Id: README,v 1.7 1998/02/14 22:43:02 tiggr Exp $

UC is a utility program to extract information from the Unicode Data file
and various ISO to Unicode mapping files.  The files created with UC are
used by the TOM String classes to perform encoding conversions, and test
character predicates.

For more information on TOM, see http://www.gerbil.org/tom/.
(tiggr@gerbil.org).

INVOCATION

	uc options action

  OPTIONS

    -o file

      Output to the <file>.  When not specified, output is written to
      stdout.

    -m name

      Specify the 8-bit to unicode mapping file to be used.  The format of
      this file is that of version 0.1, format A of the ISO to Unicode
      mapping tables:

	Three tab-separated columns

	  1: ISO 8859-1 code (in hex as 0xXX)
	  2: Unicode (in hex as 0xXXXX)
	  3: Unicode name (follows a comment sign, '#')

    -u name

      Specify the name of the UnicodeData file to be used.

  ACTIONS

    digit
    letter
    numeric
    punctuation
    space
    isupper
    islower

      Create a bitset, where the following value being non-zero indicates
      the character {a} satisfies the predicate.  This expression assumes
      the bitset has been read into the ByteArray {set}.

		set[a / 8] & (1 << a % 8)

      The UnicodeData file must be specified.  If no mapping file is
      specified, the bitset contains 8192 bytes with predicates for
      unicode characters.  With a mapping file, the bitset contains 32
      bytes with predicates of the characters within the encoding of the
      mapping.

    map

      This option reads a Unicode ISO 8859 mapping file and outputs a
      512-byte array describing 256 chars, MSB first.

      Only a mapping file needs to be specified.

    lower
    upper
    title

      Without a mapping file, output a line BASE OTHER NUM for each range
      of NUM characters starting at BASE, which under the conversion map
      to the NUM characters starting with OTHER.

      With a mapping file, output a 256 byte array containing the
      conversion for each character in the encoding.  Characters
      unaffected by the conversion are specified as to convert to
      themselves.