File: README

package info (click to toggle)
libuninameslist 20190701-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 4,860 kB
  • sloc: ansic: 95,072; python: 118; makefile: 107; sh: 4
file content (43 lines) | stat: -rw-r--r-- 1,858 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 A Library of Unicode annotation data
======================================
The Unicode consortium provides a file containing annotations on many unicode
characters. This library contains a compiled version of this file so that
programs can access these data easily.

The library contains a very large (sparse) array with one entry for each
unicode code point (U+0000 - U+10FFFF). Each entry contains two strings,
a name and an annotation. Either or both may be NULL. The library also
contains a (much smaller) list of all the Unicode blocks.

    struct unicode_block {
        int start, end;
        const char *name;
    };

    struct unicode_nameannot {
        const char *name, *annot;
    };

    extern const struct unicode_block UnicodeBlock[185];

    #define UNICODE_NAME_MAX	83
    #define UNICODE_ANNOT_MAX	462
    extern const struct unicode_nameannot * const *const UnicodeNameAnnot[];

    /* Index by: UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff] */

    /* At the beginning of lines (after a tab) within the annotation string, a */
    /*  * should be replaced by a bullet U+2022 */
    /*  x should be replaced by a right arrow U+2192 */
    /*  : should be replaced by an equivalent U+224D */
    /*  # should be replaced by an approximate U+2245 */
    /*  = should remain itself */

This package consists of one header file and one library file. The header
is <uninameslist.h>. To find the name of a given unicode character uni use
    UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].name
while the annotation string is:
    UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].annot
The both strings are in US ASCII, but the annotation string is intended to be
modified slightly by the having any '*' characters which immediately follow
a tab at the start of a line converted to a bullet character. Etc.