File: README

package info (click to toggle)

libuninameslist 20160701-2

links: PTS, VCS
area: main
in suites: stretch
size: 4,252 kB
ctags: 365
sloc: ansic: 84,390; sh: 2,914; makefile: 40

file content (43 lines) | stat: -rw-r--r-- 1,858 bytes

parent folder | download | duplicates (9)

 A Library of Unicode annotation data
======================================
The Unicode consortium provides a file containing annotations on many unicode
characters. This library contains a compiled version of this file so that
programs can access these data easily.

The library contains a very large (sparse) array with one entry for each
unicode code point (U+0000 - U+10FFFF). Each entry contains two strings,
a name and an annotation. Either or both may be NULL. The library also
contains a (much smaller) list of all the Unicode blocks.

    struct unicode_block {
        int start, end;
        const char *name;
    };

    struct unicode_nameannot {
        const char *name, *annot;
    };

    extern const struct unicode_block UnicodeBlock[185];

    #define UNICODE_NAME_MAX	83
    #define UNICODE_ANNOT_MAX	462
    extern const struct unicode_nameannot * const *const UnicodeNameAnnot[];

    /* Index by: UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff] */

    /* At the beginning of lines (after a tab) within the annotation string, a */
    /*  * should be replaced by a bullet U+2022 */
    /*  x should be replaced by a right arrow U+2192 */
    /*  : should be replaced by an equivalent U+224D */
    /*  # should be replaced by an approximate U+2245 */
    /*  = should remain itself */

This package consists of one header file and one library file. The header
is <uninameslist.h>. To find the name of a given unicode character uni use
    UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].name
while the annotation string is:
    UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].annot
The both strings are in US ASCII, but the annotation string is intended to be
modified slightly by the having any '*' characters which immediately follow
a tab at the start of a line converted to a bullet character. Etc.