1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
|
A Library of Unicode annotation data
======================================
The Unicode consortium provides a file containing annotations on many unicode
characters. This library contains a compiled version of this file so that
programs can access these data easily.
The library contains a very large (sparse) array with one entry for each
unicode code point (U+0000 - U+10FFFF). Each entry contains two strings,
a name and an annotation. Either or both may be NULL. The library also
contains a (much smaller) list of all the Unicode blocks.
struct unicode_block {
int start, end;
const char *name;
};
struct unicode_nameannot {
const char *name, *annot;
};
extern const struct unicode_block UnicodeBlock[185];
#define UNICODE_NAME_MAX 83
#define UNICODE_ANNOT_MAX 462
extern const struct unicode_nameannot * const *const UnicodeNameAnnot[];
/* Index by: UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff] */
/* At the beginning of lines (after a tab) within the annotation string, a */
/* * should be replaced by a bullet U+2022 */
/* x should be replaced by a right arrow U+2192 */
/* : should be replaced by an equivalent U+224D */
/* # should be replaced by an approximate U+2245 */
/* = should remain itself */
This package consists of one header file and one library file. The header
is <uninameslist.h>. To find the name of a given unicode character uni use
UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].name
while the annotation string is:
UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].annot
The both strings are in US ASCII, but the annotation string is intended to be
modified slightly by the having any '*' characters which immediately follow
a tab at the start of a line converted to a bullet character. Etc.
|