File: README

package info (click to toggle)
uniutils 2.27-2
  • links: PTS
  • area: main
  • in suites: bookworm, bullseye, buster, jessie, jessie-kfreebsd, sid, stretch
  • size: 1,556 kB
  • ctags: 177
  • sloc: ansic: 28,282; sh: 790; awk: 55; makefile: 16
file content (201 lines) | stat: -rw-r--r-- 17,786 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
This directory contains programs for working with Unicode.

They have been compiled and executed succesfully under GNU/Linux, SunOs, and FreeBSD.
They should compile and run on any POSIX-compliant system.

If you have the GNU autoconf/automake, the sequence:

./configure
make
make install-strip

should do the trick.

Details may be found in INSTALL. 

unihist is compiled by default to handle the entire Unicode character set.
If you have no use for characters outside the BMP (plane 0), you may define
BMPONLY to reduce memory usage and save a little bit of time.

Uniname may be updated by obtaining a new version of the file UnicodeData.txt
from the Unicode consortium (http://www.unicode.org) and executing
the command:

awk -f genunames.awk < UnicodeData.txt > unames.c

If Unicde ranges are added you will also need to update the file unirange.c.
Then recompile.

The directory TestData contains a number of test files.
Files with the suffix .u are binary files. Files with the suffix .ann
are ASCII files giving the contents of the binary files in hex
with comments explaining the status of each piece of test data.

demo.u is a utf-8 file containing text in a number of writing systems.
(The Private Use Area characters are Hungarian Runes.
You can see what they look like if you display the file using
the Yudit Unicode editor (http://www.yudit.org).)
demo.jpg is a screenshot of Yudit displaying this file.

Here is the unidesc report on this file:

       0	       7	Armenian
       8	      14	Ethiopic
      15	      20	Unified Canadian Aboriginal Syllabics
      21	      25	Hangul Syllables
      26	      32	Hiragana
      33	      42	Devanagari
      43	      47	Cherokee
      48	      55	Tamil
      56	      65	Gurmukhi
      66	      71	Linear B Syllabary
      72	      87	Private Use Area
      88	      88	Greek/Coptic
      89	      89	Basic Latin
      90	      92	Latin Extended-A
      93	     109	CJK Unified Ideographs
     110	     116	Hebrew
     117	     126	Georgian
     127	     133	Malayalam
     134	     136	Cyrillic

Here is the uniname report on this file:

character  byte       UTF-32   encoded as     glyph   range                                   name
        0          0  000020   20                    Basic Latin                             SPACE
        1          1  000020   20                    Basic Latin                             SPACE
        2          2  000570   D5 B0          հ      Armenian                                ARMENIAN SMALL LETTER HO
        3          4  000561   D5 A1          ա      Armenian                                ARMENIAN SMALL LETTER AYB
        4          6  000582   D6 82          ւ      Armenian                                ARMENIAN SMALL LETTER YIWN
        5          8  000580   D6 80          ր      Armenian                                ARMENIAN SMALL LETTER REH
        6         10  00000A   0A                    Basic Latin                             LINE FEED (LF)
        7         11  000009   09                    Basic Latin                             CHARACTER TABULATION
        8         12  001275   E1 89 B5       ት      Ethiopic                                ETHIOPIC SYLLABLE TE
        9         15  00130D   E1 8C 8D       ግ      Ethiopic                                ETHIOPIC SYLLABLE GE
       10         18  00122D   E1 88 AD       ር      Ethiopic                                ETHIOPIC SYLLABLE RE
       11         21  00129B   E1 8A 9B       ኛ      Ethiopic                                ETHIOPIC SYLLABLE NYAA
       12         24  00000A   0A                    Basic Latin                             LINE FEED (LF)
       13         25  000009   09                    Basic Latin                             CHARACTER TABULATION
       14         26  000009   09                    Basic Latin                             CHARACTER TABULATION
       15         27  001455   E1 91 95       ᑕ      Unified Canadian Aboriginal Syllabics   CANADIAN SYLLABICS TA
       16         30  0015F8   E1 97 B8       ᗸ      Unified Canadian Aboriginal Syllabics   CANADIAN SYLLABICS CARRIER KHEE
       17         33  0014A1   E1 92 A1       ᒡ      Unified Canadian Aboriginal Syllabics   CANADIAN SYLLABICS C
       18         36  00000A   0A                    Basic Latin                             LINE FEED (LF)
       19         37  000020   20                    Basic Latin                             SPACE
       20         38  000020   20                    Basic Latin                             SPACE
       21         39  00D55C   ED 95 9C       한      Hangul Syllables                        No character name is available
       22         42  00AD6D   EA B5 AD       국      Hangul Syllables                        No character name is available
       23         45  00B9D0   EB A7 90       말      Hangul Syllables                        No character name is available
       24         48  00000A   0A                    Basic Latin                             LINE FEED (LF)
       25         49  000009   09                    Basic Latin                             CHARACTER TABULATION
       26         50  00306B   E3 81 AB       に      Hiragana                                HIRAGANA LETTER NI
       27         53  00307B   E3 81 BB       ほ      Hiragana                                HIRAGANA LETTER HO
       28         56  003093   E3 82 93       ん      Hiragana                                HIRAGANA LETTER N
       29         59  003054   E3 81 94       ご      Hiragana                                HIRAGANA LETTER GO
       30         62  00000A   0A                    Basic Latin                             LINE FEED (LF)
       31         63  000009   09                    Basic Latin                             CHARACTER TABULATION
       32         64  000009   09                    Basic Latin                             CHARACTER TABULATION
       33         65  000926   E0 A4 A6       द      Devanagari                              DEVANAGARI LETTER DA
       34         68  000947   E0 A5 87       े      Devanagari                              DEVANAGARI VOWEL SIGN E
       35         71  000935   E0 A4 B5       व      Devanagari                              DEVANAGARI LETTER VA
       36         74  000928   E0 A4 A8       न      Devanagari                              DEVANAGARI LETTER NA
       37         77  00093E   E0 A4 BE       ा      Devanagari                              DEVANAGARI VOWEL SIGN AA
       38         80  000917   E0 A4 97       ग      Devanagari                              DEVANAGARI LETTER GA
       39         83  000930   E0 A4 B0       र      Devanagari                              DEVANAGARI LETTER RA
       40         86  000940   E0 A5 80       ी      Devanagari                              DEVANAGARI VOWEL SIGN II
       41         89  00000A   0A                    Basic Latin                             LINE FEED (LF)
       42         90  000020   20                    Basic Latin                             SPACE
       43         91  0013E3   E1 8F A3       Ꮳ      Cherokee                                CHEROKEE LETTER TSA
       44         94  0013B3   E1 8E B3       Ꮃ      Cherokee                                CHEROKEE LETTER LA
       45         97  0013A9   E1 8E A9       Ꭹ      Cherokee                                CHEROKEE LETTER GI
       46        100  00000A   0A                    Basic Latin                             LINE FEED (LF)
       47        101  000009   09                    Basic Latin                             CHARACTER TABULATION
       48        102  000BA4   E0 AE A4       த      Tamil                                   TAMIL LETTER TA
       49        105  000BAE   E0 AE AE       ம      Tamil                                   TAMIL LETTER MA
       50        108  000BBF   E0 AE BF       ி      Tamil                                   TAMIL VOWEL SIGN I
       51        111  000BB2   E0 AE B2       ல      Tamil                                   TAMIL LETTER LA
       52        114  000BCD   E0 AF 8D       ்      Tamil                                   TAMIL SIGN VIRAMA
       53        117  00000A   0A                    Basic Latin                             LINE FEED (LF)
       54        118  000009   09                    Basic Latin                             CHARACTER TABULATION
       55        119  000009   09                    Basic Latin                             CHARACTER TABULATION
       56        120  000A17   E0 A8 97       ਗ      Gurmukhi                                GURMUKHI LETTER GA
       57        123  000A41   E0 A9 81       ੁ      Gurmukhi                                GURMUKHI VOWEL SIGN U
       58        126  000A30   E0 A8 B0       ਰ      Gurmukhi                                GURMUKHI LETTER RA
       59        129  000A4D   E0 A9 8D       ੍      Gurmukhi                                GURMUKHI SIGN VIRAMA
       60        132  000A2E   E0 A8 AE       ਮ      Gurmukhi                                GURMUKHI LETTER MA
       61        135  000A41   E0 A9 81       ੁ      Gurmukhi                                GURMUKHI VOWEL SIGN U
       62        138  000A16   E0 A8 96       ਖ      Gurmukhi                                GURMUKHI LETTER KHA
       63        141  000A3F   E0 A8 BF       ਿ      Gurmukhi                                GURMUKHI VOWEL SIGN I
       64        144  00000A   0A                    Basic Latin                             LINE FEED (LF)
       65        145  000020   20                    Basic Latin                             SPACE
       66        146  010024   F0 90 80 A4    𐀤      Linear B Syllabary                      LINEAR B SYLLABLE B078 QE
       67        150  010035   F0 90 80 B5    𐀵      Linear B Syllabary                      LINEAR B SYLLABLE B005 TO
       68        154  01002B   F0 90 80 AB    𐀫      Linear B Syllabary                      LINEAR B SYLLABLE B002 RO
       69        158  01003A   F0 90 80 BA    𐀺      Linear B Syllabary                      LINEAR B SYLLABLE B042 WO
       70        162  00000A   0A                    Basic Latin                             LINE FEED (LF)
       71        163  000009   09                    Basic Latin                             CHARACTER TABULATION
       72        164  00F8E4   EF A3 A4             Private Use Area                        No character name is available
       73        167  00F8D7   EF A3 97             Private Use Area                        No character name is available
       74        170  00F8DC   EF A3 9C             Private Use Area                        No character name is available
       75        173  00F8DD   EF A3 9D             Private Use Area                        No character name is available
       76        176  00F8DB   EF A3 9B             Private Use Area                        No character name is available
       77        179  00000A   0A                    Basic Latin                             LINE FEED (LF)
       78        180  000009   09                    Basic Latin                             CHARACTER TABULATION
       79        181  000009   09                    Basic Latin                             CHARACTER TABULATION
       80        182  00EE14   EE B8 94             Private Use Area                        No character name is available
       81        185  00EE00   EE B8 80             Private Use Area                        No character name is available
       82        188  00EE0B   EE B8 8B             Private Use Area                        No character name is available
       83        191  00EE00   EE B8 80             Private Use Area                        No character name is available
       84        194  00EE1C   EE B8 9C             Private Use Area                        No character name is available
       85        197  00000A   0A                    Basic Latin                             LINE FEED (LF)
       86        198  000020   20                    Basic Latin                             SPACE
       87        199  000020   20                    Basic Latin                             SPACE
       88        200  0003B2   CE B2          β      Greek/Coptic                            GREEK SMALL LETTER BETA
       89        202  000061   61             a      Basic Latin                             LATIN SMALL LETTER A
       90        203  00014B   C5 8B          ŋ      Latin Extended-A                        LATIN SMALL LETTER ENG
       91        205  00000A   0A                    Basic Latin                             LINE FEED (LF)
       92        206  000009   09                    Basic Latin                             CHARACTER TABULATION
       93        207  004E09   E4 B8 89       三      CJK Unified Ideographs                  No character name is available
       94        210  004E32   E4 B8 B2       串      CJK Unified Ideographs                  No character name is available
       95        213  000020   20                    Basic Latin                             SPACE
       96        214  000020   20                    Basic Latin                             SPACE
       97        215  000020   20                    Basic Latin                             SPACE
       98        216  000020   20                    Basic Latin                             SPACE
       99        217  000020   20                    Basic Latin                             SPACE
      100        218  000020   20                    Basic Latin                             SPACE
      101        219  000009   09                    Basic Latin                             CHARACTER TABULATION
      102        220  000009   09                    Basic Latin                             CHARACTER TABULATION
      103        221  000009   09                    Basic Latin                             CHARACTER TABULATION
      104        222  000009   09                    Basic Latin                             CHARACTER TABULATION
      105        223  000009   09                    Basic Latin                             CHARACTER TABULATION
      106        224  000009   09                    Basic Latin                             CHARACTER TABULATION
      107        225  000009   09                    Basic Latin                             CHARACTER TABULATION
      108        226  000009   09                    Basic Latin                             CHARACTER TABULATION
      109        227  000009   09                    Basic Latin                             CHARACTER TABULATION
      110        228  0005E9   D7 A9          ש      Hebrew                                  HEBREW LETTER SHIN
      111        230  0005DC   D7 9C          ל      Hebrew                                  HEBREW LETTER LAMED
      112        232  0005D5   D7 95          ו      Hebrew                                  HEBREW LETTER VAV
      113        234  0005DE   D7 9E          מ      Hebrew                                  HEBREW LETTER MEM
      114        236  00000A   0A                    Basic Latin                             LINE FEED (LF)
      115        237  000020   20                    Basic Latin                             SPACE
      116        238  000020   20                    Basic Latin                             SPACE
      117        239  0010D7   E1 83 97       თ      Georgian                                GEORGIAN LETTER TAN
      118        242  0010D1   E1 83 91       ბ      Georgian                                GEORGIAN LETTER BAN
      119        245  0010D8   E1 83 98       ი      Georgian                                GEORGIAN LETTER IN
      120        248  0010DA   E1 83 9A       ლ      Georgian                                GEORGIAN LETTER LAS
      121        251  0010D8   E1 83 98       ი      Georgian                                GEORGIAN LETTER IN
      122        254  0010E1   E1 83 A1       ს      Georgian                                GEORGIAN LETTER SAN
      123        257  0010D8   E1 83 98       ი      Georgian                                GEORGIAN LETTER IN
      124        260  000020   20                    Basic Latin                             SPACE
      125        261  00000A   0A                    Basic Latin                             LINE FEED (LF)
      126        262  000009   09                    Basic Latin                             CHARACTER TABULATION
      127        263  000D15   E0 B4 95       ക      Malayalam                               MALAYALAM LETTER KA
      128        266  000D46   E0 B5 86       െ      Malayalam                               MALAYALAM VOWEL SIGN E
      129        269  000D30   E0 B4 B0       ര      Malayalam                               MALAYALAM LETTER RA
      130        272  000D32   E0 B4 B2       ല      Malayalam                               MALAYALAM LETTER LA
      131        275  00000A   0A                    Basic Latin                             LINE FEED (LF)
      132        276  000009   09                    Basic Latin                             CHARACTER TABULATION
      133        277  000009   09                    Basic Latin                             CHARACTER TABULATION
      134        278  00043C   D0 BC          м      Cyrillic                                CYRILLIC SMALL LETTER EM
      135        280  000438   D0 B8          и      Cyrillic                                CYRILLIC SMALL LETTER I
      136        282  000440   D1 80          р      Cyrillic                                CYRILLIC SMALL LETTER ER