1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
|
.\" Hey, EMACS: -*- nroff -*-
.TH UNICODE 1 "2003-01-31"
.SH NAME
unicode \- command line unicode database query tool
.SH SYNOPSIS
.B unicode
.RI [ options ]
string
.SH DESCRIPTION
This manual page documents the
.B unicode
command.
.PP
\fBunicode\fP is a command line unicode database query tool.
.SH OPTIONS
.TP
.B \-h
.B \-\-help
Show help and exit.
.TP
.B \-x
.B \-\-hexadecimal
Assume
.I string
to be a hexadecimal number
.TP
.B \-d
.B \-\-decimal
Assume
.I string
to be a decimal number
.TP
.B \-o
.B \-\-octal
Assume
.I string
to be an octal number
.TP
.B \-b
.B \-\-binary
Assume
.I string
to be a binary number
.TP
.B \-r
.B \-\-regexp
Assume
.I string
to be a Python regular expression
.TP
.B \-s
.B \-\-string
Assume
.I string
to be a sequence of characters
.TP
.B \-a
.B \-\-auto
Try to guess type of
.I string
from one of the above (default)
.TP
.BI \-f FILE
.BI \-\-input_file= FILE
Read characters from FILE and display information about each of them.
Use \- to read from standard input.
.TP
.BI \-m MAXCOUNT
.BI \-\-max= MAXCOUNT
Maximal number of codepoints to display, default: 20; use 0 for unlimited
.TP
.BI \-i CHARSET
.BI \-\-io= IOCHARSET
I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
tries to guess this value from your locale, so with properly set up
locale, you should not need to specify it.
.TP
.BI \-\-fcp= CHARSET
.BI \-\-fromcp= CHARSET
Convert numerical arguments from this encoding, default: no conversion.
Multibyte encodings are supported. This is ignored for non-numerical
arguments.
.TP
.BI \-c ADDCHARSET
.BI \-\-charset\-add= ADDCHARSET
Show hexadecimal reprezentation of displayed characters in this additional charset.
.TP
.BI \-C USE_COLOUR
.BI \-\-colour= USE_COLOUR
USE_COLOUR is one of
.B on
.B off
.B auto
.B \-\-colour=on
will use ANSI colour codes to colourise the output
.B \-\-colour=off
won't use colours.
.B \-\-colour=auto
will test if standard output is a tty, and use colours only when it is.
.B \-\-color
is a synonym of
.B \-\-colour
.TP
.B \-v
.B \-\-verbose
Be more verbose about displayed characters, e.g. display Unihan information, if available.
.TP
.B \-w
.B \-\-wikipedia
Spawn browser pointing to English Wikipedia entry about the character.
.TP
.B \-\-wt
.B \-\-wiktionary
Spawn browser pointing to English Wiktionary entry about the character.
.TP
.B \-\-brief
Display character information in brief format
.TP
.BI \-\-format= fmt
Use your own format for character information display. See the README for details.
.TP
.B \-\-list
List (approximately) all known encodings.
.TP
.B \-\-download
Try to download UnicodeData.txt into ~/.unicode/
.TP
.B \-\-ascii
Display ASCII table
.TP
.B \-\-brexit\-ascii
.B \-\-brexit
Display ASCII table (EU–UK Trade and Cooperation Agreement 2020 version)
.SH USAGE
\fBunicode\fP tries to guess the type of an argument. In particular,
if the arguments looks like a valid hexadecimal representation of a
Unicode codepoint, it will be considered to be such. Using
\fBunicode\fP face
will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE,
and it will not search for 'face' in character descriptions \- for the latter,
use:
\fBunicode\fP \-r face
For example, you can use any of the following to display information
about U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a):
\fBunicode\fP 00E1
\fBunicode\fP U+00E1
\fBunicode\fP \('a
\fBunicode\fP 'latin small letter a with acute'
You can specify a range of characters as argumets, \fBunicode\fP will
show these characters in nice tabular format, aligned to 256-byte boundaries.
Use two dots ".." to indicate the range, e.g.
\fBunicode\fP 0450..0520
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
\fBunicode\fP 0400..
will display just characters from U+0400 up to U+04FF
Use \-\-fromcp to query codepoints from other encodings:
\fBunicode\fP \-\-fromcp cp1250 \-d 200
Multibyte encodings are supported:
\fBunicode\fP \-\-fromcp big5 \-x aff3
and multi-char strings are supported, too:
\fBunicode\fP \-\-fromcp utf-8 \-x c599c3adc5a5
.SH BUGS
Tabular format does not deal well with full-width, combining, control
and RTL characters.
.SH SEE ALSO
ascii(1)
.SH AUTHOR
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>
|