File: unicode.1

package info (click to toggle)
unicode 3.2-1
links: PTS
area: main
in suites: forky, sid, trixie
size: 152 kB
sloc: python: 1,263; makefile: 2
file content (249 lines) | stat: -rw-r--r-- 4,426 bytes
.\"                                      Hey, EMACS: -*- nroff -*-
.TH UNICODE 1 "2003-01-31"
.SH NAME
unicode \- command line unicode database query tool
.SH SYNOPSIS
.B unicode
.RI [ options ]
string
.SH DESCRIPTION
This manual page documents the
.B unicode
command.
.PP
\fBunicode\fP is a command line unicode database query tool.

.SH OPTIONS
.TP
.B \-h
.B \-\-help

Show help and exit.

.TP
.B \-x
.B \-\-hexadecimal

Assume
.I string
to be a hexadecimal number

.TP
.B \-d
.B \-\-decimal

Assume
.I string
to be a decimal number

.TP
.B \-o
.B \-\-octal

Assume
.I string
to be an octal number

.TP
.B \-b
.B \-\-binary

Assume
.I string
to be a binary number

.TP
.B \-r
.B \-\-regexp

Assume
.I string
to be a Python regular expression

.TP
.B \-s
.B \-\-string

Assume
.I string
to be a sequence of characters

.TP
.B \-a
.B \-\-auto

Try to guess type of
.I string
from one of the above (default)

.TP
.BI \-f FILE
.BI \-\-input_file= FILE

Read characters from FILE and display information about each of them.
Use \- to read from standard input.

.TP
.BI \-m MAXCOUNT
.BI \-\-max= MAXCOUNT

Maximal number of codepoints to display, default: 20; use 0 for unlimited

.TP
.BI \-i CHARSET
.BI \-\-io= IOCHARSET

I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
tries to guess this value from your locale, so with properly set up
locale, you should not need to specify it.

.TP
.BI \-\-fcp= CHARSET
.BI \-\-fromcp= CHARSET

Convert numerical arguments from this encoding, default: no conversion.
Multibyte encodings are supported. This is ignored for non-numerical
arguments.


.TP
.BI \-c ADDCHARSET
.BI \-\-charset\-add= ADDCHARSET

Show hexadecimal reprezentation of displayed characters in this additional charset.

.TP
.BI \-C USE_COLOUR
.BI \-\-colour= USE_COLOUR

USE_COLOUR is one of
.B on
.B off
.B auto

.B \-\-colour=on
will use ANSI colour codes to colourise the output

.B \-\-colour=off
won't use colours.

.B \-\-colour=auto
will test if standard output is a tty, and use colours only when it is.

.B \-\-color
is a synonym of
.B \-\-colour

.TP
.B \-v
.B \-\-verbose

Be more verbose about displayed characters, e.g. display Unihan information, if available.

.TP
.B \-w
.B \-\-wikipedia

Spawn browser pointing to English Wikipedia entry about the character.

.TP
.B \-\-wt
.B \-\-wiktionary

Spawn browser pointing to English Wiktionary entry about the character.

.TP
.B \-\-brief

Display character information in brief format

.TP
.BI \-\-format= fmt

Use your own format for character information display. See the README for details.

.TP
.B \-\-list

List (approximately) all known encodings.

.TP
.B \-\-download

Try to download UnicodeData.txt into ~/.unicode/

.TP
.B \-\-ascii

Display ASCII table

.TP
.B \-\-brexit\-ascii
.B \-\-brexit

Display ASCII table (EU–UK Trade and Cooperation Agreement 2020 version)


.SH USAGE

\fBunicode\fP tries to guess the type of an argument. In particular,
if the arguments looks like a valid hexadecimal representation of a
Unicode codepoint, it will be considered to be such. Using

\fBunicode\fP face

will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE,
and it will not search for 'face' in character descriptions \- for the latter,
use:

\fBunicode\fP \-r face


For example, you can use any of the following to display information
about  U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a):

\fBunicode\fP 00E1

\fBunicode\fP U+00E1

\fBunicode\fP \('a

\fBunicode\fP 'latin small letter a with acute'


You can specify a range of characters as argumets, \fBunicode\fP will
show these characters in nice tabular format, aligned to 256-byte boundaries.
Use two dots ".." to indicate the range, e.g.

\fBunicode\fP 0450..0520

will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)

\fBunicode\fP 0400..

will display just characters from U+0400 up to U+04FF

Use \-\-fromcp to query codepoints from other encodings:

\fBunicode\fP \-\-fromcp cp1250 \-d 200

Multibyte encodings are supported:
\fBunicode\fP \-\-fromcp big5 \-x aff3

and multi-char strings are supported, too:

\fBunicode\fP \-\-fromcp utf-8 \-x c599c3adc5a5

.SH BUGS
Tabular format does not deal well with full-width, combining, control
and RTL characters.

.SH SEE ALSO
ascii(1)


.SH AUTHOR
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>