File: ascii2uni.1

package info (click to toggle)
uni2ascii 4.20-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 992 kB
  • sloc: ansic: 8,730; sh: 4,471; tcl: 1,914; python: 53; makefile: 42
file content (184 lines) | stat: -rw-r--r-- 5,394 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
.TH ascii2uni 1  "August, 2011"
.SH NAME
ascii2uni \- convert 7-bit ASCII representations to UTF-8 Unicode
.SH SYNOPSIS
.B ascii2uni [options] (<input file name>)
.SH DESCRIPTION
.I ascii2uni
converts various 7-bit ASCII representations to UTF-8.  It reads from the standard
input and writes to the standard output. The representations understood
are listed below under the command line options. If no format is specified, standard
hexadecimal format (e.g. 0x00e9) is assumed.
.PP

.SH "COMMAND LINE OPTIONS"
.sp 1
.B \-a <format>
Convert from the specified format. Formats may be specified by means of the following
arbitrary single character codes, by means of names such as "SGML_decimal", and by
examples of the desired format.
.IP
.B A
Convert hexadecimal numbers with prefix U in angle-brackets (<U00E9>).
.IP
.B B
Convert \\x-escaped hex (e.g. \\x00E9)
.IP
.B C
Convert \\x escaped hexadecimal numbers in braces (e.g. \\x{00E9}).
.IP
.B D
Convert decimal HTML numeric character references (e.g. &#0233;)
.IP
.B E
Convert hexadecimal with prefix U (U00E9).
.IP
.B F
Convert hexadecimal with prefix u (u00E9).
.IP
.B G
Convert hexadecimal in single quotes with prefix X (e.g. X'00E9').
.IP
.B H
Convert hexadecimal HTML numeric character references (e.g. &#x00E9;)
.IP
.B I
Convert hexadecimal UTF-8 with each byte's hex preceded by an =-sign (e.g. =C3=A9) . This is the 
Quoted Printable format defined by RFC 2045. 
.IP
.B J
Convert hexadecimal UTF-8 with each byte's hex preceded by a %-sign (e.g.  %C3%A9). This is the URIescape format defined by RFC 2396. 
.IP
.B K
Convert octal UTF-8 with each byte escaped by a backslash (e.g.  \\303\\251)
.IP
.B L
Convert \\U followed by eight hex digits or \\u followed by four hex digits. \\UXXXXXXXX encoding a character within the BMP (U+0000-U+FFFF) is converted but a warning is issued since this violates the WWW specification.
.IP
.B M
Convert hexadecimal SGML numeric character references (e.g. \\#xE9;)
.IP
.B N
Convert decimal SGML numeric character references (e.g. \\#233;)
.IP
.B O
Convert octal escapes for the three low bytes in big-endian order(e.g. \\000\\000\\351))
.IP
.B P
Convert hexadecimal numbers with prefix U+ (e.g. U+00E9)
.IP
.B Q
Convert HTML character entities (e.g. &eacute;).
.IP
.B R
Convert raw hexadecimal numbers (e.g. 00E9). Requires the \-p flag.
.IP
.B S
Convert hexadecimal escapes for the three low bytes in big-endian order (e.g. \\x00\\x00\\xE9)
.IP
.B T
Convert decimal escapes for the three low bytes in big-endian order (e.g. \\d000\\d000\\d233)
.IP
.B U
Convert \\u-escaped hexadecimal numbers (e.g. \\u00E9).
.IP
.B V
Convert \\u-escaped decimal numbers (e.g. \\u00233).
.IP
.B X
Convert standard hexadecimal numbers (e.g. 0x00E9).
.IP
.B Y
Convert all three types of HTML escape: hexadecimal and decimal character references and character entities.
.IP
.B 0
Convert hexadecimal UTF-8 with each byte's hex enclosed within angle brackets (e.g.  <C3><A9>).
.IP
.B 1
Convert Common Lisp format hexadecimal numbers (e.g. #x00E9).
.IP
.B 2
Convert Perl format decimal numbers with prefix v (e.g. v233).
.IP
.B 3
Convert hexadecimal numbers with prefix $ (e.g. $00E9).
.IP
.B 4
Convert Postscript format hexadecimal numbers with prefix 16# (e.g. 16#00E9).
.IP
.B 5
Convert Common Lisp format hexadecimal numbers with prefix #16r (e.g. #16r00E9).
.IP
.B 6
Convert ADA format hexadecimal numbers with prefix 16# and suffix # (e.g. 16#00E9#).
.IP
.B 7
Convert Apache log format hexadecimal UTF-8 with each byte's hex preceded by a backslash-x (e.g.  \\xC3\\xA9). 
.IP
.B 8
Convert Microsoft OOXML format hexadecimal numbers with prefix _x and suffix _ (e.g. _x00E9_).
.IP
.B 9
Convert %\\u-escaped hexadecimal numbers (e.g. %\\u00E9).
.TP
.B \-h 
Help. Print the usage message and exit.
.TP
.B \-v 
Print program version information and exit.
.TP
.B \-m
Accept deprecated HTML entities lacking final semicolon, e.g. 
"&#x00E9" in place of "&#x00E9;".
.TP
.B \-p 
Pure. Assume that the input consists entirely of escapes except for arbitrary
(but non-null) amounts of separating whitespace.
.TP
.B \-q
Be quiet. Do not chat unnecessarily.
.sp 1
.TP
.B \-Z <format>
Convert input using the supplied format. The format
specified will be used as the format string in a call
to sscanf(3) with a single argument consisting of a pointer
to an unsigned long integer. For example, to obtain the same results
as with the \-U flag, the format would be: \\u%04X.
.PP
If the format is Quoted-Printable, although it is not strictly speaking
conversion of an ASCII escape to Unicode, in accordance with RFC 2045,
if an equal-sign occurs at the end of an input line, both the equal-sign
and the immediately following newline are skipped.
.PP
All options that accept hexadecimal input recognize both upper- and lower-case
hexadecimal digits.

.SH "EXIT STATUS"
.PP
The following values are returned on exit:

.IP "0 SUCCESS"
The input was successfully converted.

.IP "3 INFO"
The user requested information such as the version number or usage synopsis
and this has been provided.

.IP "5 BAD OPTION"
An incorrect option flag was given on the command line.

.IP "7 OUT OF MEMORY"
Additional memory was unsuccessfully requested.

.IP "8 BAD RECORD"
An ill-formed record was detected in the input.

.sp 1
.SH "SEE ALSO"
uni2ascii(1)
.sp 1
.SH AUTHOR
Bill Poser <billposer@alum.mit.edu>
.SH LICENSE
GNU General Public License