1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
|
.TH GOLF 2gg $VERSION $DATE Development Tools
.SH NAME
utf-text \- (UTF)
.SH PURPOSE
Convert UTF (UTF8 or UTF16) string to text.
.SH SYNTAX
.RS 4
.EX
utf-text <utf> \\
[ to <text> ] \\
[ length <length> ] \\
[ status <status> ] \\
[ error-text <error text> ]
.EE
.RE
.SH DESCRIPTION
utf-text will convert <utf> text to <text> (specified with "to" clause). If <text> is omitted, then the result of conversion is output.
<utf> is a string that may contain UTF characters (as 2, 3 or 4 bytes representing a unicode character). Encoding creates a string that can be used as a value where text representation of UTF is required. utf-text is performed according to \fBRFC7159\fP and \fBRFC3629\fP (UTF standard).
Note that hexadecimal characters used for Unicode (such as \eu21d7) are always lowercase. Solidus character ("/") is not escaped, although \fBtext-utf\fP will correctly process it if the input has it escaped.
The number of bytes in <utf> to be converted can be specified with <length> in "length" clause. If <length> is not specified, it is the length of string <utf>. Note that a single UTF character can be anywhere between 1 to 4 bytes. For example "љ" is 2 bytes in length.
The status of encoding can be obtained in number <status>. <status> is the string length of the result in <text> (or the number of bytes output if <text> is omitted), or -1 if error occurred (meaning <utf> is an invalid UTF) in which case <text> (if specified) is an empty string and the error text can be obtained in <error text> in "error-text" clause.
.SH EXAMPLES
Convert UTF string to text and verify the expected result:
.RS 4
.EX
// UTF string
set-string utf_str = "\[char92]"Doc\[char92]"\en\et\eb\ef\er\et⇗⇘\et▷◮𝄞ᏫⲠш\en/\[char92]"()\et"
// Convert UTF string to text
utf-text utf_str status encstatus to text_text
// This is the text expected
(( expected_result
@\[char92]"Doc\[char92]"\en\et\eb\ef\er\et\eu21d7\eu21d8\et\eu25b7\eu25ee\eud834\eudd1e\eu13eb\eu2ca0\eu0448\en/\[char92]"()\et
))
// Make sure conversion was okay, decs is the length of the result (encj string)
if-true text_text equal expected_result and encstatus not-equal -1
@decode-text worked okay
end-if
.EE
.RE
.SH SEE ALSO
UTF
\fBtext-utf\fP
\fButf-text\fP
See all
\fBdocumentation\fP
|