File: unicode.yo

package info (click to toggle)

c%2B%2B-annotations 13.02.02-1

links: PTS, VCS
area: main
in suites: forky, sid
size: 13,576 kB
sloc: cpp: 25,297; makefile: 1,523; ansic: 165; sh: 126; perl: 90; fortran: 27

file content (28 lines) | stat: -rw-r--r-- 1,248 bytes

parent folder | download | duplicates (4)

In bf(C++) string literals can be defined as NTBSs.  Prepending an NTBS
by tt(L) (e.g., tt(L"hello")) defines a tt(wchar_t) string literal.

bf(C++) also supports 8, 16 and 32 bit i(Unicode) encoded
strings. Furthermore, two new data types are introduced: tt(char16_t) and
tt(char32_t) storing, respectively, a ti(UTF-16) and a ti(UTF-32) unicode
value.

A tt(char) type value fits in a tt(utf_8) unicode value. For character sets
exceeding 256 different values wider types (like tt(char16_t) or tt(char32_t))
should be used.

String literals for the various types of unicode encodings (and associated
variables) can be defined as follows:
        verb(    char     utf_8[] = u8"This is UTF-8 encoded.";
    char16_t utf16[] = u"This is UTF-16 encoded.";
    char32_t utf32[] = U"This is UTF-32 encoded.";)

Alternatively, unicode constants may be defined using the ti(\u) escape
sequence, followed by a hexadecimal value. Depending on the type of the
unicode variable (or constant) a tt(UTF-8, UTF-16) or tt(UTF-32) value is
used. E.g.,
        verb(    char     utf_8[] = u8"\u2018";
    char16_t utf16[] = u"\u2018";
    char32_t utf32[] = U"\u2018";)

Unicode strings can be delimited by double quotes but raw string literals
can also be used.