File: unicode.yo

package info (click to toggle)
c%2B%2B-annotations 13.02.02-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 13,576 kB
  • sloc: cpp: 25,297; makefile: 1,523; ansic: 165; sh: 126; perl: 90; fortran: 27
file content (28 lines) | stat: -rw-r--r-- 1,248 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
In bf(C++) string literals can be defined as NTBSs.  Prepending an NTBS
by tt(L) (e.g., tt(L"hello")) defines a tt(wchar_t) string literal.

bf(C++) also supports 8, 16 and 32 bit i(Unicode) encoded
strings. Furthermore, two new data types are introduced: tt(char16_t) and
tt(char32_t) storing, respectively, a ti(UTF-16) and a ti(UTF-32) unicode
value.

A tt(char) type value fits in a tt(utf_8) unicode value. For character sets
exceeding 256 different values wider types (like tt(char16_t) or tt(char32_t))
should be used.

String literals for the various types of unicode encodings (and associated
variables) can be defined as follows:
        verb(    char     utf_8[] = u8"This is UTF-8 encoded.";
    char16_t utf16[] = u"This is UTF-16 encoded.";
    char32_t utf32[] = U"This is UTF-32 encoded.";)

Alternatively, unicode constants may be defined using the ti(\u) escape
sequence, followed by a hexadecimal value. Depending on the type of the
unicode variable (or constant) a tt(UTF-8, UTF-16) or tt(UTF-32) value is
used. E.g.,
        verb(    char     utf_8[] = u8"\u2018";
    char16_t utf16[] = u"\u2018";
    char32_t utf32[] = U"\u2018";)

Unicode strings can be delimited by double quotes but raw string literals
can also be used.