File: rawstring.yo

package info (click to toggle)
c%2B%2B-annotations 13.02.02-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 13,576 kB
  • sloc: cpp: 25,297; makefile: 1,523; ansic: 165; sh: 126; perl: 90; fortran: 27
file content (87 lines) | stat: -rw-r--r-- 3,331 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
Standard series of ASCII characters (a.k.a. emi(C strings)) are delimited by
double quotes, supporting
 hi(escape sequence) escape sequences like tt(\n, \\) and tt(\"), and ending
in 0-bytes. Such series of ASCII-characters are commonly known as
em(null-terminated byte strings) (singular: emi(NTBS), plural: em(NTBSs)).
bf(C)'s NTBS is the foundation upon which an enormous amount of code has
been built

In some cases it is attractive to be able to avoid having to use escape
sequences (e.g., in the context of XML). bf(C++) allows this using 
    hi(raw string literal) em(raw string literals).

Raw string literals start with an tt(R), followed by a double quote,
optionally followed by a label (which is an arbitrary sequence of non-blank
characters, followed by tt(OPENPAR)). The raw string ends at the
closing parenthesis tt(CLOSEPAR), followed by the label (if specified when
starting the raw string literal), which is in turn followed by a double
quote. Here are some examples:
        verb(    R"(A Raw \ "String")"
    R"delimiter(Another \ Raw "(String))delimiter")

In the first case, everything between tt("OPENPAR) and tt(CLOSEPAR") is
part of the string. Escape sequences aren't supported so the text tt(\ ")
within the first raw string literal defines three characters: a backslash, a
blank character and a double quote. The second example shows a raw string
defined between the markers tt("delimiter)tt(OPENPAR) and
tt(CLOSEPARdelimiter"). 

Raw string literals come in very handy when long, complex ascii-character
sequences (e.g., usage-info or long html-sequences) are used. In the end they
are just that: long NTBSs. Those long raw string literals should be separated
from the code that uses them, thus maintaining the readability of the using
code. 

As an illustration: the bf(bisonc++) parser generator supports an option
tt(--prompt). When specified, the code generated by bf(bisonc++) inserts
prompting code when debugging is requested. Directly inserting the raw string
literal into the function processing the prompting code results in code that
is very hard to read:
        verb(    void prompt(ostream &out)
    {
        if (d_genDebug)
            out << (d_options.prompt() ? R"(
            if (d_debug__)
            {
                s_out__ << "\n================\n"
                           "? " << dflush__;
                std::string s;
                getline(std::cin, s);
            }
    )"              : R"(
            if (d_debug__)
                s_out__ << '\n';
    )"
                    ) << '\n';        
    })

Readability is greatly enhanced by defining the raw string literals as named
NTBSs, defined in the source file's anonymous namespace (cf. chapter
ref(NAMESPACE)):
        verb(    namespace {
    
    char const noPrompt[] = 
    R"(
            if (d_debug__)
                s_out__ << '\n';
    )";
    
    char const doPrompt[] = 
    R"(
            if (d_debug__)
            {
                s_out__ << "\n================\n"
                           "? " << dflush__;
                std::string s;
                getline(std::cin, s);
            }
    )";
    
    } // anonymous namespace
    
    void prompt(ostream &out)
    {
        if (d_genDebug)
            out << (d_options.prompt() ? doPrompt : noPrompt) << '\n';        
    })