File: central-data-structures.md

package info (click to toggle)
cbmc 6.6.0-4
  • links: PTS
  • area: main
  • in suites: forky, sid, trixie
  • size: 153,852 kB
  • sloc: cpp: 386,459; ansic: 114,466; java: 28,405; python: 6,003; yacc: 4,552; makefile: 4,041; lex: 2,487; xml: 2,388; sh: 2,050; perl: 557; pascal: 184; javascript: 163; ada: 36
file content (184 lines) | stat: -rw-r--r-- 7,522 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
\ingroup module_hidden

\page central-data-structures Central Data Structures

# Central Data Structures

The following is some light technical documentation of the major data structures
used in the input transformation to Intermediate Representation (IR) inside
CBMC and the assorted CProver tools.

## GOTO models

The `goto_modelt` class is the top-level data structure that CBMC (and the other
tools) use for holding the GOTO intermediate representation. The following
diagram is a simplified view of how the data structures relate to each other -

```mermaid
erDiagram
  goto_modelt {
    symbol_tablet symbol_table
    goto_functionst goto_functions
  }
  goto_modelt ||--|| symbol_tablet : ""
  goto_modelt ||--|| goto_functionst : ""
  symbol_tablet {
    symbol_map symbols
  }
  symbol_tablet ||--o{ symbolt : ""
  symbolt {
    string name
  }
  goto_functionst {
    function_map functions
  }
  goto_functionst ||--o{ goto_functiont : ""
  goto_functiont {
    goto_programt body
    ordered_collection_of parameter_identifiers
  }
  goto_functiont ||--|| goto_programt : ""
  goto_programt {
    ordered_collection_of instructions
  }
  goto_programt ||--o{ instructiont : ""
  instructiont {
    enumeration instruction_type
    goto_program_codet code
    boolean_expression guard
    source_locationt source_location
  }
  instructiont ||--|| goto_program_instruction_typet : ""
  instructiont ||--o| goto_program_codet : ""
  instructiont ||--o| source_locationt : ""
  goto_program_codet ||--o| source_locationt : ""
```

A `goto_modelt` is effectively a pair, consisting of:

* A collection of GOTO functions.
* A symbol table containing symbol definitions which may be referred to by
  symbol expressions. Symbol expressions are found in the goto functions and the
  (sub-)expressions for the definitions of symbols.

In pseudocode, the type looks this:

```js
type goto_modelt {
  type goto_functionst = map<identifier, goto_functiont>
  type symbol_tablet = map<identifier, symbolt>
}
```

There is an abstract interface for goto models provided by the
`abstract_goto_modelt` class. This is defined and documented in the file
[`src/goto-programs/abstract_goto_model.h`](../../src/goto-programs/abstract_goto_model.h).
Ideally code which takes a goto model as input should accept any implementation
of the abstract interface rather than accepting the concrete `goto_modelt`
exclusively. This helps with compatibility with `jbmc` which uses lazy loading.
See the `lazy_goto_modelt` class which is defined and documented in
[`jbmc/src/java_bytecode/lazy_goto_model.h`](../../jbmc/src/java_bytecode/lazy_goto_model.h)
for details of lazy loading.

For further information about symbols see the `symbolt` class which is defined
and documented in [`src/util/symbol.h`](../../src/util/symbol.h) and the
`symbol_exprt` (symbol expression) class which is defined and documented in
[`src/util/std_expr.h`](../../src/util/std_expr.h).

## goto_functiont

A `goto_functiont` is also defined as a pair. It's designed to represent a function
at the goto level, and effectively it's the following data type (in pseudocode):

```js
type goto_functiont {
    goto_programt body
    list<identifiers> parameters
}
```

The `goto_programt` denoting the `body` of the function will be the subject of
a more elaborate explanation in the next section. The in-memory structure of a
goto function allows the `body` to be optional, but only functions with bodies
are included in the serialised goto binaries.

The `parameters` subcomponent is a list of identifiers that are to be looked-up
in the symbol-table for their definitions.

## goto_programt

A  goto program  is a sequence of GOTO instructions (`goto_instructiont`). For
details of the `goto_programt` class itself see the documentation in
[`goto_program.h`](../../src/goto-programs/goto_program.h). For further details
of the life cycle of goto programs see \ref goto-programs "goto programs"

An instruction (`goto_instructiont`) is a triple (an element with three subcomponents),
describing a GOTO level instruction with the following 3 component subtypes,
again in pseudocode:

```js
type goto_instructiont {
    instruction_type instr_type
    instruction      statement
    guard            boolean_expr
}
```

The pseudocode subtypes above require some more elaboration, so we will provide some extra
detail to illuminate some details at the code level:

* The `instruction_type` above corresponds to the `goto_program_instruction_typet` type
  listed in [`goto_program.h`](../../src/goto-programs/goto_program.h)
  * Some illustrative instruction types are `assign`, `function_call`, `return`, etc.
* The `instruction` is a statement represented by a `goto_instruction_codet`.
  * A `goto_instruction_codet` is an alias of `codet` (a data structure broadly representing
    a statement inside CBMC) that contains the actual code to be executed.
  * You can distinguish different statements by using the result of the
    `get_statement` member function of the `codet` class.
* The `guard` is an `exprt` (a data structure broadly representing an expression inside CBMC)
  that is expected to have type `bool`.
  * This is optional - not every instruction is expected to have a `guard` associated with it.

## source_locationt

Another important data structure that needs to be discussed at this point is
`source_locationt`.

`source_locationt` are attached into various `exprt`s (the data structure representing
various expressions, usually the result of some early processing, e.g. the result of the
frontend parsing a file).

This means that `codet`s, representing GOTO instructions *also* have a `source_locationt`
attached to them, by virtue of inheriting from `exprt`.

For the most part, `source_locationt` is something that is only being used when we print
various nodes (for property listing, counterexample/trace printing, etc).

It might be possible that a specific source location might point to a CBMC instrumentation
primitive (which might be reported as a `built-in-addition`) or there might even be no-source
location (because it might be part of harnesses generated as an example, that have no presence
in the user code).

## irept

`irep`s are the underlying data structure used to implement many of the data
structures in CBMC and the assorted tools. These include the `exprt`, `typet`,
`codet` and `source_locationt` classes. This is a tree data structure where
each node is expected to contain a string/ID and may have child nodes stored in
both a sequence of child nodes and a map of strings/IDs to child nodes. This
enables the singular `irept` data structure to be used to model graphs such as
ASTs, CFGs, etc.

The classes extending `irept` define how the higher level concepts are mapped
onto the underlying tree data structure. For this reason it is usually advised
that the member functions of the sub-classes should be used to access the data
held, rather than the member functions of the base `irept` class. This aids
potential future restructuring and associates accesses of the data with the
member functions which have the doxygen explaining what the data is.

The strings/IDs held within `irept` are of type `irep_idt`. These can be
converted to `std::string` using the `id2string` function. There is a mechanism
provided for casting expressions and types in
[`src/util/expr_cast.h`](../../src/util/expr_cast.h). In depth documentation
of the `irept` class itself can be found in
[`src/util/irep.h`](../../src/util/irep.h).