File: serialization_format.markdown

package info (click to toggle)
nqp 2020.12%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 8,880 kB
  • sloc: java: 26,979; perl: 3,386; ansic: 450; makefile: 203; javascript: 68; sh: 1
file content (358 lines) | stat: -rw-r--r-- 18,273 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
# Serialization Format

## Header
The header contains a version, followed by offsets (from the start of the
data blob) where various tables can be located.

    +---------------------------------------------------------+
    | Version                                                 |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the dependencies table   |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the dependencies table             |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the STables table        |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the STables table                  |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the STables data         |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the objects table        |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the objects table                  |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the objects data         |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the closures table       |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the closures table                 |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the contexts table       |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the contexts table                 |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the contexts data        |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the repossessions table  |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the repossessions table            |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset (from start of data) of the parameterization     |
    | interns data                                            |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Number of entries in the parameterization intern data   |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

## Dependencies Table
This table describes the Serialization Contexts (SC) that are required to
already be loaded in order to load this one. The number of entries this table
has, is supplied by the header. Each entry looks as follows.

    +---------------------------------------------------------+
    | Index into the string heap of the SC unique ID          |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index into the string heap of the SC description        |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

## STables Table
This table describes the 6model STables that have been serialized. Each entry
contains the following items.

    +---------------------------------------------------------+
    | Index into the string heap a string holding the name of |
    | the representation (REPR) that this STable points to.   |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset from the start of the STable data chunk where    |
    | the data for this STable has been serialized            |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset from the start of the STable data chunk where    |
    | the REPR data for this STable has been serialized (you  |
    | can get there by reading everything from the previous   |
    | offset, but it may not be efficient if you want to get  |
    | an idea of the object size first)                       |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

## STables Data
The STable is serialized just by a sequence of primitives, in the
following order.

* HOW (object reference)
* WHAT (object reference)
* WHO (variant)
* method_cache (VM hash)
* vtable_length (native int)
* \[each of the items\] (variant)
* type_check_cache_length (native int)
* \[each of the items in type_check_cache\] (object reference)
* mode_flags (native int)   
* boolification_spec (native int flag for if it exists; if true, then a native int
  for the mode and a ref for the method slot)
* container_spec (native int flag for if it exists; if true, then ref/string/int for
  the attribute and ref for the fetch method)
* invocation_spec (native int flag for if it exists; if true, then ref for class handle,
  str for attr name, int for hint and ref for invocation handler)

After this, the REPR data is serialized (which is specific to the REPR in question).

## Objects Table
This table describes the objects that have been serialized. Each entry
contains the following items.

    +---------------------------------------------------------+
    | Base-1 index of the SC that contains the STable for the |
    | context, or 0 if it is in the current SC.               |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in the SC where the STable can be located.        |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset from the start of the object data chunk where    |
    | the data for this object has been serialized            |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Flags. Currently, just 1 if it's a normal object and 0  |
    | if it is a type object.                                 |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

The exact data stored for an object is up to its representation.

## Closures Table
This table describes the closures we have taken during compilation and
that need to be re-instated during deserialization, along with
references to their relevant outer contexts.

    +---------------------------------------------------------+
    | Base-1 index of the SC that contains the static code    |
    | reference that this closure is a clone of, or 0 if it   |
    | is in the current SC.                                   |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in that SC where the static code ref is located.  |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | 1-based index into the contexts table where the outer   |
    | context for this closure can be found, or zero if there |
    | is none of interest                                     |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Flag for if the closure has an associated code object.  |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | If it has one, this is the 1-based SC index of the code |
    | object, or 0 if in the current SC.                      |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | If it has one, this is the index in that SC where the   |
    | code object can be located.                             |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

## Contexts Table
This table describes the contexts that exist as outer scopes for
closures taking during compilation.

    +---------------------------------------------------------+
    | Base-1 index of the SC for the code ref the context is  |
    | associated with, or 0 if it is in the current SC.       |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in that SC where the code ref is located.         |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Offset into the contexts data segment where the values  |
    | for the various lexical entries may be found            |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | 1-based index into the contexts table where the next    |
    | outer context in the chain can be found, or zero if     |
    | there is none                                           |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

## Repossessions Table
This table describes the objects serialized in this SC that were
originally owned by another SC, but were taken over by this one
due to being modified while it was being compiled.

    +---------------------------------------------------------+
    | Repossessed entity type (0 = object, 1 = STable)        |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in our object list where the repossessed object   |
    | is located                                              |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Base-1 index of the SC that used to own the object (0   |
    | is not legal here)                                      |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in that SC where the original object is located.  |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

## Parameterization Interning Data
Parameterized types are interned in a VM instance, meaning that if we
deserialize two compilation units that have both serialized identical
parameterizations, we will allow the first unit's parameterization to
"win". The data in this section of the serialization blob identifies
types whose STables and type objects are subject to this interning.

This section exclusively contains entries meeting the following rules:

* The parametric type must be from another SC
* All the parameters must be objects from another SC

Each entry starts as follows:

    +---------------------------------------------------------+
    | Base-1 index of the SC that owns the parametric type    |
    | (this is the type that was parameterized to get the one |
    | we're dealing with the interning of). A value of 0 is   |
    | invalid, as that would imply the current SC.            |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in that SC where the parametric type's type       |
    | object can be found                                     |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in our object list where the type object we may   |
    | intern is located.                                      |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | Index in our STable list where the STable we may intern |
    | is located.                                             |
    |    32-bit integer                                       |
    +---------------------------------------------------------+
    | The number of parameters in this parameterization       |
    |    32-bit integer                                       |
    +---------------------------------------------------------+

Following this, for each parameter an object reference is written.

## Primitives
This section describes how the various primitive types known to the
serializer are stored.

### Native Integers
These are stored as 64-bit integers.

### Native Numbers
These are stored as 64-bit floating point numbers (doubles).

### Strings
These are stored as 32-bit indexes into the strings heap.

### Variant Reference
Most times we have a pointer to serialize, we will use a variant
reference to do so. The reason being that 

    1 = NULL
    2 = Object reference
    3 = VM NULL
    4 = VM Boxed Integer
    5 = VM Boxed Number
    6 = VM Boxed String
    7 = VM Array of Variant References
    8 = VM Array of Strings
    9 = VM Array of Integers
    10 = VM Hash of Variant References with String Keys
    11 = VM Static Code Reference
    12 = VM Cloned Code Reference (a serialized closure)

### Object references
These are stored as a 32-bit SC index (base 1 into the dependencies
table, or 0 for current SC), followed by a 32-bit index into the
selected SC.

### VM NULL
We store no extra info for those.
    
### VM Array of Variants
These are stored as an element count as a 32-bit integer, followed by the variants.

### VM Array of Strings
These are stored as an element count, followed by the string heap indexes.

### VM Array of Integers
These are stored as an element count, followed by the 64-bit integers.

### VM Hash of Variants with String Keys
These are stored as an element count. This is followed by that number of
string/variant pairs.

### Code References
These are always serialized as the SC containing the code reference, plus
the index where it can be located.

### VM Static Code Reference
The simplest case of code referenced from objects is when the VM has never
had to invoke the code during compilation. In this case, the thunk for the
dynamic compilation will have been tagged with a STATIC_CODE_REF property
and placed into the SC for the current compilation unit. When we encounter
such a case, we simply serialize the SC that owns the code object and the
code ref index that it is at.

Deserialization relies on the deserializer being given a fixed up list of
code objects, pointing to the compiled code refs. These are indexed just as
the dynamic compilation stubs were, so references to them can be resolved.
This also works out in the cross-context case.

### Dynamic Compilation
When dynamic compilation is performed, the SC should be updated with the
code ref to the now-compiled code. Additionally, this needs to be tagged
as a static code reference (and will also be tagged with the SC in question).
In the simplest case, we'll never end up referencing this execution of the
code - but if closures are taken pointing to it, this will happen, and needs
some special handling.

### Closures
If we encounter a code object that is not marked as static, but already has been
given an SC, then we can simply write the reference out in just the same way as
we would for a static code object. This means something has already taken care of
the hard work. They are deserializable in a similar way.

The interesting part comes when we encounter a code ref that hasn't been tagged with
an SC yet. First, we trace back from it to the correct static code ref (probably via
the reference back to the static code ref that's held in the static lexical scope info).
We make an entry in the closures table indicating the static code ref that needs to
be cloned in order to start recreating the closure.

Next, we consider the outer. There are two things involved here. One is the context,
which represents the lexicals declared in that context. The second is the code object
that is associated with the outer. There are some options for this.

1) The outer points to a dynamic compilation boundary (tagged DYN_COMP_BOUNDARY).
   This means that it is a "fake" frame that exists because the outer scopes beyond
   here were not done compiling at the point of invocation. We thus don't want to
   actually do any serialization of this context.

2) The outer points to no interesting outer whatsoever, and represents the point that
   we should stop doing any serialization. It will be tagged UNIT_BOUNDARY.

3) The outer points to a context with lexicals values that need to be serialized. In
   this case, it may be an invocation from the static code reference, or it may be an
   invocation from another closure.