File: Format-specification.md

package info (click to toggle)
dtfabric 20240211-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 868 kB
  • sloc: python: 5,785; sh: 24; makefile: 19
file content (546 lines) | stat: -rw-r--r-- 16,090 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
# Format specification

## Overview

Data types fabric (dtFabric) is a YAML-based definition language to specify
format and data types.

* storage data types, such as integers, characters, structures
* semantic data types, such as constants, enumerations
* layout data types, such as format, vectors, trees

## Data type definition

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
aliases | List of strings | No | List of alternative names for the data type
description | string | No | Description of the data type
name | string | Yes | Name of the data type
type | string | Yes | Definition type <br> See section: [Data type definition types](#data-type-definition-types)
urls | List of strings | No | List of URLS that contain more information about the data type

### Data type definition types

Identifier | Description
--- | ---
boolean | Boolean
character | Character
constant | Constant
enumeration | Enumeration
floating-point | Floating-point
format | Data format metadata <br> See section: [Data format](#data-format)
integer | Integer
padding | Alignment padding, only supported as a member definition of a structure data type
stream | Stream
string | String
structure | Structure
structure-family | **TODO: add description**
union | Union data type
uuid | UUID (or GUID)

**TODO: consider adding the following types**

Identifier | Description
--- | ---
bit-field | Bit field (or group of bits)
fixed-point | Fixed-point data type
reference | **TODO: add description**

## Storage data types

Storage data types are data types that represent stored (or serialized) values.
In addition to the [Data type definition attributes](#data-type-definition)
storage data types also define:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
attributes | mapping | No | Data type attributes <br> See section: [Storage data type definition attributes](#storage-data-type-definition-attributes)

### Storage data type definition attributes

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
byte_order | string | No | Byte-order of the data type <br> Valid options are: "big-endian", "little-endian", "native" <br> The default is native

---
**NOTE:** middle-endian is a valid byte-ordering but currently not supported.

---

### Fixed-size data types

In addition to the [Storage data type definition attributes](#storage-data-type-definition-attributes)
fixed-size data types also define the following attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
size | integer or string | No | size of data type in number of units or "native" if architecture dependent <br> The default is "native"
units | string | No | units of the size of the data type <br> The default is bytes

#### Boolean

A boolean is a data type to represent true-or-false values.

```yaml
name: bool32
aliases: [BOOL]
type: boolean
description: 32-bit boolean type
attributes:
  size: 4
  units: bytes
  false_value: 0
  true_value: 1
```

Boolean data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
false_value | integer | No | Integer value that represents False <br> The default is 0
true_value | integer | No | Integer value that represents True <br> The default is not-set, which represent any value except for the false_value

Currently supported size attribute values are: 1, 2 and 4 bytes.

#### Character

A character is a data type to represent elements of textual strings.

```yaml
name: wchar16
aliases: [WCHAR]
type: character
description: 16-bit wide character type
attributes:
  size: 2
  units: bytes
```

Currently supported size attribute values are: 1, 2 and 4 bytes.

#### Fixed-point

A fixed-point is a data type to represent elements of fixed-point values.

**TODO: add example**

#### Floating-point

A floating-point is a data type to represent elements of floating-point values.

```yaml
name: float64
aliases: [double, DOUBLE]
type: floating-point
description: 64-bit double precision floating-point type
attributes:
  size: 8
  units: bytes
```

Currently supported size attribute values are: 4 and 8 bytes.

#### Integer

An integer is a data type to represent elements of integer values.

```yaml
name: int32le
aliases: [LONG, LONG32]
type: integer
description: 32-bit little-endian signed integer type
attributes:
  byte_order: little-endian
  format: signed
  size: 4
  units: bytes
```

Integer data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
format | string | No | Signed or unsiged <br> The default is signed

Currently supported size attribute values are: 1, 2, 4 and 8 bytes.

#### UUID (or GUID)

An UUID (or GUID) is a data type to represent a Globally or Universal unique
identifier (GUID or UUID) data types.

```yaml
name: known_folder_identifier
type: uuid
description: Known folder identifier.
attributes:
  byte_order: little-endian
```

Currently supported size attribute values are: 16 bytes.

### Variable-sized data types

#### Sequence

A sequence is a data type to represent a sequence of individual elements such
as an array of integers.

```yaml
name: page_numbers
type: sequence
description: Array of 32-bit page numbers.
element_data_type: int32
number_of_elements: 32
```

Sequence data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
element_data_type | string | Yes | Data type of sequence element
elements_data_size | integer or string | See note | Integer value or expression to determine the data size of the elements in the sequence
elements_terminator | integer | See note | element value that indicates the end-of-string
number_of_elements | integer or string | See note | Integer value or expression to determine the number of elements in the sequence

---
**NOTE:** At least one of the elements attributes: "elements_data_size",
"elements_terminator" or "number_of_elements" must be set. As of version
20200621 "elements_terminator" can be set in combination with
"elements_data_size" or "number_of_elements".

---

**TODO: describe expressions and the map context**

#### Stream

A stream is a data type to represent a continous sequence of elements such as
a byte stream.

```yaml
name: data
type: stream
element_data_type: byte
number_of_elements: data_size
```

Stream data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
element_data_type | string | Yes | Data type of stream element
elements_data_size | integer or string | See note | Integer value or expression to determine the data size of the elements in the stream
elements_terminator | integer | See note | element value that indicates the end-of-string
number_of_elements | integer or string | See note | Integer value or expression to determine the number of elements in the stream

---
**NOTE:** At least one of the elements attributes: "elements_data_size",
"elements_terminator" or "number_of_elements" must be set. As of version
20200621 "elements_terminator" can be set in combination with
"elements_data_size" or "number_of_elements".

---

**TODO: describe expressions and the map context**

#### String

A string is a data type to represent a continous sequence of elements with a
known encoding such as an UTF-16 formatted string.

```yaml
name: utf16le_string_with_size
type: string
ecoding: utf-16-le
element_data_type: wchar16
elements_data_size: string_data_size
```

```yaml
name: utf16le_string_with_terminator
type: string
ecoding: utf-16-le
element_data_type: wchar16
elements_terminator: "\x00\x00"
```

String data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
encoding | string | Yes | Encoding of the string
element_data_type | string | Yes | Data type of string element
elements_data_size | integer or string | See note | Integer value or expression to determine the data size of the elements in the string
elements_terminator | integer | See note | element value that indicates the end-of-string
number_of_elements | integer or string | See note | Integer value or expression to determine the number of elements in the string

---
**NOTE:** At least one of the elements attributes: "elements_data_size",
"elements_terminator" or "number_of_elements" must be set. As of version
20200621 "elements_terminator" can be set in combination with
"elements_data_size" or "number_of_elements".

---

**TODO: describe elements_data_size and number_of_elements expressions and the map context**

### Storage data types with members

In addition to the [Storage data type definition attributes](#storage-data-type-definition-attributes)
storage data types with member also define the following attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
members | list | Yes | List of member definitions <br> See section: [Member definition](#member-definition)

#### Member definition

A member definition supports the following attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
aliases | List of strings | No | List of alternative names for the member
condition | string | No | Condition under which the member is condisidered to be present
data_type | string | See note | Name of the data type definition of the member
description | string | No | Description of the member
name | string | See note | Name of the member
type | string | See note | Name of the definition type of the member <br> See section: [Data type definition types](#data-type-definition-types)
value | integer or string | See note | Supported value
values | List of integers or strings | See note | Supported values

---
**NOTE:** The name attribute: "name" must be set for storage data types with
members except for the Union type where it is optional.

---

---
**NOTE:** One of the type attributes: "data_type" or "type" must be set. The
following definition types cannot be directly defined as a member definition:
"constant", "enumeration", "format" and "structure".

---

**TODO: describe member definition not supporting attributes.**

---
**NOTE:** Both the value attributes: "value" and "values" are optional but only
one is supported at a time.

---

**TODO: describe conditions**

#### Structure

A structure is a data type to represent a composition of members of other
data types.

**TODO: add structure size hint?**

```yaml
name: point3d
aliases: [POINT]
type: structure
description: Point in 3 dimensional space.
attributes:
  byte_order: little-endian
members:
- name: x
  aliases: [XCOORD]
  data_type: int32
- name: y
  data_type: int32
- name: z
  data_type: int32
```

```yaml
name: sphere3d
type: structure
description: Sphere in 3 dimensional space.
members:
- name: number_of_triangles
  data_type: int32
- name: triangles
  type: sequence
  element_data_type: triangle3d
  number_of_elements: sphere3d.number_of_triangles
```

#### Padding

Padding is a member definition to represent (alignment) padding as a byte
stream.

```yaml
name: padding1
type: padding
alignment_size: 8
```

Padding data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
alignment_size | integer | Yes | Alignment size

Currently supported alignment_size attribute values are: 2, 4, 8 and 16 bytes.

---
**NOTE:** The padding is currently considered as required in the data stream.

---

#### Union

**TODO: describe union**

## Semantic types

### Constant

A constant is a data type to provide meaning (semantic value) to a single
predefined value. The value of a constant is typically not stored in a byte
stream but used at compile time.

```yaml
name: maximum_number_of_back_traces
aliases: [AVRF_MAX_TRACES]
type: constant
description: Application verifier resource enumeration maximum number of back traces
urls: ['https://msdn.microsoft.com/en-us/library/bb432193(v=vs.85).aspx']
value: 13
```

Constant data type specfic attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
value | integer or string | Yes | Integer or string value that the constant represents

### Enumeration

An enumeration is a data type to provide meaning (semantic value) to one or more
predefined values.

```yaml
name: handle_trace_operation_types
aliases: [eHANDLE_TRACE_OPERATIONS]
type: enumeration
description: Application verifier resource enumeration handle trace operation types
urls: ['https://msdn.microsoft.com/en-us/library/bb432251(v=vs.85).aspx']
values:
- name: OperationDbUnused
  number: 0
  description: Unused
- name: OperationDbOPEN
  number: 1
  description: Open (create) handle operation
- name: OperationDbCLOSE
  number: 2
  description: Close handle operation
- name: OperationDbBADREF
  number: 3
  description: Invalid handle operation
```

Enumeration value attributes:

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
aliases | list of strings | No | List of alternative names for the enumeration
description | string | No | Description of the enumeration value
name | string | Yes | Name the enumeration value maps to
number | integer | Yes | Number the enumeration value maps to

**TODO: add description**

## Layout types

### Data format

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
attributes | mapping | No | Data type attributes <br> See section: [Data format attributes](#data-format-attributes)
description | string | No | Description of the format
layout | mapping | Yes | Format layout definition
metadata | mapping | No | Metadata
name | string | Yes | Name of the format
type | string | Yes | Definition type <br> See section: [Data type definition types](#data-type-definition-types)
urls | List of strings | No | List of URLS that contain more information about the format

Example:

```yaml
name: mdmp
type: format
description: Minidump file format
urls: ['https://docs.microsoft.com/en-us/windows/win32/debug/minidump-files']
metadata:
  authors: ['John Doe <john.doe@example.com>']
  year: 2022
attributes:
  byte_order: big-endian
layout:
- data_type: file_header
  offset: 0
```

#### Data format attributes

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
byte_order | string | No | Byte-order of the data type <br> Valid options are: "big-endian", "little-endian", "native" <br> The default is native

---
**NOTE:** middle-endian is a valid byte-ordering but currently not supported.

---

### Structure family

A structure family is a layout type to represent multiple generations
(versions) of the same structure.

```yaml
name: group_descriptor
type: structure-family
description: Group descriptor of Extended File System version 2, 3 and 4
base: group_descriptor_base
members:
- group_descriptor_ext2
- group_descriptor_ext4
```

The structure members defined in the base structure are exposed at runtime.

**TODO:** define behavior if a structure family member does not define a
structure member defined in the base structure.

### Structure group

A structure group is a layout type to represent a group structures that share
a common trait.

```yaml
name: bsm_token
type: structure-group
description: BSM token group
base: bsm_token_base
identifier: token_type
members:
- bsm_token_arg32
- bsm_token_arg64
```

The structure group members are required to define the identifier structure
member with its values specific to the group member.

Attribute name | Attribute type | Required | Description
--- | --- | --- | ---
base | string | Yes | Base data type. Note that this must be a structure data type.
default | string | None | Default data type as fallback if no corresponding member data type is defined. Note that this must be a structure data type.
identifier | string | Yes | Name of the member in the base (structure) data type that identified a (group) member.
members | list | Yes | List of (group) member data types. Note that these must be a structure data types.