File: binary.md

package info (click to toggle)
glaze 7.0.2-3
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 9,036 kB
  • sloc: cpp: 142,035; sh: 98; ansic: 26; makefile: 12
file content (556 lines) | stat: -rw-r--r-- 17,590 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
# Binary Format (BEVE)

Glaze provides a binary format to send and receive messages like JSON, but with significantly improved performance and message size savings.

The binary specification is known as [BEVE](https://github.com/beve-org/beve).

**Write BEVE**

```c++
my_struct s{};
std::vector<std::byte> buffer{};
auto ec = glz::write_beve(s, buffer);
if (!ec) {
   // Success: ec.count contains bytes written
}
```

**Read BEVE**

```c++
my_struct s{};
auto ec = glz::read_beve(s, buffer);
if (!ec) {
   // Success
}
```

> [!NOTE]
>
> Reading binary is safe for invalid input and does not require null terminated buffers.

## Calculate BEVE Size

The `glz::beve_size` function calculates the exact number of bytes needed to serialize a value to BEVE format, without actually performing the serialization. This is useful when you need to pre-allocate a buffer of the exact size, such as when writing to shared memory for inter-process communication.

**Basic Usage**

```c++
my_struct s{};
size_t size = glz::beve_size(s);
// size contains the exact number of bytes needed to serialize s
```

**Shared Memory Example**

```c++
#include "glaze/beve.hpp"

struct SensorData {
   uint64_t timestamp;
   double temperature;
   std::vector<double> readings;
};

// Calculate size before allocation
SensorData data{12345, 98.6, {1.0, 2.0, 3.0}};
size_t size = glz::beve_size(data);

// Allocate shared memory with exact size
void* shm = mmap(nullptr, size, PROT_READ | PROT_WRITE,
                 MAP_SHARED | MAP_ANONYMOUS, -1, 0);

// Serialize directly to pre-allocated buffer
std::span<char> buffer{static_cast<char*>(shm), size};
glz::write_beve(data, buffer);
```

**Untagged Size Calculation**

For untagged serialization (structs written as arrays without keys), use `glz::beve_size_untagged`:

```c++
my_struct s{};
size_t tagged_size = glz::beve_size(s);           // with keys
size_t untagged_size = glz::beve_size_untagged(s); // without keys (smaller)
```

**Compressed Integer Size Helper**

BEVE uses variable-length encoding for integers in size/count fields. The `glz::compressed_int_size` helper calculates how many bytes a given integer will occupy:

```c++
glz::compressed_int_size(0);           // 1 byte (values < 64)
glz::compressed_int_size(63);          // 1 byte
glz::compressed_int_size(64);          // 2 bytes (values < 16384)
glz::compressed_int_size(16383);       // 2 bytes
glz::compressed_int_size(16384);       // 4 bytes (values < 1073741824)
glz::compressed_int_size(1073741824);  // 8 bytes

// Compile-time version
static_assert(glz::compressed_int_size<100>() == 2);
```

**Supported Types**

`glz::beve_size` supports all types that can be serialized to BEVE:

- Primitives: `bool`, integers, floating-point, `char`, enums
- Strings: `std::string`, `std::string_view`, `const char*`
- Containers: `vector`, `array`, `map`, `set`, `list`, `deque`, `span`
- Nullable: `optional`, `shared_ptr`, `unique_ptr`, raw pointers
- Compound: `variant`, `tuple`, `pair`, `complex`
- Glaze types: structs with `glz::meta`, `glz::obj`, `glz::raw_json`
- Nested structures of arbitrary depth

## Peek BEVE Header

The `glz::beve_peek_header` function examines a BEVE buffer's header to extract type and element count information **without performing full deserialization**. This is the counterpart to `glz::beve_size` - while `beve_size` calculates the size before writing, `beve_peek_header` inspects the size/count after receiving data.

**Use Cases**

- **Pre-allocation**: Know how many elements to `reserve()` before deserializing a vector
- **Buffer validation**: Check structure and bounds before committing to full parse
- **Routing decisions**: Determine the type of incoming data before processing
- **Memory budgeting**: Verify element counts against limits before allocation

**Basic Usage**

```c++
#include "glaze/beve.hpp"

std::vector<int> data{1, 2, 3, 4, 5};
auto buffer = glz::write_beve(data).value();

// Peek at header without deserializing
auto result = glz::beve_peek_header(buffer);
if (result) {
   // result->type == glz::tag::typed_array (4)
   // result->count == 5 (number of elements)
   // result->header_size == 2 (bytes consumed by header)
}
```

**Return Type: `glz::beve_header`**

```c++
struct beve_header {
   uint8_t tag{};        // Raw tag byte
   uint8_t type{};       // Base type (see below)
   uint8_t ext_type{};   // For extensions: subtype (variant, complex, etc.)
   size_t count{};       // Element count, string length, variant index, etc.
   size_t header_size{}; // Bytes consumed by tag + count encoding
};
```

**Base Types (`type` field)**

| Value | Type | `count` meaning |
|-------|------|-----------------|
| 0 | null/boolean | 0 for null, 1 for boolean |
| 1 | number | 1 |
| 2 | string | String length in bytes |
| 3 | object | Number of key-value pairs |
| 4 | typed_array | Number of elements |
| 5 | generic_array | Number of elements |
| 6 | extensions | See extension subtypes below |

**Extension Subtypes (`ext_type` field when `type == 6`)**

| ext_type | Name | `count` meaning | `header_size` |
|----------|------|-----------------|---------------|
| `glz::extension::delimiter` (0) | Delimiter | 0 | 1 |
| `glz::extension::variant` (1) | Variant | Variant index | 1 + compressed_int size |
| `glz::extension::complex` (3) | Complex number | 2 (real + imag) | 2 |
| `glz::extension::complex` (3) | Complex array | Element count | 2 + compressed_int size |

For complex types, distinguish single complex vs array by checking if `count == 2` and `header_size == 2` (single) or `header_size > 2` (array).

**Pre-allocation Example**

```c++
// Receive BEVE data from network/file
std::string buffer = receive_beve_data();

// Peek to get element count
auto header = glz::beve_peek_header(buffer);
if (!header) {
   handle_error(header.error());
   return;
}

// Pre-allocate based on peeked count
std::vector<double> values;
values.reserve(header->count);

// Now deserialize - vector won't need to reallocate
glz::read_beve(values, buffer);
```

**Validation Example**

```c++
auto header = glz::beve_peek_header(buffer);
if (!header) {
   return unexpected(header.error());
}

// Reject oversized arrays
constexpr size_t max_elements = 10000;
if (header->type == glz::tag::typed_array && header->count > max_elements) {
   return unexpected(error_code::invalid_length);
}

// Proceed with deserialization
return glz::read_beve<std::vector<int>>(buffer);
```

**Variant Index Example**

For variants, `count` contains the variant index, allowing you to determine which type is stored before deserializing:

```c++
using MyVariant = std::variant<int, std::string, double>;
std::string buffer = receive_data();

auto header = glz::beve_peek_header(buffer);
if (header && header->type == glz::tag::extensions
           && header->ext_type == glz::extension::variant) {
   switch (header->count) {
      case 0: std::cout << "Contains int\n"; break;
      case 1: std::cout << "Contains string\n"; break;
      case 2: std::cout << "Contains double\n"; break;
   }
}

MyVariant value;
glz::read_beve(value, buffer);
```

**Raw Pointer Overload**

For C-style buffers:

```c++
const void* data = /* ... */;
size_t size = /* ... */;

auto result = glz::beve_peek_header(data, size);
```

**Peek Header at Offset**

Use `glz::beve_peek_header_at` to peek at headers at arbitrary byte offsets without slicing or copying the buffer:

```c++
std::string buffer = /* BEVE data */;
size_t offset = /* position to inspect */;

auto header = glz::beve_peek_header_at(buffer, offset);
if (header) {
   std::cout << "Type: " << (int)header->type
             << ", Count: " << header->count
             << ", Header size: " << header->header_size << "\n";
}
```

This is useful for:

- **Buffers with custom prefixes**: Skip past application headers to the BEVE payload
- **Memory-mapped files**: Seek to specific positions without copying data
- **Resuming partial reads**: Continue parsing from where you left off
- **Concatenated/delimited streams**: Inspect each object before deserializing
- **Embedded BEVE**: Parse BEVE data within larger binary structures
- **Validation**: Check element counts at specific offsets against limits

```c++
// Example: Buffer with 8-byte custom header followed by BEVE data
std::string buffer = /* custom_header (8 bytes) + BEVE payload */;

auto header = glz::beve_peek_header_at(buffer, 8);  // Skip custom header
if (header) {
   // header->type, header->count, header->header_size
}
```

```c++
// Example: Multiple values in one buffer
int32_t val1 = 42;
std::string val2 = "hello";

auto buffer1 = glz::write_beve(val1).value();
auto buffer2 = glz::write_beve(val2).value();
std::string combined = buffer1 + buffer2;

// Peek at first value (offset 0)
auto header1 = glz::beve_peek_header_at(combined, 0);
// header1->type == glz::tag::number

// Peek at second value
auto header2 = glz::beve_peek_header_at(combined, buffer1.size());
// header2->type == glz::tag::string
// header2->count == 5 ("hello" has 5 characters)
```

Raw pointer overload with offset:

```c++
auto result = glz::beve_peek_header_at(data, size, offset);
```

**Error Handling**

Returns `glz::expected<beve_header, error_ctx>`. Possible errors:

- `error_code::unexpected_end` - Buffer too small to contain header
- `error_code::syntax_error` - Invalid tag byte

## Untagged Binary

By default Glaze will handle structs as tagged objects, meaning that keys will be written/read. However, structs can be written/read without tags by using the option `structs_as_arrays` or the functions `glz::write_beve_untagged` and `glz::read_beve_untagged`.

## BEVE to JSON Conversion

`glaze/binary/beve_to_json.hpp` provides `glz::beve_to_json`, which directly converts a buffer of BEVE data to a buffer of JSON data.

### Function Pointers

Objects that expose function pointers (both member and non-member) through `glz::meta` are skipped by the BEVE writer by default. This mirrors JSON/TOML behaviour and avoids emitting unusable callable placeholders in binary payloads.

If you want the key present, use `write_function_pointers = true`.

## Custom Map Keys

BEVE can serialize map-like containers whose key types expose a value through Glaze metadata. This allows “strong ID” wrappers to keep a user-defined type while the binary payload stores the underlying numeric representation.

```c++
struct ModuleID {
   uint64_t value{};
   auto operator<=>(const ModuleID&) const = default;
};

template <>
struct glz::meta<ModuleID> {
   static constexpr auto value = &ModuleID::value;
};

std::map<ModuleID, std::string> modules{{ModuleID{42}, "life"}, {ModuleID{9001}, "power"}};

std::string beve{};
glz::write_beve(modules, beve);
```

Glaze inspects the metadata, reuses the underlying `uint64_t`, and emits the numeric BEVE map header so the payload decodes as a regular number key. The same behaviour works for `std::unordered_map` and concatenated ranges such as `std::vector<std::pair<ModuleID, T>>`.

If you prefer to keep a custom conversion in your metadata, `glz::cast` works as well:

```c++
template <>
struct glz::meta<ModuleID> {
   static constexpr auto value = glz::cast<&ModuleID::value, uint64_t>;
};
```

## Partial Objects

It is sometimes desirable to write out only a portion of an object. This is permitted via an array of JSON pointers, which indicate which parts of the object should be written out.

```c++
static constexpr auto partial = glz::json_ptrs("/i",
                                               "/d",
                                               "/sub/x",
                                               "/sub/y");
std::vector<std::byte> out;
glz::write_beve<partial>(s, out);
```

## Delimited BEVE (Multiple Objects in One Buffer)

Similar to [NDJSON](https://github.com/ndjson/ndjson-spec) for JSON, BEVE supports storing multiple objects in a single buffer using a delimiter. The BEVE specification defines a **Data Delimiter** extension (type 6, subtype 0) specifically for this purpose.

This is useful for:

- Streaming multiple messages over a connection
- Appending records to a buffer without re-encoding existing data
- Log files with multiple serialized entries
- Message queues with batched records

### Quick Reference

**Writing Functions**

| Function | Description |
|----------|-------------|
| `write_beve_delimiter(buffer)` | Writes a single delimiter byte (0x06) |
| `write_beve_append(value, buffer)` | Appends a BEVE value to existing buffer. Returns `error_ctx` with `count` field for bytes written. |
| `write_beve_append_with_delimiter(value, buffer)` | Writes delimiter + value. Returns `error_ctx` with bytes written including delimiter. |
| `write_beve_delimited(container, buffer)` | Writes all container elements with delimiters between them. Returns `error_ctx`. |

**Reading Functions**

| Function | Description |
|----------|-------------|
| `read_beve_delimited(container, buffer)` | Reads all delimiter-separated values into a container |
| `read_beve_at(value, buffer, offset)` | Reads a single value at offset. Returns bytes consumed. Skips leading delimiter if present. |

### Writing Delimited BEVE

#### Append a Single Value

Use `write_beve_append` to add a value to an existing buffer without clearing it:

```c++
std::string buffer{};

// Write first object
auto result1 = glz::write_beve_append(my_struct{1, "first"}, buffer);
// result1.count contains bytes written

// Append delimiter and second object
auto result2 = glz::write_beve_append_with_delimiter(my_struct{2, "second"}, buffer);

// Append delimiter and third object
auto result3 = glz::write_beve_append_with_delimiter(my_struct{3, "third"}, buffer);
```

The `write_beve_append` function returns `glz::error_ctx` where `ec.count` contains the number of bytes written.

#### Write a Delimiter

You can manually write just the delimiter byte:

```c++
std::string buffer{};
glz::write_beve_append(obj1, buffer);
glz::write_beve_delimiter(buffer);  // Writes single 0x06 byte
glz::write_beve_append(obj2, buffer);
```

#### Write a Container with Delimiters

To write all elements of a container with delimiters between them:

```c++
std::vector<my_struct> objects = {
   {1, "first"},
   {2, "second"},
   {3, "third"}
};

std::string buffer{};
auto ec = glz::write_beve_delimited(objects, buffer);

// Or get the buffer directly:
auto result = glz::write_beve_delimited(objects);
if (result) {
   std::string buffer = std::move(*result);
}
```

### Reading Delimited BEVE

#### Read All Values into a Container

Use `read_beve_delimited` to read all delimiter-separated values:

```c++
std::string buffer = /* delimited BEVE data */;

std::vector<my_struct> objects{};
auto ec = glz::read_beve_delimited(objects, buffer);

// Or get the container directly:
auto result = glz::read_beve_delimited<std::vector<my_struct>>(buffer);
if (result) {
   for (const auto& obj : *result) {
      // process each object
   }
}
```

#### Read at a Specific Offset

For manual control over reading, use `read_beve_at` which returns the number of bytes consumed:

```c++
std::string buffer = /* delimited BEVE data */;
size_t offset = 0;

while (offset < buffer.size()) {
   my_struct obj{};
   auto result = glz::read_beve_at(obj, buffer, offset);
   if (!result) {
      break;  // Error or end of data
   }
   offset += *result;  // Advance by bytes consumed

   // Process obj...
}
```

> [!NOTE]
> `read_beve_at` automatically skips a delimiter byte if one is present at the given offset. The returned byte count **includes** the skipped delimiter, so `offset += *result` correctly advances to the next value.

#### Bytes Consumed Tracking

The standard `read_beve` function tracks bytes consumed via `ec.count`:

```c++
my_struct obj{};
auto ec = glz::read_beve(obj, buffer);
if (!ec) {
   size_t bytes_consumed = ec.count;  // Number of bytes read on success
}
```

On error, `count` indicates where the parse error occurred. This field is available for all read operations (`read_beve`, `read_json`, `read_cbor`, `read_msgpack`).

### Example: Streaming Workflow

```c++
struct Message {
   int id{};
   std::string content{};
};

// Producer: append messages to a buffer
std::string buffer{};
for (int i = 0; i < 100; ++i) {
   Message msg{i, "message " + std::to_string(i)};
   if (i == 0) {
      glz::write_beve_append(msg, buffer);
   } else {
      glz::write_beve_append_with_delimiter(msg, buffer);
   }
}

// Consumer: read all messages
std::vector<Message> messages{};
auto ec = glz::read_beve_delimited(messages, buffer);
// messages now contains all 100 Message objects
```

### Delimiter Format

The BEVE delimiter is a single byte: `0x06` (extensions type 6 with subtype 0). When converting delimited BEVE to JSON via `glz::beve_to_json`, each delimiter is converted to a newline character (`\n`), producing NDJSON-compatible output.

## Lazy BEVE Parsing

For scenarios where you need to extract a few fields from large BEVE documents without full deserialization, Glaze provides `glz::lazy_beve`. This offers on-demand parsing with zero upfront processing.

```cpp
std::vector<std::byte> buffer;
glz::write_beve(large_struct, buffer);

auto result = glz::lazy_beve(buffer);
if (result) {
    // Access fields lazily - only parses what you access
    auto name = (*result)["user"]["name"].get<std::string_view>();
    auto age = (*result)["user"]["age"].get<int64_t>();

    // Check container size without parsing elements
    size_t count = (*result)["items"].size();
}
```

See [Lazy BEVE](./lazy-beve.md) for full documentation.