File: lazy-json.md

package info (click to toggle)
glaze 7.0.2-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 9,036 kB
  • sloc: cpp: 142,035; sh: 109; ansic: 26; makefile: 12
file content (650 lines) | stat: -rw-r--r-- 20,581 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
# Lazy JSON Parsing

Glaze provides a truly lazy JSON parser (`glz::lazy_json`) that offers **on-demand** parsing without any upfront processing. This approach is ideal when you need to extract a few separate fields from large JSON documents.

## When to Use Lazy JSON

| Use Case | Recommended Approach |
|----------|---------------------|
| Extract 1-3 fields from large JSON | `glz::lazy_json` |
| Access fields near the beginning | `glz::lazy_json` or partial_read |
| Full deserialization into structs | `glz::read_json` |
| Iterate all elements (single pass) | `glz::lazy_json` |
| Multiple random accesses to array | `glz::lazy_json` with `.index()` |
| Unknown/dynamic JSON structure with persistent memory | `glz::generic` |

## Basic Usage

```cpp
#include "glaze/json.hpp"

std::string json = R"({"name":"John","age":30,"active":true,"balance":12345.67})";
auto result = glz::lazy_json(json);
if (result) {
    auto& doc = *result;

    // Access fields lazily - only parses what you access
    auto name = doc["name"].get<std::string_view>();
    auto age = doc["age"].get<int64_t>();
    auto active = doc["active"].get<bool>();
    auto balance = doc["balance"].get<double>();

    if (name && age && active && balance) {
        std::cout << *name << " is " << *age << " years old\n";
    }
}
```

## Why Lazy?

`glz::lazy_json` does **zero upfront work**:

- `lazy_json()` just stores a pointer and validates the first byte - O(1)
- Field access scans only the bytes needed to find that field

## UTF-8 Validation

To maximize performance, `lazy_json` does not validate UTF-8 encoding during initial parsing or field scanning. Validation only occurs when you extract string values:

- **`get<std::string>()`**: Processes escape sequences (`\n`, `\uXXXX`, etc.) and validates UTF-8 encoding
- **`get<std::string_view>()`**: Returns a raw view into the JSON buffer with no validation or processing

If you need validated UTF-8 strings and unescaping, use `get<std::string>()`.  Otherwise, `get<std::string_view>()` is faster.

> glz::lazy_json will ensure that any instantiated C++ values are valid JSON (except for std::string_view), but it doesn't validate the entire document, because this is often not a requirement for lazy parsing. If you want high performance full validation it is best to use C++ structs. Or, use glz::validate_json for pure validation passes.

## Nested Object Access

Access deeply nested fields efficiently:

```cpp
std::string json = R"({
   "user": {
      "profile": {
         "name": "Alice",
         "email": "alice@example.com"
      },
      "settings": {
         "theme": "dark"
      }
   }
})";

auto result = glz::lazy_json(json);
if (result) {
    auto& doc = *result;

    // Chain field access - each level is lazy
    auto email = doc["user"]["profile"]["email"].get<std::string_view>();
    if (email) {
        std::cout << "Email: " << *email << "\n";
    }
}
```

## Array Access

Access array elements by index:

```cpp
std::string json = R"({
   "items": [
      {"id": 1, "value": 100},
      {"id": 2, "value": 200},
      {"id": 3, "value": 300}
   ]
})";

auto result = glz::lazy_json(json);
if (result) {
    auto& doc = *result;

    // Access specific array element
    auto first_value = doc["items"][0]["value"].get<int64_t>();
    auto third_id = doc["items"][2]["id"].get<int64_t>();

    if (first_value && third_id) {
        std::cout << "First value: " << *first_value << "\n";
        std::cout << "Third id: " << *third_id << "\n";
    }
}
```

## Iteration

Iterate over arrays and objects efficiently:

```cpp
std::string json = R"({"items": [{"id": 1}, {"id": 2}, {"id": 3}]})";
auto result = glz::lazy_json(json);

if (result) {
    auto& doc = *result;

    // Iterate array elements
    int64_t sum = 0;
    for (auto item : doc["items"]) {
        auto id = item["id"].get<int64_t>();
        if (id) sum += *id;
    }
    std::cout << "Sum of ids: " << sum << "\n";
}
```

For objects, you can access both keys and values:

```cpp
std::string json = R"({"a": 1, "b": 2, "c": 3})";
auto result = glz::lazy_json(json);

if (result) {
    for (auto item : result->root()) {
        std::cout << item.key() << ": ";
        auto val = item.get<int64_t>();
        if (val) std::cout << *val;
        std::cout << "\n";
    }
}
```

## Indexed Views for O(1) Access

For scenarios requiring multiple random accesses or repeated iteration, you can build an index for O(1) element access:

```cpp
std::string json = R"({"users": [{"id": 0}, {"id": 1}, ..., {"id": 999}]})";
auto result = glz::lazy_json(json);

if (result) {
    // Build index once - O(n) scan
    auto users = (*result)["users"].index();

    // Now enjoy O(1) operations:
    size_t count = users.size();        // O(1) - no scanning
    auto user500 = users[500];          // O(1) - direct access
    auto user999 = users[999];          // O(1) - no matter the position

    // O(1) iteration advancement
    for (auto& user : users) {
        auto id = user["id"].get<int64_t>();  // Nested access still lazy
    }
}
```

### When to Use `.index()`

| Scenario | Without Index | With Index | Recommendation |
|----------|---------------|------------|----------------|
| Single random access | O(k) | O(n) build + O(1) | Don't index |
| 5+ random accesses | O(5k) | O(n) build + O(5) | **Use index** |
| Multiple iterations | O(n) each | O(n) build + O(n) each | **Use index** |
| Need size before iterating | O(n) | O(1) after build | **Use index** |
| Single sequential iteration | O(n) | O(n) build + O(n) | Don't index |

### Indexed View API

```cpp
auto indexed = doc["items"].index();

// O(1) size query
size_t count = indexed.size();

// O(1) empty check
if (!indexed.empty()) { /* ... */ }

// O(1) random access by position
auto third = indexed[2];

// For indexed objects: O(n) key lookup (linear search)
auto value = indexed["key"];

// Check if object contains key
if (indexed.contains("key")) { /* ... */ }

// Full random-access iterator support
auto it = indexed.begin();
it += 50;                    // Jump forward 50 elements
auto elem = it[10];          // Access 10 elements ahead
auto dist = indexed.end() - it;  // Distance to end
```

### Nested Access Remains Lazy

Elements returned from an indexed view are still `lazy_json_view` objects. Nested field access remains lazy:

```cpp
auto users = doc["users"].index();

// O(1) to get to user 500
auto user = users[500];

// Nested access is still lazy - scans only "email" field
auto email = user["profile"]["email"].get<std::string_view>();
```

### Performance Example

For 10 random accesses to a 1000-element array:

| Approach | Throughput | Notes |
|----------|------------|-------|
| `lazy_json` (no index) | 232 MB/s | Each access scans from start |
| `lazy_json` (indexed) | 993 MB/s | Index built once, O(1) accesses |

The indexed approach is **327% faster** than non-indexed for this use case.

## Optimizing Performance: Sequential Access

The key to getting maximum performance from `lazy_json` is **accessing keys in document order**. The parser maintains a position pointer and continues scanning from where it left off.

### How Progressive Scanning Works

```cpp
std::string json = R"({"a":1,"b":2,"c":3,"d":4,"e":5})";
auto result = glz::lazy_json(json);

if (result) {
    auto& doc = *result;

    // FAST: Sequential access - O(n) total
    doc["a"].get<int64_t>();  // Scans from start, finds "a"
    doc["b"].get<int64_t>();  // Continues from after "a", finds "b"
    doc["c"].get<int64_t>();  // Continues from after "b", finds "c"
    doc["d"].get<int64_t>();  // Continues from after "c", finds "d"
    doc["e"].get<int64_t>();  // Continues from after "d", finds "e"
    // Total: scanned the object once
}
```

### Performance Comparison

| Access Pattern | Complexity | Example |
|----------------|------------|---------|
| Sequential (in document order) | O(n) total | `a`, `b`, `c`, `d`, `e` |
| Reverse order | O(n) per access | `e`, `d`, `c`, `b`, `a` |
| Random order | O(n) per access | `c`, `a`, `e`, `b`, `d` |

### Why Order Matters

Consider a JSON object with 1000 keys. Accessing 5 keys:

**Sequential access** (keys appear in order):
```
doc["key_001"]  → scan 1 key
doc["key_002"]  → scan 1 more key (continues from key_001)
doc["key_003"]  → scan 1 more key
doc["key_004"]  → scan 1 more key
doc["key_005"]  → scan 1 more key
Total: ~5 keys scanned
```

**Reverse order access**:
```
doc["key_005"]  → scan 5 keys from start
doc["key_004"]  → wrap around, scan 1004 keys
doc["key_003"]  → wrap around, scan 1003 keys
doc["key_002"]  → wrap around, scan 1002 keys
doc["key_001"]  → wrap around, scan 1001 keys
Total: ~5014 keys scanned (1000x slower!)
```

### Practical Guidelines

1. **Know your JSON structure**: If you know the key order, access them in that order:
   ```cpp
   // JSON: {"id":1,"name":"...","email":"...","created_at":"..."}
   // Access in document order:
   auto id = doc["id"].get<int64_t>();
   auto name = doc["name"].get<std::string_view>();
   auto email = doc["email"].get<std::string_view>();
   auto created = doc["created_at"].get<std::string_view>();
   ```

2. **Use iterators for unknown order**: If you need all keys but don't know the order:
   ```cpp
   for (auto item : doc.root()) {
       auto key = item.key();
       // Process each key-value pair in document order
   }
   ```

3. **Single field access is always fast**: Accessing just one field is O(k) where k is the position of that field - no penalty.

4. **Nested access is independent**: Each nested object has its own position tracking:
   ```cpp
   // Each level scans its own object independently
   doc["user"]["profile"]["email"]  // Fast - 3 separate scans
   ```

### Wrap-Around Behavior

If you access a key that appears earlier in the document, the parser wraps around:

```cpp
doc["c"].get<int64_t>();  // Position now after "c"
doc["a"].get<int64_t>();  // Wraps: scans from "c" to end, then start to "a"
```

This still works correctly but is slower than sequential access.

### Reset Parse Position

If you need to re-scan from the beginning:

```cpp
doc.reset_parse_pos();  // Next access starts from beginning
```

## Type Checking

Check the type of a value before extracting:

```cpp
auto& doc = *result;
auto value = doc["field"];

if (value.is_object()) { /* ... */ }
if (value.is_array()) { /* ... */ }
if (value.is_string()) { /* ... */ }
if (value.is_number()) { /* ... */ }
if (value.is_boolean()) { /* ... */ }
if (value.is_null()) { /* ... */ }

// Explicit bool conversion - true if not null/error
if (value) {
    // Value exists and is not null
}
```

## Supported Types for get<T>()

| Type | Description |
|------|-------------|
| `bool` | Boolean values |
| `int32_t`, `int64_t` | Signed integers |
| `uint32_t`, `uint64_t` | Unsigned integers |
| `float`, `double` | Floating-point numbers |
| `std::string` | String with escape processing |
| `std::string_view` | Raw string view (no escape processing) |
| `std::nullptr_t` | Null values |

## Error Handling

All operations return values that can be checked for errors:

```cpp
auto result = glz::lazy_json(json);
if (!result) {
    // Parse error
    auto error = result.error();
    std::cout << "Error: " << glz::format_error(error, json) << "\n";
    return;
}

auto& doc = *result;
auto value = doc["missing_key"];

if (value.has_error()) {
    // Key not found or type error
    auto ec = value.error();
    // Handle error...
}

auto num = doc["field"].get<int64_t>();
if (!num) {
    // Extraction failed (wrong type, parse error, etc.)
    auto error = num.error();
    // Handle error...
}
```

## Container Methods

```cpp
auto& doc = *result;
auto arr = doc["items"];

// Check if container is empty
if (arr.empty()) { /* ... */ }

// Get number of elements (requires scanning)
size_t count = arr.size();

// Check if object contains a key
if (doc.root().contains("name")) { /* ... */ }
```

## Deserializing into Structs

Use `glz::read_json()` to deserialize a lazy view directly into a typed struct:

```cpp
struct User {
   std::string name;
   int age;
   bool active;
};

std::string json = R"({
   "user": {"name": "Alice", "age": 30, "active": true},
   "metadata": {"version": 1, "large_data": "..."}
})";

auto result = glz::lazy_json(json);
if (result) {
    // Navigate lazily to "user", then deserialize into struct
    User user{};
    auto ec = glz::read_json(user, (*result)["user"]);

    // user.name == "Alice", user.age == 30, user.active == true
}
```

This works because Glaze provides a `read_json` overload that accepts `lazy_json_view` directly. The lazy navigation skips "metadata" entirely, and deserialization is single-pass (no double scanning).

### Why Use This Pattern?

This hybrid approach gives you the best of both worlds:

1. **Lazy navigation**: Skip large sections of JSON you don't need
2. **Fast deserialization**: Use Glaze's optimized struct parsing for the parts you do need
3. **Type safety**: Get compile-time checked structs instead of runtime field access

### Deserializing Array Elements

Combine with indexed views for efficient random access deserialization:

```cpp
struct Person {
   std::string name;
   Address address;
};

std::string json = R"({"people": [{"name": "Alice", ...}, {"name": "Bob", ...}, ...]})";

auto result = glz::lazy_json(json);
if (result) {
    // Build index for O(1) random access
    auto people = (*result)["people"].index();

    // Deserialize only the 500th person
    Person person{};
    glz::read_json(person, people[500]);
}
```

### Alternative: `read_into()` Member Function

If you prefer member function syntax, use `read_into()`:

```cpp
User user{};
(*result)["user"].read_into(user);  // Equivalent to glz::read_json(user, view)
```

### Performance Note

Both `glz::read_json(value, view)` and `view.read_into(value)` are **~49% faster** than the older pattern of `glz::read_json(value, view.raw_json())`. The `raw_json()` approach requires scanning the value twice: once to find its extent, and once to parse it.

### The `raw_json()` Method

Returns a `std::string_view` of the raw JSON bytes for any lazy view. Use this when you need the JSON text itself (for logging, forwarding, or storage):

```cpp
auto result = glz::lazy_json(R"({"user": {"name": "Alice"}, "count": 5})");

// Get raw JSON for different value types
(*result).raw_json();                  // {"user": {"name": "Alice"}, "count": 5}
(*result)["user"].raw_json();          // {"name": "Alice"}
(*result)["user"]["name"].raw_json();  // "Alice"
(*result)["count"].raw_json();         // 5
```

> **Note**: For deserialization, use `glz::read_json(value, view)` instead of `glz::read_json(value, view.raw_json())` for better performance.

## Writing Lazy Views

Lazy views can be written back to JSON:

```cpp
auto& doc = *result;
auto user = doc["user"];

std::string output;
auto ec = glz::write_json(user, output);
// output contains the JSON for just the "user" field
```

## Options

Use compile-time options for non-null-terminated buffers:

```cpp
// For null-terminated strings (default, fastest)
auto result = glz::lazy_json(json);

// For non-null-terminated buffers
constexpr auto opts = glz::opts{.null_terminated = false};
auto result = glz::lazy_json<opts>(buffer);
```

## Memory Layout

The lazy parser is designed for minimal memory overhead. A `lazy_json_view` is 48 bytes on 64-bit systems and 24 bytes on 32-bit systems.

## Best Practices

1. **Access keys in document order**: This is the most important optimization. Sequential access gives O(n) total complexity:
   ```cpp
   // If JSON is: {"a":1,"b":2,"c":3}
   doc["a"];  // Good: starts scanning
   doc["b"];  // Good: continues from "a"
   doc["c"];  // Good: continues from "b"
   // Total: one scan of the object
   ```

2. **Store the document reference**: To benefit from progressive scanning, use the same document object:
   ```cpp
   auto& doc = *result;  // Store reference
   doc["a"];  // Position tracked in doc
   doc["b"];  // Continues from where "a" left off
   ```

3. **Use iterators when order is unknown**: If you don't know the key order or need all keys:
   ```cpp
   for (auto item : doc.root()) {
       // Always efficient - iterates in document order
   }
   ```

4. **Use `.index()` for multiple random accesses**: If you need to access many elements by index or iterate multiple times:
   ```cpp
   auto items = doc["items"].index();  // Build index once
   auto first = items[0];              // O(1) access
   auto last = items[items.size()-1];  // O(1) access
   ```

5. **Keep JSON buffer alive**: The lazy parser stores pointers into the original buffer - it must remain valid for the lifetime of the document.

6. **Prefer `std::string_view` for strings**: When you don't need escape processing, `get<std::string_view>()` is faster than `get<std::string>()`.

7. **Access few fields for best speedup**: Lazy JSON shines when you access 1-5 fields from a large document. For full deserialization, use `glz::read_json`.

8. **Use `glz::read_json(value, view)` for struct deserialization**: Glaze provides an overload of `read_json` that accepts `lazy_json_view` directly. Use `glz::read_json(obj, view)` instead of `glz::read_json(obj, view.raw_json())` - it's ~49% faster because it avoids scanning the value twice.

## Partial Read vs Lazy JSON

Glaze offers two approaches for reading a subset of JSON data. Choose based on whether you know the fields at compile time:

### Use `partial_read` When:

- **Fields are known at compile time**: You can define a struct with just the fields you need
- **Type safety matters**: You want compile-time type checking
- **Fields appear early in the document**: Partial read short-circuits after finding all struct fields
- **Hash-based lookup**: Uses Glaze's optimized key matching

```cpp
// Define a struct with only the fields you need
struct Header {
   std::string id{};
   std::string type{};
};

std::string json = R"({"id":"abc123","type":"request","payload":{...large data...}})";
Header h{};
auto ec = glz::read<glz::opts{.partial_read = true}>(h, json);
// Parsing stops after "id" and "type" are found - "payload" is never parsed
```

### Use `lazy_json` When:

- **Fields determined at runtime**: You don't know which fields to access until execution
- **Conditional access**: You need to check one field before deciding to read others
- **Path-based access**: You want to access nested fields by path (e.g., `doc["user"]["email"]`)
- **Iteration**: You need to iterate over array/object elements

```cpp
auto result = glz::lazy_json(json);
if (result) {
    auto& doc = *result;

    // Decide at runtime which fields to access
    auto type = doc["type"].get<std::string_view>();
    if (type && *type == "user_event") {
        auto user_id = doc["user"]["id"].get<int64_t>();  // Only accessed conditionally
    }
}
```

### Performance Comparison

| Scenario | `partial_read` | `lazy_json` | Winner |
|----------|---------------|-------------|--------|
| Known fields, near start | Very fast | Fast | `partial_read` |
| Known fields, scattered | Moderate | Fast (sequential) | Depends on order |
| Conditional field access | N/A | Fast | `lazy_json` |
| Dynamic field names | N/A | Supported | `lazy_json` |
| Type-safe structs | Yes | No | `partial_read` |

See [Partial Read](./partial-read.md) for detailed documentation.

## Comparison with All Approaches

| Feature | `glz::read_json` | `partial_read` | `glz::lazy_json` | `lazy_json` + `.index()` | `glz::generic` |
|---------|------------------|----------------|------------------|--------------------------|----------------|
| Parse time | O(n) | O(n) worst | O(1) | O(1) + O(n) on index | O(n) |
| Field access | O(1) | Hash-based | O(k)* | O(1) after index | O(1) |
| Random array access | O(1) | N/A | O(k)* | O(1) after index | O(1) |
| Memory usage | Struct size | Struct size | ~48 bytes | ~48 + 8n bytes | Dynamic |
| Type safety | Compile-time | Compile-time | Runtime | Runtime | Runtime |
| Short-circuit | No | Yes | Yes | Yes | No |
| Best for | Full deser. | Known subset | Few accesses | Many accesses | Unknown structure |

*k = bytes to skip to reach field

## See Also

- [Partial Read](./partial-read.md) - Compile-time partial reading with structs
- [Generic JSON](./generic-json.md) - Dynamic JSON with `glz::generic`
- [Reading](./reading.md) - Standard JSON reading with `glz::read_json`
- [JSON Pointer Syntax](./json-pointer-syntax.md) - Alternative path-based access