File: avro.md

package info (click to toggle)
reflect-cpp 0.18.0%2Bds-3
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 12,524 kB
  • sloc: cpp: 44,484; python: 131; makefile: 30; sh: 3
file content (134 lines) | stat: -rw-r--r-- 4,074 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Avro 

For Avro support, you must also include the header `<rfl/avro.hpp>` and link to the [avro-c](https://avro.apache.org/docs/1.11.1/api/c/) library.
Furthermore, when compiling reflect-cpp, you need to pass `-DREFLECTCPP_AVRO=ON` to cmake.

Avro is a schemaful binary format. This sets it apart from most other formats supported by reflect-cpp, which are schemaless.

## Reading and writing

Suppose you have a struct like this:

```cpp
struct Person {
    std::string first_name;
    std::string last_name;
    rfl::Timestamp<"%Y-%m-%d"> birthday;
    std::vector<Person> children;
};
```

A `Person` struct can be serialized to a bytes vector like this:

```cpp
const auto person = Person{...};
const std::vector<char> bytes = rfl::avro::write(person);
```

You can parse bytes like this:

```cpp
const rfl::Result<Person> result = rfl::avro::read<Person>(bytes);
```

## The schema

However, Avro is a schemaful format, so before you serialize or
deserialize, you have to declare a schema. In the two function calls
above, this is abstracted away.

But if you are repeatedly serializing or deserializing the same struct,
it is more efficient to generate the schema explicitly:

```cpp
const auto schema = rfl::avro::to_schema<Person>();

const auto person = Person{...};
const std::vector<char> bytes = rfl::avro::write(person, schema);

const rfl::Result<Person> result = rfl::avro::read<Person>(bytes, schema);
```

Avro schemas are created using a JSON-based schema language. You can
retrieve the JSON representation like this:

```cpp
// Both calls are equivalent.
schema.json_str();
schema.str();
```

In this case, the resulting JSON schema representation looks like this:

```json
{"type":"record","name":"Person","fields":[{"name":"first_name","type":{"type":"string"}},{"name":"last_name","type":{"type":"string"}},{"name":"birthday","type":{"type":"string"}},{"name":"children","type":{"type":"array","items":{"type":"Person"},"default":[]}}]}
```

## Loading and saving

You can also load and save to disc using a very similar syntax:

```cpp
const rfl::Result<Person> result = rfl::avro::load<Person>("/path/to/file.avro");

const auto person = Person{...};
rfl::avro::save("/path/to/file.avro", person);
```

## Reading from and writing into streams

You can also read from and write into any `std::istream` and `std::ostream` respectively.

```cpp
const rfl::Result<Person> result = rfl::avro::read<Person>(my_istream);

const auto person = Person{...};
rfl::avro::write(person, my_ostream);
```

Note that `std::cout` is also an ostream, so this works as well:

```cpp
rfl::avro::write(person, std::cout) << std::endl;
```

(Since Avro is a binary format, the readability of this will be limited, but it might be useful for debugging).

## Custom constructors

One of the great things about C++ is that it gives you control over
when and how you code is compiled.

For large and complex systems of structs, it is often a good idea to split up
your code into smaller compilation units. You can do so using custom constructors.

For the Avro format, these must be a static function on your struct or class called
`from_avro` that take a `rfl::avro::Reader::InputVarType` as input and return
the class or the class wrapped in `rfl::Result`.

In your header file you can write something like this:

```cpp
struct Person {
    rfl::Rename<"firstName", std::string> first_name;
    rfl::Rename<"lastName", std::string> last_name;
    rfl::Timestamp<"%Y-%m-%d"> birthday;

    using InputVarType = typename rfl::avro::Reader::InputVarType;
    static rfl::Result<Person> from_avro(const InputVarType& _obj);
};
```

And in your source file, you implement `from_avro` as follows:

```cpp
rfl::Result<Person> Person::from_avro(const InputVarType& _obj) {
    const auto from_nt = [](auto&& _nt) {
        return rfl::from_named_tuple<Person>(std::move(_nt));
    };
    return rfl::avro::read<rfl::named_tuple_t<Person>>(_obj)
        .transform(from_nt);
}
```

This will force the compiler to only compile the Avro parsing when the source file is compiled.