1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
|
# csv
For CSV support, include the header `<rfl/csv.hpp>` and link to the [Apache Arrow](https://arrow.apache.org/) library.
Furthermore, when compiling reflect-cpp, you need to pass `-DREFLECTCPP_CSV=ON` to cmake.
CSV is a tabular text format. Like other tabular formats in reflect-cpp, CSV is designed for collections of flat records and has limitations for nested or variant types.
## Reading and writing
Suppose you have a struct like this:
```cpp
struct Person {
std::string first_name;
std::string last_name = "Simpson";
rfl::Timestamp<"%Y-%m-%d"> birthday;
unsigned int age;
rfl::Email email;
};
```
Important: CSV is a tabular format that requires collections of records. You cannot serialize individual structs - you must use containers like `std::vector<Person>`, `std::deque<Person>`, etc.
Write a collection to a string (CSV bytes) like this:
```cpp
const auto people = std::vector<Person>{
Person{.first_name = "Bart", .birthday = "1987-04-19", .age = 10, .email = "bart@simpson.com"},
Person{.first_name = "Lisa", .birthday = "1987-04-19", .age = 8, .email = "lisa@simpson.com"}
};
const std::string csv_text = rfl::csv::write(people);
```
Parse from a string or bytes view:
```cpp
const rfl::Result<std::vector<Person>> result = rfl::csv::read<std::vector<Person>>(csv_text);
```
## Settings
CSV behavior can be configured using `rfl::csv::Settings`:
```cpp
const auto settings = rfl::csv::Settings{}
.with_delimiter(';')
.with_quoting(true)
.with_quote_char('"')
.with_null_string("n/a")
.with_double_quote(true)
.with_escaping(false)
.with_escape_char('\\')
.with_newlines_in_values(false)
.with_ignore_empty_lines(true)
.with_batch_size(1024);
const std::string csv_text = rfl::csv::write(people, settings);
```
Key options:
- `batch_size` - Maximum number of rows processed per batch (performance tuning)
- `delimiter` - Field delimiter character
- `quoting` - Whether to use quoting when writing
- `quote_char` - Quote character used when reading
- `null_string` - String representation for null values
- `double_quote` - Whether a quote inside a value is double-quoted (reading)
- `escaping` - Whether escaping is used (reading)
- `escape_char` - Escape character (reading)
- `newlines_in_values` - Whether CR/LF are allowed inside values (reading)
- `ignore_empty_lines` - Whether empty lines are ignored (reading)
## Loading and saving
You can load from and save to disk:
```cpp
const rfl::Result<std::vector<Person>> result = rfl::csv::load<std::vector<Person>>("/path/to/file.csv");
const auto people = std::vector<Person>{...};
rfl::csv::save("/path/to/file.csv", people);
```
With custom settings:
```cpp
const auto settings = rfl::csv::Settings{}.with_delimiter(';');
rfl::csv::save("/path/to/file.csv", people, settings);
```
## Reading from and writing into streams
You can read from any `std::istream` and write to any `std::ostream`:
```cpp
const rfl::Result<std::vector<Person>> result = rfl::csv::read<std::vector<Person>>(my_istream);
const auto people = std::vector<Person>{...};
rfl::csv::write(people, my_ostream);
```
With custom settings:
```cpp
const auto settings = rfl::csv::Settings{}.with_delimiter(';');
rfl::csv::write(people, my_ostream, settings);
```
## Field name transformations
Like other formats, CSV supports field name transformations via processors, e.g. `SnakeCaseToCamelCase`:
```cpp
const auto people = std::vector<Person>{...};
const auto result = rfl::csv::read<std::vector<Person>, rfl::SnakeCaseToCamelCase>(csv_text);
```
## Enums and validation
CSV supports enums and validated types. Enums are written/read as strings:
```cpp
enum class FirstName { Bart, Lisa, Maggie, Homer };
struct Person {
rfl::Rename<"firstName", FirstName> first_name;
rfl::Rename<"lastName", std::string> last_name;
rfl::Timestamp<"%Y-%m-%d"> birthday;
rfl::Validator<unsigned int, rfl::Minimum<0>, rfl::Maximum<130>> age;
rfl::Email email;
};
```
## Limitations of tabular formats
CSV, like other tabular formats, has limitations compared to hierarchical formats such as JSON or XML:
### Collections requirement
You must serialize collections, not individual objects:
```cpp
std::vector<Person> people = {...}; // ✅ Correct
Person person = {...}; // ❌ Wrong - must be in a container
```
### No nested objects
Each field must be a primitive type, enum, or a simple validated type. Nested objects are not automatically flattened:
```cpp
// This would NOT work as expected - nested objects are not automatically flattened
struct Address {
std::string street;
std::string city;
};
struct Person {
std::string first_name;
std::string last_name;
Address address; // ❌ Will cause compilation errors for CSV
};
```
### Using rfl::Flatten for nested objects
If you need to include nested objects, use `rfl::Flatten` to explicitly flatten them:
```cpp
struct Address {
std::string street;
std::string city;
};
struct Person {
std::string first_name;
std::string last_name;
rfl::Flatten<Address> address; // ✅ This will flatten the Address fields
};
// The resulting CSV will have columns: first_name, last_name, street, city
```
### No variant types
Variant types like `std::variant`, `rfl::Variant`, or `rfl::TaggedUnion` cannot be serialized to CSV as separate columns:
```cpp
// ❌ This will NOT work
struct Person {
std::string first_name;
std::variant<std::string, int> status; // Variant - not supported
rfl::Variant<std::string, int> type; // rfl::Variant - not supported
rfl::TaggedUnion<"type", std::string, int> category; // TaggedUnion - not supported
};
```
### No arrays (except bytestrings)
CSV output here does not support arrays (lists) of values in a single column. The only array-like field supported is binary data represented as bytestrings:
```cpp
// ❌ This will NOT work
struct Person {
std::string first_name;
std::vector<std::string> hobbies; // Array of strings - not supported
std::vector<int> scores; // Array of integers - not supported
std::vector<Address> addresses; // Array of objects - not supported
};
// ✅ This works
struct Blob {
std::vector<char> binary_data; // Binary data supported as bytestring
};
```
### Use cases
CSV is ideal for:
- Data exchange and interoperability
- Simple, flat data structures with consistent types
- Human-readable datasets
CSV is less suitable for:
- Complex nested data structures
- Data with arrays or variant types
- Strict schemas with evolving types
- Very large datasets where binary columnar formats are preferred
|