1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
|
# Using libzsv
## Compilable / runnable examples
This directory contains the following examples:
| file | description |
| -------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [pull.c](pull.c) | Same as simple.c, but uses pull parsing via `zsv_pull_next_row()` |
| [simple.c](simple.c) | parse a CSV file and for each row, output the row number, the total number of cells and the number of blank cells |
| [print_my_column.c](print_my_column.c) | parse a CSV file, look for a specified column of data, and for each row of data, output only that column |
| [parse_by_chunk.c](parse_by_chunk.c) | read a CSV file in chunks, parse each chunk, and output number of rows. This example uses `zsv_parse_bytes()` (whereas the other two examples use `zsv_parse_more()`) |
## Building
To build the examples in this directory, cd to here (`cd examples/lib`) and then
run `make build`
For further make options, run `make`
## API overview
### Example
A typical usage consists of:
- defining row and/or cell handler functions
- connecting an input stream, and
- running the parser
For example, to simply count lines:
```c
void count_row(void *ctx) {
unsigned *count = ctx;
(*count)++;
}
zsv_parser p = zsv_new(NULL);
if (!p)
fprintf(stderr, "Out of memory!\n");
else {
size_t count = 0;
zsv_set_input(stdin);
zsv_set_row_handler(p, count_row);
zsv_set_context(p, &count);
zsv_finish(p);
zsv_delete(p);
printf("Parsed %zu rows\n", count);
}
```
### Getting data into the parser
Data can be passed into libzsv through any of the following:
1. passing a FILE pointer
2. passing a custom read function (such as `fread` and context pointer (such as
`FILE *`))
3. passing a byte array
In the first two approaches, which are marginally faster due to less memory
copying, data is read in chunks by the library. In the third approach, each
chunk is passed manually by your code to the library.
### Getting parsed data from the parser to your code
You can either pull data from the parser, or push it to your code.
#### Pull parsing
With pull parsing, your code "pulls" each row using `zsv_next_row()`, which
returns `zsv_status_row` until no more rows are left to parse
```c
zsv_parser parser = zsv_new(...);
while (zsv_next_row(parser) == zsv_status_row) { // for each row
// do something
size_t cell_count = zsv_cell_count(parser);
for(size_t i = 0; i < cell_count; i++) {
struct zsv_cell c = zsv_get_cell(parser, i);
fprintf(stderr, "Cell: %.*s\n", c.len, c.str);
// ...
}
}
```
An full example is at [pull.c](pull.c).
#### Cell iteration
For either pull or push parsing, the current row and each of its cells can be
fetched using `zsv_cell_count()` and `zsv_get_cell()`:
```c
size_t cell_count = zsv_cell_count(parser);
for (size_t i = 0; i < cell_count; i++) { // for each cell in this row
struct zsv_cell c = zsv_get_cell(parser, i);
// now c.len contains the cell length in bytes
// if c.len > 0, c.str contains the cell contents
printf("Got cell: %.*s\n", c.len, c.len ? c.str : "");
}
```
#### Push parsing
Data is "pushed" to handler callback, for each cell and/or row that is parsed.
Each series of "pushes" is invoked via `zsv_parse_more()`, which reads the next
chunk of bytes from your source and parses all of its contents.
Push parsing in conjunction with a file or custom reader performs the fastest of
all of the approaches, but the difference is too small to be measurable for any
but the most trivial of tasks.
##### Row handling
With push parsing, you specify a cell and/or row handler. Usually, only a row
handler will suffice. If your row handler code needs access to row contents
(which is usually the case), then you just need to ensure its context argument
points to a structure that contains your zsv parser, so that the row contents
can be retrieved.
For example, we could process each cell as follows:
```c
static void my_row_handler(void *ctx) {
struct my_data *data = ctx;
// print every other cell
size_t cell_count = zsv_cell_count(data->parser);
for (size_t i = 0, j = zsv_cell_count(data->parser); i < j; i++) {
// use zsv_get_cell() and do something
}
}
...
struct my_data d = {0};
d.parser = zsv_new(NULL);
zsv_set_row_handler(d.parser, my_row_handler);
zsv_set_context(p, &d);
```
### Other options
Many other options are supported through the use of `struct zsv_opts`, only some
of which are described in this introduction. For further detail, please visit
[zsv/common.h](../../include/zsv/common.h).
#### Delimiter
The `delimiter` option can specify any single-character delimiter other than
newline, form feed or quote. The default value is a comma.
#### Blank leading rows
By default, libzsv skips any leading blank rows, before it makes any callbacks.
This behavior can be disabled by setting the `keep_empty_header_rows` option
flag.
#### Maximum number of rows
By default, libzsv assumes a maximum number of columns per table of 1024. This
can be changed by setting the `max_columns` option.
#### Header row span
If your header spans more than one row, you can instruct libzsv to merge header
cells with a single space before it makes its initial row handler call by
setting `header_span` to a number greater than 1. For example, if your input is
as follows:
```csv
Street,Phone,City,Zip
Address,Number,,Code
888 Madison,212-555-1212,New York,10001
```
and you set `header_span` to 2, then in your first row handler call, the cell
values will be `Street Address`, `Phone Number`, `City` and `Zip Code`, after
which the row handler will be called for each single row that is parsed.
|