1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
|
load_cache is a minilib for caching parsed data of files in-memory.
The abstraction is:
+-context-------------------------------------------------------------------+
| |
| +---file #1-----------+ +---file #2-----------+ +---file #3-----------+ |
| | | | | | | |
| | metadata | | metadata | | metadata | |
| | | | | | | |
| | +---low_payload---+ | | +---low_payload---+ | | +---low_payload---+ | |
| | | | | | | | | | | | | |
| | +-----------------+ | | +-----------------+ | | +-----------------+ | |
| | | | | | | |
| | +-high_payload #1-+ | | +-high_payload #1-+ | | +-high_payload #1-+ | |
| | | | | | | | | | | | | |
| | +-----------------+ | | +-----------------+ | | +-----------------+ | |
| | | | | | | |
| | +-high_payload #2-+ | | +-high_payload #2-+ | | +-high_payload #2-+ | |
| | | | | | | | | | | | | |
| | +-----------------+ | | +-----------------+ | | +-----------------+ | |
| | | | | | | |
| | +-high_payload #3-+ | | +-high_payload #3-+ | | +-high_payload #3-+ | |
| | | | | | | | | | | | | |
| | +-----------------+ | | +-----------------+ | | +-----------------+ | |
| +---------------------+ +---------------------+ +---------------------+ |
+---------------------------------------------------------------------------+
Files are assumed to be encoded using 2 levels: a low level format, e.g.
xml, json, lihata (generic) and a high level (application specific) format.
A file always has a singe low level format, but may have multiple high level
parses (e.g. if different high level parsers read different parts of the file).
Each context may have multiple files; each file has some metadata (path to
the file, last modification time seen on disk, parsers to use). A file is
always loaded with a combination of file name, low level and high level parser.
When a file is loaded for the first time, the low level parser is called
and the result is saved in low_payload. Then the requested high level
parser is ran on low_payload to produce the high_payload which is
returned to the caller.
When the application requests the same filename/low/high combination and
the file did not change on disk, the parsed high level payload is served
from memory, without any parsing.
When the application requests a known filename/low with a new high, only
the high level parsing is done and result is cached/returned.
If the file changes on disk, upon the next load request the low level payload
and all high level payloads are purged from the cache and the normal
first-time-load procedure is followed.
The above terms translate to API as follows:
context: ldch_ctx_t
file: ldch_file_t
high_payload: ldch_data_t
Parsers are registered in contexts as ldch_low_parser_t and
ldch_high_parser_t. They are specified by their name and a few callback
functions the host application needs to implement.
When a new low level parse is done the ->parse_alloc() is called first.
This should call ldch_file_alloc() with a payload size specific to the format.
The payload is typically the parser context struct of the file format and
is stored at the end of the ldch_file_t struct. Then ->parse() is called.
A high level parser's ->parse is called then, with the low level data already
in memory for the ldch_file_t *. It should follow the same pattern in
allocating a ldch_data_t struct with extra payload_size room for storing
the parsed data.
Isn't keeping the low level parse in memory all the time expensive? On the
one hand yes, it looks like a waste if there is only one high level parser
for a file. On the other hand, file formats will normally have a lot of
strings stored. Since it is guaranteed that the low level parse data is
always available when the high level data is, the high level does not need
to make copies of those strings but can reference directly into the low level
payload.
|