1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
|
This file contains notes to developers of the GOCR API. It contains things that
you MUST do, things that you MUST NOT do, what to take care, etc.
Syntax
------
Function should follow this template:
[_]gocr_[type][action]
Internal functions should be prefixed with _.
type is what data type the function handles, such as image.
action is what the function does, such as load.
With exception of the first word, every other should be Capitalized.
Debugging information
---------------------
Use the _gocr_debug macro as frequently as possible. This helps debugging
severely; many things that would need a debugger and a couple of minutes to find
out can be immediately spotted setting VERBOSE to 3. First, each level:
0: outputs almost nothing. The only exceptions are VERY big errors (such as lack
of a library or a program.
1: outputs error messages.
2: outputs warning and error messages.
3: output anything.
These are the guidelines of where and when to put a _gocr_debug:
* in the beginning of each function, as level 3, with outputting "func(args)".
* every error should have a level 1 explanation.
Hash tables
-----------
Although the current code supports (by #defines) a chained implementation, it's
not interesting in GOCR, for the following reason: character attributes can be
encoded as UNICODEs, if we use the least significant byte as the hash key.
(see character attributes, below)
A chained implementation allows different data to share a key, which wouldn't
work. Since the hash tables used in GOCR are not expected to be even near to
filled, that's OK. (note: in future both kinds of tables may be implemented)
Linked lists
------------
There are some restriction of our implementation of linked lists. Take a look
at list.h to learn about them.
GOCR image structure
--------------------
Instead of declaring each pixel as a unsigned char, I decided to implement it
as a structure with bit fields. They probably are faster than accessing bits by
bitwise operators, and the syntax is MUCH cleaner. Care must be taken, however,
to avoid unnecessary waste of memory. The current structure uses only a byte
(at least in a x86, gcc 2.95.2), so take care if you decide to add "just
another bit", since you may be doubling the memory use. Also, take care of
padding manually, keeping the structure size (in bits) a multiple of 8. The
structure can be cast to a unsigned char (using pointers only: unsigned char *c
= &pixel;), so we can print it safely (but NOT portably). Last, but far from
least, the code assumes in some places (specially in print.c) that the
gocrPixel structure size is the same as unsigned char, that is, 1 byte. If the
structure grows (which should be avoided at any cost, since the memory
requirements would become very high), the code must be reviewed. Also note that
the data order is [y][x], which is common in computers.
Character attributes
--------------------
Can be saved using the printf string formatter+new high value data code.
Partially implemented.
|