File: README

package info (click to toggle)
dwarfutils 20210528-1
links: PTS, VCS
area: main
in suites: bookworm
size: 11,924 kB
sloc: ansic: 110,023; sh: 5,663; cpp: 4,809; makefile: 654; python: 639; awk: 11
file content (227 lines) | stat: -rw-r--r-- 7,971 bytes
David Anderson.  
Written January  2014
Updated December 2020

See RUNNING TESTS below.

In 2013 I realized I could simplify some code in libdwarf
(another project) by removing some complicated and messy
special memory allocation code.  Substituting by use of
malloc/free and a table of used values.  The used-values needed
to a preserve a special feature of the original allocations
related to a special handle: any allocations would be freed
by closing the special handle.

So I decided to implement various searches (mostly tree
searches) using the tsearch interface definitions.
I wrote these using a dwarf_ prefix to distinguish
these from any libc implementation and to make the
source fit in easily with libdwarf conventions.

I added a tsearch() implementation using hashing too.
The GNU-only hsearch() function  does not let the hash grow
(the tsearch() hash here grows automatically as needed)
and has no hdelete() or hwalk() capability, so hsearch() and
friends did not seem like a good interface set even though
the GNU-designed hsearch_r() interface set is nicely designed.

The attached C code is a small set of implementations of
C search code.  Based on algorithms in Knuth Volume 3
(2nd Ed) and Sedgewick 4th Edition.

All the code here is implemented by me. I have never read the
source to any other implementations of the tsearch algorighms
or interfaces so this is a clean-room implementation.

The hash version weakens the correspondence to tree based
tsearch because, well, it's not a tree.  So twalk() behaves
differently than a tree-based version and an initialization.

Non-GNU libc usually has no tdestroy().   The set of functions
here provides a tdestroy() for all the tree/hash function
sets here.


==================
WHY TSEARCH?

The interfaces are of a well-known design.  While
the interfaces are based on an obsolete style of writing
interfaces, they are a known set.  So the routines provided
here are a direct substitution.

The tdestroy() function is available in GNU libc, but is
not part of the standard tsearch() set, nor is tdestroy()
defined in the Single Unix Specification.

The hash version of tdelete() cannot always return a non-null
value even if there are records left.  Not being a tree at
all, it would cost runtime to provide a non-null return in
some cases even when there are records left from tdelete().
The return value from tdelete() is problematic in all versions,
since it purports to return the parent of the deleted node,
but what if the deleted node was the root?

==================
LICENSING:

I wanted this to be usable in any context, hence the use of
the BSD open-source license.

==================
INTERFACE ODDITIES:

It's not really easy to understand whether any given
call is returning
	success, action succeeded
        tsearch success - added record
        tsearch success - record already in tree
        fail due to internal error or improper argument.
        requested action impossible 
            (tfind() fails to find a record, or tdelete() fails
            to find the record to delete)


Speaking of C here, so, there is no try/catch available.

It might have been nice if twalk() let the user-provided
action code indicate to twalk() that the user wanted it to stop
walking the tree.

I won't be changing the interface though.
I am staying with the standard interfaces.

==================
DOCUMENTATION ODDITIES:

The GNU/Linux man pages on tsearch() and friends are nearly
useless as you won't understand how to use the functions from
reading that man page.

The prototype for tfind() in the man page is slightly wrong
while the headers in <search.h> are correct.

==================
IMPLEMENTATION ODDITIES:

The code uses names like dwarf_tsearch() instead of just
tsearch() so one can have both this tsearch() and GNU or
a standard tsearch code available simultaneously with no
interference at runtime.   The code here operates like GNU or
UNIX tsearch() but is internally incompatible.  We'll usually
refer to tsearch() not dwarf_tsearch() (etc) but we mean either
and both unless otherwise specified here.

The use of tsearch() and friends is a little tricky.
That is just the way it is.  The best reference is the code
in tsearch_tester.c.

See the trivial set of #defines in tsearch_tester.c that
result in a standard-based naming of the functions.

The return value from tdelete() is particularly problematic
because it purports to return a pointer to another record
than the one deleted, yet no such record necessarily exists.
And in the case of

The hash version requires use of the new function
dwarf_initialize_search_hash() to provide a hashing
function. And the hash version does not set the root pointer
to NULL on tdelete() of the last record, since that would
result in losing the hashing function pointer.  Use tdestroy()
to free up any remaining space.

dwarf_tdump() is an invention and unique to this code.  It is
for debugging. It prints a representation of the data in the
tree or hash to stdout.

The #include .h file set used here makes it much easier
for me to move these files to another project as necessary.
You will probably want to revise the #include set a little.


==================
OTHER DIRECTORIES:

testdata contains a set of sample allocations, where
lines starting with 'a' mean add and 'd' means delete.
These are used for testing the library code.

scripts contains some python scripts for converting
results produced by printf added to libdwarf alloca()
code.  These were used massage the print output
from running dwarfdump on real object files
into the testcase/* format.
It is unlikely you will find them much use
except as starting points.

==================
RUNNING TESTS

The test scheme should be revised to do each test 'right'
and do something about those pointer values printed.
But for now this is how it works.

Getting started:
   rm -rf testcases
   cp -r ../regressiontests/tsearchtests
   mv tsearchtests testcases
   cd testcases
   gunzip action-F-dwgenb-dwarfgen.log.gz     
   gunzip action-f-v-v-irix64-libc.so.log.gz   
   gunzip action-i-dwarf4-dd2g4.5dwarf-4.log.gz
   gunzip action-l-irix64-libc.so.log.gz
   cd ..
All the files in testcases except two are
just data for the tests.
   testcases/testfail.base and testcases/testpass.base
are baseline outputs and if what the test cases or
dwarf_dump changes those are what need to be updated.

All the tests are done by the script RUNTEST.
   sh RUNTEST [-save]
The -save option tells the script to save (rather
than delete) the temp files it writes to.

First tests run and are saved in 'testfail'
Then that content (which includes passes and fails
and all that is normal)
  diff testfail testcases/testfail.base
should show no differences.

The second part involves the file testpass
which takes a little longer. The message from the
script suggests you inspect the difference with
a graphical compare tool (I use fldiff) because
most lines that appear involve printing pointers
and of course those do not match between runs.
The actual diff is on files with hex numbers
transformed to 0xyyy so an overall PASS
is normal..

Overall on an ordinary 3GHz 64Bit
machine a run takes around seven minutes.

==================
REFERENCES:

Donald E Knuth, The Art of Computer Programming Volume 3,
Second Edition.

(Quite difficult to interpret some parts of Knuth's algorithms,
but then I always found Knuth hard to understand. Still,
Knuth is the essential source for algorithms.)

The Open Group, The Single UNIX Specification, Version 3(2001).
The Single Unix Specification (or whatever it is called now)
is a reference source everyone should have.

Robert Sedgewick and Kevin Wayne, Algorithms (4th Ed).
This is the crucial reference on red-black trees.  There is a
crucial fix in the errata of the 3rd printing.  In addition,
in dwarf_tsearchred.c in two places I had to fix things,
look for 'Mistake' in the source.

http://www.prevanders.net/tsearch.html