1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227
|
David Anderson.
Written January 2014
Updated December 2020
See RUNNING TESTS below.
In 2013 I realized I could simplify some code in libdwarf
(another project) by removing some complicated and messy
special memory allocation code. Substituting by use of
malloc/free and a table of used values. The used-values needed
to a preserve a special feature of the original allocations
related to a special handle: any allocations would be freed
by closing the special handle.
So I decided to implement various searches (mostly tree
searches) using the tsearch interface definitions.
I wrote these using a dwarf_ prefix to distinguish
these from any libc implementation and to make the
source fit in easily with libdwarf conventions.
I added a tsearch() implementation using hashing too.
The GNU-only hsearch() function does not let the hash grow
(the tsearch() hash here grows automatically as needed)
and has no hdelete() or hwalk() capability, so hsearch() and
friends did not seem like a good interface set even though
the GNU-designed hsearch_r() interface set is nicely designed.
The attached C code is a small set of implementations of
C search code. Based on algorithms in Knuth Volume 3
(2nd Ed) and Sedgewick 4th Edition.
All the code here is implemented by me. I have never read the
source to any other implementations of the tsearch algorighms
or interfaces so this is a clean-room implementation.
The hash version weakens the correspondence to tree based
tsearch because, well, it's not a tree. So twalk() behaves
differently than a tree-based version and an initialization.
Non-GNU libc usually has no tdestroy(). The set of functions
here provides a tdestroy() for all the tree/hash function
sets here.
==================
WHY TSEARCH?
The interfaces are of a well-known design. While
the interfaces are based on an obsolete style of writing
interfaces, they are a known set. So the routines provided
here are a direct substitution.
The tdestroy() function is available in GNU libc, but is
not part of the standard tsearch() set, nor is tdestroy()
defined in the Single Unix Specification.
The hash version of tdelete() cannot always return a non-null
value even if there are records left. Not being a tree at
all, it would cost runtime to provide a non-null return in
some cases even when there are records left from tdelete().
The return value from tdelete() is problematic in all versions,
since it purports to return the parent of the deleted node,
but what if the deleted node was the root?
==================
LICENSING:
I wanted this to be usable in any context, hence the use of
the BSD open-source license.
==================
INTERFACE ODDITIES:
It's not really easy to understand whether any given
call is returning
success, action succeeded
tsearch success - added record
tsearch success - record already in tree
fail due to internal error or improper argument.
requested action impossible
(tfind() fails to find a record, or tdelete() fails
to find the record to delete)
Speaking of C here, so, there is no try/catch available.
It might have been nice if twalk() let the user-provided
action code indicate to twalk() that the user wanted it to stop
walking the tree.
I won't be changing the interface though.
I am staying with the standard interfaces.
==================
DOCUMENTATION ODDITIES:
The GNU/Linux man pages on tsearch() and friends are nearly
useless as you won't understand how to use the functions from
reading that man page.
The prototype for tfind() in the man page is slightly wrong
while the headers in <search.h> are correct.
==================
IMPLEMENTATION ODDITIES:
The code uses names like dwarf_tsearch() instead of just
tsearch() so one can have both this tsearch() and GNU or
a standard tsearch code available simultaneously with no
interference at runtime. The code here operates like GNU or
UNIX tsearch() but is internally incompatible. We'll usually
refer to tsearch() not dwarf_tsearch() (etc) but we mean either
and both unless otherwise specified here.
The use of tsearch() and friends is a little tricky.
That is just the way it is. The best reference is the code
in tsearch_tester.c.
See the trivial set of #defines in tsearch_tester.c that
result in a standard-based naming of the functions.
The return value from tdelete() is particularly problematic
because it purports to return a pointer to another record
than the one deleted, yet no such record necessarily exists.
And in the case of
The hash version requires use of the new function
dwarf_initialize_search_hash() to provide a hashing
function. And the hash version does not set the root pointer
to NULL on tdelete() of the last record, since that would
result in losing the hashing function pointer. Use tdestroy()
to free up any remaining space.
dwarf_tdump() is an invention and unique to this code. It is
for debugging. It prints a representation of the data in the
tree or hash to stdout.
The #include .h file set used here makes it much easier
for me to move these files to another project as necessary.
You will probably want to revise the #include set a little.
==================
OTHER DIRECTORIES:
testdata contains a set of sample allocations, where
lines starting with 'a' mean add and 'd' means delete.
These are used for testing the library code.
scripts contains some python scripts for converting
results produced by printf added to libdwarf alloca()
code. These were used massage the print output
from running dwarfdump on real object files
into the testcase/* format.
It is unlikely you will find them much use
except as starting points.
==================
RUNNING TESTS
The test scheme should be revised to do each test 'right'
and do something about those pointer values printed.
But for now this is how it works.
Getting started:
rm -rf testcases
cp -r ../regressiontests/tsearchtests
mv tsearchtests testcases
cd testcases
gunzip action-F-dwgenb-dwarfgen.log.gz
gunzip action-f-v-v-irix64-libc.so.log.gz
gunzip action-i-dwarf4-dd2g4.5dwarf-4.log.gz
gunzip action-l-irix64-libc.so.log.gz
cd ..
All the files in testcases except two are
just data for the tests.
testcases/testfail.base and testcases/testpass.base
are baseline outputs and if what the test cases or
dwarf_dump changes those are what need to be updated.
All the tests are done by the script RUNTEST.
sh RUNTEST [-save]
The -save option tells the script to save (rather
than delete) the temp files it writes to.
First tests run and are saved in 'testfail'
Then that content (which includes passes and fails
and all that is normal)
diff testfail testcases/testfail.base
should show no differences.
The second part involves the file testpass
which takes a little longer. The message from the
script suggests you inspect the difference with
a graphical compare tool (I use fldiff) because
most lines that appear involve printing pointers
and of course those do not match between runs.
The actual diff is on files with hex numbers
transformed to 0xyyy so an overall PASS
is normal..
Overall on an ordinary 3GHz 64Bit
machine a run takes around seven minutes.
==================
REFERENCES:
Donald E Knuth, The Art of Computer Programming Volume 3,
Second Edition.
(Quite difficult to interpret some parts of Knuth's algorithms,
but then I always found Knuth hard to understand. Still,
Knuth is the essential source for algorithms.)
The Open Group, The Single UNIX Specification, Version 3(2001).
The Single Unix Specification (or whatever it is called now)
is a reference source everyone should have.
Robert Sedgewick and Kevin Wayne, Algorithms (4th Ed).
This is the crucial reference on red-black trees. There is a
crucial fix in the errata of the 3rd printing. In addition,
in dwarf_tsearchred.c in two places I had to fix things,
look for 'Mistake' in the source.
http://www.prevanders.net/tsearch.html
|