1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Tables</title>
</head>
<body bgcolor="#FFFFFF">
<table border="0" width="100%">
<tr>
<td><font size="6"><strong>Tables</strong></font></td>
<td align="right"><a href="list.html"><img
src="../image/previous.gif" alt="Previous" border="0"
width="40" height="40"></a><a href="dispenser.html"><img
src="../image/next.gif" alt="Next" border="0" width="40"
height="40"></a></td>
</tr>
</table>
<hr size="1">
<p align="center"><!--webbot bot="ImageMap" startspan
rectangle=" (131,235) (233, 267) flatshort/ds_hash_table.html"
rectangle=" (124,160) (242, 191) flatshort/ds_sparse_table.html"
rectangle=" (258,85) (350, 115) flatshort/ds_resizable.html"
rectangle=" (141,84) (230, 116) flatshort/ds_bilinear.html"
rectangle=" (22,85) (107, 116) flatshort/ds_table.html"
rectangle=" (18,9) (110, 40) flatshort/ds_container.html"
src="image/table.gif" border="0" width="368" height="279" --><MAP NAME="FrontPageMap0"><AREA SHAPE="RECT" COORDS="131, 235, 233, 267" HREF="flatshort/ds_hash_table.html"><AREA SHAPE="RECT" COORDS="124, 160, 242, 191" HREF="flatshort/ds_sparse_table.html"><AREA SHAPE="RECT" COORDS="258, 85, 350, 115" HREF="flatshort/ds_resizable.html"><AREA SHAPE="RECT" COORDS="141, 84, 230, 116" HREF="flatshort/ds_bilinear.html"><AREA SHAPE="RECT" COORDS="22, 85, 107, 116" HREF="flatshort/ds_table.html"><AREA SHAPE="RECT" COORDS="18, 9, 110, 40" HREF="flatshort/ds_container.html"></MAP><img src="image/table.gif" border="0" width="368" height="279" usemap="#FrontPageMap0"><!--webbot
bot="ImageMap" i-checksum="35398" endspan --></p>
<h2>Class <em><tt>DS_TABLE</tt></em></h2>
<p>Tables are data structures whose items are accessible by keys.
Therefore the class <a href="flatshort/ds_table.html"><font
color="#008080"><em><tt>DS_TABLE</tt></em></font></a> has two
generic parameters. As for all other container classes, the first
parameter <font color="#008080"><em><tt>G </tt></em></font>represents
the type of the items, whereas the second parameter <font
color="#008080"><em><tt>K</tt></em></font> is the type of the
keys which are associated with the items. The features provided
by class <font color="#008080"><em><tt>DS_TABLE </tt></em></font>are
very similar to some of those of class <a
href="container.html#DS_INDEXABLE"><font color="#008080"><em><tt>DS_INDEXABLE</tt></em></font></a>.
The difference comes from the fact that in <font color="#008080"><em><tt>DS_INDEXABLE
</tt></em></font>keys are contiguous integers whereas in <font
color="#008080"><em><tt>DS_TABLE </tt></em></font>keys can be of
any type, and hence not necessarily contiguous. The main features
introduced in class <font color="#008080"><em><tt>DS_TABLE </tt></em></font>are
<a href="flatshort/ds_table.html#item"><font color="#008080"><em><tt>item</tt></em></font></a><font
color="#008080"><em><tt> </tt></em></font>to access items by key,
<a href="flatshort/ds_table.html#put"><font color="#008080"><em><tt>put</tt></em></font></a>
to insert new items and <a href="flatshort/ds_table.html#replace"><font
color="#008080"><em><tt>replace</tt></em></font></a> to associate
an existing key with another item, <a
href="flatshort/ds_table.html#remove"><font color="#008080"><em><tt>remove</tt></em></font></a>
to remove an item and its associated key from the table, <a
href="flatshort/ds_table.html#valid_key"><font color="#008080"><em><tt>valid_key</tt></em></font></a>
to check whether a key can be used in the table and <a
href="flatshort/ds_table.html#has"><font color="#008080"><em><tt>has</tt></em></font></a>
to check whether an item has already been inserted in the table
with a given key.</p>
<h2>Class <em><tt>DS_HASH_TABLE</tt></em></h2>
<p>One possible implementation of tables is hash tables. A hash
table is typically made up of an array where items are accessed
by integer index. Therefore the keys used in the hash tables
should provide a means to yield such integer value through a
hashing mechanism. This is exactly what feature <font
color="#008080"><em><tt>hash_code</tt></em></font> from <font
color="#008080"><em><tt>HASHABLE</tt></em></font> is for, and
therefore the second generic parameter of <a
href="flatshort/ds_hash_table.html"><font color="#008080"><em><tt>DS_HASH_TABLE</tt></em></font></a>
is constrained by <font color="#008080"><em><tt>HASHABLE</tt></em></font>.
Thanks to this implementation, features of hash tables are
usually more efficient than linked implementations since access
time in an array is bounded by a constant regardless of the
number of items in the container. However the hash code
associated with the keys is not necessarily unique, and therefore
collisions may happen and hence slow down the process. No
optimization effort has been made when writing <font
color="#008080"><em><tt>DS_HASH_TABLE</tt></em></font> in order
to take care of collisions. In particular it doesn't take
advantage of the well known algorithms using prime numbers when
collisions occur. The implementation used here is very simple and
was deemed satisfactory enough for its usage in the rest if the <em>Gobo
Eiffel </em>libraries. In particular, some benchmarks have proven
that <font color="#008080"><em><tt>DS_HASH_TABLE </tt></em></font>behaved
better than class <font color="#008080"><em><tt>HASH_TABLE</tt></em></font>
from <a href="see_also.html#EiffelBase"><em>EiffelBase</em></a>
which uses a collision-resolution algorithm based on prime
numbers. It is not clear to me whether this difference of
performance comes from the algorithm itself or from other
differences between the implementation of both hash tables, but
it is unlikely that the implementation of <font color="#008080"><em><tt>DS_HASH_TABLE
</tt></em></font>will use more sophisticated algorithms in the
future based on these observations.</p>
<p>Another interesting remark about efficiency is that the
performance of hash tables depends on the number of collisions
that may occur in the table. Therefore the implementation for the
<font color="#008080"><em><tt>hash_code</tt></em></font> of the
keys is very important since returning often the same value for
different keys will trigger too many collisions and yield
performance degradations. If the hash table appears to be slow
and the type of the keys is one of the basic Eiffel types such as
<font color="#008080"><em><tt>STRING</tt></em></font>, it is
recommended to try with another Eiffel compiler whose
implementation of <font color="#008080"><em><tt>hash_code </tt></em></font>may
be better in order to see if it makes a difference. Finally, if
the default implementation of <font color="#008080"><em><tt>hash_code
</tt></em></font>is not optimal for a given set of keys, one can
inherit from <font color="#008080"><em><tt>DS_HASH_TABLE </tt></em></font>and
redefine its feature <font color="#008080"><em><tt>hash_code </tt></em></font>to
provide a better implementation. For example, consider a school
which keeps track of project assignments in a table indexed by
students. The obvious solution is to declare:</p>
<blockquote>
<pre><em>assignments</em>: <em>DS_HASH_TABLE</em> [<em>PROJECT</em>, <em>STUDENT</em>]</pre>
</blockquote>
<p>However the implementation of <font color="#008080"><em><tt>hash_code
</tt></em></font>in class <font color="#008080"><em><tt>STUDENT</tt></em></font>
inherited from <font color="#008080"><em><tt>PERSON</tt></em></font>
just returns the age of the person. This implementation of <font
color="#008080"><em><tt>hash_code</tt></em></font> in class <font
color="#008080"><em><tt>PERSON</tt></em></font> is perfectly
valid in most cases, but it is clear that when dealing with
students in the same classroom it is likely that they will all
have more or less the same age and hence the same hash code. In
this particular case it is better to provide a better
implementation for <font color="#008080"><em><tt>hash_code </tt></em></font>in
<font color="#008080"><em><tt>DS_HASH_TABLE </tt></em></font>to
avoid the numerous collisions:</p>
<blockquote>
<pre><font color="#000080"><em><strong>class</strong></em></font><em> ASSIGNMENTS
</em><font color="#000080"><em><strong>inherit</strong></em></font><em>
DS_HASH_TABLE </em>[<em>PROJECT</em>,<em> STUDENT</em>]<em>
</em><font color="#000080"><em><strong>redefine</strong></em></font><em>
hash_code
</em><font color="#000080"><em><strong>end
creation</strong></em></font><em>
make
</em><font color="#000080"><em><strong>feature</strong></em></font><em> </em>{<em>NONE</em>} <font
color="#008000">-- Implementation</font><em>
hash_code </em>(<em>s</em>:<em> STUDENT</em>):<em> INTEGER </em><font
color="#000080"><em><strong>is</strong></em></font><em>
</em><font color="#008000">-- Hash code of student</font> <em>s
</em><font color="#000080"><em><strong>do</strong></em></font><em>
</em><font color="#008080"><em>Result</em></font><em> </em>:=<em> s</em>.<em>name</em>.<em>hash_code
</em><font color="#000080"><em><strong>end
end</strong></em></font><em> </em><font color="#008000">-- class ASSIGNMENTS</font></pre>
</blockquote>
<p>The keys are directly stored in the hash table without being
copied. Therefore it is important that the hash code associated
with each key doesn't change while the key is stored in the hash
table. Otherwise the hashing mechanism would be broken and it
would be impossible to access the item associated with this key.
Likewise, keys are compared in the hash table using the feature <font
color="#008080"><em><tt>is_equal</tt></em></font> from class <font
color="#008080"><em><tt>GENERAL</tt></em></font>. Therefore if
the critera used to implement the function <font color="#008080"><em><tt>is_equal
</tt></em></font>are changed while the key is stored in the hash
table, this key might not be recognized properly anymore within
this hash table. Needless to say that if two keys are considered
equal, they should have the same hash code. The solution when the
hash code or equality criteria of a key are likely to vary while
the key is stored in the hash table is clone that key when
inserting an item associated with it.</p>
<p>The class <font color="#008080"><em><tt>DS_HASH_TABLE </tt></em></font>provides
traversal facilities inherited from <a
href="traversal.html#DS_BILINEAR"><font color="#008080"><em><tt>DS_BILINEAR</tt></em></font></a>.
Although all items will be visited once and only once during a
traversal, they will be traversed in an unpredictable order and
subsequent traversals may traverse the items in different orders.
This is because a hash table is not an ordered containter as can
a list be. Items are not inserted before or after other items in
the hash table but based on a hashing mechanism and
collision-resolution algorithm. Therefore, altering the hash
table by adding or removing items or by resizing the table may
change the order of the items and hence will invalidate current
traversals. For this reason, most of these routines will move <font
color="#008080"><em><tt>off </tt></em></font>all existing cursors
currently traversing the hash table (see the header comments of
these routines for details).</p>
<p>As you may now realize after reading the first paragraph
above, the performance of hash tables is one of their <em>raison
d'tre</em>. This had of course to be taken into account when
implementing the routines inherited from <font color="#008080"><em><tt>DS_TABLE</tt></em></font>.
Because of the precondition of <a
href="flatshort/ds_hash_table.html#item"><font color="#008080"><em><tt>item</tt></em></font></a>
which states that in order to be able to query an item associated
with a given key, that item has to exist in the first place, one
has to write:</p>
<blockquote>
<pre><font color="#000080"><em><strong>if</strong></em></font><em> table</em>.<em>has </em>(<em>k</em>)<em> </em><font
color="#000080"><em><strong>then</strong></em></font><em>
v </em>:= <em>table</em>.<em>item </em>(<em>k</em>)<em>
</em><font color="#000080"><em><strong>else</strong></em></font><em>
...
</em><font color="#000080"><em><strong>end</strong></em></font></pre>
</blockquote>
<p>However, both <a href="flatshort/ds_hash_table.html#has"><font
color="#008080"><em><tt>has</tt></em></font></a><font
color="#008080"><em><tt> </tt></em></font>and <a
href="flatshort/ds_hash_table.html#item"><font color="#008080"><em><tt>item</tt></em></font></a><font
color="#008080"><em><tt> </tt></em></font>will have to compute
the hash code of <font color="#008080"><em><tt>k</tt></em></font>
and deal with possible collisions in the hash table. In other
words we do twice the same thing. The solution adopted in the
current implementation of <font color="#008080"><em><tt>DS_HASH_TABLE
</tt></em></font>is to keep track of the result of last hashing
operation in the hash table in a cache. Therefore, in the code
above, most of the work of accessing the item at key <font
color="#008080"><em><tt>k</tt></em></font> will be done only once
in the routine <font color="#008080"><em><tt>has</tt></em></font>,
and <font color="#008080"><em><tt>item </tt></em></font>will
realize that the key given as argument is the same and hence
avoid calling the hashing mechanism again.</p>
<p>Another solution to avoid that would have been to get rid of
the precondition in <font color="#008080"><em><tt>item </tt></em></font>and
return <font color="#008080"><em><tt>Void</tt></em></font> when
there is no item associated with key <font color="#008080"><em><tt>k</tt></em></font>.
However this solution goes against the principle of <em>Design by
Contract</em> since getting <font color="#008080"><em><tt>Void</tt></em></font>
could either mean that there is no item for key <font
color="#008080"><em><tt>k</tt></em></font> or that there is
actually one which happens to be <font color="#008080"><em><tt>Void</tt></em></font>.
Therefore this solution has not been adopted in <font
color="#008080"><em><tt>DS_HASH_TABLE </tt></em></font>but a
better designed alternative also based on the <em>try-and-see</em>
principle is available:</p>
<blockquote>
<pre><em>table</em>.<em>search </em>(<em>k</em>)
<font color="#000080"><em><strong>if</strong></em></font><em> table</em>.<em>found </em><font
color="#000080"><em><strong>then</strong></em></font><em>
v </em>:= <em>table</em>.<em>found_item
</em><font color="#000080"><em><strong>else</strong></em></font><em>
...
</em><font color="#000080"><em><strong>end</strong></em></font></pre>
</blockquote>
<p>Both code excerpts should have the same execution time
performance thanks to <font color="#008080"><em><tt>DS_HASH_TABLE</tt></em></font>'s
internal optimizations. Using one or the other is just a question
of taste. I personally prefer the first pattern.</p>
<hr size="1">
<table border="0" width="100%">
<tr>
<td><address>
<font size="2"><b>Copyright 1999</b></font><font
size="1"><b>, </b></font><font size="2"><strong>Eric
Bezault</strong></font><strong> </strong><font
size="2"><br>
<strong>mailto:</strong></font><a
href="mailto:ericb@gobosoft.com"><font size="2">ericb@gobosoft.com</font></a><font
size="2"><br>
<strong>http:</strong></font><a
href="http://www.gobosoft.com"><font size="2">//www.gobosoft.com</font></a><font
size="2"><br>
<strong>Last Updated:</strong> 25 September 1999</font><br>
<!--webbot bot="PurpleText"
preview="
$Date: 1999/10/02 11:59:41 $
$Revision: 1.6 $"
-->
</address>
</td>
<td align="right" valign="top"><a
href="http://www.gobosoft.com"><img
src="../image/home.gif" alt="Home" border="0" width="40"
height="40"></a><a href="index.html"><img
src="../image/toc.gif" alt="Toc" border="0" width="40"
height="40"></a><a href="list.html"><img
src="../image/previous.gif" alt="Previous" border="0"
width="40" height="40"></a><a href="dispenser.html"><img
src="../image/next.gif" alt="Next" border="0" width="40"
height="40"></a></td>
</tr>
</table>
</body>
</html>
|