File: hash2.tex

package info (click to toggle)
gap 4r4p12-2
links: PTS
area: main
in suites: squeeze, wheezy
size: 29,584 kB
ctags: 7,113
sloc: ansic: 98,786; sh: 3,299; perl: 2,263; makefile: 498; asm: 63; awk: 6
file content (292 lines) | stat: -rw-r--r-- 10,566 bytes
parent folder | download | duplicates (4)
% This file was created automatically from hash2.msk.
% DO NOT EDIT!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%A  hash2.msk                  GAP documentation               Gene Cooperman
%A							         Scott Murray
%A	        					     Alexander Hulpke
%%
%A  @(#)$Id: hash2.msk,v 1.8 2002/04/15 10:02:30 sal Exp $
%%
%Y  (C) 2000 School Math and Comp. Sci., University of St.  Andrews, Scotland
%Y  Copyright (C) 2002 The GAP Group
%%
\PreliminaryChapter{Dictionaries and General Hash Tables}

People and computers spend a large amount of time with searching.
Dictionaries are an abstract data structure which facilitates searching for
certain objects. An important way of implementing dictionaries is via hash
tables.

*The functions and operations described in this chapter have been added
very recently and are still undergoing development. It is conceivable that
names of variants of the functionality might change in future versions. If
you plan to use these functions in your own code, please contact us.*

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Dictionaries}

\>IsDictionary( <obj> ) C

A dictionary is a growable collection of objects that permits to add
objects (with associated values) and to check whether an object is
already known.

\>IsLookupDictionary( <obj> ) C

A *lookup dictionary* is a dictionary, which permits not only to check
whether an object is contained, but also to retrieve associated values,
using the operation `LookupDictionary'.



\>KnowsDictionary( <dict>, <key> ) O

checks, whether <key> is known to the dictionary <dict>, and returns
`true' or `false' accordingly. <key> *must* be an object of the kind for
which the dictionary was specified, otherwise the results are
unpredictable.

\>LookupDictionary( <dict>, <key> ) O

looks up <key> in the lookup dictionary <dict> and returns the
associated value. If <key> is not known to the dictionary, `fail' is
returned.


There are several ways how dictionaries are implemented: As lists, as
sorted lists, as hash tables or via binary lists. A user however will
just have to call `NewDictionary' and obtain a ``suitable'' dictionary
for the kind of objects she wants to create. It is possible however to
create hash tables (see~"General hash table definitions and operations")
and dictionaries using binary lists (see~"DictionaryByPosition").


\>NewDictionary( <obj>, <look>[, <objcoll>] ) F

creates a new dictionary for objects such as <obj>. If <objcoll> is
given the dictionary will be for objects only from this collection,
knowing this can improve the performance. If <objcoll> is given, <obj>
may be replaced by `false', i.e. no sample object is needed.

The function tries to find the right kind of dictionary for the basic
dictionary functions to be quick.
If <look> is `true', the dictionary will be a lookup dictionary,
otherwise it is an ordinary dictionary.


The use of two objects, <obj> and <objcoll> to parametrize the objects a
dictionary is able to store might look confusing. However there are
situations where either of them might be needed:

The first situation is that of objects, for which no formal ``collection
object'' has been defined. A typical example here might be subspaces of
a vector space. {\GAP} does not formally define a ``Grassmannian'' or
anything else to represent the multitude of all subspaces. So it is only
possible to give the dictionary a ``sample object''.

The other situation is that of an object which might represent quite
varied domains. The permutation $(1,10^6)$ might be the nontrivial
element of a cyclic group of order 2, it might be a representative of
$S_{10^6}$. In the first situation the best approach might be just to
have two entries for the two possible objects, in the second situation a
much more elaborate approach might be needed.

An algorithm that creates a dictionary will usually know a priori, from what
domain all the objects will be, giving this domain permits to use a more
efficient dictionary.

This is particularly true for vectors. From a single vector one cannot
decide whether a calculation will take place over the smallest field
containing all its entries or over a larger field.


As there are situations where the approach via binary lists is explicitly
desired, such dictionaries can be created deliberately.
\>DictionaryByPosition( <list>, <lookup> ) F

creates a new (lookup) dictionary which uses `PositionCanonical' in
<list> for indexing. The dictionary will have an entry `<dict>!.blist'
which is a bit list corresponding to <list> indicating the known
If <look> is `true', the dictionary will be a lookup dictionary,
otherwise it is an ordinary dictionary.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{General Hash Tables}

This chapter describes hash tables for general objects.
We hash by keys and also store a value.  Keys    
cannot be removed from the table, but the corresponding value can be 
changed.  Fast access to last hash index allows you to efficiently store 
more than one array of values -- this facility should be used with care.

This code works for any kind of object, provided you have a DenseIntKey
or KeyIntSparse method to convert the key into a positive integer.  
These methods should ideally be implemented efficiently in the core.

Note that, for efficiency, it is currently impossible to create a 
hash table with non-positive integers.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{General hash table definitions and operations}

\>IsHash( <obj> ) C

The category of hash tables for arbitrary objects (provided an `IntKey'
function 
is defined).


\>PrintHashWithNames( <hash>, <keyName>, <valueName> ) O

Print a hash table with the given names for the keys and values.


\>GetHashEntry( <hash>, <key> ) O

If the key is in hash, return the corresponding value.  Otherwise
return fail.  Note that it is not a good idea to use fail as a value.


\>AddHashEntry( <hash>, <key>, <value> ) O

Add the key and value to the hash table.


\>RandomHashKey( <hash> ) O

Return a random Key from the hash table (Random returns a random value).


\>HashKeyEnumerator( <hash> ) O

Enumerates the keys of the hash table (Enumerator enumerates values).



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Hash keys}

The crucial step of hashing is to transform key objects into integers such
that equal objects produce the same integer.

\>TableHasIntKeyFun( <hash> ) P

If this filter is set, the hash table has an `IntKey' function in its
component `<hash>!.intKeyFun'.



The actual function used will vary very much on the type of objects. However
{\GAP} provides already key functions for some commonly encountered objects.

\>DenseIntKey( <objcoll>, <obj> ) O

returns a function that can be used as hash key function for objects
such as <obj> in the collection <objcoll>. <objcoll> typically will be a
large domain.  If the domain is not available, it can be given as
`false' in which case the hash key function will be determined only
based on <obj>. (For a further discussion of these two arguments
see~`NewDictionary', section~"NewDictionary").

The function returned by `DenseIntKey' is guaranteed to give different
values for different objects.
If no suitable hash key function has been predefined, `fail' is returned.

\>SparseIntKey( <objcoll>, <obj> ) O

returns a function that can be used as hash key function for objects
such as <obj> in the collection <objcoll>. In contrast to `DenseIntKey',
the function returned may return the same key value for different
objects.
If no suitable hash key function has been predefined, `fail' is returned.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Dense hash tables}

Dense hash tables are used for hashing dense sets without collisions, 
in particular integers. 
Stores keys as an unordered list and values as an 
array with holes.  The position of a value is given by the attribute
`IntKeyFun' or the function returned by `DenseIntKey',
and so KeyIntDense must be one-to-one.  
\>DenseHashTable( ) F

Construct an empty dense hash table.  This is the only correct way to
construct such a table.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Sparse hash tables}

Sparse hash tables are used for hashing sparse sets.  
Stores keys as an array with fail 
denoting an empty position, stores values as an array with holes.
Uses `HashFunct' applied to the `IntKeyFun' (respectively the result of
calling `SparseIntKey') of the key.  DefaultHashLength 
is the default starting hash table length; the table is doubled 
when it becomes half full.
\>SparseHashTable( [<intkeyfun>] ) F

Construct an empty sparse hash table.  This is the only correct way to
construct such a table.
If the argument <intkeyfun> is given, this function will be used to
obtain numbers for the keys passed to it.


\>GetHashEntryIndex( <hash>, <key> ) F

If the key is in hash, return its index in the hash array.


\>DoubleHashArraySize( <hash> ) F

Double the size of the hash array and rehash all the entries.
This will also happen automatically when the hash array is half full.



In sparse hash tables, the integer obtained from the hash key is then
transformed to an index position, this transformation is done using the hash
function `HashFunct':
\>HashFunct( <key>, <i>, <size> ) F

This will be a good double hashing function for any reasonable KeyInt 
(see Cormen, Leiserson and Rivest, Introduction to Algorithms, 
1e, p. 235).



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Fast access to last hash index}

These functions allow you to use the index of last hash access or modification. 
Note that this is global across all hash tables.  If you want to
have two hash tables with identical layouts, the following works:
GetHashEntry( hashTable1, object ); GetHashEntryAtLastIndex( hashTable2 );
These functions should be used with extreme care, as they bypass most
of the inbuilt error checking for hash tables.

\>GetHashEntryAtLastIndex( <hash> ) O

Returns the value of the last hash entry accessed.


\>SetHashEntryAtLastIndex( <hash>, <newValue> ) O

Resets the value of the last hash entry accessed.


\>SetHashEntry( <hash>, <key>, <value> ) O

Resets the value corresponding to <key>.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%E