Store about 1000 records with keys in range 0 to 16,383.
Impractical to keep a hash table with 16,384 slots.
We must devise a hash function to map the key range to a
smaller table.
Collisions (1)
Given: hash function h with keys k1 and k2.
β is a slot in the hash table.
If h(k1)=β=h(k2), then
k1 and k2 have a collision at β
under h.
Search for the record with key K:
Compute the table location h(K).
Starting with slot h(K), locate the record
containing key K using (if necessary) a collision
resolution policy.
Collisions (2)
Collisions are inevitable in most applications.
Example: Among 23 people, some pair is likely to share a
birthday.
Hash Functions (1)
A hash function MUST return a value within the hash table range.
To be practical, a hash function SHOULD evenly distribute the
records stored among the hash table slots.
Ideally, the hash function should distribute records with equal
probability to all hash table slots. In practice, success
depends on distribution of actual records stored.
Hash Functions (2)
If we know nothing about the incoming key distribution, evenly
distribute the key range over the hash table slots while avoiding
obvious opportunities for clustering.
If we have knowledge of the incoming distribution, use a
distribution-dependent hash function.
We will demonstrate the mod hash function. To make the compuation easy (because you can probably do mod by 10 in your head easily) we will store records in an array of size 10.
We will demonstrate the Binning hash function. To make the compuation easy (because you can probably divide by 10 in your head easily) we will store records in an array of size 10.
int sascii(String x, int M) {
char ch[];
ch = x.toCharArray();
int xlength = x.length();
int i, sum;
for (sum=0, i=0; i < x.length(); i++)
sum += ch[i];
return sum % M;
}
String Folding
// Use folding on a string, summed 4 bytes at a timeintsfold(Strings,intM){longsum=0,mul=1;for(inti=0;i<s.length();i++){mul=(i%4==0)?1:mul*256;sum+=s.charAt(i)*mul;}return(int)(Math.abs(sum)%M);}
Demonstration of bucket hash for an array of size 10 storing 5 buckets, each two slots in size. The alternating gray and white cells indicate the buckets.
Demonstration of alternative bucket hash for an array of size 10 storing 5 buckets, each two slots in size. The alternating gray and white cells indicate the buckets.
Closed hashing stores all records directly in the hash table.
Each record i has a home position h(ki).
If another record occupies the home position for i, then
another slot must be found to store i.
The new slot is found by a collision resolution policy.
Search must follow the same policy to find records not in their
home slots.
Collision Resolution
During insertion, the goal of collision resolution is to find a
free slot in the table.
Probe sequence: The series of slots visited during insert/search
by following a collision resolution policy.
Let β0=h(K).
Let (β0,β1,...) be the series of slots making
up the probe sequence.
Insertion
// Insert e into hash table HT
void hashInsert(const Key& k, const Elem& e) {
int home; // Home position for e
int pos = home = h(k); // Init probe sequence
for (int i=1; EMPTYKEY != (HT[pos]).key(); i++) {
pos = (home + p(k, i)) % M; // probe
if (k == HT[pos].key()) {
println("Duplicates not allowed");
return;
}
}
HT[pos] = e;
}
Search
// Search for the record with Key K
bool hashSearch(const Key& K, Elem& e) const {
int home; // Home position for K
int pos = home = h(K); // Initial position is the home slot
for (int i = 1;
(K != (HT[pos]).key()) && (EMPTYKEY != (HT[pos]).key());
i++)
pos = (home + p(K, i)) % M; // Next on probe sequence
if (K == (HT[pos]).key()) { // Found it
e = HT[pos];
return true;
}
else return false; // K not in hash table
}
Probe Function
Look carefully at the probe function p():
pos = (home + p(k, i)) % M; // probe
Each time p() is called, it generates a value to be added to the
home position to generate the new slot to be examined.
p() is a function both of the element’s key value, and of
the number of steps taken along the probe sequence.
Not all probe functions use both parameters.
Linear Probing (1)
Use the following probe function:
p(K, i) = i;
Linear probing simply goes to the next slot in the table.
Past bottom, wrap around to the top.
To avoid infinite loop, one slot in the table must always be empty.
The simplest collsion resolution method is called linear probing. We simply move to the right in the table from the home slot, wrapping around to the beginning if necessary.
Consider the situation where we left off in the last slide show. If at this point we wanted to insert the value 3348, we would have to probe all the way to slot 2.
Under quadratic probing, two keys with different home positions will have diverging probe sequences. Consider a value that hashes to slot 5. Its probe sequence is 5, then 5 + 1 = 6, then 5 + 4 = 9, then (5 + 9) % 10 = 4, and so on.
Let's see what happens when we use a hash table of size M = 11 (a prime number), our primary hash function is a simple mod on the table size (as usual), and our secondary hash function is h2(k) = 1 + (k % (M-1)).
Now we try the alternate second hash function. Use a hash table of size M = 16 (a power of 2), our primary hash function is a simple mod on the table size (as usual), and our secondary hash function is to do linear probing by steps of size h2(k) = (((k/M) % (M/2)) * 2) + 1.
Let's see an example of the deletion process in action. As usual, our example will use a hash table of size 10, the simple mod hash function, and collision resolution using simple linear probing.