Loading [MathJax]/jax/output/HTML-CSS/jax.js

8.Hashing (1)§

Hashing: The process of mapping a key value to a position in a table.

A hash function maps key values to positions. It is denoted by h.

A hash table is an array that holds the records. It is denoted by HT.

HT has M slots, indexed form 0 to M1.

Hashing (2)

For any value K in the key range and some hash function h, h(K)=i, 0<=i<M, such that key(HT[i]) =K.

Hashing is appropriate only for sets (no duplicates).

Good for both in-memory and disk-based applications.

Answers the question “What record, if any, has key value K?”

Simple Examples

1 / 10 Settings
<<<>>>

We will demonstrate the simplest hash function, storing records in an array of size 10.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Collisions (1)

Collisions (2)

Hash Functions (1)

Hash Functions (2)

Simple Mod Function

int h(int x) {
  return x % 16;
}
1 / 10 Settings
<<<>>>

We will demonstrate the mod hash function. To make the compuation easy (because you can probably do mod by 10 in your head easily) we will store records in an array of size 10.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Binning

1 / 8 Settings
<<<>>>

We will demonstrate the Binning hash function. To make the compuation easy (because you can probably divide by 10 in your head easily) we will store records in an array of size 10.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Mod vs. Binning

Binning vs. Mod Function

Mid-Square Method

Mid-square method example

Strings Function: Character Adding

int sascii(String x, int M) {
  char ch[];
  ch = x.toCharArray();
  int xlength = x.length();

  int i, sum;
  for (sum=0, i=0; i < x.length(); i++)
    sum += ch[i];
  return sum % M;
}

String Folding

// Use folding on a string, summed 4 bytes at a time
int sfold(String s, int M) {
  long sum = 0, mul = 1;
  for (int i = 0; i < s.length(); i++) {
    mul = (i % 4 == 0) ? 1 : mul * 256;
    sum += s.charAt(i) * mul;
  }
  return (int)(Math.abs(sum) % M);
}

Open Hashing

Created with Raphaël 2.1.2
  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Created with Raphaël 2.1.2
100
930
313
Created with Raphaël 2.1.2
977
207
157
979

Bucket Hashing (1)

1 / 27 Settings
<<<>>>

Demonstration of bucket hash for an array of size 10 storing 5 buckets, each two slots in size. The alternating gray and white cells indicate the buckets.

Created with Raphaël 2.1.2
  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
B0
B1
B2
B3
B4
Proficient Saving... Error Saving
Server Error
Resubmit

Bucket Hashing (2)

1 / 21 Settings
<<<>>>

Demonstration of alternative bucket hash for an array of size 10 storing 5 buckets, each two slots in size. The alternating gray and white cells indicate the buckets.

Created with Raphaël 2.1.2
  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
B0
B1
B2
B3
B4
Proficient Saving... Error Saving
Server Error
Resubmit

Closed Hashing

Collision Resolution

Insertion

// Insert e into hash table HT
void hashInsert(const Key& k, const Elem& e) {
  int home;                     // Home position for e
  int pos = home = h(k);        // Init probe sequence
  for (int i=1; EMPTYKEY != (HT[pos]).key(); i++) {
    pos = (home + p(k, i)) % M; // probe
    if (k == HT[pos].key()) {
      println("Duplicates not allowed");
      return;
    }
  }
  HT[pos] = e;
}

Probe Function

Linear Probing (1)

Linear Probing (2)

1 / 11 Settings
<<<>>>

The simplest collsion resolution method is called linear probing. We simply move to the right in the table from the home slot, wrapping around to the beginning if necessary.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Problem with Linear Probing

1 / 7 Settings
<<<>>>

Consider the situation where we left off in the last slide show. If at this point we wanted to insert the value 3348, we would have to probe all the way to slot 2.

  1. 90500
  2. 72001
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 98777
  9. 20378
  10. 10599
Proficient Saving... Error Saving
Server Error
Resubmit

Linear Probing by Steps (1)

1 / 10 Settings
<<<>>>

When doing collision resolution with linear probing by steps of size 2 on a hash table of size 10, a record that hashes to slot 4...

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Linear Probing by Steps (2)

1 / 11 Settings
<<<>>>

When doing collision resolution with linear probing by steps of size 3 on a hash table of size 10, a record that hashes to slot 4...

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Pseudo-Random Probing (1)

1 / 14 Settings
<<<>>>

Let's see an example of collision resolution using pseudorandom probing on a hash table of size 10 using the simple mod hash function.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Pseudo-Random Probing (2)

1 / 11 Settings
<<<>>>

First recall what happens with linear probing by steps of 2. Say that one record hashes to slot 4, and another hashes to slot 6.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 1044
  6. 5
  7. 9366
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Quadratic Probing

1 / 3 Settings
<<<>>>

Under quadratic probing, two keys with different home positions will have diverging probe sequences. Consider a value that hashes to slot 5. Its probe sequence is 5, then 5 + 1 = 6, then 5 + 4 = 9, then (5 + 9) % 10 = 4, and so on.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

1 / 3 Settings
<<<>>>

Unfortunately, quadratic probing has the disadvantage that typically not all hash table slots will be on the probe sequence.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Double Hashing (1)

1 / 9 Settings
<<<>>>

Let's see what happens when we use a hash table of size M = 11 (a prime number), our primary hash function is a simple mod on the table size (as usual), and our secondary hash function is h2(k) = 1 + (k % (M-1)).

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
  11. 10
h2(k) = 1 + (k % (M-1))
Proficient Saving... Error Saving
Server Error
Resubmit

Double Hashing (2)

1 / 7 Settings
<<<>>>

Now we try the alternate second hash function. Use a hash table of size M = 16 (a power of 2), our primary hash function is a simple mod on the table size (as usual), and our secondary hash function is to do linear probing by steps of size h2(k) = (((k/M) % (M/2)) * 2) + 1.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
  11. 10
  12. 11
  13. 12
  14. 13
  15. 14
  16. 15
h2(k) = (((k/M) % (M/2)) * 2) + 1
Proficient Saving... Error Saving
Server Error
Resubmit

Analysis of Closed Hashing

The load factor is α=N/M where N is the number of records currently in the table.

Hashing analysis plot

Deletion

Tombstones (1)

1 / 16 Settings
<<<>>>

Let's see an example of the deletion process in action. As usual, our example will use a hash table of size 10, the simple mod hash function, and collision resolution using simple linear probing.

  1. 0
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
Proficient Saving... Error Saving
Server Error
Resubmit

Tombstones (2)