Implementation of a perfect hashing data structure. We say a hash function is perfect for S if all lookups involve O(1) work. In section 2, background about universal hashing is provided. Sections 3 and 4 describe two methods for constructing perfect hash functions for a given set S. It's required to design, analyze and implement a perfect hash table as described in sections 3 and 4.
A probability distribution H over hash functions from U to {1, ..., M} is universal if for all x=y in U, we have
Pr[h(x) = h(y)] ≤ 1/M
If H is universal, then for any set S ⊂ U, for any x ∈ U (that we might want to insert or lookup), for a random h taken from H, the expected number of collisions between x and other elements in S is at most N/M.
Let’s say keys are u-bits long. Say the table size M is power of 2, so an index is b-bits long with M = 2b. What we’ll do is pick h to be a random b-by-u 0/1 matrix, and define h(x)=hx, where we do addition mod 2. For instance:
We can show that for x = y, Pr[h(x) = h(y)] = 1/M = 1/2b
Say we are willing to have a table whose size is quadratic in the size N of our dictionary S. Then, here is an easy method. Let H be universal and M = N2. Pick a random h from H and try it out, hashing everything in S. So, we just try it, and if we got any collisions, we just try a new h. On average, we will only need to do this twice.
The main idea for this method is to use universal hash functions in a 2-level scheme. The method is as follows. We will first hash into a table of size N using universal hashing. This will produce some collisions. However, we will then rehash each bin using Method 1, squaring the size of the bin to get zero collisions. So, the way to think of this scheme is that we have a first-level hash function h and first-level table A, and then N second-level hash functions h1, ..., hN and N second-level tables A1, ..., AN. To look up an element x, we first compute i = h(x) and then find the element in Ai[hi(x)].
As an application based on the perfect hashing implementation, it's required to implement a simple English dictionary supporting the following functionalities:
- Initialize (constructor): Takes the name of the type of the backend perfect hashing as an input and creates a new empty dictionary based on it.
- Insert: Takes a single string key and tries to insert it.
- Delete: Takes a single string key and tries to delete it.
- Search: Takes a single string key, searches for it, and returns true if it exists and false otherwise.
- Batch insert: Takes a path to a text file containing multiple words each in a separate line. And tries to insert all that words into the dictionary.
- Batch delete: Takes a path to a text file containing multiple words each in a separate line. And tries to delete all that words from the dictionary.
Implement a command line interface that will enable us to deal with the dictionary and apply all its implemented operations. This interface must take the type of hash table as an initial input then create a dictionary based on it and allow the user to apply subsequent operations on it from the following list:
- Insert a string and prints a confirmation message or an error one if the the string already exists in the dictionary.
- Delete a string and prints a confirmation message or an error one if the the string doesn’t exist in the dictionary.
size | Hash-O(N) | Hash-O(N2) |
---|---|---|
10 | 1 | 0 |
100 | 41 | 0 |
1,000 | 468 | 0 |
10,000 | 3292 | 0 |
20,000 | 6388 | 0 |
size | Hash-O(N) | Hash-O(N2) |
---|---|---|
10 | 14 | 128 |
100 | 185 | 16,384 |
1,000 | 1,965 | 1,048,576 |
10,000 | 16,288 | 134,217,728 |
20,000 | 33,349 | 536,870,912 |
30,000 | 57,782 | 1,073,741,824 |