home
table of contents
Comp Sci
October 2004
email

Dynamic hash tables

Objective:

The objective is to maintain a hash table that is dynamically, incrementally resizable such that that the cost of inserts, deletes, and changes in table size all be O(1). That is, the table size grows (and contracts) proportionately to the number of elements in the table. Using O(N) extra storage locations, where N is the maxiumum number of elements that will be present, is acceptable.

Hash value reduction

A key part of the technique is the determination of a reduced hash value. Suppose we have a raw hash value, H, typically a 32 or 64 bit integer, and that we have a table with M slots. We need to convert the raw value, H, into a reduced value, h, in the range {0,...,M-1}. One way to do this is to mask off all but the last upper(log2(M)) bits of H to get a candidate value for h. If the result is still too large (.ge. M) then we take off one more bit by subtracting 2^lower(log2(M)). For example if M is 4 we use the last two bits of H; if M is 5 we use the last three bits of H and subtract 4 if h is 5 or greater.

The point of using this scheme is that when M is increased (or decreased) by one we only have to to "rehash" elements having one specific reduced hash value, namely those having the specific value M-2^lower(log2(M)). Thus, in our example, when M is increased from 4 to 5, we need only re-examine those elements that had a reduced hash value of 0.

Changing the number of slots>

Suppose that our table is contained in an array that is sufficiently large so that we do not have to be concerned with resizing the array when the number of slots is increased. Then we can use the following technique to increase the number of slots from M to M+1, i.e., from slots {0,...,M-1} to <0,...,M}. We determine k=upper(log2(M+1)). The mask will consist of the k least significant bits being ones and the rest of the bits being zeroes. The slot that will be rehashed is M-2^(k-1). For each entry in the slot we mask its full hash with the new mask to get its new slot number. This will either be its existing slot number or else it will be the in the new slot being added. In the former case leave it where it is, and in the latter case move it to the newly added slot.

For example, suppose that M=4 and that we want to increase the number of slots to 5. Then we have

        
        k = upper(log2(5)) = 3
The new mask will be 00...00111. The new slot being added is slot number 4. We examine the contents of the slot 0 and mask each entries full with with the new mask. The masked value will either be 000 or 100. If it is 000 we leave the entry in slot 0; if it is 100 we move it to slot 4.

The double table technique

Now suppose that the containing array is not necessarily large enough to hold the table as it expands. Here is a technique for increasing the allocated table size that does not require copying the entire current table from the old array to the new array.

What we do is to keep two tables at all times, one of length 2^(k-1) and one of length 2^k. For example, when M=5, k=3 and the table lengths are 4 and 8. When we increase the number of slots we move the contents of the rehashed slot form the small table to the large table. Thus when M=5 the two tables are:

        
        small: -123
        large: 0---4---
where - indicates an empty table entry and the integers represent slots with the specified reduced hash value. When M is increased to 6 the two tables become:
        
        small: --23
        large: 01--45--
When a table becomes completely empty we can discard (free) it. If T is the total number of allocated table locations then M will vary between T/3 and 2T/3 with an average T/2, i.e., the combined tables will be half full on average.

Handling collisions

The simplest way (and perhaps the only practical way) to handle collsions is to use buckets, i.e., each slot holds a linked list of table entries, all with the same reduced hash value. The task of resolving collisions when buckets are not used is left as an exercise for the reader.

The average number of entries per slot is an implementation choice. However one per slot is a quite natural choice; the effective table size increases (and decreases) directly with changes in the number of entries.


This page was last updated October 8, 2004.

home
table of contents
Comp Sci
October 2004
email