Logo Data Structures and Algorithms with Object-Oriented Design Patterns in C++
next up previous contents index

Keys and Hash Functions

We are designing a container which will be used to hold some number of items of a given set K. In this context, we call the elements of the set K keys . The general approach is to store the keys in an array. The position of a key in the array is given by a function tex2html_wrap_inline59195, called a hash function , which determines the position of a given key directly from that key.

In the general case, we expect the size of the set of keys, |K|, to be relatively large or even unbounded. E.g., if the keys are 32-bit integers, then tex2html_wrap_inline62174. Similarly, if the keys are arbitrary character strings of arbitrary length, then |K| is unbounded.

On the other hand, we also expect that the actual number of items stored in the container to be significantly less than |K|. I.e., if n is the number of items actually stored in the container, then tex2html_wrap_inline62182. Therefore, it seems prudent to use an array of size M, where M is as least as great as the maximum number of items to be stored in the container.

Consequently, what we need is a function tex2html_wrap_inline62188. This function maps the set of values to be stored in the container to subscripts in an array of length M. This function is called a hash function .

In general, since tex2html_wrap_inline62192, the mapping defined by hash function will be a many-to-one mapping . I.e., there will exist many pairs of distinct keys x and y, such that tex2html_wrap_inline62198, for which h(x)=h(y). This situation is called a collision. Several approaches for dealing with collisions are explored in the following sections.

What are the characteristics of a good hash function?




next up previous contents index

Bruno Copyright © 1997 by Bruno R. Preiss, P.Eng. All rights reserved.