We are designing a container which will be used to hold some number of items of a given set K. In this context, we call the elements of the set K keys . The general approach is to store the keys in an array. The position of a key in the array is given by a function , called a hash function , which determines the position of a given key directly from that key.
In the general case, we expect the size of the set of keys, |K|, to be relatively large or even unbounded. For example, if the keys are 32-bit integers, then . Similarly, if the keys are arbitrary character strings of arbitrary length, then |K| is unbounded.
On the other hand, we also expect that the actual number of items stored in the container to be significantly less than |K|. That is, if n is the number of items actually stored in the container, then . Therefore, it seems prudent to use an array of size M, where M is as least as great as the maximum number of items to be stored in the container.
Consequently, what we need is a function . This function maps the set of values to be stored in the container to subscripts in an array of length M. This function is called a hash function .
In general, since , the mapping defined by hash function will be a many-to-one mapping . That is, there will exist many pairs of distinct keys x and y, such that , for which h(x)=h(y). This situation is called a collision. Several approaches for dealing with collisions are explored in the following sections.
What are the characteristics of a good hash function?