We are designing a container which will be used
to hold some number of items of a given set K.
In this context,
we call the elements of the set K keys .
The general approach is to store the keys in an array.
The position of a key in the array is given by a function ,
called a hash function ,
which determines the position of a given key directly from that key.
In the general case,
we expect the size of the set of keys, |K|,
to be relatively large or even unbounded.
E.g., if the keys are 32-bit integers, then .
Similarly, if the keys are arbitrary character strings
of arbitrary length, then |K| is unbounded.
On the other hand, we also expect that the actual number of items
stored in the container to be significantly less than |K|.
I.e., if n is the number of items actually
stored in the container, then .
Therefore, it seems prudent to use an array of size M,
where M is as least as great as the maximum number of items to be stored
in the container.
Consequently, what we need is a function .
This function maps the set of values to be stored in the container
to subscripts in an array of length M.
This function is called a hash function .
In general, since ,
the mapping defined by hash function will be a
many-to-one mapping .
I.e., there will exist many pairs of distinct keys x and y,
such that
,
for which h(x)=h(y).
This situation is called a collision.
Several approaches for dealing with collisions are explored
in the following sections.
What are the characteristics of a good hash function?