Cover Data Structures and Algorithms with Object-Oriented Design Patterns in Java
next up previous contents index

Average Case Analysis

 

The previous section has shown that in the worst case, the running time to insert an object into a separately chained hash table is O(1), and the time to find or delete an object is O(n). But these bounds are no better than the same operations on plain lists! Why have we gone to all the trouble inventing hash tables?

The answer lies not in the worst-case performance, but in the average expected performance. Suppose we have a hash table of size M. Let there be exactly n items in the hash table. We call the quantity tex2html_wrap_inline61748 the load factor   . The load factor is simply the ratio of the number of items in the hash table to the array length.

Program gif gives the implementation for the getLoadFactor method of the AbstractHashTable class. This method computes tex2html_wrap_inline61748 by calling the getCount method to determine n and the getLength method to determine M.

   program11262
Program: AbstractHashTable class getLoadFactor method.

Consider a chained hash table. Let tex2html_wrap_inline61758 be the number of items in the tex2html_wrap_inline57340 linked list, for tex2html_wrap_inline61762. The average length of a linked list is

eqnarray11272

The average length of a linked list is exactly the load factor!

If we are given the load factor tex2html_wrap_inline61750, we can determine the average running times for the various operations. The average running time of insert is the same as its worst case time, O(1)--this result does not depend on tex2html_wrap_inline61750. On the other hand, the average running time for withdraw does depend on tex2html_wrap_inline61750. It is tex2html_wrap_inline61772 since the time required to delete an item from a linked list of length tex2html_wrap_inline61750 is tex2html_wrap_inline61776.

To determine the average running time for the find operation, we need to make an assumption about whether the item that is being sought is in the table. If the item is not found in the table, the search is said to be unsuccessful. The average running time for an unsuccessful search is

displaymath61734

On the other hand, if the search target is in the table, the search is said to be successful. The average number of comparisons needed to find an arbitrary item in a linked list of length tex2html_wrap_inline61750 is

displaymath61735

Thus, the average running time for a successful search is

displaymath61736

So, while any one search operation can be as bad as O(n), if we do a large number of random searches, we expect that the average running time will be tex2html_wrap_inline61776. In fact, if we have a sufficiently good hash function and a reasonable set of objects in the container, we can expect that those objects are distributed throughout the table. Therefore, any one search operation will not be very much worse than the worst case.

Finally, if we know how many objects will be inserted into the hash table a priori, then we can choose a table size M which is larger than the maximum number of items expected. By doing this, we can ensure that tex2html_wrap_inline61786. That is, a linked list contains no more than one item on average. In this case, the average time for withdraw is tex2html_wrap_inline61718 and for find it is tex2html_wrap_inline61790.


next up previous contents index

Bruno Copyright © 1998 by Bruno R. Preiss, P.Eng. All rights reserved.