Cover Data Structures and Algorithms with Object-Oriented Design Patterns in Java
next up previous contents index

Average Case Analysis

The average case analysis of open addressing is easy if we ignore the primary clustering phenomenon. Given a scatter table of size M that contains n items, we assume that each of the tex2html_wrap_inline62122 combinations of n occupied and (m-n) empty scatter table entries is equally likely. This is the so-called uniform hashing model .

In this model we assume that the entries will either be occupied or empty, i.e., the deleted state is not used. Suppose a search for an empty cell requires exactly i probes. Then the first i-1 positions probed must have been occupied and the tex2html_wrap_inline57340 position probed was empty. Consider the i cells which were probed. The number of combinations in which i-1 of the probed cells are occupied and one is empty is tex2html_wrap_inline62138. Therefore, the probability that exactly i probes are required is

  equation13402

The average number of probes required to find an empty cell in a table which has n occupied cells is U(n) where

  equation13409

Using Equation gif into Equation gif and simplifying the result gives

   eqnarray13416

This result is actually quite intuitive. The load factor, tex2html_wrap_inline61750, is the fraction of occupied entries. Therefore, tex2html_wrap_inline62150 entries are empty so we would expect to have to probe tex2html_wrap_inline62152 entries before finding an empty one! For example, if the load factor is 0.75, a quarter of the entries are empty. Therefore, we expect to have to probe four entries before finding an empty one.

To calculate the average number of probes for a successful search we make the observation that when an item is initially inserted, we need to find an empty cell in which to place it. For example, the number of probes to find the empty position into which the tex2html_wrap_inline57340 item is to be placed is U(i). And this is exactly the number of probes it takes to find the tex2html_wrap_inline57340 item again! Therefore, the average number of probes required for a successful search in a table which has n occupied cells is S(n) where

  equation13431

Substituting Equation gif in Equation gif and simplifying gives

  eqnarray13440

where tex2html_wrap_inline62164 is the tex2html_wrap_inline61000 harmonic number  (see Section gif). Again, there is an easy intuitive derivation for this result. We can use a simple integral to calculate the mean number of probes for a successful search using the approximation tex2html_wrap_inline62168 as follows

eqnarray13461

Empirical evidence has shown that the formulas derived for the uniform hashing model characterize the performance of scatter tables using open addressing with quadratic probing and double hashing quite well. However, they do not capture the effect of primary clustering which occurs when linear probing is used. Knuth has shown that when primary clustering is taking into account, the number of probes required to locate an empty cell is

  equation13478

and the number of probes required for a successful search is

  equation13485

The graph in Figure gif compares the predictions of the uniform hashing model (Equations gif and gif) with the formulas derived by Knuth (Equations gif and gif). Clearly, while the results are qualitatively similar, the formulas are in agreement for small load factors and they diverge as the load factor increases.

   figure13497
Figure: Number of probes vs. load factor for uniform hashing and linear probing.


next up previous contents index

Bruno Copyright © 1998 by Bruno R. Preiss, P.Eng. All rights reserved.