Logo Data Structures and Algorithms with Object-Oriented Design Patterns in C++
next up previous contents index

Average Case Analysis

The average case analysis of open addressing is easy if we ignore the primary clustering phenomenon. Given a scatter table of size M that contains n items, we assume that each of the tex2html_wrap_inline63244 combinations of n occupied and (m-n) empty scatter table entries is equally likely. This is the so-called uniform hashing model .

In this model we assume that the entries will either be occupied or empty, i.e., the deleted state is not used. Suppose a search for an empty cell requires exactly i probes. Then the first i-1 positions probed must have been occupied and the tex2html_wrap_inline58387 position probed was empty. Consider the i cells which were probed. The number of combinations in which i-1 of the probed cells are occupied and one is empty is tex2html_wrap_inline63260. Therefore, the probability that exactly i probes are required is

  equation14150

The average number of probes required to find an empty cell in a table which has n occupied cells is U(n) where

  equation14157

Using Equation gif into Equation gif and simplifying the result gives

   eqnarray14164

This result is actually quite intuitive. The load factor, tex2html_wrap_inline62866, is the fraction of occupied entries. Therefore, tex2html_wrap_inline63272 entries are empty so we would expect to have to probe tex2html_wrap_inline63274 entries before finding an empty one! E.g., if the load factor is 0.75, a quarter of the entries are empty. Therefore, we expect to have to probe four entries before finding an empty one.

To calculate the average number of probes for a successful search we make the observation that when an item is initially inserted, we need to find an empty cell in which to place it. E.g., the number of probes to find the empty position into which the tex2html_wrap_inline58387 item is to be placed is U(i). And this is exactly the number of probes it takes to find the tex2html_wrap_inline58387 item again! Therefore, the average number of probes required for a successful search in a table which has n occupied cells is S(n) where

  equation14177

Substituting Equation gif in Equation gif and simplifying gives

  eqnarray14186

where tex2html_wrap_inline63286 is the tex2html_wrap_inline61700 harmonic number  (see Section gif). Again, there is an easy intuitive derivation for this result. We can use a simple integral to calculate the mean number of probes for a successful search using the approximation tex2html_wrap_inline63290 as follows

eqnarray14206

Empirical evidence has shown that the formulas derived for the uniform hashing model characterize the performance of scatter tables using open addressing with quadratic probing and double hashing quite well. However, they do not capture the effect of primary clustering which occurs when linear probing is used. Knuth has shown that when primary clustering is taking into account, the number of probes required to locate an empty cell is

  equation14223

and the number of probes required for a successful search is

  equation14230

The graph in Figure gif compares the predictions of the uniform hashing model (Equations gif and gif) with the formulas derived by Knuth (Equations gif and gif). Clearly, while the results are qualitatively similar, the formulas are in agreement for small load factors and they diverge as the load factor increases.

   figure14242
Figure: Number of Probes vs. Load Factor for Uniform Hashing and Linear Probing


next up previous contents index

Bruno Copyright © 1997 by Bruno R. Preiss, P.Eng. All rights reserved.