Average Case Analysis

Data Structures and Algorithms with Object-Oriented Design Patterns in C++

Average Case Analysis

The average case analysis of open addressing is easy if we ignore the primary clustering phenomenon. Given a scatter table of size M that contains n items, we assume that each of the combinations of n occupied and (m-n) empty scatter table entries is equally likely. This is the so-called uniform hashing model .

In this model we assume that the entries will either be occupied or empty, i.e., the deleted state is not used. Suppose a search for an empty cell requires exactly i probes. Then the first i-1 positions probed must have been occupied and the position probed was empty. Consider the i cells which were probed. The number of combinations in which i-1 of the probed cells are occupied and one is empty is . Therefore, the probability that exactly i probes are required is

The average number of probes required to find an empty cell in a table which has n occupied cells is U(n) where

Using Equation into Equation and simplifying the result gives

eqnarray14164

This result is actually quite intuitive. The load factor, , is the fraction of occupied entries. Therefore, entries are empty so we would expect to have to probe entries before finding an empty one! E.g., if the load factor is 0.75, a quarter of the entries are empty. Therefore, we expect to have to probe four entries before finding an empty one.

To calculate the average number of probes for a successful search we make the observation that when an item is initially inserted, we need to find an empty cell in which to place it. E.g., the number of probes to find the empty position into which the item is to be placed is U(i). And this is exactly the number of probes it takes to find the item again! Therefore, the average number of probes required for a successful search in a table which has n occupied cells is S(n) where

Substituting Equation in Equation and simplifying gives

eqnarray14186

where is the harmonic number (see Section ). Again, there is an easy intuitive derivation for this result. We can use a simple integral to calculate the mean number of probes for a successful search using the approximation as follows

eqnarray14206

Empirical evidence has shown that the formulas derived for the uniform hashing model characterize the performance of scatter tables using open addressing with quadratic probing and double hashing quite well. However, they do not capture the effect of primary clustering which occurs when linear probing is used. Knuth has shown that when primary clustering is taking into account, the number of probes required to locate an empty cell is

and the number of probes required for a successful search is

The graph in Figure compares the predictions of the uniform hashing model (Equations and ) with the formulas derived by Knuth (Equations and ). Clearly, while the results are qualitatively similar, the formulas are in agreement for small load factors and they diverge as the load factor increases.

Figure: Number of Probes vs. Load Factor for Uniform Hashing and Linear Probing