Logo Data Structures and Algorithms with Object-Oriented Design Patterns in C++
next up previous contents index

AVL Search Trees

 

The problem with binary search trees is that while the average running times for search, insertion, and withdrawal operations are all tex2html_wrap_inline59891, any one operation is still O(n) in the worst case. This is so because we cannot say anything in general about the shape of the tree.

For example, consider the two binary search trees shown Figure gif. Both trees contain the same set of keys. The tree tex2html_wrap_inline63424 is obtained by starting with an empty tree and inserting the keys in the following order

displaymath64856

The tree tex2html_wrap_inline63426 is obtained by starting with an empty tree and inserting the keys in this order

displaymath64857

Clearly, tex2html_wrap_inline63426 is a better tree search tree than tex2html_wrap_inline63424. In fact, since tex2html_wrap_inline63426 is a perfect binary tree , its height is tex2html_wrap_inline64882. Therefore, all three operations, search, insertion, and withdrawal, have the same worst case asymptotic running time, tex2html_wrap_inline59891.

The reason that tex2html_wrap_inline63426 is better than tex2html_wrap_inline63424 is that it is the more balanced tree. If we could ensure that the search trees we construct are balanced then the worst-case running time of search, insertion, and withdrawal, could be made logarithmic rather than linear. But under what conditions is a tree balanced?

If we say that a binary tree is balanced if the left and right subtrees of every node have the same height, then the only trees which are balanced are the perfect binary trees. A perfect binary tree of height h has exactly tex2html_wrap_inline63812 internal nodes. Therefore, it is only possible to create perfect trees with n nodes for tex2html_wrap_inline64896. Clearly, this is an unsuitable balance condition because it is not possible to create a balanced tree for every n.

What are the characteristics of a good balance condition ?

  1. A good balance condition ensures that the height of a tree with n nodes is tex2html_wrap_inline59891.
  2. A good balance condition can be maintained efficiently. I.e., the additional work necessary to balance the tree when an item is inserted or deleted is O(1).
Adelson-Velskii and Landisgif were the first to propose the following balance condition and show that it has the desired characteristics.

Definition (AVL Balance Condition)  An empty binary tree is AVL balanced  . A non-empty binary tree, tex2html_wrap_inline63788, is AVL balanced if both tex2html_wrap_inline63784 and tex2html_wrap_inline63786 are AVL balanced and

displaymath64858

where tex2html_wrap_inline64912 is the height of tex2html_wrap_inline63784 and tex2html_wrap_inline64916 is the height of tex2html_wrap_inline63786.

Clearly, all perfect binary trees are AVL balanced. What is not so clear is that heights of all trees that satisfy the AVL balance condition are logarithmic in the number of internal nodes.

Theorem  The height, h, of an AVL balanced tree with n internal nodes satisfies

displaymath64859

extbfProof The lower bound follows directly from Theorem gif. It is in fact true for all binary trees regardless of whether they are AVL balanced.

To determine the upper bound, we turn the problem around and ask the question, what is the minimum number of internal nodes in an AVL balanced tree of height h?

Let tex2html_wrap_inline64926 represent an AVL balanced tree of height h which has the smallest possible number of internal nodes, say tex2html_wrap_inline64930. Clearly, tex2html_wrap_inline64926 must have at least one subtree of height h-1 and that subtree must be tex2html_wrap_inline64936. To remain AVL balanced, the other subtree can have height h-1 or h-2. Since we want the smallest number of internal nodes, it must be tex2html_wrap_inline64942. Therefore, the number of internal nodes in tex2html_wrap_inline64926 is tex2html_wrap_inline64946, where tex2html_wrap_inline64948.

Clearly, tex2html_wrap_inline63638 contains a single internal node, so tex2html_wrap_inline64952. Similarly, tex2html_wrap_inline63404 contains exactly two nodes, so tex2html_wrap_inline64956. Thus, tex2html_wrap_inline64930 is given by the recurrence

  equation19987

The remarkable thing about Equation gif is its similarity with the definition of Fibonacci numbers  (Equation gif). In fact, it can easily be shown by induction that

displaymath64860

for all tex2html_wrap_inline63700, where tex2html_wrap_inline64962 is the tex2html_wrap_inline61700 Fibonacci number.

Base Cases

eqnarray20001

Inductive Hypothesis Assume that tex2html_wrap_inline64966 for tex2html_wrap_inline63710. Then

eqnarray20005

Therefore, by induction on k, tex2html_wrap_inline64966, for all tex2html_wrap_inline63700.

According to Theorem gif, the Fibonacci numbers are given by

displaymath62416

where tex2html_wrap_inline60483 and tex2html_wrap_inline60485. Furthermore, since tex2html_wrap_inline64980, tex2html_wrap_inline64982.

Therefore,

eqnarray20025

This completes the proof of the upper bound.

So, we have shown that the AVL balance condition satisfies the first criterion of a good balance condition--the height of an AVL balanced tree with n internal nodes is tex2html_wrap_inline64986. What remains to be shown is that the balance condition can be efficiently maintained. To see that it can, we need to look at an implementation.




next up previous contents index

Bruno Copyright © 1997 by Bruno R. Preiss, P.Eng. All rights reserved.