Joining BSTs - GNU libavl 2.0.2

Occasionally we may want to take a pair of BSTs and merge or “join” their contents, forming a single BST that contains all the items in the two original BSTs. It's easy to do this with a series of calls to bst_insert(), but we can optimize the process if we write a function exclusively for the purpose. We'll write such a function in this section.

There are two restrictions on the trees to be joined. First, the BSTs' contents must be disjoint. That is, no item in one may match any item in the other. Second, the BSTs must have compatible comparison functions. Typically, they are the same. Speaking more precisely, if f() and g() are the comparison functions, p and q are nodes in either BST, and r and s are the BSTs' user-provided extra comparison parameters, then the expressions f(p, q, r), f(p, q, s), g(p, q, r), and g(p, q, s) must all have the same value for all possible choices of p and q.

Suppose we're trying to join the trees shown below:

Our first inclination is to try a “divide and conquer” approach by reducing the problem of joining a and b to the subproblems of joining a's left subtree with b's left subtree and joining a's right subtree with b's right subtree. Let us postulate for the moment that we are able to solve these subproblems and that the solutions that we come up with are the following:

To convert this partial solution into a full solution we must combine these two subtrees into a single tree and at the same time reintroduce the nodes a and b into the combined tree. It is easy enough to do this by making a (or b) the root of the combined tree with these two subtrees as its children, then inserting b (or a) into the combined tree. Unfortunately, in neither case will this actually work out properly for our example. The diagram below illustrates one possibility, the result of combining the two subtrees as the child of node 4, then inserting node 7 into the final tree. As you can see, nodes 4 and 5 are out of order:¹

Now let's step back and analyze why this attempt failed. It was essentially because, when we recombined the subtrees, a node in the combined tree's left subtree had a value larger than the root. If we trace it back to the original trees to be joined, we see that this was because node 5 in the left subtree of b is greater than a. (If we had chosen 7 as the root of the combined tree we would have found instead node 6 in the right subtree of b to be the culprit.)

On the other hand, if every node in the left subtree of a had a value less than b's value, and every node in the right subtree of a had a value greater than b's value, there would be no problem. Hey, wait a second... we can force that condition. If we perform a root insertion (see Root Insertion in a BST) of b into subtree a, then we end up with one pair of subtrees whose node values are all less than 7 (the new and former left subtrees of node 7) and one pair of subtrees whose node values are all greater than 7 (the new and former right subtrees of node 7). Conceptually it looks like this, although in reality we would need to remove node 7 from the tree on the right as we inserted it into the tree on the left:

We can then combine the two subtrees with values less than 7 with each other, and similarly for the ones with values greater than 7, using the same algorithm recursively, and safely set the resulting subtrees as the left and right subtrees of node 7, respectively. The final product is a correctly joined binary tree:

Of course, since we've defined a join recursively in terms of itself, there must be some maximum depth to the recursion, some simple case that can be defined without further recursion. This is easy: the join of an empty tree with another tree is the second tree.

Implementation

It's easy to implement this algorithm recursively. The only nonobvious part of the code below is the treatment of node b. We want to insert node b, but not b's children, into the subtree rooted at a. However, we still need to keep track of b's children. So we temporarily save b's children as b0 and b1 and set its child pointers to NULL before the root insertion.

This code makes use of root_insert() from <Robust root insertion of existing node in arbitrary subtree 625>.

96. <BST join function, recursive version 96> =
/* Joins a and b, which are subtrees of tree, 
   and returns the resulting tree. */
static struct bst_node *
join (struct bst_table *tree, struct bst_node *a, struct bst_node *b) 
{
  if (b == NULL)
    return a;
  else if (a == NULL)
    return b;
  else 
    {
      struct bst_node *b0 = b->bst_link[0];
      struct bst_node *b1 = b->bst_link[1];
      b->bst_link[0] = b->bst_link[1] = NULL;
      root_insert (tree, &a, b);
      a->bst_link[0] = join (tree, b0, a->bst_link[0]);
      a->bst_link[1] = join (tree, b1, a->bst_link[1]);
      return a;
    }
}

/* Joins a and b, which must be disjoint and have compatible 
   comparison functions.
   b is destroyed in the process. */
void 
bst_join (struct bst_table *a, struct bst_table *b) 
{
  a->bst_root = join (a, a->bst_root, b->bst_root);
  a->bst_count += b->bst_count;
  free (b);
}

See also: [Sedgewick 1998], program 12.16.

Exercises:

1. Rewrite bst_join() to avoid use of recursion. [answer]

Footnotes

[1] The ** notation in the diagram emphasizes that this is a counterexample.