Optimal Binary Search Trees

Posted on March 19, 2015

We are given a seqence \(K = \langle k_1, \cdots, k_n \rangle\) of \(n\) distinct keys in sorted order and we wish to build a binary search tree out of these keys. For each key, \(k_i\), we have a probability \(p_i\) that the search will be for \(k_i\). So we have the following inequality (we assume that there are searches for members other than that in the given sequence).

\[ \sum p_i \le 1\]

The expected cost of a tree \(T\) can thus be defined as

\[ E[\text{search cost in } T] = \sum(\text{depth}_T(k_i)+1) + \sum (\text{depth}_T(k_i)+1).\]

For a given set of probabilities, we wish to construct a binary search tree whose expected search time is the lowest. We call such a tree as optimal binary search tree.

Recursive relation and Dynamic algorithm

We assume that there are no dummy keys and proceed to form an algorithm. The optimal sub-structure for this problem is sub-trees. Let us use \(C[i,j]\) to denote the search time of binary tree consisting of elements \(\{i, i+1, \cdots, j\}\). Then one can easily see that

\[ C[i,j] = \max_{i\le r \le j}\left\{ \sum_{k=i}^{j}{p_k} + C[i,r-1]+ C[r+1, j] \right\}\]

Thus, the algorithm is clear, create a two-dimensional array fill the array in a bottom up fashion. It should be noted that the first summation in the recursive relation is constant for \(r\) under the given constraint.

Complexity

The time complexity of the above algorithm is \(\Theta (n^3)\).
Trivia: There is an advanced algorithm that has complexity \(\Theta (n^2)\).

Note:

There is no known greedy algorithm for this problem. In particular, the following ones can be easily observed:
The optimal binary search tree need not be a tree with the smallest height.
The optimal binary search tree need not be a tree with such that the node of highest probability occurs at the root.
The optimal binary search tree need not be the tree such that the node with lest probability is the deepest node.
The brute force algorithm takes exponential time (the algorithm that calculate the expected search time of all such binary search trees).