eighty characters per line

May 14, 2008

Make the Common Case Fast

Filed under: Algorithms — Tags: , — Chad Waters @ 5:24 pm

In making a design trade-off, favor the frequent case over the infrequent case.

1. Introduction

For those who do not read Okasaki’s blog, he wrote an article [1] yesterday regarding the reasons for using balanced binary search trees. He clearly outlnes in this article that, despite the very nature of binary trees to self-balance to some extent, one employs balanced binary search trees as a safety net, something he correlates to an “insurance policy.”

Students are taught that binary search trees are acceptable. When data is provided in a moderately random order, the tree should end up balanced enough. By paying a negligible overhead at all times, however, the tree can be guaranteed to be balanced, despite the means of input. This is the fundamental principle behind the balanced binary search tree, red-black tree, scapegoat tree, &c.

Note: If input is being received in a non-random order (already presorted), perhaps binary search tree isn’t the best choice anyways. Binary search and binary tree lookup are both O(log n) algorithms, whereas the binary search forgoes the tree structure overhead entirely. Why use a complicated structure where a naive one will work perfectly?

2. Going Against Intuition

If a mantra could be ascribed to software development, it would be “make the common case fast.” It is trivial to prove that making a small improvement to a large portion of code is better than a large improvement to a small portion of code. This is even more true when speaking about code that is used often. [2]

So then why use balanced binary search trees?

Answer: To make life more difficult. Binary trees only become heavily unbalanced when data is received in a sorted order, with each new node becoming a child of the deepest node existing. This creates an incomplete binary tree with comparatively hellish lookup times, specifically O(n).

Balanced binary trees never have this problem, but also require extensive action during insertion to make the tree balanced, by rotating and resorting large portions of the tree. Are these actions necessary? They certainly seem useful, but lest we forget our mantra: “make the common case fast.”

3. Conclusion?

Why struggle upon insertion with balancing a tree when the common case dictates that the tree will have some semblance of balance? It is clear that binary trees are the wrong data structure for when data has been presorted, and therefore, should only be used with unsorted data, creating reasonably balanced trees and O(log n) lookups.

Furthermore, even in using binary trees for sorted data, it is the difference between O(n) and O(log n) lookups, with O(n) being the rare case. Certainly, there is no argument that O(log n) is much preferred over O(n), with increasingly large data sets. The issue is the overhead in the common case when unbalanced trees are a rarity in application.

Use balanced binary search trees, by all means. They are useful, and a great structure to study. Just use common sense when it comes time to discern the actual benefits between the two.

4. Resources

  1. Okasaki, Chris. May 2008. On Balanced Trees and Car Insurance. http://okasaki.blogspot.com/2008/05/on-balanced-trees-and-car-insurance.html.
  2. Prabhu, Gurpur. October 2004. Make the Common Case Fast. http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/common_case.html.

No Comments Yet »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.