eighty characters per line

March 6, 2009

SdkTk v1.0

Filed under: Algorithms, Computer Science, Development, Implementation, Logic and Reasoning — Tags: , , , — Chad Waters @ 2:46 pm

Sudoku Toolkit

1. Functional Requirements

  • The user should be able to solve a Sudoku board at a specified level.
  • The user should be able to generate a Sudoku board of a specified difficulty.
  • The user should be able to convert a Sudoku board to a printable format.

2. Implementation Details

2.1 Sudoku Board

First, we begin with a simple representation of a Sudoku grid. This is a two-dimensional array of numbers, of a given size N x N, with numbers from 1 to N filling the grid.  This stores the values in the solved portion of the grid. There is also a three-dimensional array of boolean flags, of size N x N x N, pertaining to decisions about various locations on the board, called the markup board. A ‘true’ in cell (I, J, K) in the markup board means that a value of K can potentially appear in cell (I, J) of the grid. This is the foundation for the entire Sudoku program. As values in the markup board need to be concurrently maintained with the values in the grid, methods have been created to place a value in the grid, that takes markup board entries into consideration.

2.2 Sudoku Solver

The Sudoku Solver extends the Sudoku Board class, adding solving methods.  There are roughly twelve to fifteen uniquely different methods for a human to solve a Sudoku grid. Of those, the ones with the most uniqueness were chosen to be implemented by the program. The solve method takes a difficulty level, and it begins at the bottom. It attempts the most basic of strategies, and continues using those strategies until they are no longer effective. Then, the program attempts more difficult strategies. Whenever a strategy is effective, we start back at the bottom of the difficulty chain and try the most basic strategies again. When we finally reach the difficulty level that was specified to the solve routine, or, if we have completed the puzzle, we stop.

2.3 Sudoku Generator

The Sudoku Generator simply generates a filled grid (more on this in a moment), then uses a dynamic programming algorithm to remove candidates until a threshold is reached. This is the concept of generating a “unique” grid, wherein there is only one legal solution. Since the Solver routines can only solve a board if it has a unique solution (no guesswork is ever employed), the ability to solve a grid proves the uniqueness of the board. If a cell is cleared, and no solution can be reached, we replace it and try another cell. The threshold specified is the number of “givens” the generated board should have. It is conjectured that the minimum number of givens required for a Sudoku board to be unique is 17, but no formal proof exists to this effect. Nonetheless, any value less than roughly 25 for this threshold may take considerable time to generate.

A filled grid is generated by using a legal filled board (which will the refered to as the canonical solved grid), then performing elementary row, column, and element operations. The three elementary operations are:

  • Any row can be switched with any other row in the same tower.
  • Any column can be switched with any other column in the same tower.
  • Any value can be switched with any other value, as long as all values are exchanged.

Where a tower is defined to be a group of three (in the case of 9×9 Sudoku) rows or columns in the same row- or column-group, respectively. If only these elementary operations are performed, the legality of the solved grid is not compromised. Then, removing the elements in order to create an unsolved grid is described above.

3. Source Code

The source is available on coderprofile.com: SudokuToolkit v1.0

May 14, 2008

Make the Common Case Fast

Filed under: Algorithms — Tags: , — Chad Waters @ 5:24 pm

In making a design trade-off, favor the frequent case over the infrequent case.

1. Introduction

For those who do not read Okasaki’s blog, he wrote an article [1] yesterday regarding the reasons for using balanced binary search trees. He clearly outlnes in this article that, despite the very nature of binary trees to self-balance to some extent, one employs balanced binary search trees as a safety net, something he correlates to an “insurance policy.”

Students are taught that binary search trees are acceptable. When data is provided in a moderately random order, the tree should end up balanced enough. By paying a negligible overhead at all times, however, the tree can be guaranteed to be balanced, despite the means of input. This is the fundamental principle behind the balanced binary search tree, red-black tree, scapegoat tree, &c.

Note: If input is being received in a non-random order (already presorted), perhaps binary search tree isn’t the best choice anyways. Binary search and binary tree lookup are both O(log n) algorithms, whereas the binary search forgoes the tree structure overhead entirely. Why use a complicated structure where a naive one will work perfectly?

2. Going Against Intuition

If a mantra could be ascribed to software development, it would be “make the common case fast.” It is trivial to prove that making a small improvement to a large portion of code is better than a large improvement to a small portion of code. This is even more true when speaking about code that is used often. [2]

So then why use balanced binary search trees?

Answer: To make life more difficult. Binary trees only become heavily unbalanced when data is received in a sorted order, with each new node becoming a child of the deepest node existing. This creates an incomplete binary tree with comparatively hellish lookup times, specifically O(n).

Balanced binary trees never have this problem, but also require extensive action during insertion to make the tree balanced, by rotating and resorting large portions of the tree. Are these actions necessary? They certainly seem useful, but lest we forget our mantra: “make the common case fast.”

3. Conclusion?

Why struggle upon insertion with balancing a tree when the common case dictates that the tree will have some semblance of balance? It is clear that binary trees are the wrong data structure for when data has been presorted, and therefore, should only be used with unsorted data, creating reasonably balanced trees and O(log n) lookups.

Furthermore, even in using binary trees for sorted data, it is the difference between O(n) and O(log n) lookups, with O(n) being the rare case. Certainly, there is no argument that O(log n) is much preferred over O(n), with increasingly large data sets. The issue is the overhead in the common case when unbalanced trees are a rarity in application.

Use balanced binary search trees, by all means. They are useful, and a great structure to study. Just use common sense when it comes time to discern the actual benefits between the two.

4. Resources

  1. Okasaki, Chris. May 2008. On Balanced Trees and Car Insurance. http://okasaki.blogspot.com/2008/05/on-balanced-trees-and-car-insurance.html.
  2. Prabhu, Gurpur. October 2004. Make the Common Case Fast. http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/common_case.html.

Blog at WordPress.com.