eighty characters per line

March 6, 2009

SdkTk v1.0

Filed under: Algorithms, Computer Science, Development, Implementation, Logic and Reasoning — Tags: , , , — Chad Waters @ 2:46 pm

Sudoku Toolkit

1. Functional Requirements

  • The user should be able to solve a Sudoku board at a specified level.
  • The user should be able to generate a Sudoku board of a specified difficulty.
  • The user should be able to convert a Sudoku board to a printable format.

2. Implementation Details

2.1 Sudoku Board

First, we begin with a simple representation of a Sudoku grid. This is a two-dimensional array of numbers, of a given size N x N, with numbers from 1 to N filling the grid.  This stores the values in the solved portion of the grid. There is also a three-dimensional array of boolean flags, of size N x N x N, pertaining to decisions about various locations on the board, called the markup board. A ‘true’ in cell (I, J, K) in the markup board means that a value of K can potentially appear in cell (I, J) of the grid. This is the foundation for the entire Sudoku program. As values in the markup board need to be concurrently maintained with the values in the grid, methods have been created to place a value in the grid, that takes markup board entries into consideration.

2.2 Sudoku Solver

The Sudoku Solver extends the Sudoku Board class, adding solving methods.  There are roughly twelve to fifteen uniquely different methods for a human to solve a Sudoku grid. Of those, the ones with the most uniqueness were chosen to be implemented by the program. The solve method takes a difficulty level, and it begins at the bottom. It attempts the most basic of strategies, and continues using those strategies until they are no longer effective. Then, the program attempts more difficult strategies. Whenever a strategy is effective, we start back at the bottom of the difficulty chain and try the most basic strategies again. When we finally reach the difficulty level that was specified to the solve routine, or, if we have completed the puzzle, we stop.

2.3 Sudoku Generator

The Sudoku Generator simply generates a filled grid (more on this in a moment), then uses a dynamic programming algorithm to remove candidates until a threshold is reached. This is the concept of generating a “unique” grid, wherein there is only one legal solution. Since the Solver routines can only solve a board if it has a unique solution (no guesswork is ever employed), the ability to solve a grid proves the uniqueness of the board. If a cell is cleared, and no solution can be reached, we replace it and try another cell. The threshold specified is the number of “givens” the generated board should have. It is conjectured that the minimum number of givens required for a Sudoku board to be unique is 17, but no formal proof exists to this effect. Nonetheless, any value less than roughly 25 for this threshold may take considerable time to generate.

A filled grid is generated by using a legal filled board (which will the refered to as the canonical solved grid), then performing elementary row, column, and element operations. The three elementary operations are:

  • Any row can be switched with any other row in the same tower.
  • Any column can be switched with any other column in the same tower.
  • Any value can be switched with any other value, as long as all values are exchanged.

Where a tower is defined to be a group of three (in the case of 9×9 Sudoku) rows or columns in the same row- or column-group, respectively. If only these elementary operations are performed, the legality of the solved grid is not compromised. Then, removing the elements in order to create an unsolved grid is described above.

3. Source Code

The source is available on coderprofile.com: SudokuToolkit v1.0

June 30, 2008

Functional Programming in C++, Pt. 1

Filed under: Implementation, Language — Tags: , , — Chad Waters @ 10:34 am

Today’s Topic: map()

1. Apologies

I would just like to start by apologizing for the code that I have written. It is a naive approach to beginning work on a few of the basic constructs of functional programming in C and C++. It is not pretty, and it is not necessarily perfect. It does, however, work. I created some data structures to more closely emulate the syntax I desired, and I am in no way vouching for my code as being the best possible solution.

2. Implementation

Instead of dumping code here, I will discuss just the relevant function, to make the overall lack of elegance more palatable. I will also show how one would invoke the map function.

To explain a few things before we jump into code, in an effort to beautify the hacked-together code that was sure to follow, I created a few constants and structs. Firstly, I defined a constant called LIST_SIZE that I passed around to define the size of any array used in this code.

I also created a struct called SIZED_ARRAY that takes a type and an unsigned int, and creates an array of the specified type that can hold the specified number of elements. This array is templated in such a way that the size of the array is carried with the array itself.

Simple, right? Invokation of the map function, thanks to GCC inferring template parameters from arguments, is even simpler. So we have a function called multiply_two that takes an int and returns double that value. We could simply map that function to a list of values and now we have a map function, much like one would find in Haskell, or similar languages.

3. Source Code

The entire source code is available here: map.cpp

May 20, 2008

The Servitude of Developers

Filed under: Development — Tags: , — Chad Waters @ 4:14 pm

1. Introduction

Recently, the Funpidgin fork of the Pidgin instant messaging client project began based on disputes related to input field resizing issues. Apprently, two groups saw two sides of the same coin and felt that this would be worth forking the project over. Funpidgin, working according to a “we work for you” idiom, felt that the user should have the choice of behavior for the client, which the Pidgin project did not agree with.

No statement is being made about the stubbornness of the Pidgin loyalists. They went through painstaking trouble to make sure that the behavior of the client is intuitive and simple. Blogs, such as obso1337.org, did get a little up-in-arms about this attitude, claiming that users are not designers, and that they don’t think that the uninitiated masses should have a say in the development of a large-scale project.

This attitude [that users are designers] takes participatory design to all-new (and very dangerous) level. You go from user-centered design: keeping users in mind while designing a product, to user-directed design: catering to every users’ whim without consideration of the consequences (at least, users who know how to use mailing lists and bug trackers, who are not representative of a broad user audience for an instant messenger client). [Author's emphasis] [1]

2. Leadership

The underlying principle behind this message is fair: users that make up the minority or the unintended demographic might not be the best candidates for directing the design and development a project. That is understandable, since the project should have direction of its own. That does not mean, however, that they should have no say.

It is the idea that users are going to inherently work in their own best interest, forgoing the betterment of the overall project, that is irksome. Allowing users with different interests to have say in development can only make the outcome more diverse; nothing dictates that the “too many chefs” metaphor will produce a proverbial Rube Goldberg machine.

3. Rules, or a Reasonable Facsimile

Being an immature field, “computer science”, by and large, does not have the easiest time prescribing or adhering to a set of rules by which to operate. Organizations such as the ACM and IEEE do have codes of ethics by which they attempt to regulate their actions.

Taking the ACM, for instance, we find the following:

3.4 Ensure that users and those who will be affected by a system have their needs clearly articulated during the assessment and design of requirements; later the system must be validated to meet requirements.

Current system users, potential users and other persons whose lives may be affected by a system must have their needs assessed and incorporated in the statement of requirements. System validation should ensure compliance with those requirements. [2]

Note that no mention is made about the majority or minority, the target audience or the outliers. Simply put, all current and potential users should have their needs met. This means that, in essence, all users are developers, because it is the job of developers to listen to and satisfy the needs of those who will be using a given system.

This is the servitude into which developers enter. Needs of users outweigh the vision of the leader, outweigh that which is easy, outweigh petty bickering about text input fields, and outweigh the ridiculous notion that users are incompetent. Designers and developers must understand that they are the minority in their own project, and that the users are the ones who will crucify those who ignore their concerns.

4. Resources

  1. Paul, Celeste L. May 2008. Four Words for Funpidgin. http://weblog.obso1337.org/2008/four-words-for-funpidgin.
  2. ACM. Code of Ethics. http://www.acm.org/about/code-of-ethics.

May 14, 2008

Make the Common Case Fast

Filed under: Algorithms — Tags: , — Chad Waters @ 5:24 pm

In making a design trade-off, favor the frequent case over the infrequent case.

1. Introduction

For those who do not read Okasaki’s blog, he wrote an article [1] yesterday regarding the reasons for using balanced binary search trees. He clearly outlnes in this article that, despite the very nature of binary trees to self-balance to some extent, one employs balanced binary search trees as a safety net, something he correlates to an “insurance policy.”

Students are taught that binary search trees are acceptable. When data is provided in a moderately random order, the tree should end up balanced enough. By paying a negligible overhead at all times, however, the tree can be guaranteed to be balanced, despite the means of input. This is the fundamental principle behind the balanced binary search tree, red-black tree, scapegoat tree, &c.

Note: If input is being received in a non-random order (already presorted), perhaps binary search tree isn’t the best choice anyways. Binary search and binary tree lookup are both O(log n) algorithms, whereas the binary search forgoes the tree structure overhead entirely. Why use a complicated structure where a naive one will work perfectly?

2. Going Against Intuition

If a mantra could be ascribed to software development, it would be “make the common case fast.” It is trivial to prove that making a small improvement to a large portion of code is better than a large improvement to a small portion of code. This is even more true when speaking about code that is used often. [2]

So then why use balanced binary search trees?

Answer: To make life more difficult. Binary trees only become heavily unbalanced when data is received in a sorted order, with each new node becoming a child of the deepest node existing. This creates an incomplete binary tree with comparatively hellish lookup times, specifically O(n).

Balanced binary trees never have this problem, but also require extensive action during insertion to make the tree balanced, by rotating and resorting large portions of the tree. Are these actions necessary? They certainly seem useful, but lest we forget our mantra: “make the common case fast.”

3. Conclusion?

Why struggle upon insertion with balancing a tree when the common case dictates that the tree will have some semblance of balance? It is clear that binary trees are the wrong data structure for when data has been presorted, and therefore, should only be used with unsorted data, creating reasonably balanced trees and O(log n) lookups.

Furthermore, even in using binary trees for sorted data, it is the difference between O(n) and O(log n) lookups, with O(n) being the rare case. Certainly, there is no argument that O(log n) is much preferred over O(n), with increasingly large data sets. The issue is the overhead in the common case when unbalanced trees are a rarity in application.

Use balanced binary search trees, by all means. They are useful, and a great structure to study. Just use common sense when it comes time to discern the actual benefits between the two.

4. Resources

  1. Okasaki, Chris. May 2008. On Balanced Trees and Car Insurance. http://okasaki.blogspot.com/2008/05/on-balanced-trees-and-car-insurance.html.
  2. Prabhu, Gurpur. October 2004. Make the Common Case Fast. http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/common_case.html.

May 4, 2008

The Macro Preprocessor

Filed under: Pedagogy — Tags: , , — Chad Waters @ 7:42 pm

1. Introduction

To help better explain programming paradigms to students, schools often teach introductory programming courses centering around a particular language, most commonly either the C family or Java. Java, and its derivatives, which, for all intents and purposes, includes C#, J#, and other similar languages. This allows students to learn by example, being asked to prove competency regarding a particular concept by demonstrating proficiency in implementing the concept.

No comment is being made on this practice of teaching by implementation. This is a common practice that is proven to work, and there is no harm in doing a double service to students, introducing them to programming and to a particular language in one fell-swoop. The point that is to be made here is in regards to the slapdash, and often incomplete, means of teaching the C family.

2. The Macro Preprocessor

2.1. Parameter Resolution

Undoubtedly the most dangerous, and certainly the most confusing, of all of the C remnants is the macro preprocessor. It has all of the functionality of your garden variety copy-and-paste, without any of that messy common sense! Consider the following canonical example:

#define MAX(a, b) ((a) > (b) ? (a) : (b))

A call to the MAX macro will return the larger of the two values. To understand why this is a potentially problematic macro, let us consider how the macro preprocessor works. References to the MAX macro are directly translated into the associated expression. Parameters passed to the macro are directly replaced within the definition. Why is this a problem? Consider:

int z = MAX(++m, ++n);

Assume that m has been initialized to a value of 10, and n is 20. So, then, since they are both pre-incremented, m is 11 and n is 21, and the expected value of z after computation is 21. This is not the case, and the reason for that can be seen when the macro is expanded fully.

int z = ((++m) > (++n) ? (++m) : (++n));

The actual result will be 22, with n now holding 22 and m being 11. This is because the expansion of the macro never resolved the two instances of the same parameter, meaning that the pre-increment happened both times. Unless this was an intended side-effect, this is dangerous and, likely, a problematic case for debugging.

2.2. Macros as Functions

It is most simple to consider macros as functions. As in the case above, disregarding the issue with parameter resolution, the macro acted as a function, returning the larger of the two values. This view of macros can be dangerous, and, again, confusing. Consider the following macro:

#define SWAP(a, b) a ^= b; b ^= a; a ^= b;

This is what is known as the XOR swap. This definition gives the appearance of a function, at first glance.
After the completion of the following code segment, x and y will have swapped values, naturally. Remember that the macro definition operates as a rudimentary copy and paste.

SWAP(x, y); // becomes
// x ^= y; y ^= x; x ^= y;

This does exactly what is expected. Consider, however, the case of the conditional swap in the next code segment. Notice that the macro definition is a free-standing set of three operations. The macro preprocessor does the following:

if (x != y) SWAP(x, y); // becomes
// if (x != y) x ^= y; y ^= x; x ^= y;

Notice that the conditional statement will only apply to the first of the three operations. The second and third operations will proceed regardless of the condition. This will smash the values in x and y. This can be avoided, however, by enclosing the original definition in braces to offset the code as a local segment. This can fix many macro problems, unless the macro needs to return something, as in the case of the MAX macro.

3. Conclusion

As of GCC 3.4.5, most of the inane issues, such as a comment on the same line as a macro definition, were resolved by the compiler, but many still live on. Stroustrup explains that C++ programmers tend to regard the use of macros suspiciously, as a “lesser evil,” while C programmers find the use of macros as natural and elegant [1]. What is important is the use of safe, type-neutral code, relying on inline functions where efficiency is a concern.

4. References

  1. Stroustrup, Bjarne. 2002. C and C++: Siblings. From The C/C++ Users Journal. http://www.research.att.com/~bs/siblings_short.pdf.

April 13, 2008

Java as a First Language

Filed under: Pedagogy — Tags: , — Chad Waters @ 5:49 am

1. The State of Computer Science

These days, numbers in computer science programs at most institutions are suffering. Enrollment is down 49% since 2001/2002. [1] This can be possibly attributed to the brainwashing that secondary school systems usually put their students through, impressing upon them how computer science is a “dime-a-dozen” field, and that the market will be flooded. Furthermore, rumors that the College Board was planning to stop administering the Computer Science Advanced Placement test were greatly exaggerated, thanks, in part, to a fallacious article in the Washington Post [2]. With numbers dwindling as they are, computer science programs are attempting to find new and better ways to not only improve their enrollment rate, but also to improve their retention rate. So what approach are programs taking to entice new students? Make programming fun.

2. Java to the Rescue

There are no arguments that Java is a useful language to know. TIOBE, which evaluates programming languages based on search engine results for both web pages and message boards, ranks Java the number one programming language in terms of popularity, possessing 20.529% of the market share as of April 2008 [3]. It presents an approachable syntax with a good deal of flexibility and the potential for rapid prototyping. Dismissing criticism for Java’s virtual machine is even simple: GCJ, the GNU compiler for Java. Compiling to native code? Sure, perhaps it lacks full AWT support, but work is being done, and it is quickly improving. So then what could there possibly be to complain about?

3. Java as a Starting Language

Java is a very common starting language for students. It makes programming fun for those who are unfamiliar with traditional syntax. It hides lower-level functionality to allow the programmer to think about the bigger picture. This being said, there is much lower level functionality that is important to learn. Dewar and Schonberg, with AdaCore Inc., put it best:

What we observed at New York University is that the Java programming courses did not prepare our students for the first course in systems, much less for more advanced ones. Students found it hard to write programs that did not have a graphic interface, had no feeling for the relationship between the source program and what the hardware would actually do, and (most damaging) did not understand the semantics of pointers at all, which made the use of C in systems programming very challenging. [emphasis added] [4]

So, what is the solution? Do we continue to teach complicated languages, to the chagrin of entering students, and lower our retention rates significantly? Or do we continue to produce students who possess a skill set analogous to placing a square peg in a square hole, fitting a basic solution to a basic problem, but being unable to actually write code?

4. Finding the Solution

To prevent universities from continuing to produce a generation of replaceable developers, pains need to be taken to teach theory and paradigms, not languages with robust packages that coddle programmers. Languages, from C++ to Common Lisp to Ada, should be learned, to produce students better able to handle projects ranging in size from personal programming to large-scale code delegation. Predictions? No longer will scope and adherence to specification be the biggest evil to a programmer. Instead, evil will be known as working with colleagues who have no understanding of language fundamentals.

The storm’s a-comin’!

5. References

  1. Vegso, Jay. March 2008. Enrollments and Degree Production at US CS Departments Drop Further in 2006/2007. http://www.cra.org/wp/index.php?p=139.
  2. ACM. April 2008. AP Computer Science is NOT Going Away. http://usacm.acm.org/usacm/weblog/index.php?p=593.
  3. TIOBE. April 2008. TIOBE Programming Community Index. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html.
  4. Dewar, Robert B.K., and Schonberg, Edmond. January 2008. Computer Science Education: Where Are the Software Engineers of Tomorrow? http://www.stsc.hill.af.mil/CrossTalk/2008/01/0801DewarSchonberg.html.

Blog at WordPress.com.