eighty characters per line

May 20, 2008

The Servitude of Developers

Filed under: Development — Tags: , — Chad Waters @ 4:14 pm

1. Introduction

Recently, the Funpidgin fork of the Pidgin instant messaging client project began based on disputes related to input field resizing issues. Apprently, two groups saw two sides of the same coin and felt that this would be worth forking the project over. Funpidgin, working according to a “we work for you” idiom, felt that the user should have the choice of behavior for the client, which the Pidgin project did not agree with.

No statement is being made about the stubbornness of the Pidgin loyalists. They went through painstaking trouble to make sure that the behavior of the client is intuitive and simple. Blogs, such as obso1337.org, did get a little up-in-arms about this attitude, claiming that users are not designers, and that they don’t think that the uninitiated masses should have a say in the development of a large-scale project.

This attitude [that users are designers] takes participatory design to all-new (and very dangerous) level. You go from user-centered design: keeping users in mind while designing a product, to user-directed design: catering to every users’ whim without consideration of the consequences (at least, users who know how to use mailing lists and bug trackers, who are not representative of a broad user audience for an instant messenger client). [Author's emphasis] [1]

2. Leadership

The underlying principle behind this message is fair: users that make up the minority or the unintended demographic might not be the best candidates for directing the design and development a project. That is understandable, since the project should have direction of its own. That does not mean, however, that they should have no say.

It is the idea that users are going to inherently work in their own best interest, forgoing the betterment of the overall project, that is irksome. Allowing users with different interests to have say in development can only make the outcome more diverse; nothing dictates that the “too many chefs” metaphor will produce a proverbial Rube Goldberg machine.

3. Rules, or a Reasonable Facsimile

Being an immature field, “computer science”, by and large, does not have the easiest time prescribing or adhering to a set of rules by which to operate. Organizations such as the ACM and IEEE do have codes of ethics by which they attempt to regulate their actions.

Taking the ACM, for instance, we find the following:

3.4 Ensure that users and those who will be affected by a system have their needs clearly articulated during the assessment and design of requirements; later the system must be validated to meet requirements.

Current system users, potential users and other persons whose lives may be affected by a system must have their needs assessed and incorporated in the statement of requirements. System validation should ensure compliance with those requirements. [2]

Note that no mention is made about the majority or minority, the target audience or the outliers. Simply put, all current and potential users should have their needs met. This means that, in essence, all users are developers, because it is the job of developers to listen to and satisfy the needs of those who will be using a given system.

This is the servitude into which developers enter. Needs of users outweigh the vision of the leader, outweigh that which is easy, outweigh petty bickering about text input fields, and outweigh the ridiculous notion that users are incompetent. Designers and developers must understand that they are the minority in their own project, and that the users are the ones who will crucify those who ignore their concerns.

4. Resources

  1. Paul, Celeste L. May 2008. Four Words for Funpidgin. http://weblog.obso1337.org/2008/four-words-for-funpidgin.
  2. ACM. Code of Ethics. http://www.acm.org/about/code-of-ethics.

May 14, 2008

Make the Common Case Fast

Filed under: Algorithms — Tags: , — Chad Waters @ 5:24 pm

In making a design trade-off, favor the frequent case over the infrequent case.

1. Introduction

For those who do not read Okasaki’s blog, he wrote an article [1] yesterday regarding the reasons for using balanced binary search trees. He clearly outlnes in this article that, despite the very nature of binary trees to self-balance to some extent, one employs balanced binary search trees as a safety net, something he correlates to an “insurance policy.”

Students are taught that binary search trees are acceptable. When data is provided in a moderately random order, the tree should end up balanced enough. By paying a negligible overhead at all times, however, the tree can be guaranteed to be balanced, despite the means of input. This is the fundamental principle behind the balanced binary search tree, red-black tree, scapegoat tree, &c.

Note: If input is being received in a non-random order (already presorted), perhaps binary search tree isn’t the best choice anyways. Binary search and binary tree lookup are both O(log n) algorithms, whereas the binary search forgoes the tree structure overhead entirely. Why use a complicated structure where a naive one will work perfectly?

2. Going Against Intuition

If a mantra could be ascribed to software development, it would be “make the common case fast.” It is trivial to prove that making a small improvement to a large portion of code is better than a large improvement to a small portion of code. This is even more true when speaking about code that is used often. [2]

So then why use balanced binary search trees?

Answer: To make life more difficult. Binary trees only become heavily unbalanced when data is received in a sorted order, with each new node becoming a child of the deepest node existing. This creates an incomplete binary tree with comparatively hellish lookup times, specifically O(n).

Balanced binary trees never have this problem, but also require extensive action during insertion to make the tree balanced, by rotating and resorting large portions of the tree. Are these actions necessary? They certainly seem useful, but lest we forget our mantra: “make the common case fast.”

3. Conclusion?

Why struggle upon insertion with balancing a tree when the common case dictates that the tree will have some semblance of balance? It is clear that binary trees are the wrong data structure for when data has been presorted, and therefore, should only be used with unsorted data, creating reasonably balanced trees and O(log n) lookups.

Furthermore, even in using binary trees for sorted data, it is the difference between O(n) and O(log n) lookups, with O(n) being the rare case. Certainly, there is no argument that O(log n) is much preferred over O(n), with increasingly large data sets. The issue is the overhead in the common case when unbalanced trees are a rarity in application.

Use balanced binary search trees, by all means. They are useful, and a great structure to study. Just use common sense when it comes time to discern the actual benefits between the two.

4. Resources

  1. Okasaki, Chris. May 2008. On Balanced Trees and Car Insurance. http://okasaki.blogspot.com/2008/05/on-balanced-trees-and-car-insurance.html.
  2. Prabhu, Gurpur. October 2004. Make the Common Case Fast. http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/common_case.html.

May 4, 2008

The Macro Preprocessor

Filed under: Pedagogy — Tags: , , — Chad Waters @ 7:42 pm

1. Introduction

To help better explain programming paradigms to students, schools often teach introductory programming courses centering around a particular language, most commonly either the C family or Java. Java, and its derivatives, which, for all intents and purposes, includes C#, J#, and other similar languages. This allows students to learn by example, being asked to prove competency regarding a particular concept by demonstrating proficiency in implementing the concept.

No comment is being made on this practice of teaching by implementation. This is a common practice that is proven to work, and there is no harm in doing a double service to students, introducing them to programming and to a particular language in one fell-swoop. The point that is to be made here is in regards to the slapdash, and often incomplete, means of teaching the C family.

2. The Macro Preprocessor

2.1. Parameter Resolution

Undoubtedly the most dangerous, and certainly the most confusing, of all of the C remnants is the macro preprocessor. It has all of the functionality of your garden variety copy-and-paste, without any of that messy common sense! Consider the following canonical example:

#define MAX(a, b) ((a) > (b) ? (a) : (b))

A call to the MAX macro will return the larger of the two values. To understand why this is a potentially problematic macro, let us consider how the macro preprocessor works. References to the MAX macro are directly translated into the associated expression. Parameters passed to the macro are directly replaced within the definition. Why is this a problem? Consider:

int z = MAX(++m, ++n);

Assume that m has been initialized to a value of 10, and n is 20. So, then, since they are both pre-incremented, m is 11 and n is 21, and the expected value of z after computation is 21. This is not the case, and the reason for that can be seen when the macro is expanded fully.

int z = ((++m) > (++n) ? (++m) : (++n));

The actual result will be 22, with n now holding 22 and m being 11. This is because the expansion of the macro never resolved the two instances of the same parameter, meaning that the pre-increment happened both times. Unless this was an intended side-effect, this is dangerous and, likely, a problematic case for debugging.

2.2. Macros as Functions

It is most simple to consider macros as functions. As in the case above, disregarding the issue with parameter resolution, the macro acted as a function, returning the larger of the two values. This view of macros can be dangerous, and, again, confusing. Consider the following macro:

#define SWAP(a, b) a ^= b; b ^= a; a ^= b;

This is what is known as the XOR swap. This definition gives the appearance of a function, at first glance.
After the completion of the following code segment, x and y will have swapped values, naturally. Remember that the macro definition operates as a rudimentary copy and paste.

SWAP(x, y); // becomes
// x ^= y; y ^= x; x ^= y;

This does exactly what is expected. Consider, however, the case of the conditional swap in the next code segment. Notice that the macro definition is a free-standing set of three operations. The macro preprocessor does the following:

if (x != y) SWAP(x, y); // becomes
// if (x != y) x ^= y; y ^= x; x ^= y;

Notice that the conditional statement will only apply to the first of the three operations. The second and third operations will proceed regardless of the condition. This will smash the values in x and y. This can be avoided, however, by enclosing the original definition in braces to offset the code as a local segment. This can fix many macro problems, unless the macro needs to return something, as in the case of the MAX macro.

3. Conclusion

As of GCC 3.4.5, most of the inane issues, such as a comment on the same line as a macro definition, were resolved by the compiler, but many still live on. Stroustrup explains that C++ programmers tend to regard the use of macros suspiciously, as a “lesser evil,” while C programmers find the use of macros as natural and elegant [1]. What is important is the use of safe, type-neutral code, relying on inline functions where efficiency is a concern.

4. References

  1. Stroustrup, Bjarne. 2002. C and C++: Siblings. From The C/C++ Users Journal. http://www.research.att.com/~bs/siblings_short.pdf.

Blog at WordPress.com.