Archive for the ‘CodeDesign’ Category

Consider a Domain Specific Language

No Comments »

There I was, in this meeting. A domain specialist had build a prototype in Excel/VBA. The commercial recode was an pseudo Excel engine, coded in PHP, with the prototype expressions shoehorned into it. I’ll pause while you recover. They had replaced the domain specialist with a programmer where they should have replaced Excel with a DSL. And here is why.

 A DSL drastically lowers the implicit domain knowledge in your code. 

Let me examplify; implement “If you have been married for more than half the fiscal year…” given the implicit knowledge: Dutch law mandates that you can D.I.V.O.R.C.E only after being marriage for at least a year.

In a generic language (given some suitable library) one could write: if (is_married_on($year, 6, 30)) reasoning that who has been married the middle day of the year, must have been been married for at least the first or the last half.

Apart from the hard to spot bugs in this code, it is hard to understand what constitutes being married more than half a year. Is it 183 days and 184 in a leap year? Being married the first three and last three months of the year?

Look at three examples in an imaginary predicate logic like DSL:

{all "married" | day $Y 30 6} ->
{sum #day "married" | year $Y} >= 183 ->
{max #day "married" | year $Y} > {sum #day | year $Y}/2 ->

How easy it is for the domain expert to express things much more explicit. A generic language would need a huge number of domain specific library call’s with extreme explicit naming and still would not get close to offering a flexible and explicit alternative. Besides, it is nice to use some explicit language now and than.

Catch All

1 Comment »

I was using my google-fu on longjmp when I came across a page from Francesco Nidito describing how to implement a Java like try/catch exception handling in C using the CPP. Very entertaining. However, I did see the dangling ETRY; as a challenge. Can I implement a more serious try/catch without it? On top of that, can I create a syntax more elegant than the syntax of Java, Ruby and Python? I could not resist. So here is the BSD-licensed source for my version of try/catch in CPP [download .tgz]. There is one small gem, I’d like to share with those that lack the time or courage to read the source. The code below would be valid after including the try/catch header file.

try             { /* Do something. */ }
catch(DIV_ZERO) { /* Handle this. */ }
catch(*)        { printf("Can't handle %d\n", excepno); }
ensure          { if (0 == d) d = 1; }

Note the catch(*). It catches all the (hitherto) uncaught exceptions. The preprocessor expands both catch(*) and catch(22) to a correct C integer expression. Since * can mean a pointer dereference and & can mean both pointer-of and bitwise-and, the expression (arg & allones) expands to either (* & allones) or to (22 & allones). Both are a valid C expression, the first evaluating to -1 the later to 22. To distinguish between a star and a number argument, (#arg[0] == '*') is used to check the first character of the argument. I happen to think the catch all catch(*) syntax is already plenty elegant but I am very open to any improvements, so don’t hold back.

La Coder Qui Rit

No Comments »

LVQRHumor comes in many guises. Coding humor is no exception. Next to jokes in comments like:

  // Message for the dyslexic:
  // There is a Dog!

humor can be be code specific like this LOC from a message server:

  for (-8  ;-(  8)  ;o)

or, alternatively:


Some go overboard (as coders often do) and change the language all together:

likepython latinperl
Like, Python Lingua::Romana::Perligata

To me, it becomes interesting when I have to wonder: “Is it a joke, or not?” Look at the code below and tell me, given that this is part of the MINIX POSIX compliance test set, is this juste pour rire?


Outfox The Fox

No Comments »

Sometimes I encounter code like this:

  u >>= 2; // Fastest way to divide by 4.

I usually burst out singing: Those who try to tangle with my daring do, end up at the angle that herring do. (Danny Kaye) [Outfox the Fox] Never try to outfox the compiler. Take a look at what gcc generates to divide an integer by 13 in x64 and PowerPC assembler (approximately):

x64Idiv PowerIdiv

It is not uncommon for a divide instruction to take more than four times as long as a multiplication. Shift and register loading typically are up to fifty times faster. Above (speed) optimization is a simplified (i.e., unsigned) example. It does not take into account that floating point divides can be executed in parallel with integer code, it does not take into account the instructions in the CPU’s pipeline, it does not take into account surrounding code, it does not take into account MMX/SSE instructions, etc. For the curious, there is a excellent blog post on the why and how of multiply/shift-back. Remember, the rule is: you will never outfox the GNU-fox or any other optimizing fox. But if you have to outfox the fox look for inspiration first.

It’s Easy, It’s Obvious

No Comments »

No comment. Well, ok, as a coder, I know I have to comment.
So here: // No Comment Necessary.

The Counts

1 Comment »

countingMy attention was recently focused (thanks John) on an article by Flickr’s Kellan, describing how Flickr generates system wide unique IDs for user data like photos and comments. In short they have a dedicated ID generator that issues unique, bigint(20) numbers. They implemented this by keeping an almost empty table in MySQL and have the server do a non-standard REPLACE INTO and return the SELECT LAST_INSERT_ID(); of the table. The article claims that they needed a dedicated ID generator because they could not come up with a way to make independent servers generate a system wide unique ID and to prevent the introduction of a single-point-of-failure they came up with a way to make multiple independent servers generate a system wide unique ID. It is “a great example of the Flickr engineering dumbest possible thing that will work design principle.” (Kellan Elliott-McCrea) [Ticket Servers: Distributed Unique Primary Keys on the Cheap] But there is no such thing as dumbest-possible-thing. Give me a candidate dumbest-possible-thing and I’ll add a client-server layer and presto you have something even dumber. Like solving a multi-machine problem by building a multi-machine server.
However I am in favor of system design that goes for the “simplest possible thing that will work (but not simpler than that).” And others might rightfully argue that it’s impossible to prove a candidate simplest thing to actually be the simplest. Oh, I am such a hypocrite, and I did do nothing in my life that comes even remotely close to an installation like Flickr, they rock! But it’s amusing nonetheless.

Kinda Like Assert


In 1983 my math teacher wrote on the blackboard: P{Q}R. Ever since, my code is loaded with assertions. Assertions should be used to document logically impossible situations and discover programming errors […] This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely. (Wikipedia) [Assertion] We all have seen, but never produced, anti-code like:

  assert(buf = malloc(128));

However, I have produced code like:

  assert(!"Sorry, PNG routines not implemented yet.");

Fix-me-later-code like this might get shipped (with or without NDEBUG)… and some user might try a PNG file… after six hours of typing… without saving… We have all been there. You balance a todo-list of 20 items in your head, and it won’t survive deep thought on some unlikely potential problem. Put in an quick assert() and you can continue to work on your todo-list of, now, 19. The problem is not in using the assert(), but in forgetting to replace it with real code, eventually. So here is how I make them stick out:

#define NOT_HANDLED_YET(b) \
  (void)(!(b) || exit_error("Error '"#b"' not handled yet!"))
  (void)exit_error("Sorry, "s" not implemented yet!")

The examples above would read:

  buf = malloc(128);


And remember, no commits on trunk unless grep -c NOT_ *.[ch] returns 2.

Windowcitis Can Be Contagious

servet8In 2004 I had to make a set of napkins and tablecloths of a unique design. The theme was fallen leaves of the fern-leaved beech (Fagus sylvatica ‘Asplenifolia’). I decided to use PostScript (Ghostscript actually). It has instructions like infill to test collisions, it shrugs at using an aspect-ratio of 28×36 threads per cm and it is very intuitive. I could preview (in the after-shrink aspect-ratio) on screen and paper. I used a very naive algorithm. 1) Generate a unique leaf 2) pick a random position and direction for it and 3) shift it along until it did not collide with anything. It worked wonders. I could generate a napkin of 81×81 cm with 555 leaves in a few minutes. However doing the 2000 leaves of the tablecloth took much longer. About 32 hours. Since I had to send the CD to the weavery in two days, I started it on my wife’s computer and let it run. About 31 hours later my, than, 10 year old son came to me for some help with a computer game… You guessed it, he had rebooted the computer because it was acting “funny.” (It was running FreeBSD.) I could not blame him, rebooting is normal behavior for a Windows user. With the deadline four hours away, I had no choice but optimizing the code for three hours so I could generated the table cloth in minutes.
There are two lessons I learned. First, you can be hit by someone else’s windowcitis, and second sometimes optimizing can lower the risk of failure in unexpected ways. Speaking of lowering failure risk, nowadays all my family members run OS X.

Bijective Programming

No Comments »

In 1987 a friend of mine asked for some help with his (Turbo) Pascal home assignment. He had to write a program that would convert Roman numerals to Arabic. A classical exercise. He had written some 200 LOC. He noted that graceful handling wrong input necessitated a lot of if statements and cleanup code. It made the code bloated and hard to read. He still was unsure if his code would catch all possible wrong input—it did not. I explained to him that the bijection (i.e., converting from Arabic to Roman) was straight forward. So we wrote 20 LOC that would, for any Arabic integer input, generate the correct Roman string. Than we refactored his code until about 30 LOC were left. It would generate the correct Arabic integer for any well formed Roman string, and some integer value for any other input. Than we wrote some 10 LOC that would take a string convert it to a number and back to a string and compare it to the original input. If they were the same, it would proclaim the answer, otherwise it would print something like this:
I'm sorry Dave, 'IM' isn't Roman, did you mean 999 (CMXCIX)?
His solution would baffle his classmates, yet he got one of the lowest notes, because the (math) teacher was totally confused about the Arabic to Roman part—plus we did not know his name was Dave. The teacher failed to see that the bijective programming, as usual, resulted in very clean and robust code, with output that bordered on AI.

The Art of Redundancy

            [oo ]    [oo ]    [oo ]    [oo ]
            <   >    <   >    <   >    <   >

            [ oo]             [ oo]    [ oo]
             > <               > <      > <

            [oo ]    [oo ]    [oo ]
            <   >    <   >    <   >



In 1981 I coded a Space Invaders clone for the Apple ][ Europlus. I knew I should not have used Microsoft’s super slow Applesoft BASIC, but I just did. It was an animated ASCII-art program with a main-loop moving 12 invaders, the defender, the up-bullet and the five down-bullets. The latter two with collision detection and scoring. However it was not fast enough. I then tried, what I now would describe as redundancy. I replaced the code that printed the 12 invaders one at a time with one that would print 4 at a time (i.e., a whole row). I used ON (II%*2+FF%) GOSUB to go to any of the (2^4)*2 “subroutines” that would print the correct row. That made the game totally playable. It taught me that redundancy is not always a bad thing.