Archive for the ‘Export’ Category

Consider a Domain Specific Language

No Comments »

There I was, in this meeting. A domain specialist had build a prototype in Excel/VBA. The commercial recode was an pseudo Excel engine, coded in PHP, with the prototype expressions shoehorned into it. I’ll pause while you recover. They had replaced the domain specialist with a programmer where they should have replaced Excel with a DSL. And here is why.

 A DSL drastically lowers the implicit domain knowledge in your code. 

Let me examplify; implement “If you have been married for more than half the fiscal year…” given the implicit knowledge: Dutch law mandates that you can D.I.V.O.R.C.E only after being marriage for at least a year.

In a generic language (given some suitable library) one could write: if (is_married_on($year, 6, 30)) reasoning that who has been married the middle day of the year, must have been been married for at least the first or the last half.

Apart from the hard to spot bugs in this code, it is hard to understand what constitutes being married more than half a year. Is it 183 days and 184 in a leap year? Being married the first three and last three months of the year?

Look at three examples in an imaginary predicate logic like DSL:

{all "married" | day $Y 30 6} ->
{sum #day "married" | year $Y} >= 183 ->
{max #day "married" | year $Y} > {sum #day | year $Y}/2 ->

How easy it is for the domain expert to express things much more explicit. A generic language would need a huge number of domain specific library call’s with extreme explicit naming and still would not get close to offering a flexible and explicit alternative. Besides, it is nice to use some explicit language now and than.

Post Assignment

No Comments »

A while back, I designed a domain specific language. It combines the good of APL, C, Pascal and SQL. For example, the token ‘=‘ is not used, ‘:=‘ is the assign-token and ‘==‘ is the equal-token. Like C, it uses only ASCII characters and, like APL, it has a minimalistic, mathematical notation. There is no muli- or list-assign, like in Python and Perl, eventhough code like this: ‘(a,b,c) := (b,c,3);‘ just looks so cool and intuitive. But, it did not fit the syntax, it would make it hard to parse (both for humans and for the LR parser) and most of all it went against the minimalisticity. Than I came up with ‘=:‘, the post-assignment operator. It took me but five minutes to implement. It is a regular assignment but returns the old lvalue. For example ‘a =: 3;‘ would assign ‘3‘ to ‘a‘ and return what ever value was in ‘a‘ before. Post-assignment; analogous to the post-increment from C. IMHO, It makes the above shift even more elegant to read: ‘a =: b =: c =: 3;‘. It was love at first sight! However, the smartest person I know (and first programmer in this language) did not see the need for this opperator and neither did anyone else. So I swallowed my pride and removed the ten odd LOCs. I cried myself to sleep that night.

Clock Arithmetic Still Hard


Clock or modular arithmetic is hard. Take a look at the two code snippets below. Both seem reasonable integer comparison functions modeled after strcmp(). However, the left one contains an overflow bug.

BuggyIntCmp ClassicIntCmp

I have thirty odd years of programming experience, yet I had to be pointed out the bug. The left snippet looks obviously wrong to me, now. It’s like claiming that we go back in time if the short hand of a clock moves more than 6 hours. Ridiculous. (And, no, intermediate casting to long won’t help.) Just try a few extreme cases using INT_MIN, for example. Clock arithmetic is hard because if the modulus is sufficiently large we tend to ignore it. SimplifiedIntCmp So, if you have to go against the first directive, if you don’t trust the -O4 option, or if you just have to show off how brilliant you are, use something like the C code on the right. It won’t bring you much, but at least it does not contain an overflow bug. Though it might contain an other one, or show a incompatibility in your compiler. But that is left as an exercise to the reader.

Control Is The Ultimate Illusion

No Comments »

peoplewareTom DeMarco has turned 180o, finally! As one of the initiators and most prolific proponents, if not principle responsibles of the software engineering myth, he finally sees his erring ways. After selling a fuckton of books on software-engineering like “Peopleware,” “Deadline,” “Controlling Software Projects,” and “Waltzing with Bears.” he now, finally, takes his distance. A bit late after 40 years, but I guess, the money was too good. He, at long last, acknowledges the folly of his most famous quote: “You can’t control what you can’t measure.” Never came so much stress to so many coders from so few words. Pity he does not apologize profoundly, but still it is good that he basically tells the world he has been wrong and misinterpreted. Finally he acknowledges, what all real-life-coders knew all along, the more you focus on control, the more likely you’re working on a project that’s striving to deliver something of relatively minor value. (Tom DeMarco) [Software Engineering: An Idea Whose Time Has Come and Gone?] It is ending the era of the software engineering myth. It has hit the community hard, just look at a few reactions: CH, iTwire, Stackoverflow, Software Architecture, Never In Doubt.

Kinda Like Assert


In 1983 my math teacher wrote on the blackboard: P{Q}R. Ever since, my code is loaded with assertions. Assertions should be used to document logically impossible situations and discover programming errors […] This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely. (Wikipedia) [Assertion] We all have seen, but never produced, anti-code like:

  assert(buf = malloc(128));

However, I have produced code like:

  assert(!"Sorry, PNG routines not implemented yet.");

Fix-me-later-code like this might get shipped (with or without NDEBUG)… and some user might try a PNG file… after six hours of typing… without saving… We have all been there. You balance a todo-list of 20 items in your head, and it won’t survive deep thought on some unlikely potential problem. Put in an quick assert() and you can continue to work on your todo-list of, now, 19. The problem is not in using the assert(), but in forgetting to replace it with real code, eventually. So here is how I make them stick out:

#define NOT_HANDLED_YET(b) \
  (void)(!(b) || exit_error("Error '"#b"' not handled yet!"))
  (void)exit_error("Sorry, "s" not implemented yet!")

The examples above would read:

  buf = malloc(128);


And remember, no commits on trunk unless grep -c NOT_ *.[ch] returns 2.

Pointer To Pointer


Some hold that pointers in a language like C are dangerous and hard to master. Both maybe true, but so are scalpels. Since you are reading this, you know that pointer practice makes pointer perfect. The same goes for pointers to pointers. I remember vividly when I first saw some elegant code that used a pointer to a pointer (I was working on Amoeba). It was a coding changing event. Before I would write like:
Picture 2
After I would write:
Picture 3
I’ve even used that last bit of code in job interviews. If some self proclaimed hardcore C coder could not explain those lines of code I would know they were no experts. Now that this trick is out of the bag, I will have to use an other one, next time we meet.

Skip List

No Comments »

skiplistThe skip list is a relative unknown data structure. If you know it already you can stop reading. If not, keep reading because “the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.” (William Pugh) [Skip lists: a probabilistic alternative to balanced trees]. Speaking from experience, I have to say this is true. Coding up skip lists is easy. I’ve written a reentrant version that uses read/write locking, a version for a hash table to replace chaining, a version that allows Θ(log(n)) indexing, and an unrolled version. It is not perfect for all problems, but it is adaptable and has some other nice properties. However, I like it best for its simplicity. To find the 7 in the above skip list, we start at the top left, and go right until we hit a value >7 (or NIL). We go down one and continue to the right, and so on. So we visit the nodes 1, 4, 6, and 7. Search usually is Θ(log(n)). Since the reentrant version is somewhat less trivial to write, click this link for a GPLed implementation.

NULL’s a Bitch

No Comments »

Anyone that wrote any non trivial SQL query with an outer join, will know: NULL‘s a bitch. By the time you learn to write:

SELECT * FROM T  -- contains bitches.

your hair has turned gray, if you did not pull it all, that is. When the ω—the symbol for ‘unknown’—was introduced by Codd, I guess he basically tried to support ternary logic. Later, he must have realized that ω, in non boolean columns, can stand for many different problem specific meta-values. In version 2 of The Relational Model for Database Management (Chapter 8), he tried to remedy the problem by splitting NULL (ω in SQL) into A-mark (applicable-value mark) and I-mark (inapplicable-value mark). But first, this sinned against Bruce MacLennan’s zero, one, or infinity principle and second, almost nobody adopted it. Hence different SQL implementations handle non trilogic usage of NULL differently. For example the expression 'A' || NULL || 'C' has different values in different SQL implementations. The same goes for aggregates, COUNT(*), NULL / 0, and indexing. Also a whole host of NULL related functions were invented like COALESCE(). Since any usage of NULL can be mimicked by adding a boolean column and some careful code, I am of the opinion that the introduction of NULL was a huge mistake, and the current standard and implementations aggravate the problem by implementing a lot of undefined or implicit behavior. Other solutions that would have been better are (user-definable) meta- or default-values. Anything but NULL.

Recursive With Clause

No Comments »

with x( s, ind ) as
( select sud, instr( sud, ' ' )
  from ( select 
  '53  7    6  195    98    6 8   6   34  8 3  17   2   6 6    28    419  5    8  79'
  sud from dual )
  union all
  select substr( s, 1, ind - 1 ) || z || substr( s, ind + 1 )
       , instr( s, ' ', ind + 1 )
  from x
     , ( select to_char( rownum ) z
         from dual
         connect by rownum <= 9
       ) z
  where ind > 0
  and not exists ( select null
                   from ( select rownum lp
                          from dual
                          connect by rownum <= 9
                   where z = substr( s, trunc( ( ind - 1 ) / 9 ) * 9 + lp, 1 )
                   or    z = substr( s, mod( ind - 1, 9 ) - 8 + lp * 9, 1 )
                   or    z = substr( s, mod( trunc( ( ind - 1 ) / 3 ), 3 ) * 3
                                      + trunc( ( ind - 1 ) / 27 ) * 27 + lp
                                      + trunc( ( lp - 1 ) / 3 ) * 6
                                   , 1 )
select s
from x
where ind = 0

(Anton Scheffer) [Solving a Sudoku using Recursive Subquery Factoring]

Impressive code from our amici over at Amis. But also an impressive example of how far one can go outside the original domain of a language. I mean recursive queries? On the other hand this algorithm does not look too good in Scala or in Perl either. The mathematics of Sudoku are studied in depth. Sudoku has been shown to be NP-complete and of the many ways of solving Sudoku, dancing links and constraint programming seem to be very popular. (Typically, it takes milli seconds for a computer to solve a Sudoku.)