Archive for the ‘Export’ Category
January 12th, 2011
There I was, in this meeting. A domain specialist had build a prototype in Excel/VBA. The commercial recode was an pseudo Excel engine, coded in PHP, with the prototype expressions shoehorned into it. I’ll pause while you recover. They had replaced the domain specialist with a programmer where they should have replaced Excel with a DSL. And here is why.
A DSL drastically lowers the implicit domain knowledge in your code.
Let me examplify; implement “If you have been married for more than half the fiscal year…” given the implicit knowledge: Dutch law mandates that you can D.I.V.O.R.C.E only after being marriage for at least a year.
In a generic language (given some suitable library) one could write: if (is_married_on($year, 6, 30))
… reasoning that who has been married the middle day of the year, must have been been married for at least the first or the last half.
Apart from the hard to spot bugs in this code, it is hard to understand what constitutes being married more than half a year. Is it 183 days and 184 in a leap year? Being married the first three and last three months of the year?
Look at three examples in an imaginary predicate logic like DSL:
{all "married" | day $Y 30 6} ->
{sum #day "married" | year $Y} >= 183 ->
{max #day "married" | year $Y} > {sum #day | year $Y}/2 ->
How easy it is for the domain expert to express things much more explicit. A generic language would need a huge number of domain specific library call’s with extreme explicit naming and still would not get close to offering a flexible and explicit alternative. Besides, it is nice to use some explicit language now and than.
May 9th, 2010
A while back, I designed a domain specific language. It combines the good of APL, C, Pascal and SQL. For example, the token ‘=
‘ is not used, ‘:=
‘ is the assign-token and ‘==
‘ is the equal-token. Like C, it uses only ASCII characters and, like APL, it has a minimalistic, mathematical notation. There is no muli- or list-assign, like in Python and Perl, eventhough code like this: ‘(a,b,c) := (b,c,3);
‘ just looks so cool and intuitive. But, it did not fit the syntax, it would make it hard to parse (both for humans and for the LR parser) and most of all it went against the minimalisticity. Than I came up with ‘=:
‘, the post-assignment operator. It took me but five minutes to implement. It is a regular assignment but returns the old lvalue. For example ‘a =: 3;
‘ would assign ‘3
‘ to ‘a
‘ and return what ever value was in ‘a
‘ before. Post-assignment; analogous to the post-increment from C. IMHO, It makes the above shift even more elegant to read: ‘a =: b =: c =: 3;
‘. It was love at first sight! However, the smartest person I know (and first programmer in this language) did not see the need for this opperator and neither did anyone else. So I swallowed my pride and removed the ten odd LOCs. I cried myself to sleep that night.
March 16th, 2010
Clock or modular arithmetic is hard. Take a look at the two code snippets below. Both seem reasonable integer comparison functions modeled after strcmp()
. However, the left one contains an overflow bug.
|
|
I have thirty odd years of programming experience, yet I had to be pointed out the bug. The left snippet looks obviously wrong to me, now. It’s like claiming that we go back in time if the short hand of a clock moves more than 6 hours. Ridiculous. (And, no, intermediate casting to long
won’t help.) Just try a few extreme cases using INT_MIN
, for example. Clock arithmetic is hard because if the modulus is sufficiently large we tend to ignore it. So, if you have to go against the first directive, if you don’t trust the -O4
option, or if you just have to show off how brilliant you are, use something like the C code on the right. It won’t bring you much, but at least it does not contain an overflow bug. Though it might contain an other one, or show a incompatibility in your compiler. But that is left as an exercise to the reader.
February 1st, 2010
Tom DeMarco has turned 180o, finally! As one of the initiators and most prolific proponents, if not principle responsibles of the software engineering myth, he finally sees his erring ways. After selling a fuckton of books on software-engineering like “Peopleware,” “Deadline,” “Controlling Software Projects,” and “Waltzing with Bears.” he now, finally, takes his distance. A bit late after 40 years, but I guess, the money was too good. He, at long last, acknowledges the folly of his most famous quote: “You can’t control what you can’t measure.” Never came so much stress to so many coders from so few words. Pity he does not apologize profoundly, but still it is good that he basically tells the world he has been wrong and misinterpreted. Finally he acknowledges, what all real-life-coders knew all along, “the more you focus on control, the more likely you’re working on a project that’s striving to deliver something of relatively minor value.“ (Tom DeMarco) [Software Engineering: An Idea Whose Time Has Come and Gone?] It is ending the era of the software engineering myth. It has hit the community hard, just look at a few reactions: CH, iTwire, Stackoverflow, Software Architecture, Never In Doubt.
January 31st, 2010
In 1983 my math teacher wrote on the blackboard: P{Q}R
. Ever since, my code is loaded with assertions. “Assertions should be used to document logically impossible situations and discover programming errors […] This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely“. (Wikipedia) [Assertion] We all have seen, but never produced, anti-code like:
assert(buf = malloc(128));
However, I have produced code like:
assert(!"Sorry, PNG routines not implemented yet.");
Fix-me-later-code like this might get shipped (with or without NDEBUG
)… and some user might try a PNG file… after six hours of typing… without saving… We have all been there. You balance a todo-list of 20 items in your head, and it won’t survive deep thought on some unlikely potential problem. Put in an quick assert()
and you can continue to work on your todo-list of, now, 19. The problem is not in using the assert()
, but in forgetting to replace it with real code, eventually. So here is how I make them stick out:
#define NOT_HANDLED_YET(b) \
(void)(!(b) || exit_error("Error '"#b"' not handled yet!"))
#define NOT_IMPLEMENTED_YET(s) \
(void)exit_error("Sorry, "s" not implemented yet!")
The examples above would read:
buf = malloc(128);
NOT_HANDLED_YET(NULL == buf);
NOT_IMPLEMENTED_YET("PNG routines");
And remember, no commits on trunk unless grep -c NOT_ *.[ch]
returns 2
.
January 29th, 2010
Some hold that pointers in a language like C are dangerous and hard to master. Both maybe true, but so are scalpels. Since you are reading this, you know that pointer practice makes pointer perfect. The same goes for pointers to pointers. I remember vividly when I first saw some elegant code that used a pointer to a pointer (I was working on Amoeba). It was a coding changing event. Before I would write like:
After I would write:
I’ve even used that last bit of code in job interviews. If some self proclaimed hardcore C coder
could not explain those lines of code I would know they were no experts. Now that this trick is out of the bag, I will have to use an other one, next time we meet.
December 8th, 2009
The skip list is a relative unknown data structure. If you know it already you can stop reading. If not, keep reading because “the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.” (William Pugh) [Skip lists: a probabilistic alternative to balanced trees]. Speaking from experience, I have to say this is true. Coding up skip lists is easy. I’ve written a reentrant version that uses read/write locking, a version for a hash table to replace chaining, a version that allows Θ(log(n)) indexing, and an unrolled version. It is not perfect for all problems, but it is adaptable and has some other nice properties. However, I like it best for its simplicity. To find the 7
in the above skip list, we start at the top left, and go right until we hit a value >7
(or NIL
). We go down one and continue to the right, and so on. So we visit the nodes 1
, 4
, 6
, and 7
. Search usually is Θ(log(n)). Since the reentrant version is somewhat less trivial to write, click this link for a GPLed implementation.
December 3rd, 2009
Anyone that wrote any non trivial SQL query with an outer join, will know: NULL
‘s a bitch. By the time you learn to write:
SELECT * FROM T -- contains bitches.
WHERE IFNULL(A = B, FALSE) OR (A IS NULL AND B IS NULL);
your hair has turned gray, if you did not pull it all, that is. When the ω—the symbol for ‘unknown’—was introduced by Codd, I guess he basically tried to support ternary logic. Later, he must have realized that ω, in non boolean columns, can stand for many different problem specific meta-values. In version 2 of The Relational Model for Database Management (Chapter 8), he tried to remedy the problem by splitting NULL
(ω in SQL) into A-mark (applicable-value mark) and I-mark (inapplicable-value mark). But first, this sinned against Bruce MacLennan’s zero, one, or infinity principle and second, almost nobody adopted it. Hence different SQL implementations handle non trilogic usage of NULL
differently. For example the expression 'A' || NULL || 'C'
has different values in different SQL implementations. The same goes for aggregates, COUNT(*)
, NULL / 0
, and indexing. Also a whole host of NULL
related functions were invented like COALESCE()
. Since any usage of NULL
can be mimicked by adding a boolean column and some careful code, I am of the opinion that the introduction of NULL
was a huge mistake, and the current standard and implementations aggravate the problem by implementing a lot of undefined or implicit behavior. Other solutions that would have been better are (user-definable) meta- or default-values. Anything but NULL
.
November 3rd, 2009
with x( s, ind ) as
( select sud, instr( sud, ' ' )
from ( select
'53 7 6 195 98 6 8 6 34 8 3 17 2 6 6 28 419 5 8 79'
sud from dual )
union all
select substr( s, 1, ind - 1 ) || z || substr( s, ind + 1 )
, instr( s, ' ', ind + 1 )
from x
, ( select to_char( rownum ) z
from dual
connect by rownum <= 9
) z
where ind > 0
and not exists ( select null
from ( select rownum lp
from dual
connect by rownum <= 9
)
where z = substr( s, trunc( ( ind - 1 ) / 9 ) * 9 + lp, 1 )
or z = substr( s, mod( ind - 1, 9 ) - 8 + lp * 9, 1 )
or z = substr( s, mod( trunc( ( ind - 1 ) / 3 ), 3 ) * 3
+ trunc( ( ind - 1 ) / 27 ) * 27 + lp
+ trunc( ( lp - 1 ) / 3 ) * 6
, 1 )
)
)
select s
from x
where ind = 0
/
(Anton Scheffer) [Solving a Sudoku using Recursive Subquery Factoring]
Impressive code from our amici over at Amis. But also an impressive example of how far one can go outside the original domain of a language. I mean recursive queries? On the other hand this algorithm does not look too good in Scala or in Perl either. The mathematics of Sudoku are studied in depth. Sudoku has been shown to be NP-complete and of the many ways of solving Sudoku, dancing links and constraint programming seem to be very popular. (Typically, it takes milli seconds for a computer to solve a Sudoku.)