Don’t repeat yourself

In the last few weeks of 2008 I spent a bit of time converting Jonathan Shewchuk’s adaptive floating-point predicates (included in his Triangle library) into C#.  I don’t want to bash on this too much: the algorithms in question, and the technical proofs behind them, are wonderful things.  The implementation is rock solid, providing a combination of robustness and performance that I haven’t seen beaten by anything – and it’s all freely available for non-commercial use.  There are none of the dreaded “let’s assume that none of the input points have the same x coordinate” assumptions that so many computational geometry algorithms make.  So I really have nothing to complain about!🙂

But the implementation does provide a great illustration (via getting it wrong!) of a classic software engineering principle, “Do not repeat yourself”.  The adaptive InCircle test is about 600 lines long (that’s 18 screens of code on my laptop!!), so I’d rather not list it all – but this fragment is pretty representative:

5484 if (adxtail != 0.0) {

5485 axtbclen = scale_expansion_zeroelim(4, bc, adxtail, axtbc);

5486 temp16alen = scale_expansion_zeroelim(axtbclen, axtbc, 2.0 * adx, temp16a);

5487

5488 axtcclen = scale_expansion_zeroelim(4, cc, adxtail, axtcc);

5489 temp16blen = scale_expansion_zeroelim(axtcclen, axtcc, bdy, temp16b);

5490

5491 axtbblen = scale_expansion_zeroelim(4, bb, adxtail, axtbb);

5492 temp16clen = scale_expansion_zeroelim(axtbblen, axtbb, –cdy, temp16c);

5493

5494 temp32alen = fast_expansion_sum_zeroelim(temp16alen, temp16a, temp16blen, temp16b, temp32a);

5495 temp48len = fast_expansion_sum_zeroelim(temp16clen, temp16c, temp32alen, temp32a, temp48);

5496 finlength = fast_expansion_sum_zeroelim(finlength, finnow, temp48len, temp48, finother);

5497 finswap = finnow; finnow = finother; finother = finswap;

5498 }

Now look how the function continues:

5499 if (adytail != 0.0) {

5500 aytbclen = scale_expansion_zeroelim(4, bc, adytail, aytbc);

5501 temp16alen = scale_expansion_zeroelim(aytbclen, aytbc, 2.0 * ady, temp16a);

5502

5503 aytbblen = scale_expansion_zeroelim(4, bb, adytail, aytbb);

5504 temp16blen = scale_expansion_zeroelim(aytbblen, aytbb, cdx, temp16b);

5505

5506 aytcclen = scale_expansion_zeroelim(4, cc, adytail, aytcc);

5507 temp16clen = scale_expansion_zeroelim(aytcclen, aytcc, –bdx, temp16c);

5508

5509 temp32alen = fast_expansion_sum_zeroelim(temp16alen, temp16a, temp16blen, temp16b, temp32a);

5510 temp48len = fast_expansion_sum_zeroelim(temp16clen, temp16c, temp32alen, temp32a, temp48);

5511 finlength = fast_expansion_sum_zeroelim(finlength, finnow, temp48len, temp48, finother);

5512 finswap = finnow; finnow = finother; finother = finswap;

5513 }

Look familiar?  They look as though they’re probably the same code, although it’s hard to be fully certain at a glance (they are in fact the same, if you make a few variable substitutions).  There are a further four blocks of code just the same.  Then there’s a 100-line block that gets repeated in a similar fashion, three times over.  This repetition was a killer for me, because I was making some small changes to all the code as I went (e.g. using a 2d-vector structure instead of separate floats for x and y coordinates).  These changes were tiresome and error-prone, so I wanted to perform them on as little code as possible.  It’s a perfect illustration of why repeating yourself is bad (as an aside, imagine you found a bug in one of these code blocks and had to duplicate it into 5 subtly-different-but-almost-the-same blocks).

Pulling the commonality out of these code blocks isn’t as simple as you’d like – assuming you want to avoid making a mistake :)  So I ended up writing a small script to do it automatically – given two blocks of code that I suspected to follow the same pattern, it tokenized them each to produce a list of strings, and then computed a mapping (string -> string) which converted one token sequence into the other (if no such mapping could be found, the blocks weren’t repeated code).  Using this mapping, it was simple to extract the block into a function and pass the correct variables as parameters.  I’m sure there are some nice refactoring tools out there to do this for you automatically, but I didn’t know one off the top of my head and the script only took 10 minutes to write, so it definitely saved me a lot of time compared to updating all the code by hand.

Eliminating duplicated code goes a long way to making your software better.  It becomes easier to understand, easier to change, easier to fix.   Of course duplicated code is usually much more subtle and hard to spot.  But this case was interesting for me as it’s the first time I’ve gone so far as to write a script to eliminate duplication!  In the end, that 600-line function came down to 4 functions, under 200 lines in total.  All but one of those functions fits on a single screen and can be taken in at a glance.  That’s the kind of simplifying refactoring that brings a big smile to my face🙂

This entry was posted in Software development. Bookmark the permalink.

6 Responses to Don’t repeat yourself

  1. I’d like to cite the ‘Clone detective’ at this point🙂

    http://www.codeplex.com/CloneDetectiveVS

    I’m thinking if it’s a good or a bad thing that they haven’t released anything after version 1.0.0.0, though.

  2. Jonathon says:

    I still insist that the 1700 line function we had in our 3rd year group project had the most redundant code in it😛

    Clone detective looks cool, but the “clearing of the settings” bug is a little worrying :S people just need to be more careful🙂

  3. Whaledawg says:

    Actually a good text editor(like gVim) has diff built into it to solve problems like this. Or if you’re lucky enough to be on unix you can use diff itself😉

    Also, why were you converting it by hand? A tokenizer/parser like flex/bison should have been up to the task.

  4. lukehalliwell says:

    Well, read more carefully: diff doesn’t solve this particular problem at all – it doesn’t do token substitution and keep that consistent across multiple lines. I’m not sure whether what you’re talking about in gVim is just diff, which again wouldn’t solve the problem. It would obviously be cool to have a refactoring tool of this kind in an editor!

    Hand-writing a tokeniser for this case literally took 5 minutes, far less than it would have taken me to remind myself exactly how to use lex again (note: bison is a parser generator, and that’s not what I’m doing here). Yes, lex may be quicker to do the job, but only if you’ve used it recently enough to use it without thinking! I didn’t need to write a proper tokeniser because the problem in question is so simple, and the code I was writing was totally throwaway. Obviously a real tokeniser, built to last, should be done with a proper tool.

  5. Dan says:

    Do you happen to have a completed copy of the C# version of this library? Is that something you might be willing to share? I would be eternally grateful🙂

Comments are closed.