Phew! That’s it for another year of the Realtime Worlds Student Programming Contest, bigger and better than last year by a mile. From 92 initial contestants, we came down to 32 finalists, lots of pizza, a 5-hour coding marathon, 1 major error in the problems, a live-blogging journalist (Phil Harris from square-go), and 1 blue screen of death. As a result, I have a bumper blog post, packed with problem analysis, the inside story of my bumbling organisational efforts and the geniuses that saved the day, plus video footage of possibly the funniest car-driving AI ever created in 2 hours!
So firstly, the story of the day: we began with 3 hours dedicated to 3 algorithmic programming questions; traditional programming contest stuff. At the end of this time, the scoreboard looked like this:
|Team||Fishing||Word grid||BOOM! Headshot!||Total|
|The Psychodelic Circus (Ctrl+Shift+Home Delete) and a Haggis||100||100||0||200|
|TEAM Super Serious Business||30||0||0||30|
Then, without a break, we let the teams loose on the interactive question. This year’s theme was a car racing game with 3D physics, all based on the excellent JigLibX physics library for .Net. Two teams were clearly head and shoulders above the others, propelling themselves up the leaderboard, and the team-with-the-unspeakably-long-name squeaked into third place, just enough to cling onto their overall lead:
|Team||Fishing||Word grid||BOOM! Headshot!||Cars||Total|
|The Psychodelic Circus (Ctrl+Shift+Home Delete) and a Haggis||100||100||0||50||250|
|TEAM Super Serious Business||30||0||0||200||230|
Silly me completely neglected to design contest rules to produce a clear 3rd place, so we were unable to give them prizes on the spot (we managed to give them all something the week after). I’m just glad we didn’t have a 1st place tie! Well done to all our contestants – they really worked flat out and still came out smiling and laughing at the end. The quality was noticeably up from last year, and I really enjoyed the event. I hope everyone else did too.
Algorithmic problem analysis
Judging the question difficulty was pretty awkward. Our original plan was for the online round to contain 2 easy and 1 medium difficulty question, and for us to tailor the difficulty of the final based on those scores. Unfortunately, the medium question turned into a monster, and we were left with a set of finalists who had (mostly) completed 2 easy questions with ease, and failed on the toughie. We wanted to up the ante for the final, but it wasn’t clear how far we could go. In the end, we didn’t do too badly, although the scoreboard suggests we did pitch it slightly too hard.
The fishing question was designed as our easy one. There’s not too much to say about this, except that we designed it with the solution in mind from the start (as we’re learning the hard way, that makes question preparation go far smoother!). The problem is the classic “activity selection problem”, which has a simple greedy solution. As for the problem statement, well, it’s a bit of an RTW-insider joke. You’d have to come work with Dave to find out more ;) One small point of note – the input sizes were carefully chosen to make a brute-force solution fail for time on the larger case – spot the team that did that!
After that came the Boggle-themed problem. We’d had this question in mind from a long time ago, making a few tweaks along the way. Essentially, the solution is a brute-force recursive search of all possible word sequences in the grid. There are a couple of wrinkles:
- The brute-force method gets too slow when N > 4. Originally, the point of the question was to handle this case, but at the last minute we cut the large case down to 20 points in the hope that it would make the question more accessible to all. Seems this wasn’t enough.
- The ‘q’ -> ‘qu’ rule can definitely cause a bit of implementation fiddliness. I put this in (a) because it’s how Boggle actually works, and (b) to make the question a bit harder (earlier on, when we were trying to increase difficulty). Looking at the scores, we may have been better off without it. As you’ll see, I certainly would have!!
So – how to deal with these wrinkles? The ‘q’ -> ‘qu’ rule can be done in a variety of ways. In my implementation, I preprocessed the dictionary to remove words with just ‘q’, and to replace ‘qu’ with ‘q’ in the remaining words. This simplifies the search, since you can then just forget about ‘qu’ and just deal with ‘q’. It also led me to a stupid bug, which I’ll get to later on.
Now, for the N = 6 case, there are probably different approaches, but the obvious one is to use a trie. It’s a neat data structure if you haven’t come across it before. In this case, you build a trie representing the whole dictionary, and then as you perform your brute-force search on the grid, you walk the trie in sync with the grid search – allowing you to avoid searching most word sequences (the trie lets you spot immediately when the letters so far can’t possibly lead to a valid word, no matter what you add to them).
Finally, we had BOOM! Headshot! – inspired by Doug. This question actually started out life in a much harder form, would you believe it, and was heavily simplified, but remained the hardest one in the contest. The basic idea is to consider each enemy on its own, and ask whether it can be shot, or not. To do this, for each civilian, you figure out a segment [s1, s2] of the x-axis in which the civilian “blocks” the shot (these segment endpoints are easy to find with the given helper functions: get the tangents from the robot to the civilian, then intersect them with the x-axis). If you sort all the segment endpoints, you can walk along the x-axis in order, keeping track of which civilians are blocking a shot, and hence looking for segments of the axis from which a shot can be taken free of any civilian blockages. If you find such a segment, you have to check it’s within [A, B] to actually be on Doug’s path.
And that was it for the algorithmic section. I’m fairly pleased with the questions we came up with – perhaps they could have been a little easier, but hopefully some fun was had attempting them :)
The interactive question illustrates an awkward truth about programming in general, and perhaps game programming more than most. In real life, you rarely get to work on problems that are perfectly, mathematically defined; life as a programmer is not like working through a programming textbook. Problems are vaguely stated, and the APIs you build on are poorly-documented and packed with undefined behaviour. The difference between our algorithmic and interactive questions illustrates this perfectly. It reminds me of the way MIT have stopped introducing students to programming with Scheme, and switched to Python-controlled robots – for similar reasons.
Our interactive question began with a hunt for some kind of cool open-source code as a base to save us preparation time, and I thought something based around 3D physics would be cool. We were also determined to do the whole question in .Net after last year’s fiasco, where we allowed any language, started each entry as a separate process, and communicated via standard I/O. It’s nice in theory to allow any language, but the subsequent problems were just horrible: on a minor note, the possibility for the contestants to be puzzled by I/O deadlock bugs, but more seriously, the total failure to load one of the entries due to some kind of runtime library incompatibility problem with the host programme. This year, I figured we’d let everyone write a C# class library, have a C# host app load them in with a bit of reflection and then just call the contestants’ code normally. Losing the language flexibility seemed worthwhile for the confidence we’d gain in the whole thing working. Now, there aren’t that many good choices for an open source physics engine in .Net, at least not ones that just work straight out of the box. So I was just delighted when I came across JigLibX. Written 100% in C#, it just works, and comes with a fabulous little demo programme, complete with cars driving over bumpy terrain, stackable boxes, ragdoll bowling-pin men, and throwable solid objects. It’s a tremendously fun little thing to play around with, and so easy to modify.
Anyway, enough of the blethering. Here’s some footage of the actual races in the final. I’d originally thought of having some kind of gradual elimination of cars, but with 30 minutes to go, got discussing the fairest scoring method with Andrew & Alex, and decided that this was not it. We went through a tonne of options, in the end settling for shoving all 7 cars in the race together, awarding Formula-1-style points based on ranking, and adding up these points across 4 races. Every car would get a chance to try every track. The only trouble was that I’d written the application to support 4 cars, so a mad last-minute rush ensued to add 3 new car models with all-new colours.
The winning AI is actually quite remarkable. It’s slow-but-steady, a tortoise to the other teams’ hares, but does a super job of steering accurately through all the checkpoints, almost without a fault. It also avoids the trap that the other slow-but-steady entry fell into: not having enough power to climb the steeper hills.
Last year, one of the biggest lessons learned was to avoid the craziness of submitting solutions on a USB stick. We knew it was never going to be good, but it was even worse than expected. We lost track of people’s scores and had to do some real detective work to work them out again; we lost source code that people wanted afterwards, so they could complete their efforts; and my laptop was infected with malware. Doh! So we were determined to use some proper contest software – but which one? It actually turned out to be hard to find anything suitable. PC2 is the most widely-known, but it wasn’t any use to us: being guests inside a university network, it was infeasible to install or run such a server locally within the network to meet the “any available port open” requirement. And writing your own is not that simple. Automatically compiling and running code is a bit of a yucky problem, when you consider having to keep each job isolated, kill processes that run amok, feed results back to the front-end, and time them precisely. My second idea was to separate the concepts of submission from grading. I figured we could use a simple file submission process (source control repository? email?), enforce some conventions within that to organise the solutions, and when the time came, download the solutions to a single machine and run a batch grading script to calculate all the scores. It would have been very home-grown, clearly wouldn’t have scaled to large contests, but I believe could have been a simple, effective solution for our contest.
Then, last year, Google Code Jam came to the rescue. The genius of Code Jam’s approach is having contestants compile and run the code themselves, and just comparing their answers to the expected solution. For one thing, it’s brilliant to allow developers to choose any language they want (and brilliant to see what they do with this freedom); but further, it makes the contest software a doddle to write, as compilation, running and timing is replaced with a simple comparison against a reference answer.
I also noticed they ran the thing on App Engine, and I’d been messing around with that a fair bit myself, so I thought I’d follow suit. It’s hard to explain just how cool App Engine is for simple web application development. For starters, it’s cool being able to put a database-backed web application up on the internet without having to care about servers at all, as the app runs on Google’s server infrastructure. Where it gets really cool, though, is if your app takes off, or gets spiky traffic. Essentially, they’ll spawn extra instances of your app when demand is high, so scaling “just comes for free” (of course, it is not exactly transparent!). And finally, the whole thing’s free of charge (at least, to start out – you can pay more to get higher quotas of CPU usage, bandwidth etc). If you haven’t tried it before, I highly recommend it. It’s brought a big smile to my face of late.
Overall, our home-rolled web application turned out to be awesome – firstly, allowing us to run an online qualification round, and secondly, making the submission process of the onsite final run like clockwork. We did have a few technical problems, though – mostly affecting the online round. To start with, as the online contest approached, I began to get reports of teams that couldn’t log in – but not everyone. It turns out that the datastore API lets you add data of type ‘User’, with the wrong case, and when the real user logs in, a query for “find a contestant registered in this contest with this username” just returns nothing. So I had to go back and re-add the problem teams with exactly the right casing. There was a further issue of the same type, but even more pernicious, resulting from the fact that more recent UK users have “@googlemail” addresses, not gmail (down to a trademark dispute). But Google let you send email to either, so some people think they have a gmail username when it’s technically googlemail, and vice versa. Again, if I added them with the wrong domain, when they logged in, it wouldn’t find them in the system.
There was a much bigger problem to come. As the start of the online contest approached, I was sitting trying to soothe two babies at once. I figured I could hit the “open contest” button without much trouble. Imagine my panic when it responded with “Internal server error”! (not even the helpful Python callstack you get when your own code goes wrong). This was a problem I’d never seen before, and it was preventing anyone from logging in to see the questions. The babies took this opportunity to begin full-scale screaming, so I couldn’t have been more stressed. A bit of time digging around in the server logs (you get a nice application administrator’s dashboard with this type of stuff), I could see that it was exceeding a CPU quota – not the daily usage quota but some kind of secret “we think your application is being attacked or has gone crazy” short-term quota. A quick glance at the code and I quickly suspected the scoreboard, which looked inefficient and had some kind of quadratic cost in this situation – O(N) to compute the scoreboard for N teams, times N teams all trying to view the page simultaneously. I commented out the scoreboard and everything righted itself instantly. That was probably the most stressful 7 minutes of debugging I’ve ever done. Guess I should have done some load testing beforehand!! :)
Everything about organising the contest ran down to the wire. It doesn’t matter how much I try to cajole everyone into having the questions ready 2 weeks in advance; it always happens last-minute. In the online contest, it was the dungeons question that went down to the wire; for the final, all 4 questions were finished with a day or two to spare, and the headshot question with 15 minutes to go!!! Getting the questions ready at the last minute causes real headaches, because you absolutely need time for a couple of people to try a question, make sure the problem statement is understandable and unambiguous, make sure it can be solved, that we hadn’t missed a trivial solution, and most important of all, that we all agree on the test case answers.
The headshots question ran latest this time round. Alex completed it on Thursday night in a slightly different form (the enemies had radius R and could be shot anywhere in their radius, not just through the centre). Unfortunately, as Andrew pointed out on Friday morning, the intended solution (basically the same as the solution for the final question) didn’t work. If you consider a point ‘x’ on Doug’s path, it may be that he has clear shots with respect to two civilians C and D when considered individually, but no clear shot when considered together. This happens when the two clear shots are at two different angles. So Alex spent part of Friday re-thinking the question (switching to shooting centre points means that a point ‘x’ defines a unique angle to shoot the enemy, so that the results for different civilians can be simply overlaid – and enemies had to become robots with CPUs in the backstory!). All of this pushed the finishing of the question late, to the point that the test data and geometric code was only added to the system 15 minutes before the contest started – a really close call!
It was the Boggle question where lateness got us in real trouble, though. Half way through the contest, one of the teams came over to point out that our provided sample data was wrong. Indeed, it was not listing ‘quo’ or ‘qua’ as valid words – because my ‘qu’ handling had replaced ‘qu’ with ‘q’ everywhere, so my code looked at ‘qo’ and ‘qa’, saw 2-letter words and rejected them. What’s especially stupid about this is that the dictionary had 2-letter words removed, so there wasn’t even any need to check the word length. The immediate problem was that we knew the ‘real’ test data in the system would now be wrong. Here’s where it got farcical. My code to compute the output was on my work PC, so I decided to run to the office to get it (missing out on most of the pizza … oh well, better than going for the run after the pizza!). I patched it up, and Andrew rustled up a solution quickly too. Bizarrely, our results differed slightly on occasional tests, out by one. In the simplest such case, I found ‘req’ where he did not. We glanced into the dictionary file and didn’t see ‘req’, so decided Andrew’s was right. The unspeakably-long-named-team’s solution agreed with his, and Alex weighed in with another agreeing solution a few minutes later. So we went with that.
There was a twist the next morning as I looked into my ‘req’ bug. I discovered that it was in the dictionary all along, and we hadn’t seen it because the dictionary wasn’t in alphabetical order … who makes a dictionary like that!! We decided my solution was right and that Andrew’s code didn’t cope with such a dictionary. Panic now set in as we realised a couple of teams had tried the question but been put off by being told they were incorrect for the easy case. What if they had got it right and been put off by being told they were incorrect? We quickly saw that this would turn the scoreboard upside down, with the winners demoted to fourth, and a couple of third-placed teams promoted to joint-second. I scrambled to check the source code they’d submitted!! Much to my relief, we had a submission that printed out ‘3’ for each test case, one that didn’t terminate, and one that didn’t compile. The leaderboard had still changed, technically (first two teams swapped), but we felt ok with leaving it as-is: the winners had solved the Boggle question in spirit.
The story ends with Andrew running up to me triumphantly after lunch: “My code is right! My code is right!”. Of course, ‘req’ is not a valid Boggle word, since there’s no u after the q. So my solution had contained 2 different bugs, the winners were correct, and the leaderboard stuck without controversy. Phew.
So that’s it for another year. Hope you had as much fun as I did :)