Darkness 2001-09-14 00:00:00 UTC
 
Twilight 2001-09-15 00:00:00 UTC
 
Daylight 2001-09-16 00:00:00 UTC
 
Finish 2001-09-21 00:00:00 UTC

Mastermind - Statistics

Contest Evolution

There were 1138 entries in this contest, with more than a hundred participants (the exact number is hard to determine since people don't always report their names consistently).

This area plot shows how the entries added up as the contest wore on. The green area represents the number of files that passed the test suite, and the red area shows those that failed. Overall 507 of 1138 entries passed the test suite, for a pass rate of 44.5%.

(Note: This contest analysis was calculated and published entirely from MATLAB)


This histogram shows when contest activity was occurring. Notice the obvious spike around midnight on the second day of the contest. This is when E. Brian Welch was submitting a flurry of files designed to exploit the PERSISTENT function loophole (he describes this approach in another part of this analysis).

The histogram at the bottom shows when E. Brian Welch was submitting his files.


This is probably the single most useful diagram for visualizing the contest: here we see the score of the entries logged against the time at which they were submitted. Since a lower score is better, the lowest part of this data set decreases steadily over time as the lead entry gets better. All leaders are shown connected by a red line. Therefore the winning entry, "fcol_a_05" by Stijn Helsen is found in the lower right corner of the plot: the lowest score at the end of the contest.

This plot shows some of the dramatic improvements in the algorithm that occurred over time.


Here we are looking at the same basic presentation as above, only now line segments are shown connecting entries that are related. You can see in a few places multiple lines that radiate from a few points, indicating places where the same entry spawned a second generation of multiple offspring.

We have highlighted with red connecting lines the descendants of Nedialko Krouchev's "t7c", which has the distinction of being in first place for the longest period of time.


The scoring for the contest was based on several factors: the correctness of the answer in terms of black pegs and white pegs, how many guesses it took to reach the final answer, and how many seconds it took to reach the final answer. From a practical point of view the contest was driven by the ability to to guess as many black pegs as possible before timing out at 180 seconds of CPU time. Soon after the contest started, the algorithms were cycling through all possible colors in the first few guesses, which means that they were assured at least all white pegs. So maximum black peg count ended up being the distinguishing characteristic of the leader. In the following plots, we see on the left score vs. CPU time.

On the right we are showing the percentage of black pegs vs. score. This plot makes obvious the correlation between black pegs and getting a better (i.e. lower) score. By the end of the contest, the correlation is almost perfect.


In order to get a feel for the trends in the contest, let's focus on the score vs. CPU time plot. Here we are looking at each of the five days of the contest. All entries for all days are shown in light blue for context, and the entries for the day in question appear as red dots. This lets you see how large-scale trends play out over time.

Here is a plot in which we pay particular attention to the breakthroughs in the algorithm as the contest progressed. These "corner points" represent significant improvements to the algorithm which were then tweaked by following entries.

Now we'll do the same plot on a score vs. CPU time plot for the sake of comparison.

Here are some pictures that illustrate how the algorithms work. We'll examine how some of the critical "corner point" breakthroughs improved on earlier approaches. Note on the visualization: the guesses are shown with colored pixels on the left, and the score is shown with black and white pixels on the right. The correct answer is at the very top on the left, and the final guess appears just below it.

Our first example entry (on the left here) simply made random guesses and kept the best for its final answer. Soon after the contest started, E. Brian Welch started the guessing by cycling through all the colors first, so that at least he could be assured of doing no worse than all white pegs. His entry "ebw_sub03" became the first big corner point improvement. Even with this improvement, though, a random guess is unlikely to ever guess the correct answer in all but the simplest problems. The next innovation has to do with an algorithm that is guaranteed to converge.


Our next example is the first from Benno Weber's "alcohol series" (we here on the MATLAB Contest Team would like to nominate Benno for best entry names). Notice that he also cycles through all the colors initially. He picks a color and tries to find the column where it belongs. On the right we have John Stroebels "submission 1". Stroebel rocketed into the lead at the end of the first day.

Here are three of the later entries, starting with Stijn Helsen's breakthrough code based on a binary tree algorithm. He's able to nail down the position for a given color in far less time. Also show here are Paul Valiant's "new ugly" and Stijn's winning entry.

Several people sent us email observing that the contest ended a little too soon. They wanted to spend weekend time attacking the problem, time that simply wasn't available during the week. For the most part, the story of the last three days of the contest was the steady improvement of a common theme of attack. But several other approaches were being developed and might have succeeded in taking the lead, given a little more time. One example is Alex Backer's Papi code which appeared in the last few hours of the contest. Look at the dramatic difference in solving technique between Stijn's winning entry on the left and Alex's contender Papi5.

For those of you who feel you didn't have enough time to get involved in this contest, get ready for the next one. And we plan on taking your advice and lengthening the next contest so it stretches over the weekend. Thanks for playing, and we'll see you next time.


We close here with a composite picture of the contest that shows the score vs. CPU time picture with algorithm visualization pictures superimposed.