Staff in prediction
Former England international player turned broadcaster Gary Lineker famously said “Football is a simple game; 22 men chase a ball for 90 minutes and at the end, the Germans always win.” That proved true for the 2014 World Cup, with a late German goal securing a 1-0 win over Argentina.
Towards the end of March, we posted predictions for the final ordering of teams in the World Cup, based on Ranker’s re-ranks and voting data. During the tournament, we posted an update, including comparisons with predictions made by FiveThirtyEight and Betfair. With the dust settled in Brazil (and the fireworks in Berlin shelved), it is time to do a final evaluation.
Our prediction was a little different from many others, in that we tried to predict the entire final ordering of all 32 teams. This is different from sites like Betfair, which provided an ordering in terms of the predicted probability each team would be the overall winner. In order to assess our order against the true final result, we used a standard statistical measure called partial tau. It is basically an error measure — 0 would be a perfect prediction, and the larger the value grows the worse the prediction — based on how many “swaps” of a predicted order need to be made to arrive at the true order. The “partial” part of partial tau allows for the fact that the final result of the tournament is not a strict ordering. While the final and 3rd place play-off determined the order of the first four teams: Germany, Argentina, the Netherlands, and Brazil, other groups of teams are effectively tied from then on. All of the teams eliminated in the quarter finals can be regarded as having finished in equal fifth place. All of the teams eliminated in the first game past the group stage finished equal sixth. And all of the 32 teams eliminated in group play finished equal last.
The model we used to make our predictions involved three sources of information. The first was the ranks and re-ranks provided by users. The second was the up and down votes provided by users. The third was the bracket structure of the tournament itself. As we emphasized in our original post, the initial group stage structure of the World Cup provides strong constraints on where teams can and cannot finish in the final order. Thus, we were interested to test how our model predictions depended on each sources of information. This lead to a total of 8 separate models
- Random: Using no information, but just placing all 32 teams in a random order.
- Bracket: Using no information beyond the bracket structure, placing all the teams in an order that was a possible finish, but treating each game as a coin toss.
- Rank: Using just the ranking data.
- Vote: Using just the voting data.
- Rank+Vote: Using the ranking and voting data, but not the bracket structure.
- Bracket+Vote: Using the voting data and bracket structure, but not the ranking data.
- Bracket+Rank: Using the ranking data and bracket structure, but not the voting data.
- Rank+Vote+Bracket: Using all of the information, as per the predictions made in our March blog post.
We also considered the Betfair and FiveThirtyEight rankings, as well as the Ranker Ultimate List at the start of the tournament, as interesting (but maybe slightly unfair, given their different goals) comparisons. The partial taus for all these predictions, with those based on less information on the left, and those based on more information on the right, are shown in the graph below. Remember, lower is better.
The prediction we made using the votes, ranks, and bracket structure out-performed Betfair, FiveThirtyEight, and the Ranker Ultimate List. This is almost certainly because of the use of the bracket information. Interestingly, just using the ranking and bracket structure information, but not the votes, resulted in a slightly better prediction. It seems as if our modeling needs to improve how it benefits from using both ranking and voting data. The Rank+Vote prediction was worse than either source alone. It is also interesting to note that the Bracket information by itself is not useful — it performs almost as poorly as a random order — but it is powerful when combined with people’s opinions, as the improvement from Rank to Bracket+Rank and from Vote to Bracket+Vote show.