by    in Data

Ranker Voters Predict Matt Ryan’s NFL MVP Win

The 2016 NFL season is now in the rearview mirror, ending with a Super Bowl that will be talked about for years to come. Most of the records in the Big Game were set by winning Patriots quarterback Tom Brady, leaving Falcons quarterback Matt Ryan with plenty to despair. The day before the Super Bowl, however, Ryan had something to celebrate: he was named the league’s MVP.

Ranker has a popular list for Players Most Likely to Be the 2016-17 MVP. The list was published in November 2016, and over 20 players received a total of more than 30,000 thumbs-up and thumbs-down votes. We were interested in whether this list predicted Ryan’s win, and how the patterns of opinions expressed by the voters changed over time.

The figure above summarizes the raw voting data for Matt Ryan. The black cross markers show the empirical proportion of up-votes to total votes on each individual day that votes were cast. The size of the crosses corresponds to how many votes were cast on that day. There is an increase in the proportion of up-votes, beginning around January 15. The two Sundays, marked on the x-axis, which includes the NFC Divisional Round and NFC Championship, on January 15 and 22. Ryan and his teammates played their best games of the year as the franchise made its second Super Bowl appearance.

The blue line in the figure above shows the cumulative proportion of up-votes to total votes over each day voting was active. This cumulative proportion increases after January 15, but not to a large degree, because of the accumulated earlier votes continuing to affect the overall proportion. The problem with this analysis is that it assumes voting always reflects the same opinion, so that all votes are lumped together, and the thumbs-up or thumbs-down votes last November count equally with votes registering an opinion right before the MVP was announced.

So, we developed a new model to analyze these data, as an alternative to cumulative opinion. Our new model tries to measure current opinion, rather than cumulative opinion. It does this by allowing for swings in opinion. For something as hotly contested like NFL MVP, it’s easy to imagine opinions changing based on a good or bad game, or even an injury. Between change points, our model assumes the crowd has a stable opinion, but each time a change point is encountered, the opinion can shift. Our algorithm for applying the current opinion model is able to identify how many changes are evident in a sequence of voting data, where those change points are, and what the stable opinion in each stage are.

The results of applying the current opinion model to Matt Ryan’s data are shown in the figure above by the red line. Two change points are inferred, around November 22, 2016 and January 22, 2017. Opinion starts just below 60%, drops to about 30%, and then rises again to a final value just above 60% in time leading into the award’s announcement.

 

The two panels in the figure above shows the cumulative opinion (left-hand panel) and current opinion (right-hand panel) measures for eight leading candidates for the MVP award, including Ryan. These players were all heavily voted on, and include the leading candidates discussed in the media. For both opinion measures, the natural way to make a prediction is to order the players according the opinion right before the February 4 announcement. Cumulative opinion ranks Ryan in fifth place, behind Ezekiel Elliott, Dak Prescott, Tom Brady, and Aaron Rodgers. Rookie stars Elliott and Prescott had dominant seasons for the Dallas Cowboys, but the early MVP excitement faded to a more realistic assessment that rookie winners are less likely, nevertheless teammates. Their prospects of winning were faded once the Cowboys lost in their playoff game. Brady and Rodgers, contrarily, are well-established and high-profile perennial favorites for the MVP award.

The current opinion measure shows that Ranker voters had it right, correctly predicting Matt Ryan as the winner. It is interesting to see that Brady ranked second according to current opinion, since he was widely tipped as the only other serious possible winner in the days before the award was announced. Both Elliott and Prescott show plausible and interpretable downward changes in opinion over the period of voting. Rodgers shows an interesting large, but short-lived drop in opinion immediately after Green Bay was eliminated on January 22. Generally, many of the inferred change points occur immediately following a significant game result, although there is no constraint in our analysis that requires this. In effect, the change points reveal that game-day performance is the most likely thing to sway opinion.

The overall message is that voting data on Ranker expresses valuable crowd opinions, especially when analyzed in the right way, by allowing for opinion to change. When making predictions about an upcoming event, more recent opinion will often be better. More information is available, and less time must pass before the answer is known, reducing uncertainty. Whether or not it makes Matt Ryan feel better, our analysis shows that Ranker voters are on board with him being the NFL’s Most Valuable Player.

– Michael Lee and Lucy Wu

by    in Data

Using Data To Determine The Best Months Of The Year

Why do people like some months more than others? For many, it is all about the holidays:

“I love the scents of winter! For me, it’s all about the feeling you get when you smell pumpkin spice, cinnamon, nutmeg, gingerbread and spruce.” – Taylor Swift

while for others, it is about avoiding the cold

“A lot of people like snow. I find it to be an unnecessary freezing of water.” – Carl Reiner

and for some more disaffected souls, it is about the specifics

“August used to be a sad month for me. As the days went on, the thought of school starting weighed heavily upon my young frame.“ – Henry Rollins

Presumably all of these preferences and this angst is reflected in Ranker’s Best Months of the Year list. The graphic below provides a visualization of the opinions of ranker users. Each row is a different person, and their (sometimes incomplete) ranking of the months is shown from best-to-worst from left-to-right. The months are color coded by the four seasons: Spring has the hues of green, summer is yellow, fall has the rustic earth hues of brown, and winter is blue.

BestMonthsOriginal

The patchwork quilt of colors and hues makes it is clear that different people have different opinions. We wanted to understand the structure of these individual differences, using cognitive data analysis.

To do this, we used a simple model of how people produce rankings—known as a Thurstonian model, going back to the 1920s in psychology—that we have previously applied successfully to Ranker data. Rather than assuming everybody’s rankings were based on a shared opinion, we allowed this version of the model to have groups or clusters of people, and for each group to have their own preferences for the months. We didn’t want to pre-determine the number of groups, and so we allowed our model to make this inference directly from the data. Our modeling approach thus involves two sorts of interacting uncertainties: about how many groups there are, and about which people belong to which group. Bayesian statistical methods are well suited to handling these sorts of uncertainties.

For fans of Bayesian cognitive graphical models — we know you’re out there — the final model we used is shown in the figure below. For non-fans of Bayesian cognitive graphical models — we KNOW you’re out there — there are three important parts. The variable gamma at the top corresponds to how many groups there are, the variables z to the side correspond to which of these groups each individual belongs, and all of this is inferred from the rankings people gave, represented by the variables at the bottom.

GraphicalModel

The figure below shows the first key insight from the model. It shows the probability that there are 1, 2, …, 17 groups, ranging from everybody having the same opinion about the best months, to everyone having their own unique opinion. There is uncertainty about how many groups the rankings reveal, but the most likely answer is that there are four.

Gamma

Assuming there are four groups, the figure below organizes the ranking data  by grouping together the people most likely to belong to each group. Group 1 shows a preference for late summer and early fall, and hates cold weather. Group 2 shows a preference for the holidays. They like fall and Christmas time and despise hot weather. Group 3 loves the summertime and hates the winter. We had a look at where these people were from, and it probably comes as no surprise they’re all from the north-east of the US. The last group, a bit like Henry Rollins, stands out as a consensus of one.

BestMonths

This analysis shows how cognitive models with individual differences can help understand opinion groupings, and deal with difficult questions like how many groups exist. One especially interesting feature of the Best Months list is that at least one of the groups is defined more by what comes at the bottom of their lists than the top. People in group 1 don’t agree very precisely on which months they like, but they all agree they don’t like winter months. This shows that it is not just the top few items on a Ranker list that carry useful information: what comes at the bottom can be just as informative. Both what you love and hate matters.

“When I was young, I loved summer and hated winter. When I got older I loved winter and hated summer. Now that I’m even older, and wiser, I hate both summer and winter.” – Jarod Kintz

 

Crystal Velasquez and Michael Lee

by    in Popular Lists

Tracking Votes to Measure Changing Opinions

A key part of any Ranker list are the votes are associated with each item, counting how often a user has given that item the “thumbs up” or “thumbs down”. These votes measure people’s opinions about politics, movies, celebrities, music, sports, and all of the other issues Ranker lists cover.

A natural question is how the opinions that votes measure relate to external assessments. As an example, we considered the The Most Dangerous Cities in America list. Forbes magazine lists the top 10 as Detroit, St. Louis, Oakland, Memphis, Birmingham, Atlanta, Baltimore, Stockton, Cleveland, and Buffalo.

The graph below show the proportion of up-votes, evolving over time up towards the end of last year, for all of the cities voted on by Ranker users. Eight of the Forbes’ list are included, and are highlighted. They are all in the top half of the worst cities in the list, and Detroit is correctly placed clearly as the overall worst city. Only Stockton and Buffalo, at positions 8 and 10 on the Forbes list, are missing. There is considerable agreement between the expert opinion from Forbes’ analysis, and the voting patterns of Ranker users.

MostDangerousCitiesAmerica

Because Ranker votes are recorded as they happen, they can potentially also track changes in people’s opinions. To test this possibility, we turned to a pop-culture topic that has generated a lot of votes. The Walking Dead is the most watched drama series telecast in basic cable history, with 17.3 million viewers tuning in to watch the season 5 premiere. With such a large fan base of zombie lovers and characters regularly dying left and right, there is a lot of interest in The Walking Dead Season 5 Death Pool list.

The figure below shows the pattern of change in the proportion of up-votes for the characters in this list, and highlights three people. For the first four seasons, Gareth had been one of the main antagonists and survivors on the show. His future as a survivor became unclear in an October 13th episode where Rick vowed to kill Gareth with a machete and Gareth, undeterred, simply laughed at the threat. Two episodes later on October 26th, Rick fulfilled his promise and killed Gareth using the machete While Gareth apparently did not take the threat seriously, the increase in up-votes for Gareth during this time makes it clear many viewers did.

WalkingDeadDeathPool

A second highlighted character, Gabriel, is a priest introduced in the latest season of the October 19th episode. Upon his arrival, Rick has already expressed his distrust in the priest and threatened that, if his own sins ends up hurting his family, it will be Gabriel who has to face the consequences. Since Rick is a man of many sins, the threat seems to be real. Ranker voters agree, as shown by the jump in up-votes around mid-October, coinciding with Gabriel’s arrival on the show.

The votes also sometimes tell us who has a good chance of surviving. Carol Peletier had been a mainstay in the season, but was kidnapped in the October 19th episode and did not appear in the following episode. She briefly appeared again in the subsequent episode, only to be rendered unconscious. Despite the ambiguity surrounding her survival, her proportion of up-votes decreased significantly, perhaps driven by her mention by another character, which provided a sort of “spoiler” hinting at survival.

While these two examples are just suggestive, the enormous number of votes made by Ranker uses, and the variety of topics they cover, makes the possibility of measuring opinions, and detecting and understanding change in opinions, an intriguing one. If there were a list of “Research uses for Ranker data”, we would give this item a clear thumbs up.

  • Emily Liu & Michael Lee
by    in prediction

Ranker World Cup Predictions Outperform Betfair & FiveThirtyEight

Former England international player turned broadcaster Gary Lineker famously said “Football is a simple game; 22 men chase a ball for 90 minutes and at the end, the Germans always win.” That proved true for the 2014 World Cup, with a late German goal securing a 1-0 win over Argentina.

Towards the end of March, we posted predictions for the final ordering of teams in the World Cup, based on Ranker’s re-ranks and voting data. During the tournament, we posted an update, including comparisons with predictions made by FiveThirtyEight and Betfair. With the dust settled in Brazil (and the fireworks in Berlin shelved), it is time to do a final evaluation.

Our prediction was a little different from many others, in that we tried to predict the entire final ordering of all 32 teams. This is different from sites like Betfair, which provided an ordering in terms of the predicted probability each team would be the overall winner. In order to assess our order against the true final result, we used a standard statistical measure called partial tau. It is basically an error measure — 0 would be a perfect prediction, and the larger the value grows the worse the prediction — based on how many “swaps” of a predicted order need to be made to arrive at the true order. The “partial” part of partial tau allows for the fact that the final result of the tournament is not a strict ordering. While the final and 3rd place play-off determined the order of the first four teams: Germany, Argentina, the Netherlands, and Brazil, other groups of teams are effectively tied from then on.  All of the teams eliminated in the quarter finals can be regarded as having finished in equal fifth place. All of the teams eliminated in the first game past the group stage finished equal sixth. And all of the 32 teams eliminated in group play finished equal last.

The model we used to make our predictions involved three sources of information. The first was the ranks and re-ranks provided by users. The second was the up and down votes provided by users. The third was the bracket structure of the tournament itself. As we emphasized in our original post, the initial group stage structure of the World Cup provides strong constraints on where teams can and cannot finish in the final order. Thus, we were interested to test how our model predictions depended on each sources of information. This lead to a total of 8 separate models

  • Random: Using no information, but just placing all 32 teams in a random order.
  • Bracket: Using no information beyond the bracket structure, placing all the teams in an order that was a possible finish, but treating each game as a coin toss.
  • Rank: Using just the ranking data.
  • Vote: Using just the voting data.
  • Rank+Vote: Using the ranking and voting data, but not the bracket structure.
  • Bracket+Vote: Using the voting data and bracket structure, but not the ranking data.
  • Bracket+Rank: Using the ranking data and bracket structure, but not the voting data.
  • Rank+Vote+Bracket: Using all of the information, as per the predictions made in our March blog post.

We also considered the Betfair and FiveThirtyEight rankings, as well as the Ranker Ultimate List at the start of the tournament, as interesting (but maybe slightly unfair, given their different goals) comparisons. The partial taus for all these predictions, with those based on less information on the left, and those based on more information on the right, are shown in the graph below. Remember, lower is better.

The prediction we made using the votes, ranks, and bracket structure out-performed Betfair, FiveThirtyEight, and the Ranker Ultimate List. This is almost certainly because of the use of the bracket information. Interestingly, just using the ranking and bracket structure information, but not the votes, resulted in a slightly better prediction. It seems as if our modeling needs to improve how it benefits from using both ranking and voting data. The Rank+Vote prediction was worse than either source alone. It is also interesting to note that the Bracket information by itself is not useful — it performs almost as poorly as a random order — but it is powerful when combined with people’s opinions, as the improvement from Rank to Bracket+Rank and from Vote to Bracket+Vote show.

by    in prediction

Predicting the Movie Box Office

The North American market for films totaled about US$11,000 million in 2013, with over 1300 million admissions. The film industry is a big business that not even Ishtar, nor Jaws: The Revenge, nor even the 1989 Australian film “Houseboat Horror” manages to derail. (Check out Houseboat Horror next time you’re low on self-esteem, and need to be reminded there are many people in the world much less talented than you.)

Given the importance of the film industry, we were interested in using Ranker data to make predictions about box office grosses for different movies. The ranker list dealing with the Most Anticipated 2013 Films gave us some opinions — both in the form of re-ranked lists, and up and down votes — on which to base predictions. We used the same cognitive modeling approach previously applied to make Football (Soccer) World Cup predictions, trying to combine the wisdom of the ranker crowd.

Our basic results are shown in the figure below. The movies people had ranked are listed from the heavily anticipated Iron Man 3, Star Trek: Into Darkness, and Thor: The Dark World down to less anticipated films like Simon Killing, The Conjuring, and Alan Partridge: Alpha Papa. The voting information is shown in the middle panel, with the light bar showing the number of up-votes and the dark bar showing the number of down-votes for each movie. The ranking information is shown in the right panel, with the size of the circles showing how often each movie was placed in each ranking position by a user.

This analysis gives us an overall crowd rank order of the movies, but that is still a step away from making direct predictions about the number of dollars a movie will gross. To bridge this gap, we consulted historical data. The Box Office Mojo site provides movie gross totals for the top 100 movies each year for about the last 20 years. There is a fairly clear relationship between the ranking of a movie in a year, and the money it grosses. As the figure below shows, a few highest grossing movies return a lot more than the rest, following a “U-shaped” pattern that is often found in real-world statistics. If a movie is the 5th top grossing in a given year, for example, it grosses between about 100 and 300 million dollars. if it is the 50th highest grossing, it makes between about 10 and 80 million.

We used this historical relationship between ranking and dollars to map our predictions about ranking to predictions about dollars. The resulting predictions about the 2013 movies are shown below. These predictions are naturally uncertain, and so cover a range of possible values, for two reasons. We do not know exactly where the crowd believed they would finish in the ranking list, and we only know a range of possible historical grossed dollars for each rank. Our predictions acknowledge both of those sources of uncertainty, and the blue bars in the figure below show the region in which we predicted it was 95% likely to final outcome would lie. To assess our predictions, we looked up the answers (again at Box Office Mojo), and overlayed them as red crosses.

Many of our predictions are good, for both high grossing (Iron Man 3, Star Trek) and more modest grossing (Percy Jackson, Hansel and Gretel) movies. Forecasting social behavior, though, is very difficult, and we missed a few high grossing movies (Gravity) and over-estimated some relative flops (47 Ronin, Kick Ass 2). One interesting finding came from contrasting an analysis based on ranking and voting data with similar analyses based on just ranking or just voting. Combining both sorts of data led to more accurate predictions than using either alone.

We’re repeating this analysis for 2014, waiting for user re-ranks and votes for the Most Anticipated Films of 2014. The X-men and Hunger Games franchises are currently favored, but we’d love to incorporate your opinion. Just don’t up-vote Houseboat Horror.

by    in Data Science, prediction, Rankings

World Cup 2014 Predictions

An octopus called Paul was one of the media stars of the 2010 soccer world cup. Paul correctly predicted 11 out of 13 matches, including the final in which Spain defeated the Netherlands. The 2014 world cup is in Brazil and, in an attempt to avoid eating mussels painted with national flags, we made predictions by analyzing data from Ranker’s “Who Will Win The 2014 World Cup?” list.

Ranker lists provide two sources of information, and we used both to make our predictions. One source is the original ranking, and the re-ranks provided by other users. For the world cup list, some users were very thorough, ranking all (or nearly all) of the 32 teams who qualified for the world cup. Other users were more selective, listing just the teams they thought would finish in the top places. An interesting question for data analysis is how much weight should be given to different rankings, depending on how complete they are.

The second source of information on Ranker are the thumbs-up and thumbs-down votes other users make in response to the master list of rankings. Often ranker lists have many more votes than they have re-ranks, and so the voting data potentially are very valuable. So, another interesting question for data analysis is how the voting information should be combined with the ranking information.

A special feature of making world cup predictions is that there is very useful information provided by the structure of the competition itself. The 32 teams have been drawn in 8 brackets with 4 teams each. Within a bracket, every team plays every other team once in initial group play. The top two teams from each bracket then advance to a series of elimination games. This system places strong constraints on possible outcomes, which a good prediction should follow. For example, Although Group B contains Spain, the Netherlands, and Chile — all strong teams, currently ranked in the top 16 in the world according to FIFA rankings — only two can progress from group play and finish in the top 16 for the world cup.

We developed a model that accounts for all three of these sources of information. It uses the ranking and re-ranking data, the voting data, and the constraints coming from the brackets, to make an overall prediction. The results of this analysis are shown in the figure. The left panel shows the thumbs-up (to the right, lighter) and thumbs-down (to the left, darker) votes for each team. The middle panel summarizes the ranking data, with the area of the circles corresponding to how often each team was ranked in each position. The right hand panel shows the inferred “strength” of each team on which we based our predicted order.

Our overall prediction has host-nation Brazil winning. But the distribution of strengths shown in the model inferences panel suggests it is possible Germany, Argentina, or Spain could win. There is little to separate the remainder of the top 16, with any country from the Netherlands to Algeria capable of doing well in the finals. The impact of the drawn brackets on our predictions is clear, with a raft of strong countries — the England, USA, Uruguay, and Chile — predicted to miss the finals, because they have been drawn in difficult brackets.

– Michael Lee

by    in Data Science, prediction

Combining Preferences for Pizza Toppings to Predict Sales

The world’s most expensive pizza, auctioned for $4,200 as a charity gift in 2007, was topped with edible gold, lobster marinated in cognac, champagne-soaked caviar, smoked salmon, and medallions of venison. While most of us prefer (or can only afford to prefer) more humble ingredients, our preferences are similarly diverse.  Ranker has a Tastiest Pizza Toppingslist that asks people to express their preferences. At the time of writing there are 29 re-ranks of this list, and a total of 64 different ingredients mentioned. Edible gold, by the way, is not one of them.

Equipped with this data about popular pizza toppings, we were interested in finding out if pizzerias were actually selling the toppings that people say that they want. We also wanted to see if we could predict sales for individual ingredients by looking at one list that combined all of the responses about pizza topping preferences. This “Ultimate List” contains all of toppings that were listed in individual lists (known as re-ranks) and is ordered in a way that reflects how many times each ingredient was mentioned and where they ranked on individual lists. Many of the re-ranks only list a few ingredients, so it is fitting to combine lists and rely on the “wisdom of the crowd” to get a more complete ranking of many possible ingredients.

As a real-world test of how people’s preferences correspond to sales, we used Strombolini’s New York Pizzeria’s list of their top 10 selling ingredients. Pepperoni, cheese, sausage and mushrooms topped the list, followed by: pineapple, bacon, ham, shrimp, onion, and green peppers. All of these ingredients, save for shrimp, are included in the Ranker lists so we considered the 9 overlapping ingredients and measured how close each user’s preference list was to the pizzeria’s sales list.

To compare lists, we used a standard statistical measure known as Kendall’s tau, which counts how many times we would need to swap one item for another (known as a pair-wise swap) before two lists are identical. A Kendall’s tau of zero means the two lists are exactly the same. The larger the Kendall’s tau value becomes, the further one list is from another.

The figure shows, using little stick people, the Kendall’s tau distances between users’ lists, and the Strombolini’s sales list. The green dot corresponds to a perfect tau of zero, and the red dot is the highest possible tau (if two lists are the exact opposite of the other). The dotted line is provided as a reference to show how likely each Kendall’s tau value is by chance (that is, how often different Kendall’s tau values occur for random lists of the ingredients). It is clear that there are large differences in how close individual users’ lists came to the sales-based list. It is also clear that many users produced rankings that were quite different from the sales-based list.

Using this model, the combined list came out to be: cheese, pepperoni, bacon, mushrooms, sausage, onion, pineapple, ham, and green peppers. This is a Kendall’s tau of 7 pair-wise swaps from the Strombolini list, as shown in the figure by the blue dot representing the crowd. This means the combined list is closer to the sales list than all but one of the individual users.

Our “wisdom of the crowd” analysis, combining all the users’ lists, used the same approach we previously applied to predicting celebrity deaths using Ranker data. It is a “Top-N” variant of the psychological approach developed in our work modeling decision-making and individual differences for ranking lists, and has the nice property of naturally incorporating individual differences.

This analysis is a beginning example of a couple of interesting ideas. One is that it is possible to extract relatively complete information from a set of incomplete opinions provided by many people. The other is that this combined knowledge can be compared to, and possibly be predictive of, real-world ground truths, like whether more pizzas have bacon or green peppers on them.  It may never begin to explain, however, why someone would waste champagne-soaked caviar on pizza, as a topping.

by    in Data Science, prediction

Recent Celebrity Deaths as Predicted by the Wisdom of Ranker Crowds

At the end of each year, there are usually media stories that compile lists of famous people who have passed away. These lists usually cause us to pause and reflect. Lists like Celebrity Death Pool 2013 on Ranker, however, give us an opportunity to make (macabre) predictions about recent celebrity deaths.

We were interested in whether “wisdom of the crowd” methods could be applied to aggregate the individual predictions. The wisdom of the crowd is about making more complete and more accurate predictions, and both completeness and accuracy seem relevant here. Being complete means building an aggregate list that identifies as many celebrity deaths as possible. Being accurate means, in a list where only some predictions are borne out, placing those who do die near the top of the list.

Our Ranker data involved the lists provided by a total of 27 users up until early in 2013. (Some them were done after at least one celebrity, Patti Page, had passed away, but we thought they still provided useful predictions about other celebrities). Some users predicted as many as 25 deaths, while some made a single prediction. The median number of predictions was eight, and, in total, 99 celebrities were included in at least one list. At the time of posting, six of the 99 celebrities have passed away.

One way to measure how well a user made predictions is to work down their list, keeping track of every time they correctly predicted a recent celebrity death. This approach to scoring is shown for all 27 users in the graph below. Each blue circle corresponds to a user, and represents their final tally. The location of the circle on the x-axis corresponds to the total length of their list, and the location on the y-axis corresponds to the total number of correct predictions they made. The blue lines leading up to the circles track the progress for each user, working down their ranked lists. We can see that the best any user did was predict two out or the current six deaths, and most users currently have none or one correct predictions in their list.

To try and find some wisdom in this crowd of users, we applied an approach to combining rank data developed as part of our general research into human decision-making, memory, and individual differences. The approach is based on classic models in psychology that go all the way back to the work of Thurstone in 1931, but has some modern tweaks. Our approach allows for individual differences, and naturally identifies expert users, upweighting their opinions in determining the aggregated crowd list. A paper describing the nuts and bolts of our modeling approach can be found here (but note we used a modified version for this problem, because users only provide their “Top-N” responses, and they get to choose N, which is the length of their list).

The net result of our modeling is a list of all 99 celebrities, in an order that combines the rankings provided by everybody. The top 5 in our aggregated list, for the morbidly curious, are Hugo Chavez (already a correct prediction), Fidel Castro, Zsa Zsa Gabor, Abe Vigoda, and Kirk Douglas. We can assess the wisdom of the crowd in the same way we did individuals, by working down the list, and keeping track of correct predictions. This assessment is shown by the green line in the graph below. Because the list includes all 99 celebrities, it will always find the six who have already recently passed away, and the names of those celebrities are shown at the top, in the place they occur in the aggregated list.

Recent Celebrity Deaths and Predictions

The interesting part assessing the wisdom of the crowd is how early in the list it makes correct predictions about recent celebrity deaths. Thus, the more quickly the green line goes up as it moves to the right, the better the predictions of the crowd. From the graph, we can see that the crowd is currently performing quite well, and is certainly about the “chance” line, represented by the dotted diagonal. (This line corresponds to the average performance of a randomly-ordered list).

We can also see that the crowd is performing as well as, or better than, all but one of the individual users. Their blue circles are shown again along with crowd performance. Circles that lie above and to the left of the green line indicate users outperforming the crowd, and there is only one of these. Interestingly, predicting celebrity deaths by using age, and starting with the oldest celebrity first, does not perform well. This seemingly sensible heuristic is assessed by the red line, but is outperformed by the crowd and many users.

Of course, it is only May, so the predictions made by users on Ranker have time to be borne out. Our wisdom of the crowd predictions are locked in, and we will continue to update the assessment graphs.

– Michael Lee