A Ranker Opinion Graph of the Domains of the World of Comedy

One unique aspect of Ranker data is that people rank a wide variety of lists, allowing us to look at connections beyond the scope of any individual topic.  We compiled data from all of the lists on Ranker with the word “funny” to get a bigger picture of the interconnected world of comedy.  Using Gephi layout algorithms, we were able to create an Opinion Graph which categorizes comedy domains and identify points of intersection between them (click to make larger).

all3sm

In the following graphs, colors indicate different comedic categories that emerged from a cluster analysis, and the connecting lines indicate correlations between different nodes with thicker lines indicating stronger relationships.  Circles (or nodes) that are closest together are most similar.  The classification algorithm produced 7 comedy domains:

 

CurrentTVwAmerican TV Shows and Characters: 26% of comedy, central nodes =  It’s Always Sunny in Philadelphia, ALF, The Daily Show, Chappelle’s Show, and Friends.

NowComedianwContemporary Comedians on American Television: 25% of nodes, includes Dave Chappelle, Eddie Izzard, Ricky Gervais, Billy Connolly, and Bill Hicks.

 

ClassicComedianswClassic Comedians: 15% of comedy, central nodes = John Cleese, Eric Idle, Michael Palin, Charlie Chaplin, and George Carlin.

ClassicTVClassic TV Shows and Characters: 14% of comedy, central nodes = The Muppet Show, Monty Python’s Flying Circus, In Living Color, WKRP in Cincinnati, and The Carol Burnett Show.

BritComwBritish Comedians: 9% of comedy, central nodes = Rowan Atkinson, Jennifer Saunders, Stephen Fry, Hugh Laurie, and Dawn French.

AnimwAnimated TV Shows and Characters: 9% of comedy, central nodes = South Park, Family Guy, Futurama, The Simpsons, and Moe Szyslak.

MovieswClassic Comedy Movies: 1.5% of comedy, central nodes = National Lampoon’s Christmas Vacation, Ghostbusters, Airplane!, Vacation, and Caddyshack.

 

 

Clusters that are the most similar (most overlap/closest together):

  • Classic TV Shows and Contemporary TV Shows
  • British Comedians and Classic TV shows
  • British Comedians and Contemporary Comedians on American Television
  • Animated TV Shows and Contemporary TV Shows

Clusters that are the most distinct (lest overlap/furthest apart):

  • Classic Comedy Movies do not overlap with any other comedy domains
  • Animated TV Shows and British Comedians
  • Contemporary Comedians on American Television and Classic TV Shows

 

Take a look at our follow-up post on the individuals who connect the comedic universe.

– Kate Johnson

 

by    in Rankings

Rankings are the Future of Mobile Search

Did you know that Ranker is one of the top 100 web destinations for mobile per Quantcast, ahead of household names like The Onion and People magazine?  We are ranked #520 in the non-mobile world.  Why do we do better with mobile users as opposed to people using a desktop computer?  I’ve made this argument for awhile, but I’m hardly an authority, so I was heartened to see Google making a similar argument.

This embrace of mobile computing impacts search behavior in a number of important ways.

First, it makes the process of refining search queries much more tiresome. …While refining queries is never a great user experience, on a mobile device (and particularly on a mobile phone) it is especially onerous.  This has provided the search engines with a compelling incentive to ensure that the right search results are delivered to users on the first go, freeing them of laborious refinements.

Second, the process of navigating to web pages (is) a royal pain on a hand-held mobile device.

This situation provides a compelling incentive for the search engines to circumvent additional web page visits altogether, and instead present answers to queries – especially straightforward informational queries – directly in the search results.  While many in the search marketing field have suggested that the search engines have increasingly introduced direct answers in the search results to rob publishers of clicks, there’s more than a trivial case to be made that this is in the best interest of mobile users.  Is it really a good thing to compel an iPhone user to browse to a web page – which may or may not be optimized for mobile – and wait for it to load in order to learn the height of the Eiffel Tower?

As a result, if you ask your mobile phone for the height of a famous building (Taipei 101 in the below case), it doesn’t direct you to a web page.  Instead it answers the question itself.

That’s great for a question that has a single answer, but an increasing number of searches are not for objective facts with a single answer, but rather for subjective opinions where a ranked list is the best result.  Consider the below chart showing the increase in searches for the term “best”.  A similar pattern can be found for most any adjective.

So if consumers are increasingly doing searches on mobile phones, requiring a concise list of potential answers to questions with more than one answer, they naturally are going to end up at sites which have ranked lists…like Ranker. As such, a lot of Ranker’s future growth is likely to parallel the growth of mobile and the growth of searches for opinion based questions.

– Ravi Iyer

by    in Data Science, prediction

Recent Celebrity Deaths as Predicted by the Wisdom of Ranker Crowds

At the end of each year, there are usually media stories that compile lists of famous people who have passed away. These lists usually cause us to pause and reflect. Lists like Celebrity Death Pool 2013 on Ranker, however, give us an opportunity to make (macabre) predictions about recent celebrity deaths.

We were interested in whether “wisdom of the crowd” methods could be applied to aggregate the individual predictions. The wisdom of the crowd is about making more complete and more accurate predictions, and both completeness and accuracy seem relevant here. Being complete means building an aggregate list that identifies as many celebrity deaths as possible. Being accurate means, in a list where only some predictions are borne out, placing those who do die near the top of the list.

Our Ranker data involved the lists provided by a total of 27 users up until early in 2013. (Some them were done after at least one celebrity, Patti Page, had passed away, but we thought they still provided useful predictions about other celebrities). Some users predicted as many as 25 deaths, while some made a single prediction. The median number of predictions was eight, and, in total, 99 celebrities were included in at least one list. At the time of posting, six of the 99 celebrities have passed away.

One way to measure how well a user made predictions is to work down their list, keeping track of every time they correctly predicted a recent celebrity death. This approach to scoring is shown for all 27 users in the graph below. Each blue circle corresponds to a user, and represents their final tally. The location of the circle on the x-axis corresponds to the total length of their list, and the location on the y-axis corresponds to the total number of correct predictions they made. The blue lines leading up to the circles track the progress for each user, working down their ranked lists. We can see that the best any user did was predict two out or the current six deaths, and most users currently have none or one correct predictions in their list.

To try and find some wisdom in this crowd of users, we applied an approach to combining rank data developed as part of our general research into human decision-making, memory, and individual differences. The approach is based on classic models in psychology that go all the way back to the work of Thurstone in 1931, but has some modern tweaks. Our approach allows for individual differences, and naturally identifies expert users, upweighting their opinions in determining the aggregated crowd list. A paper describing the nuts and bolts of our modeling approach can be found here (but note we used a modified version for this problem, because users only provide their “Top-N” responses, and they get to choose N, which is the length of their list).

The net result of our modeling is a list of all 99 celebrities, in an order that combines the rankings provided by everybody. The top 5 in our aggregated list, for the morbidly curious, are Hugo Chavez (already a correct prediction), Fidel Castro, Zsa Zsa Gabor, Abe Vigoda, and Kirk Douglas. We can assess the wisdom of the crowd in the same way we did individuals, by working down the list, and keeping track of correct predictions. This assessment is shown by the green line in the graph below. Because the list includes all 99 celebrities, it will always find the six who have already recently passed away, and the names of those celebrities are shown at the top, in the place they occur in the aggregated list.

Recent Celebrity Deaths and Predictions

The interesting part assessing the wisdom of the crowd is how early in the list it makes correct predictions about recent celebrity deaths. Thus, the more quickly the green line goes up as it moves to the right, the better the predictions of the crowd. From the graph, we can see that the crowd is currently performing quite well, and is certainly about the “chance” line, represented by the dotted diagonal. (This line corresponds to the average performance of a randomly-ordered list).

We can also see that the crowd is performing as well as, or better than, all but one of the individual users. Their blue circles are shown again along with crowd performance. Circles that lie above and to the left of the green line indicate users outperforming the crowd, and there is only one of these. Interestingly, predicting celebrity deaths by using age, and starting with the oldest celebrity first, does not perform well. This seemingly sensible heuristic is assessed by the red line, but is outperformed by the crowd and many users.

Of course, it is only May, so the predictions made by users on Ranker have time to be borne out. Our wisdom of the crowd predictions are locked in, and we will continue to update the assessment graphs.

– Michael Lee

Predicting Box Office Success a Year in Advance from Ranker Data

A number of data scientists have attempted to predict movie box office success from various datasets.  For example, researchers at HP labs were able to use tweets around the release date plus the number of theaters that a movie was released in to predict 97.3% of movie box office revenue in the first weekend.  The Hollywood Stock Exchange, which lets participants bet on the box office revenues and infers a prediction, predicts 96.5% of box office revenue in the opening weekend.  Wikipedia activity predicts 77% of box office revenue according to a collaboration of European researchers.  Ranker runs lists of anticipated movies each year, often for more than a year in advance, and so the question I wanted to analyze in our data was how predictive is Ranker data of box office success.

However, since the above researchers have already shown that online activity at the time of the opening weekend predicts box office success during that weekend, I wanted to build upon that work and see if Ranker data could predict box office receipts well in advance of opening weekend.  Below is a simple scatterplot of results, showing that Ranker data from the previous year predicts 82% of variance in movie box office revenue for movies released in the next year.

Predicting Box Office Success from Ranker Data
Predicting Box Office Success from Ranker Data

The above graph uses votes cast in 2011 to predict revenues from our Most Anticipated 2012 Films list.  While our data is not as predictive as twitter data collected leading up to opening weekend, the remarkable thing about this result is that most votes (8,200 votes from 1,146 voters) were cast 7-13 months before the actual release date.  I look forward to doing the same analysis on our Most Anticipated 2013 Films list at the end of this year.

– Ravi Iyer

by    in Data Science

Crowdsourcing Objective Answers to Subjective Questions – Nerd Nite Los Angeles

A lot of the questions on Ranker are subjective, but that doesn’t mean that we cannot use data to bring some objectivity to this analysis.  In the same way that Yelp crowdsources answers to subjective questions about restaurants and TripAdvisor crowdsources answers to subjective questions about hotels, Ranker crowdsources answers to a broader assortment of relatively subjective questions such as the Tastiest Pizza Toppings, the Best Cruise Destination, and the Worst Way to Die.

A few weeks ago, I did an informal talk on the Wisdom of Crowds approach that Ranker takes to crowdsource such answers at a Los Angeles bar as part of “Nerd Nite”.  The gist of it is that one can crowdsource objective answers to subjective questions by asking diverse groups of people questions in diverse ways.  Greater diversity, when aggregated effectively, enables the error inherent in answering any subjective question to be minimized.  For example, we know intuitively that relying on only the young or only the elderly or only people in cities or only people who live in rural areas gives us biased answers to subjective questions.  But when all of these diverse groups agree on a subjective question, there is reason to believe that there is an objective truth that they are responding to.  Below is the video of that talk.

If you want to see a more formal version of this talk, I’ll be speaking at greater length on Ranker’s methodologies at the Big Data Innovation Summit in San Francisco this Friday.

– Ravi Iyer