In a previous post, we talked about a bit about how Ranker collects users into like-minded “clusters” that allowed for statistical analysis. This method is how we were able to look at “Game of Thrones” fans and figure out other shows, characters, games and movies they might like.
Now, let’s dig a bit deeper into how this analysis works, and what sort of things we can learn from it. Essentially, breaking down the users who vote on our lists into clusters of people with similar taste lets us predict how fans of one thing will feel about some other thing.
We use the advertising term “Lift %” to represent this idea, but it basically boils down to an odds ratio. We’re measuring the projected increase in someone’s interest level for something, based on their preference for something else. Therefore, we don’t even just have to compare fans of one show to another, or fans of one movie to another. Sure, we can tell what TV shows you’ll probably like if you like “Game of Thrones,” but we can also tell what people you’ll respond to positively, or what websites you prefer, or your favorite athlete.
For another example, let’s look at the 1998 comedy-drama “Rushmore.” Along with “Bottle Rocket,” this was really the film that made Wes Anderson a household name, and also contains one of Bill Murray’s most beloved and iconic performances.
“Rushmore” appears on a number of Ranker lists (it’s rated as one of the Best High School Films of All Time AND one of the Best Serious Films Starring Comedians.) So we’ve managed to create a “cluster” of users who have voted “Rushmore” up on these lists, and who also seem to share some strong opinions about other topics in our system.
The first big trend we noticed among this like-minded cluster of “Rushmore” fans was that they tended to like other comedy films, too. Which you’d sort of expect. Except these fans tended to prefer classic comedies to more contemporary films. In fact, all of these films had a greater “Lift %” among “Rushmore” fans than any films made in the 1990s, when the film actually came out:
“Dr. Strangelove” (1964)
“The General” (1926)
“Modern Times” (1936)
“The Lady Eve” (1941)
“A Night at the Opera” (1935)
As well, all of these films had a Lift % of OVER 500%, which means someone who likes “Rushmore” is 500% more likely to enjoy, say, “A Night at the Opera,” than someone who is ambivalent about “Rushmore.” That strikes us as statistically significant. (The numbers are even higher the further up the list you go. A “Rushmore” fan is 1000% more likely to enjoy “Dr. Strangelove” than a random person.)
From what we can tell, it works the other way, too. “Rushmore” is the most popular overall film among “Annie Hall” fans and #4 overall among fans of Charlie Chaplin’s “City Lights.” Exactly WHY Wes Anderson’s coming-of-age dramedy scores so well among lovers of old movies is up for debate, but the correlation itself is not, really, based on the numbers.
We’re continuing to develop and fine-tune our reports, of course. And it’s worth remembering that we get the BEST results on popular stuff that gets voted on all the time. It’s not too hard to tell what kind of music Jay-Z fans will like (though we’ll save that for another blog post), but we won’t do nearly as well for Captain Beefheart fans. Yet.
As part of our effort to promote Ranker’s unique dataset, I recently attended the Data 2.0. conference in San Francisco. “Data 2.0.” is a relatively vague term, and as Ranker’s resident Data Scientist, I have a particular perspective on what constitutes the future of data. My PhD is in psychology, not computer science, and so for me, data has always been a means, rather than an end. One thing that became readily apparent at the first few talks I saw, was that a lot of the emphasis of the conference was on dealing with bigger data sets, but without much consideration of what one could do with this data. It goes without saying that larger sample sizes allow for more statistical power than smaller sample sizes, but as the person who has collected some of the larger samples of psychological data (via YourMorals.org and BeyondThePurchase.org), I have often found that what holds me back from predictive power with my data is not the volume of data, but rather the diversity of variables in my dataset. What I often need is not bigger data, it’s better data.
The same premise has informed much of our data decision making at Ranker, where we emphasize the quality of our semantic, linked data, as opposed to the quantity. Again, both quality and quantity are important, but my thought going through the conference was that there was an over-emphasis on quantity. I didn’t find anyone talking about semantic data, which is one of the primary “Data 2.0.” concepts that relates more to quality than quantity.
I tested this idea out with a few people at the conference, framed as “better data beats better algorithms” and generally got positive feedback about the phrase. I was heartened when the moderator of a panel entitled “Data Science and Predicting the Future”, which included Alex Gray, Anthony Goldbloom, and Josh Wills, specifically proposed the question as to what was more important, data, people, or algorithms. It wasn’t quite the question I had in mind, but it served as a great jumping off point for a great discussion. Josh Wills, who worked as a data scientist at Google previously actually said the following, which I’m paraphrasing, as I didn’t take exact notes:
“Google and Facebook both have really smart people. They use essentially the same algorithms. The reason why Google can target ads better than Facebook is purely a function of better data. There is more intent in the data related to the Google user, who is actively searching for something, and so there is more predictive power. If I had a choice between asking my team to work on better algorithms or joining the data we have with other data, I’d want my team joining my data with other data, as that is what will lead to the most value.”
Again, that is paraphrased. Some of the panelists disagreed a bit. Alex Gray works on algorithms and so emphasized the importance of algorithms. To be fair, I work with relatively precise data, so I have the same bias in emphasizing the importance of quality data. Daniel Tunkelang, Principal Data Scientist of LinkedIn, supported Josh, in saying that better data was indeed more important than bigger data, a point his colleague, Monica Rogati, had made recently at a conference. I was excited to hear that others had been having similar thoughts about the need for better, not bigger, data.
I ended up asking a question myself about the Netflix challenge, where the algorithms and collective intelligence addressing the problem (reducing error of prediction) were maximized, but the goal was a relatively modest 10% gain, which was won by a truly complex algorithm that Netflix itself found too costly to use, relative to the gains. Surely better data (e.g. user opinions about different genres or user opinions about more dimensions of each movie) would have led to much greater than a 10% gain. There seemed to be general agreement, though Anthony Goldbloom rightly pointed out that you need the right people to help figure out how to get better data.
In the end, we all have our perspectives, based perhaps on what we work on, but I do think that the “better data” perspective is often lost in the rush toward larger datasets with more complex algorithms. For more on this perspective, here and here are two blog posts I found interesting on the subject. Daniel Tunkelang blogged about the same panel here.
– Ravi Iyer
Last week, we published an info graphic with lots of “taste data” about “Game of Thrones” fans. Basically, we used all the data we’re collecting about people’s preferences in Ranker to make some educated guesses about what else people who like “Game of Thrones” might like. Why? Mostly because we can, but also because we figured people could potentially find it interesting.
After we showed the infographic to the world, a lot of people wrote to us asking how we actually arrived at these conclusions. (And yes, some of them just wanted to be sure we weren’t just making the whole thing up.)
It all starts with votes. Thousands of people have voted on Ranker lists on which “Game of Thrones” appears. If they’re on a list that’s “positive” (for example, “Best Premium Cable Shows”) and they vote “Game of Thrones” up, we know they like the show. If we notice they also vote for “Game of Thrones” on other lists (“Most Loving Caresses of Dragon Eggs in TV History,” for example), we know they REALLY like the show.
Then we look at all the other Ranker lists where that person has voted, and get a sense for what else they like, and what else they hate.
But we don’t stop there. The next step is to arrange people into clusters based on their specific preferences. If 80% of the people who vote on Ranker lists like “The Simpsons,” and 80% of “Game of Thrones” fans like “The Simpsons,” that’s not very meaningful at all. But if only 20% of people who vote like “The Simpsons,” and 80% of “Game of Thrones” fans like “The Simpsons,” then we’ve learned something statistically significant about these people.
But what about fans of “Simpsons” parodies of “Game of Thrones,” you might ask… if you were purposefully trying to confuse me.
These “clusters” of people with tastes that are aligned will teach us basically everything we need to know to make educated guesses about what random Ranker users will like. In our next post, we’ll explore exactly how we use these “taste clusters” to draw conclusions.
At Ranker HQ, we’re constantly monitoring the topics that get ranked a lot. It’s pretty easy to tell when a certain book or movie or musical artist is getting popular or hitting critical mass just based on how frequently the name is mentioned on lists. This is especially true of TV, where the start of a new season for a popular show means an eruption of lists mentioning that show. (Don’t believe me? Check out all the “Mad Men” lists streaming in!)
We weren’t necessarily surprised that HBO viewers were losing their heads for “Game of Thrones.” (See what I did there?) It’s back for Season 2, and obviously Rankers are going to have fun making tons of lists about the sword-and-sorcery-and-skin fantasy series based on George R. R. Martin’s novels. Instead, we were intrigued because the data reveals Game of Thrones fans are just as… idiosyncratic as the show they love. (Yes, idiosyncratic is a nice way of putting it. But hey, we’re not here to INSULT our users.)
And we say this not just because they watch a show in which incest happens as often as other series take commercial breaks. Also because they overwhelming love villainous characters and anti-heroes and they prefer a lot of lesser-known shows that failed to ever find an audience.
Read on for more insight into the weird, even twisted world of “Game of Thrones” fans (or Throne-heads, as we’ve dubbed them.)
Like the graphic? Feel free to repost it anywhere you like. Spread the word throughout the Seven Kingdoms!