by   Ranker
Staff
in Data Science, Market Research, Pop Culture, prediction

Predicting Box Office Success a Year in Advance from Ranker Data

A number of data scientists have attempted to predict movie box office success from various datasets.  For example, researchers at HP labs were able to use tweets around the release date plus the number of theaters that a movie was released in to predict 97.3% of movie box office revenue in the first weekend.  The Hollywood Stock Exchange, which lets participants bet on the box office revenues and infers a prediction, predicts 96.5% of box office revenue in the opening weekend.  Wikipedia activity predicts 77% of box office revenue according to a collaboration of European researchers.  Ranker runs lists of anticipated movies each year, often for more than a year in advance, and so the question I wanted to analyze in our data was how predictive is Ranker data of box office success.

However, since the above researchers have already shown that online activity at the time of the opening weekend predicts box office success during that weekend, I wanted to build upon that work and see if Ranker data could predict box office receipts well in advance of opening weekend.  Below is a simple scatterplot of results, showing that Ranker data from the previous year predicts 82% of variance in movie box office revenue for movies released in the next year.

Predicting Box Office Success from Ranker Data
Predicting Box Office Success from Ranker Data

The above graph uses votes cast in 2011 to predict revenues from our Most Anticipated 2012 Films list.  While our data is not as predictive as twitter data collected leading up to opening weekend, the remarkable thing about this result is that most votes (8,200 votes from 1,146 voters) were cast 7-13 months before the actual release date.  I look forward to doing the same analysis on our Most Anticipated 2013 Films list at the end of this year.

– Ravi Iyer

by   Ranker
Staff
in Data Science

Crowdsourcing Objective Answers to Subjective Questions – Nerd Nite Los Angeles

A lot of the questions on Ranker are subjective, but that doesn’t mean that we cannot use data to bring some objectivity to this analysis.  In the same way that Yelp crowdsources answers to subjective questions about restaurants and TripAdvisor crowdsources answers to subjective questions about hotels, Ranker crowdsources answers to a broader assortment of relatively subjective questions such as the Tastiest Pizza Toppings, the Best Cruise Destination, and the Worst Way to Die.

A few weeks ago, I did an informal talk on the Wisdom of Crowds approach that Ranker takes to crowdsource such answers at a Los Angeles bar as part of “Nerd Nite”.  The gist of it is that one can crowdsource objective answers to subjective questions by asking diverse groups of people questions in diverse ways.  Greater diversity, when aggregated effectively, enables the error inherent in answering any subjective question to be minimized.  For example, we know intuitively that relying on only the young or only the elderly or only people in cities or only people who live in rural areas gives us biased answers to subjective questions.  But when all of these diverse groups agree on a subjective question, there is reason to believe that there is an objective truth that they are responding to.  Below is the video of that talk.

If you want to see a more formal version of this talk, I’ll be speaking at greater length on Ranker’s methodologies at the Big Data Innovation Summit in San Francisco this Friday.

– Ravi Iyer

by   Ranker
Staff
in New Features

New Features on Ranker

As usual, we are hard at work here in the Internet factory trying to make our site better and better. Pretty soon, Ranker.com is going to be able to walk your dog and drop your kids off at soccer practice. Those jet-pack days are not here yet, but we do have some other fun, new stuff for you to gaze upon. And maybe also use.

Do you sometimes see a list and think, “I want to rank that, but who has the TIME?” Not to fear, good people. We’ve actually reduced the amount of time it takes to register your opinion in list-form. With science!

Say you go to a list, any list. You start voting, all casual-like… and suddenly this tab pops out of the right side of your browser. Every time you vote something up, the number on the counter goes up, too! If you click on the green button that says “Your Votes,” a starter-list will come sliding in from the right side. You can literally re-order items, delete items, add items, write copy, and even add images and/or videos RIGHT THERE. And then publish your re-rank. RIGHT THERE. I mean, think of the time you just saved. Now you finally have time to cut your toenails. You’re welcome.

This isn’t technically new… but it’s something that looks new, so we’re counting it. The trigger buttons that switch how you view lists have moved! Not that exciting, I guess.

But in case you wondered why your ability to change ‘blog view’ to ‘info view’ or ‘info view’ to ‘slideshow’ wasn’t where it was supposed to be, just move your eyeballs to the left side of the screen and look right at the top of the list. There they are! Everything is still as it should be, only different.

We’ve gone and moved into the fine year of 2010 by finally making a functioning mobile version of our site. You can browse our lists more easily now, vote more easily and now you can actually re-rank a list… on your PHONE!

Unfortunately, we don’t yet support creating your own, brand-new lists on mobile, and there are some other user features of the site that we still won’t support on mobile… BUT it should be a lot easier to find, read, and vote on all your favorite lists.

Have you ever just wanted to copy and paste a list you see on Ranker? Maybe to a blog post, email, or just to Facebook? Well, now you can!

If you go into the ‘more options’ dropdown at the top of every list and select “Paste to Clipboard,” a popup with the list as it appears in text format will appear, allowing you to copy it and paste it wherever you want, free of formatting. If you want to paste it to Facebook, make sure to select the checkbox that says, “Check here to copy for Facebook.”

Enjoy. Check back soon for more fun new features!

by   Ranker
Staff
in interest graph, Opinion Graph

A Battle of Taste Graphs: Baltimore Ravens Fans vs. San Francisco 49ers Fans

Super Bowl Sunday is a day when two cities and two fan groups are competing for bragging rights, even as the Baltimore Ravens and San Francisco 49ers themselves do the playing.  You might be interested in understanding these teams’ fans better through an exploration of their fans’ taste graphs, from a recent post on our data blog, which examines correlations between votes on lists like the Top NFL Teams of 2012 and non-sports lists like our list of delicious vegetables (yum!).

For one, There is also absolutely zero consensus where music is concerned. 49er’s fans listen to an eclectic mixture of genres: up-and-coming rappers like Kendrick Lamar sit right next to INXS and 90s brit-poppers Pulp. Yet where the Ravens are concerned, classic rock is still king: Hendrix, CCR, and Neil Young are an undisputed top three. The 49ers also have the Ravens utterly beat in terms of culinary taste. Monterrey Jack and Cosmos are a fairly clear favorite among fans, while Baltimore’s stick to staples: Coffee, Bell peppers, and Ham are the only food items that correlated enough to even be tracked.

 A Snapshot from Ranker’s Data Mining Tool

TV tastes also varied between the two teams: Ravens fans stuck to almost exclusively comedic faire (Pinky and The Brain, Rugrats, Mythbusters and Louie correlated strongly), while the 49er’s stuck to more structured, dramatic shows, such as The Walking Deadand Dexter.

Read the full post here over on our data blog.

– Ravi Iyer

by   Ranker
Staff
in Data Science, Pop Culture

On Touchdowns and Tastes: This Sunday’s Conflict Of Fan-Interests

 

helmet images courtesy of http://nfl-franchises.findthedata.org

 

The greatest moment of fear in my childhood came on the eve of my first ever family trip to Manhattan. It wasn’t the flight or the crowds or the crime rate that had seven-year-old me scared. I was terrified because I had been brought up to believe that any and all Yankees fans were villainous scum, lowest of the low, the nadir of human development. Visiting the city and actually interacting with people from New York had an effect on me akin to realizing that there wasn’t a Santa Claus: I was faced with the reality that not all Yankees fans are evil. It just wasn’t mathematically feasible. You can’t run a city of 8 million people without having some people who don’t suck. This, of course, is a key part of the unspoken acknowledgement all (nonviolent & sane) sports fans have; that sports fandom is a mostly regional thing, and that there’s no point in thinking those who back another team are truly inferior, or even all that different from you.

However, if you told that to anyone from Baltimore or San Francisco right now, they’d likely try to argue for the ideological superiority of their respective squad. With the Super Bowl literally on the horizon, this is not a time where people deal in shades of gray. But are there any real, quantifiable differences between the fans of the Ravens and the 49ers? Anything else on the line in this contest?

Weirdly enough, yes. The Ranker correlation data for supporters of the Ravens and the 49ers is strikingly dissimilar. You’d think that there would be some commonalities between the likes and dislikes of the two teams, even just those that stem from the demographic features of “football fans”. But no, the pop culture tastes of the two teams have a strikingly miniscule amount of overlap.  Let us examine some of the correlations based on user behavior at Ranker.com.

For one, There is also absolutely zero consensus where music is concerned. 49er’s fans listen to an eclectic mixture of genres: up-and-coming rappers like Kendrick Lamar sit right next to INXS and 90s brit-poppers Pulp. Yet where the Ravens are concerned, classic rock is still king: Hendrix, CCR, and Neil Young are an undisputed top three. The 49ers also have the Ravens utterly beat in terms of culinary taste. Monterrey Jack and Cosmos are a fairly clear favorite among fans, while Baltimore’s stick to staples: Coffee, Bell peppers, and Ham are the only food items that correlated enough to even be tracked.

 A Snapshot from Ranker’s Data Mining Tool

TV tastes also varied between the two teams: Ravens fans stuck to almost exclusively comedic faire (Pinky and The Brain, Rugrats, Mythbusters and Louie correlated strongly), while the 49er’s stuck to more structured, dramatic shows, such as The Walking Dead and Dexter.

Some of these differences can be explained away geographically (In-and-Out Burger, a prominent correlated item for the 49ers, isn’t going to appeal to anyone on the east coast since they just don’t have it), but when the data is stacked up, there is a very noticeable dissimilarity in interests between the two teams. One could, of course, use this data to try to advocate for the superiority of one team over the other (I won’t even get into the far more extensive video game tastes of the 49er’s). However, the far more intriguing question at hand lies in what we all really watch the Super Bowl for: the ads.

If, as the data suggests, there is such a difference between the interests of the average 49er’s fan and the average Ravens fan, how will the ads attempt to bridge this gap? Since I could give a damn about the score (neither team is the Pats, who cares), I’ll be keeping track instead of whose team’s interests are catered to by the adverts. On Sunday, one team will win on the field, and another during the commercials.

– Eamon Levesque

by   Ranker
Staff
in Data Science, interest graph, Opinion Graph

The Opinion Graph predicts more than the Interest Graph

At Ranker, we keep track of talk about the “interest graph” as we have our own parallel graph of relationships between objects in our system, that we call an “opinion graph”.  I was recently sent this video concerning the power of the interest graph to drive personalization.

The points made in the video are very good, about how the interest graph is more predictive than the social graph, as far as personalization goes.  I love my friends, but the kinds of things they read and the kinds of things I read are very different and while there is often overlap, there is also a lot of diversity.  For example, trying to personalize my movie recommendations based on my wife’s tastes would not be a satisfying experience.  Collaborative filtering using people who have common interests with me is a step in the right direction and the interest graph is certainly an important part of that.

However, you can predict more about a person with an opinion graph versus an interest graph. The difference is that while many companies can infer from web behavior what people are interested in, perhaps by looking at the kinds of articles and websites they consume, a graph of opinions actually knows what people think about the things they are reading about.  Anyone who works with data knows that the more specific a data point is, the more you can predict, as the amount of “error” in your measurement is reduced.  Reduced measurement error is far more important for prediction than sample size, which is a point that gets lost in the drive toward bigger and bigger data sets.  Nate Silver often makes this point in talks and in his book.

For example, if you know someone reads articles about Slumdog Millionare, then you can serve them content about Slumdog Millionare.  That would be a typical use case for interest graph data. Using collaborative filtering, you can find out what other Slumdog Millionare fans like and serve them appropriate content.  With opinion graph data, of the type we collect at Ranker, you might be able to differentiate between a person who thinks that Slumdog Millionare is simply a great movie versus someone who thinks the soundtrack was one of the best ever.  If you liked the movie, we would predict that you would also like Fight Club.  But if you liked the soundtrack, you might instead be interested in other music by A.R. Rahman.

Simply put, the opinion graph can predict more about people than the interest graph can.

– Ravi Iyer

by   Ranker
Staff
in New Features

Latest Features on Ranker

There are a lot of neat little things we’ve been working on around here in the lab. Things that make it easier and more fun to make the lists you want to make. Take a peek:

Send A Note

We were sitting around the other day in the conference room and someone said ‘hey, wouldn’t it be cool if users could talk to each other? Like email, sorta?”

So we decided that you guys should totally get in on this whole “electronic” form of communication. Now, If you read a list you like, or are intrigued by the genius behind “5 Ways To Make Homemade Spam”, you can go to their profile page and send the list-maker a note and let them know that there are actually 6 ways! And, because we believe in the goodness of the human spirit, we are sure you guys won’t use this new power for evil.

Send A Note

PS. You need to be logged in to see this new feature!

 

Adding Items

Remember that time you made your favorite movie list? But you couldn’t remember ALL your favorite movies, because you’re not a damned robot, right? And then you were looking at someone else’s favorite movie list – or maybe perusing the Best Movies of All Time list – and you saw Piranha II: The Spawning listed there. That is totally one of your favorite movies, but you forgot until just now! Well, we have a way for you to add it to your own list with a single click. If you click that blue ‘+’ button, you will get a dropdown with any relevant lists of yours that Piranha II might be good to add to. Select your favorite movie list from the dropdown and POW, that James Cameron classic is now on your own list, too!

Adding Items

PS. You need to be logged in to see this new feature!

 

SlideShow View

You already know that you have two choices for how your list displays on Ranker. you can write lots of lovely words for the internet to read with big pictures… or you can just create easily digestable stacked lists with small images. Now we give you a third option… Slideshow! Build your list like normal in Edit, put in nice pretty images that will look good big — this view supports any commentary you might want to add, too! Choose the ‘slideshow view’ option from your ‘list options’ popup, and when you publish your list will display one beautiful item at a time!

SlideShow View

 

Filtering Lists

We have so many lists on Ranker. So. Many. And sometimes it’s overwhelming, we know. God, we know. But we’ve been tagging lists (and so have you) for the last few years and we finally went ahead and made use of them. Now, when you go into any of the big category tabs on ranker (film, tv, people, etc) you will see a little array of blue buttons on the top of the right sidebar. You can use these little buttons to sort and filter the content of that category in a million different ways! Each new filter button will narrow down your results until you find the exact lists you are looking for. Go try it!

Filtering Lists

 

Stylish Copy

One of the things we’ve never really had so much around here is the ability to dress up the things you guys are writing on your blog view lists. Bolding, italics, stuff like that. Well, fret no more! We now support a simple text styling interface in Edit.

When you are building your lists, and you want to write stuff… just click on the text field for your item. There is a whole little string of new tools there that allows you to make your text a lot fancier! And easy! Always easy!

Stylish Copy

by   Ranker
Staff
in Data Science

Mitt Romney Should Have Advertised on the X-Files

With the election recently behind us, many political analysts are conducting analyses of the campaigns, examining what worked and what didn’t.  One specific area where the Obama team is getting praise is in their unprecedented use of data to drive campaign decisions, and even more specifically, how they used data to micro-target fans who watched specific TV shows.  From this New York Times article concerning the Obama Team’s TV analytics:

“Culling never-before-used data about viewing habits, and combining it with more personal information about the voters the campaign was trying to reach and persuade than was ever before available, the system allowed Mr. Obama’s team to direct advertising with a previously unheard-of level of efficiency, strategists from both sides agree….

[They] created a new set of ratings based on the political leanings of categories of people the Obama campaign was interested in reaching, allowing the campaign to buy its advertising on political terms as opposed to traditional television industry terms…..

[They focused] on niche networks and programs that did not necessarily deliver large audiences but, as Mr. Grisolano put it, did provide the right ones.”

 

The Obama team focused more on undecided/apolitical voters in an effort to get them to the polls.  Given that some Mitt Romney supporters have blamed a lack of turnout of supporters for the results of the election, perhaps Romney would have been smart to have created a ranked list of TV shows, based on how much fans of the shows supported Romney, and then placed positive/motivating ads on those shows in an effort to increase turnout of his base.  Where would Romney get such data?  From Ranker!

Mitt Romney is on many votable Ranker lists (e.g. Most Influential People of 2012) and based on people who voted on those lists and also lists such as our Best Recent TV Shows list, we can examine which TV shows are positively or negatively associated with Mitt Romney.  Below are the top positive results from one of our internal tools.

As you can see, the X-Files appears to be the highest correlated show, by a fair margin.  I don’t watch the X-Files, so I wasn’t sure why this correlation exists, but I did a bit of research, and found this article exploring how the X-Files supported a number of conservative themes, such as the persistence of evil, objective truth, and distrust of government (also see here).  The article points out that in one episode, right wing militiamen are depicted as being heroic, which never would happen in a more liberal leaning plot.  Perhaps if you are a conservative politician seeking to motivate your base, you should consider running ads on reruns of the X-Files, or if you run a television station that shows X-Files reruns, consider contacting your local conservative politicians leveraging this data.

You may notice that this list contains more classic/rerun shows (e.g. Leave it to Beaver) than current shows.  This appears to be part of a general trend where conservatives on Ranker tend to positively vote for classic TV, a subject we’ll cover in a future blog post.  The possibility of advertising on reruns is part of what we would like to highlight in this post, as ads are likely relatively cheap and audiences can be more easily targeted, a tactic which the Obama campaign has been praised for.  At Ranker, we’re hopeful that more advertisers will seek value in the long-tail and mid-tail and will seek to mimic the tactics of the Obama campaign, as our data is uniquely suited for such psychographic targeting.

– Ravi Iyer

by   Ranker
Staff
in Data Science

How Crowdsourcing can uncover Niche/Trending shows

At Ranker, people give us their opinions in various different ways. Some people vote.  Other people make long lists.  Still others make really short lists.  Some people tell us their absolute favorite things, while others list everything they’ve ever experienced.  One of the advantages of this diversity is that it allows us to examine patterns within these divergent types of opinions.  For example, some things are really popular, meaning that everyone lists them (e.g. Michael Jordan is on everyone’s best basketball players list).  Most popular things are also things that people generally list high on their lists and also get lots of positive votes (e.g. Michael Jordan).  However, there are some things that don’t get listed very often, but when they do get listed, people are passionate about them, meaning that they get listed high on people’s lists.  We highlight these items in our system using the niche symbol.

I’ve recently been examining our “niche” tag, which signifies when something is not particularly popular, but people are passionate about it.  There are many reasons why things can be niche.  Some things appeal specifically to younger (e.g. Rugrats) or older crowds (e.g.  The Rockford Files).  Other things have natural audiences (e.g.baseball fans who appreciate defense and think Ozzie Smith is one of the greatest players of all time).  The most interesting case is when something that I can’t identify starts showing on the niche list (see the list at the time of this writing here).

This is especially helpful for someone like me, who doesn’t always know what is ‘hot’ and naturally looks to data to find new quality entertainment.  Awhile back, the show Community consistently was showing highest on our niche algorithm.  Few people listed it as one of the best recent TV shows, but those who listed it tended to think very highly of it.  I was intruiged enough to watch the pilot on Hulu and have since become hooked.  Community has since graduated from our niche algorithm as it became popular.  Sometimes passion amongst a small group is how a trend starts.

As Margaret Mead believed that only a small group of citizens could change the world, so Malcolm Gladwell has shown how a small group of trendsetters can signal changes in pop culture.  Not everything on our niche list will become the next big thing, but it’s certainly a good place to search for candidates.

Among the things that people seem to be passionate about now, that aren’t so popular, are several good candidates for up and coming movies, bands, or TV shows.  Pappillon is currently hot, scoring over 2 standard deviations higher in terms of list position on our best movie list, despite being less popular than most movies.  Another Earth and 13 Assassins,  seem like potentially interesting and under the radar films from 2011. Real Time with Bill Maher‘s niche status may be due to appeal particular ideological group, but Warehouse 13 appealed to just my niche as it had passionate fans on both the best recent TV shows list and the best Sci-Fi TV shows list (it has since graduated from the list due to increased popularity).  Warehouse 13’s highest correlated show is one of my favorites, Battlestar Galactica, so I’m definitely going to check it out.

I tend to be a late adopter of pop culture, but thanks to the niche tag, maybe I can be a little hipper going forward.  Take a look at our niche items as of October 20, 2012 and any comments on other things to consider checking out would be appreciated. Or perhaps take a look in a few months time and consider whether our niche tag successfully captured coming trends in a few cases.

– Ravi Iyer

by   Ranker
Staff
in Opinion Graph

The Best Possible Answers To Opinion-Based Questions

Ranker, as an openended platform for ranking people/places/things, is a lot of different (awesome) things to different people. But the overarching goal for Ranker has always been to provide the best possible answer to opinion-based questions like “What are the best _____?”

Popular sports and entertainment vote lists often grow into being a great answer within 12-72 hours as they get lots of traffic quickly, but the majority of Ranker lists take 1 – 3 months to build to full credibility as visitors on Ranker and from search engines find them and shape them with votes and re-ranks.

I thought it would be fun to showcase some Ultimate Lists and Vote Lists in other categories that haven’t gone viral, but through the participation of lots of Rankers over a few months have indeed become “the best possible answer” to this question.

Food

You all clearly love to weigh in on the start of the day, and the 5 o’clock hour:

Best Breakfast Cereals

The Best Cocktails

But you also have strong opinions on hydration during the day:

Best Sodas (and for the more calorie-conscious among you The Best Diet Sodas)

And even specific Gatorade flavors (thanks for the list Lucas)

Snacking, whether it be on a particular type of cheese, candy bar, or even as granular as a specific Jelly Belly flavor (thanks for the list Samantha but what’s with all the chocolate pudding haters?)

Dining out, specifically at Italian chain restaurants

A list I am not authorized to vote on, pregnancy cravings

And hundreds more, including perhaps a new category entirely – food nostalgia (I do miss those Crispy M&Ms myself)

Fashion/Beauty

Not categories that I personally check up on much, so I was psyched to see quite a few solid rankings here, some of them high-end but mostly stuff you can find at the mall:

Best women’s shoe brands

Best denim brands

Top handbag designers

Fashion Blogs

Sulfate-free shampoos

And even a men’s facial moisturizers list (have only tried 3 or 4 myself, but agree with their relative positions on the list)

Travel

Rankers, I know from a number of you that as we’ve been adding datasets of “rank-able objects” over the last year, one of the most-requested ones that we don’t yet have is hotels/resorts. Trust me, it’s still on the list. But in the meantime, it’s been heartening to see how many of you have participated in these great resources for travel destinations and attractions, like these:

Best US cities for vacations

Honeymoon destinations

Coolest cities in America

Theme parks for roller coaster addicts

And my personal faves, “bucket lists” of the world’s most beautiful natural wonders and historical landmarks.

Great stuff – these lists and 1000s more like them are true testimonials to the “wisdom of crowds”. Thanks, crowds!