By Clark Benson (CEO, Ranker)
It’s unlikely that anyone would pour freezing water over their head for it, (although if this happens, someone please let me know) but the marketing world is experiencing its own Peak Oil Crisis.
We don’t have enough data. Yes. You read that correctly. Well, at least not enough good data.
Click over to any marketing blog and you’ll read the same story: the world is awash in golden insights, companies are able to “know” their customers in real time, and businesses can form more accurate predictions about their own market … and so on.
Here’s what you won’t read: it’s really, really hard. And it’s getting harder, for the simple reason that we are all positively drenched in potentially bad data. Noisy, incomplete, out of context, approximate, downright misleading data. “Big Data” = (Mostly) Bad Data as it tends to draw explicit conclusions from implicit and noisy sources like social media or web visits. And as you probably know, bad data is even worse than no data at all.
Traditional market research methods, like surveys, are getting less reliable due to dropping response rates, especially among young, tech-savvy consumers. To counteract this trend, marketing research firms have hired hundreds of PhDs to refine the math in their models and try to build a better picture of the zeitgeist, leveraging social media and implicit opinions shared online. This has proven to be a dangerous proposition, as modeling and research firms have fallen prey to statistics’ number one cautionary rule: garbage in, garbage out.
No amount of genius mathematical skills can fix Bad Data. Simple statistical models on well-measured data will trump extensive algorithms on poorly measured data every single time. Sophisticated statistical models might help in political polling, where people are far more predictable based on party and demographics, but they won’t do anything to help traditional marketing research, where people’s tastes and positions are less entrenched and evolve more rapidly.
Parsing the exact sentiment behind a “like,” a follow or a natural language tweet is extremely difficult, as analysts often lack control over the sample population they are covering, as well as any context about why the action occurred, and what behavior or opinion triggered it. Since there is no negative sentiment to use as control, there is no ability to unconfound good with popular. Natural language processing algorithms can’t sort out sarcasm, which reigns supreme on social media (as anyone who follows or “likes” Kanye West just to see what crazy thing he will say next can attest), and even the best algorithms can’t reliably categorize the sentiment of more than 50% of Twitter’s volume of posts. Others have pointed out the issues with developing a more than razor-thin understanding of consumer mindsets and preferences based on social media data. What does a Facebook “like” mean, exactly? If you “like” Coca-Cola on Facebook, does it mean that you like the product or the company? And does it necessarily mean you don’t like Pepsi? And what is a “like” worth? Nobody knows for sure.
This is where we come in. We at Ranker have developed a very good answer to this issue: the “opinion graph,” which is a more precise version of the “interest graph” that advertisers are currently using.
Ranker is a popular website (Quantcast Top 200, 18 million unique visitors and 300 million pageviews per month) that crowdsources answers to questions, using the popular list format. Visitors to Ranker can view, rank and vote on items on around 400,000 lists. Unlike more ambiguous data points based on likes or tweets, Ranker solicits precise and explicit opinions from users about questions like the most annoying celebrities, the best guilty pleasure movies, the most memorable ad slogans, the top dream colleges, or the best men’s watch brands.
It’s very simple: instead of the vaguely positive act of “liking” a popular actor on Facebook, Ranker visitors cast 8 million votes every month and thus directly express whether they think someone is “hot,” “cool,” one of the “best actors of all-time,” or just one of the “best action stars.” Not only that, Ranker users also vote on other lists of items seemingly unrelated to their initial interest: the best cars, imported beers, greatest TV hosts, etc. Correlations between these items start to paint the three-dimensional picture expressed in our opinion graph.
Ranker has been building the world’s largest opinion graph since 2008. It currently has 50,000 nodes (topics) and 20 million edges (statistically significant connections between 2 items). Thanks to our massive sample and our rich database of correlations, we can tell you that people who like Modern Family are 5x more likely to dine at Chipotle than non-fans, or people who like the Nissan 370Z also like oddball comedy movies such as Napoleon Dynamite and The Big Lebowski, and TV shows such as Dexter and Weeds.
Our exclusive Ranker FanScope about the show Mad Men lays out this capability in more detail below:
How good is our data? Pretty good. Like “we predicted the outcome of the World Cup better than Nate Silver’s FiveThirtyEight and Betfair” good.
Our opinion data is also much more precise than Facebook’s. We not only know that someone who likes Coke is very likely to rank Jaws as one of his/her top movies of all time, but we’re able to differentiate between those who like to drink Coke, and those who admire Coca-Cola as a company:
We’re also able to differentiate between people who always like Pepsi better than Coke overall, and those who like to drink Coke, but just at the movie theater:
- 47% of Pepsi fans on Ranker vote for (vs. against) Coke on Best Sodas of All Time
- 65% of Pepsi fans on Ranker vote for (vs. against) Coke on Best Movie Snacks
That’s the kind of specific relationship you can’t get using Facebook data or Twitter messages.
By collecting millions of discrete opinions each month on thousands of diverse topics, Ranker is the only company able to combine Internet-level scale (hundreds of thousands surveyed on millions of opinions each month) with market research-level precision (e.g. adjective specific opinions about specific objects in a specific context).
We can poll questions that would be too specific (e.g. most memorable slogans) or not lucrative enough (most annoying celebrities) for other pollsters. And we use the same types of mathematical models to address sampling challenges that all pollsters (Internet or not Internet based) currently have, working with some of the world’s leading academics who study crowdsourcing, such as our Chief Data Scientist Ravi Iyer, and UC Irvine Cognitive Sciences professor Michael Lee.
While our data insights may not inspire the masses to drench themselves in ice water, if you’re a marketer or an advertiser we predict it’s likely you will want to pay close attention.