by   Ranker
Staff
in Opinion Graph

Characteristics of people who are not annoyed by Bill O’Reilly

On today’s The O’Reilly Factor (video below), Bill O’Reilly lamented the fact that he was only #10 on Ranker’s Most Annoying TV Hosts list and decided that he would make it his New Year’s Resolution to become the #1 most annoying person on our list. While I may not share O’Reilly’s politics, I like him as a person, even as he does annoy me from time to time, and would like to help him reach his goals. I enjoy working with the Ranker dataset as it lets me answer very specific questions, like whether people who think the show 24 is overrated are also convinced that George W. Bush was a terrible person—or, in this case, I can study the people who specifically disagree that O’Reilly is annoying, in the hopes that O’Reilly can find these people and work to annoy them more.

Who does O’Reilly need to work harder to annoy? From our opinion graph of 20+ million edges, (so named because we can connect not only vague “likes” or “interests,” but specifically whether someone thinks something is best, worst, hot, annoying, overrated, etc.), we have hundreds of specific opinions that characterize people who don’t find O’Reilly annoying. Here are a chosen few findings about these people:

People who are NOT annoyed by O’Reilly tend to…
– find liberals like Jon Stewart, Rachel Maddow, and Bill Maher annoying.
– believe that John Wayne and Humphrey Bogart are among the Best Actors in Film History.
– enjoy movies like The Sound of Music and Toy Story.
– watch America’s Got Talent, Cops, Dirty Jobs, Deadliest Catch, Home Improvement, and Extreme Makeover: Home Edition.
– listen to Lynyrd Skynyrd, Boston, and Elvis.
– enjoy comedians like Bob Hope, Jeff Foxworthy, Joan Rivers, and Billy Crystal.
– be attracted to  Carrie Underwood, Jessica Simpson, Brooklyn Decker, and Sarah Palin.

Thanks to big data, these audiences are all readily targetable online—and if O’Reilly really wants to annoy these people, he might want to study our biggest pet peeves list for ideas (e.g. chewing with his mouth open might work on TV). We hope this list will help O’Reilly with his ambitions for 2015, and please do reach out to us if you need more market research on how to annoy people more.

– Ravi Iyer

by   Ranker
Staff
in Data, Opinion Graph

The Opinion Graph Connections between 24, George W. Bush, Jack Bauer, and Rachel Maddow.

As someone whose roots are in political psychology, I’m always interested in seeing how the Ranker dataset shows how our values are reflected in our entertainment choices.  We’ve seen many instances where politicians have cited 24 in the case for or against torture, but are politics reflected in attitudes toward 24 amongst the public?  Using data from users who have voted on multiple Ranker lists, including our lists polling for The Worst Person in History, the Greatest TV Characters of All-Time, the most Overrated TV shows and The Biggest Hollywood Douchebags, the clear answer is yes.

People who think George W. Bush is one of the worst people in history, also tend to think that 24 is one of the most overrated TV shows of all-time.

People who think Bush is a terrible person also think 24 is overrated.
People who think Bush is a terrible person also think 24 is overrated.

…and people who think Jack Bauer is one of the best TV Characters of All-Time also think that Rachel Maddow is one of Hollywood’s Biggest Douchebags.

maddowvsjackbauer
People who think Jack Bauer is a great TV character also think Rachel Maddow is a douchebag.

– Ravi Iyer

ps. …and these are just a few of the relationships between 24 and politicians in our opinion graph, which all tell the same basic story.

by   Ranker
Staff
in Opinion Graph

The Clear Split Between AMD and Intel CPU Fans

Recently, Tom’s Hardware used the Ranker widget to poll for their Reader’s Choice awards.  Among the topics they polled was the best CPUs and while I knew that there would likely be a preference for AMD or Intel, the two largest manufacturers, I didn’t realize that the choice would be as stark.  I’m a relative novice compared to most of the people who voted in this poll, so perhaps this would not surprise them, but voting for an AMD CPU, made one, on average, 80% less likely to vote for an Intel CPU, and vice versa.  Below is a taxonomy of votes, with items that are voted on similarly closer together, based on a hierarchical cluster analysis of the votes on this list, so you can visualize the split for yourself.

TomsHardwareCPUsTaxonomy

 

– Ravi Iyer

by   Ranker
Staff
in Data Science, Pop Culture, prediction

Ranker Predicts Spurs to beat Cavaliers for 2015 NBA Championship

The NBA Season starts tonight and building on the proven success of our World Cup and movie box office predictions, as well as the preliminary success of our NFL predictions, Ranker is happy to announce our 2015 NBA Championship Predictions, based upon the aggregated data from basketball fans who have weighed in on our NBA and basketball lists.

Ranker's 2015 NBA Championship Predictions as Compared to ESPN and FiveThirtyEight
Ranker’s 2015 NBA Championship Predictions as Compared to ESPN and FiveThirtyEight

For comparison’s sake, I included the current ESPN power rankings as well as FiveThirtyEight’s teams that have the most percentage chance of winning the championship.  As with any sporting event, chance will play a large role in the outcome, but the premise of producing our predictions regularly is to validate our belief that the aggregated opinions of many will generally outperform expert opinions (ESPN) or models based on non-opinion data (e.g. player performance data plays a large role in FiveThirtyEight’s predictions).  Our ultimate goal is to prove the utility of crowdsourced data, as while something like NBA predictions is a crowded space where many people attempt to answer this question, Ranker produces the world’s only significant data model for equally important questions, such as determining the world’s best DJseveryone’s biggest turn-ons or the best cheeses for a grilled cheese sandwich.

– Ravi Iyer

by   Ranker
Staff
in Opinion Graph, Rankings

Characteristics of People who are less Afraid of Ebola

Ebola is everywhere in the news these days, even as Ebola trails other causes of death by wide margins.  Clearly the risks are great, so some amount of fear is certainly justified, but many have taken it to levels that do not make sense scientifically, making back of the envelope projections for its spread based on anecdotal evidence and/or positing that its only a matter of time before the virus evolves into an airborne disease, as diseases regularly mutate to enable more killing in movies.  Regardless of whether Ebola warrants fear or outright panic, the consensus is that it is scary, as also evidenced by its clear #1 ranking on Ranker‘s Scariest Diseases of All Time list.  Yet, among those who are fearful, I couldn’t help but wonder, what are the characteristics of people who tend to be less afraid than others?  Using the metadata associated with users who voted and reranked this list, in combination with their other activity on the site, here are a few things I found.

– Ebola fear appears to be slightly less prevalent in the Northeast, as compared to other regions of the US, and slightly more prevalent in the South.

– Older people tend to be slightly less afraid of Ebola, often expressing more fear of Alzheimer’s.

– International visitors to this list are half as likely to vote for Ebola, as compared to Americans.

– People who are afraid of Ebola are 4.4x as likely to be afraid of Dengue Fever.

– People who are afraid of Strokes, Parkinson’s Disease, Muscular Distrophy, Influenza, and/or Depression are about half as likely to believe that Ebola is one of the world’s scariest diseases.

Bear in mind that these results are based on degree of fear and ALL people are afraid of Ebola.  The fear in some groups is simply less pronounced and only the last 3 results are statistically significant based on classical statistical methods.  There are plausible explanations for all of the above, ranging from the fact that conservative areas of the country are likely more responsive to potential threats, to the fact that losing one’s mind over time to Alzheimer’s really may be much scarier for older people versus a quick death, to the fact that people who are afraid of foreign diseases prevalent in tropical areas likely fear other foreign diseases prevalent in tropical areas.

To me the most interesting fact is that people who are afraid of more common everyday diseases, including Influenza, which kills thousands every year, appear to be less afraid of Ebola than others.  Human beings are wired to be more afraid of the new and spectacular, as much psychological research has shown.  That fear kept many of our ancestors alive, so I wouldn’t dismiss it as wrong.  But it is interesting to observe that perhaps some of us are less wired in this way than others.

– Ravi Iyer

by   Ranker
Staff
in Opinion Graph, Rankings

Ranky Goes to Washington?

Something pretty cool happened last week here at Ranker, and it had nothing to do with the season premiere of the “Big Bang Theory”, which we’re also really excited about. Cincinnati’s number one digital paper used our widget to create a votable list of ideas mentioned in Cincinnati Mayor John Cranley’s first State of the City. As of right now, 1,958 voters cast 5,586 votes on the list of proposals from Mayor Cranley (not surprisingly, “fixing streets” ranks higher than the “German-style beer garden” that’s apparently also an option).

Now, our widget is used by thousands of websites to either take one of our votable lists or create their own and embed it on their site, but this was the very first time Ranker was used to directly poll people on public policy initiatives.

Here’s why we’re loving this idea: we feel confident that Ranker lists are the most fun and reliable way to poll people at scale about a list of items within a specific context. That’s what we’ve been obsessing about for the past 6 years. But we also think this could lead to a whole new way for people to weigh in in fairly  large numbers on complex public policy issues on an ongoing basis, from municipal budgets to foreign policy. That’s because Ranker is very good at getting a large number of people to cast their opinion about complex issues in ways that can’t be achieved at this scale through regular polling methods (nobody’s going to call you at dinner time to ask you to rank 10 or 20 municipal budget items … and what is “dinner time” these days, anyway?).  It may not be a representative sample, but it may be the only sample that matters, given that the average citizen of Cincinnati will have no idea about the details within the Mayor’s speech and likely will give any opinion simply to move a phone survey conversation along about a topic they know little about.

Of course, the democratic process is the best way to get the best sample (there’s little bias when it’s the whole friggin voting population!) to weigh in on public policy as a whole. But elections are very expensive, infrequent, and the focus of their policy debates is the broadest possible relative to their geographical units, meaning that micro-issues like these will often get lost in same the tired partisan debates.

Meanwhile, society, technology, and the economy no longer operate on cycles consistent with elections cycles: the rate and breadth of societal change is such that the public policy environment specific to an election quickly becomes obsolete, and new issues quickly need sorting out as they emerge, something our increasingly polarized legislative processes have a hard time doing.

Online polls are an imperfect, but necessary, way to evaluate public policy choices on an ongoing basis. Yes, they are susceptible to bias, but good statistical models can overcome a lot of such bias and in a world where the response rates for telephone polls continue to drop, there simply isn’t an alternative.  All polling is becoming a function of statistical modeling applied to imperfect datasets.  Offline polls are also expensive, and that cost is climbing as rapidly as response rates are dropping. A poll with a sample size of 800 can cost anywhere between $25,000 and $50,000 depending on the type of sample and the response rate.  Social media is, well, very approximate. As we’ve covered elsewhere in this blog, social media sentiment is noisy, biased, and overall very difficult to measure accurately.

In comes Ranker. The cost of that Cincinnati.com Ranker widget? $0. Its sample size? Nearly 2,000 people, or anywhere between 2 to 4x the average sample size of current political polls. Ranker is also the best way to get people to quickly and efficiently express a meaningful opinion about a complex set of issues, and we have collected thousands of precise opinions about conceptually complex topics like the scariest diseases and the most important life goals by making providing opinions entertaining within a context that makes simple actions meaningful.

Politics is the art of the possible, and we shouldn’t let the impossibility of perfect survey precision preclude the possibility of using technology to improve civic engagement at scale.  If you are an organization seeking to poll public opinion about a particular set of issues that may work well in a list format, we’d invite you to contact us.

– Ravi Iyer

by   Ranker
Staff
in prediction

Ranker Predicts Jacksonville Jaguars to have NFL’s worst record in 2014

Today is the start of the NFL season and building on our success in using crowdsourcing to predict the World Cup, we’d like to release our predictions for the upcoming NFL season.  Using data from our “Which NFL Team Will Have the Worst Record in 2014?” list, which was largely voted on by the community at WalterFootball.com (using a Ranker widget), we would predict the following order of finish, from worst to first.  Unfortunately for fans in Florida, the wisdom of crowds predicts that the Jacksonville Jaguars will finish last this year.

As a point of comparison, I’ll also include predictions from WalterFootball’s Walter Cherepinsky, ESPN (based on power rankings), and Betfair (basted on betting odds for winning the Super Bowl).  Since we are attempting to predict the teams with the worst records in 2014, the worst teams are listed first and the best teams are listed last.

Ranker NFL Worst Team Predictions 2014

The value proposition of Ranker is that we believe that the combined judgments of many individuals is smarter than even the most informed individual experts.  Our predictions were based on over 27,000 votes from 2,900+ fans, taking into account both positive and negative sentiment by combining the raw magnitude of positive votes with the ratio of positive to negative votes.  As research on the wisdom of crowds predicts, the crowd sourced judgments from Ranker should outperform those from the experts.  Of course, there is a lot of luck and randomness that occurs throughout the NFL season, so our results, good or bad, should be taken with a grain of salt.  What is perhaps more interesting is the proposition that crowdsourced data can approximate the results of a betting market like BetFair, for the real value of Ranker data is in predicting things where there is no betting market (e.g. what content should Netflix pursue?).

Stay tuned til the end of the season for results.

– Ravi Iyer

by   Ranker
Staff
in Data

Why Ranker Data is Better than Facebook’s and Twitter’s

 By Clark Benson (CEO, Ranker)

It’s unlikely you’ll be pouring freezing water over your head for it, but the marketing world is experiencing its own Peak Oil crisis.

Yes, you read correctly: we don’t have enough data. At least not enough good data.

Pull up to any marketing RSS and you’ll read the same story: the world is awash in golden insights, companies are able to “know” their customers in real time and predict more and better about their own market … blablabla.

Here’s what you won’t read: it’s really, really hard. And it’s getting harder, for the simple reason that we are all positively drenched in … overwhelmingly bad data. Noisy, incomplete, out of context, approximate, downright misleading data. “Big Data” = (Mostly) Bad Data as it tends to draw explicit behavior from implicit and noisy sources like social media or web visits.

Traditional market research methods are getting less reliable due to dropping response rates, especially among young, tech-savvy consumers. To counteract this trend, marketing research firms have hired hundreds of PhDs to refine the math in their models and try to build a better picture of the zeitgeist, leveraging social media and implicit web behavior. This has proven to be a dangerous proposition, as modeling and research firms have fallen prey to statistics’ number one rule: garbage in, garbage out.

No amount of genius mathematical skills can fix Bad Data, and simple statistical models on well measured data will trump extensive algorithms on badly measured data every single time. Sophisticated statistical models might help in political polling, where people are far more predictable based on party and demographics, but they won’t do anything to help traditional marketing research, where people’s tastes and positions are less entrenched and evolve more rapidly.

Parsing the exact sentiment behind a “like”, a follow or a natural language tweet is extremely difficult, as analysts often lack control over the sample population they are covering, as well as any context about why the action occurred, and what behavior or opinion triggered it. Since there is no negative sentiment to use as control, there is no aibility to unconfound good with popular. Natural language processing algorithms can’t sort out sarcasm, which reigns supreme on social media, and even the best algorithms can’t reliably categorize the sentiment of more than 50% of Twitter’s volume of posts. Others have pointed out the issues with developing a more than razor-thin understanding of consumer mindsets and preferences based on social media data. What does a Facebook “Like” mean, exactly? If you “like” Coca-Cola on Facebook, does it mean that you like the product or the company? And does it necessarily mean you don’t like Pepsi? And what is a “like” worth? Nobody knows.

This is where we come in. We at Ranker have developed a very good answer to this issue: the “opinion graph”, which is a more precise version of the “interest graph” that advertisers are currently using.

Ranker is a popular (top 200 website, 18 million unique visitors and 300 million pageviews per month) that crowdsources answers to questions, using the popular list format.  Visitors to Ranker can view, rank and vote items on around 400,000 lists. Unlike more ambiguous data points based on Facebook likes or twitter tweets, Ranker solicits precise and explicit opinions from users about questions like the most annoying celebrities, the best guilty pleasure movies, the most memorable ad slogansthe top dream colleges, or the best men’s watch brands.

It’s very simple: instead of the vaguely positive act of “liking” a popular actor on Facebook, Ranker visitors cast 8 million votes every month and thus directly express whether they think someone is “hot”, “cool”, one of the “best actors of all-time”, or just one of the “best action stars”. Not only that, they also vote on other lists of items seemingly unrelated to their initial interest: best cars, best beers, most annoying TV shows, etc.

As a result, Ranker has been building since 2008 the world’s largest opinion graph, with 50,000 nodes (topics) and 20 million edges (statistically significant connections between 2 items). Thanks to our massive sample and our rich database of correlations, we can tell you that people who like “Modern Family” are 5x more likely to dine at “Chipotle” than non-fans, or people who like the Nissan 370Z also like oddball comedy movies such as “Napoleon Dynamite” and “Big Lebowski”, and TV shows such as “Dexter” and “Weeds”.

Our exclusive Ranker “FanScope” about the show “Mad Men” lays out this capability in more details below:

Mad Men Data

How good is it? Pretty good. Like “ we predicted the outcome of the World Cup better than Nate Silver’s FiveThirtyEight and Betfair” good.

Our opinion data is also much more precise than Facebook’s, since we not only know that someone who likes Coke is very likely to rank “Jaws” as one of his/her top movies of all time, but we’re able to differentiate between those who like to drink Coke, and those who like Coca-Cola as a company:

jaws chart

We’re also able to differentiate between people who always like Pepsi better than Coke overall, and those who like to drink Coke but just at the movie theater:

  • 47% of Pepsi fans on Ranker vote for (vs. against) Coke on Best Sodas of All Time
  • 65% of Pepsi fans on Ranker vote for (vs. against) Coke on Best Movie Snacks

That’s the kind of specific relationship you can’t get using Facebook data or Twitter messages.

By collecting millions of discrete opinions each month on thousands of diverse topics, Ranker is the only company able to combine internet-level scale (hundreds of thousands surveyed on millions of opinions each month) with market research-level precision (e.g. adjective specific opinions about specific objects in a specific context).

We can poll questions that are too specific (e.g. most memorable slogans) or not lucrative enough (most annoying celebrities) for other pollsters. And we use the same types of mathematical models to address sampling challenges that all pollsters (internet or not internet based) currently have, working with some of the world’s leading academics who study crowdsourcing, such as our Chief Data Scientist Ravi Iyer, and UC Irvine Cognitive Sciences professor Michael Lee.

Our data suggests you won’t be dropping gallons of iced water on your face over it. But if you’re a marketer or an advertiser, we predict it’s likely you will want to pay close attention.

by   Ranker
Staff
in prediction

Ranker World Cup Predictions Outperform Betfair & FiveThirtyEight

Former England international player turned broadcaster Gary Lineker famously said “Football is a simple game; 22 men chase a ball for 90 minutes and at the end, the Germans always win.” That proved true for the 2014 World Cup, with a late German goal securing a 1-0 win over Argentina.

Towards the end of March, we posted predictions for the final ordering of teams in the World Cup, based on Ranker’s re-ranks and voting data. During the tournament, we posted an update, including comparisons with predictions made by FiveThirtyEight and Betfair. With the dust settled in Brazil (and the fireworks in Berlin shelved), it is time to do a final evaluation.

Our prediction was a little different from many others, in that we tried to predict the entire final ordering of all 32 teams. This is different from sites like Betfair, which provided an ordering in terms of the predicted probability each team would be the overall winner. In order to assess our order against the true final result, we used a standard statistical measure called partial tau. It is basically an error measure — 0 would be a perfect prediction, and the larger the value grows the worse the prediction — based on how many “swaps” of a predicted order need to be made to arrive at the true order. The “partial” part of partial tau allows for the fact that the final result of the tournament is not a strict ordering. While the final and 3rd place play-off determined the order of the first four teams: Germany, Argentina, the Netherlands, and Brazil, other groups of teams are effectively tied from then on.  All of the teams eliminated in the quarter finals can be regarded as having finished in equal fifth place. All of the teams eliminated in the first game past the group stage finished equal sixth. And all of the 32 teams eliminated in group play finished equal last.

The model we used to make our predictions involved three sources of information. The first was the ranks and re-ranks provided by users. The second was the up and down votes provided by users. The third was the bracket structure of the tournament itself. As we emphasized in our original post, the initial group stage structure of the World Cup provides strong constraints on where teams can and cannot finish in the final order. Thus, we were interested to test how our model predictions depended on each sources of information. This lead to a total of 8 separate models

  • Random: Using no information, but just placing all 32 teams in a random order.
  • Bracket: Using no information beyond the bracket structure, placing all the teams in an order that was a possible finish, but treating each game as a coin toss.
  • Rank: Using just the ranking data.
  • Vote: Using just the voting data.
  • Rank+Vote: Using the ranking and voting data, but not the bracket structure.
  • Bracket+Vote: Using the voting data and bracket structure, but not the ranking data.
  • Bracket+Rank: Using the ranking data and bracket structure, but not the voting data.
  • Rank+Vote+Bracket: Using all of the information, as per the predictions made in our March blog post.

We also considered the Betfair and FiveThirtyEight rankings, as well as the Ranker Ultimate List at the start of the tournament, as interesting (but maybe slightly unfair, given their different goals) comparisons. The partial taus for all these predictions, with those based on less information on the left, and those based on more information on the right, are shown in the graph below. Remember, lower is better.

The prediction we made using the votes, ranks, and bracket structure out-performed Betfair, FiveThirtyEight, and the Ranker Ultimate List. This is almost certainly because of the use of the bracket information. Interestingly, just using the ranking and bracket structure information, but not the votes, resulted in a slightly better prediction. It seems as if our modeling needs to improve how it benefits from using both ranking and voting data. The Rank+Vote prediction was worse than either source alone. It is also interesting to note that the Bracket information by itself is not useful — it performs almost as poorly as a random order — but it is powerful when combined with people’s opinions, as the improvement from Rank to Bracket+Rank and from Vote to Bracket+Vote show.

by   Ranker
Staff
in Data Science, Pop Culture, prediction

Comparing World Cup Prediction Algorithms – Ranker vs. FiveThirtyEight

Like most Americans, I pay attention to soccer/football once every four years.  But I think about prediction almost daily and so this year’s World Cup will be especially interesting to me as I have a dog in this fight.  Specifically, UC-Irvine Professor Michael Lee put together a prediction model based on the combined wisdom of Ranker users who voted on our Who will win the 2014 World Cup list, plus the structure of the tournament itself.  The methodology runs in contrast to the FiveThirtyEight model, which uses entirely different data (national team results plus the results of players who will be playing for the national team in league play) to make predictions.  As such, the battle lines are clearly drawn.  Will the Wisdom of Crowds outperform algorithmic analyses based on match results?  Or a better way of putting it might be that this is a test of whether human beings notice things that aren’t picked up in the box scores and statistics that form the core of FiveThirtyEight’s predictions or sabermetrics.

So who will I be rooting for?  Both methodologies agree that Brazil, Germany, Argentina, and Spain are the teams to beat.  But the crowds believe that those four teams are relatively evenly matched while the FiveThirtyEight statistical model puts Brazil as having a 45% chance to win.  After those first four, the models diverge quite a bit with the crowd picking the Netherlands, Italy, and Portugal amongst the next few (both models agree on Colombia), while the FiveThirtyEight model picks Chile, France, and Uruguay.  Accordingly, I’ll be rooting for the Netherlands, Italy, and Portugal and against Chile, France, and Uruguay.

In truth, the best model would combine the signal from both methodologies, similar to how the Netflix prize was won or how baseball teams combine scout and sabermetric opinions.  I’m pretty sure that Nate Silver would agree that his model would be improved by adding our data (or similar data from betting markets that similarly think that FiveThirtyEight is underrating Italy and Portugal) and vice versa.  Still, even as I know that chance will play a big part in the outcome, I’m hoping Ranker data wins in this year’s world cup.

– Ravi Iyer

Ranker’s Pre-Tournament Predictions:

FiveThirtyEight’s Pre-Tournament Predictions:

Page 1 of 612345...Last »