by    in Data Science, Pop Culture, prediction

Ranker Predicts Spurs to beat Cavaliers for 2015 NBA Championship

The NBA Season starts tonight and building on the proven success of our World Cup and movie box office predictions, as well as the preliminary success of our NFL predictions, Ranker is happy to announce our 2015 NBA Championship Predictions, based upon the aggregated data from basketball fans who have weighed in on our NBA and basketball lists.

Ranker's 2015 NBA Championship Predictions as Compared to ESPN and FiveThirtyEight
Ranker’s 2015 NBA Championship Predictions as Compared to ESPN and FiveThirtyEight

For comparison’s sake, I included the current ESPN power rankings as well as FiveThirtyEight’s teams that have the most percentage chance of winning the championship.  As with any sporting event, chance will play a large role in the outcome, but the premise of producing our predictions regularly is to validate our belief that the aggregated opinions of many will generally outperform expert opinions (ESPN) or models based on non-opinion data (e.g. player performance data plays a large role in FiveThirtyEight’s predictions).  Our ultimate goal is to prove the utility of crowdsourced data, as while something like NBA predictions is a crowded space where many people attempt to answer this question, Ranker produces the world’s only significant data model for equally important questions, such as determining the world’s best DJseveryone’s biggest turn-ons or the best cheeses for a grilled cheese sandwich.

– Ravi Iyer

by    in Opinion Graph, Rankings

Characteristics of People who are less Afraid of Ebola

Ebola is everywhere in the news these days, even as Ebola trails other causes of death by wide margins.  Clearly the risks are great, so some amount of fear is certainly justified, but many have taken it to levels that do not make sense scientifically, making back of the envelope projections for its spread based on anecdotal evidence and/or positing that its only a matter of time before the virus evolves into an airborne disease, as diseases regularly mutate to enable more killing in movies.  Regardless of whether Ebola warrants fear or outright panic, the consensus is that it is scary, as also evidenced by its clear #1 ranking on Ranker‘s Scariest Diseases of All Time list.  Yet, among those who are fearful, I couldn’t help but wonder, what are the characteristics of people who tend to be less afraid than others?  Using the metadata associated with users who voted and reranked this list, in combination with their other activity on the site, here are a few things I found.

– Ebola fear appears to be slightly less prevalent in the Northeast, as compared to other regions of the US, and slightly more prevalent in the South.

– Older people tend to be slightly less afraid of Ebola, often expressing more fear of Alzheimer’s.

– International visitors to this list are half as likely to vote for Ebola, as compared to Americans.

– People who are afraid of Ebola are 4.4x as likely to be afraid of Dengue Fever.

– People who are afraid of Strokes, Parkinson’s Disease, Muscular Distrophy, Influenza, and/or Depression are about half as likely to believe that Ebola is one of the world’s scariest diseases.

Bear in mind that these results are based on degree of fear and ALL people are afraid of Ebola.  The fear in some groups is simply less pronounced and only the last 3 results are statistically significant based on classical statistical methods.  There are plausible explanations for all of the above, ranging from the fact that conservative areas of the country are likely more responsive to potential threats, to the fact that losing one’s mind over time to Alzheimer’s really may be much scarier for older people versus a quick death, to the fact that people who are afraid of foreign diseases prevalent in tropical areas likely fear other foreign diseases prevalent in tropical areas.

To me the most interesting fact is that people who are afraid of more common everyday diseases, including Influenza, which kills thousands every year, appear to be less afraid of Ebola than others.  Human beings are wired to be more afraid of the new and spectacular, as much psychological research has shown.  That fear kept many of our ancestors alive, so I wouldn’t dismiss it as wrong.  But it is interesting to observe that perhaps some of us are less wired in this way than others.

– Ravi Iyer

by    in Opinion Graph, Rankings

Ranky Goes to Washington?

Something pretty cool happened last week here at Ranker, and it had nothing to do with the season premiere of the “Big Bang Theory”, which we’re also really excited about. Cincinnati’s number one digital paper used our widget to create a votable list of ideas mentioned in Cincinnati Mayor John Cranley’s first State of the City. As of right now, 1,958 voters cast 5,586 votes on the list of proposals from Mayor Cranley (not surprisingly, “fixing streets” ranks higher than the “German-style beer garden” that’s apparently also an option).

Now, our widget is used by thousands of websites to either take one of our votable lists or create their own and embed it on their site, but this was the very first time Ranker was used to directly poll people on public policy initiatives.

Here’s why we’re loving this idea: we feel confident that Ranker lists are the most fun and reliable way to poll people at scale about a list of items within a specific context. That’s what we’ve been obsessing about for the past 6 years. But we also think this could lead to a whole new way for people to weigh in in fairly  large numbers on complex public policy issues on an ongoing basis, from municipal budgets to foreign policy. That’s because Ranker is very good at getting a large number of people to cast their opinion about complex issues in ways that can’t be achieved at this scale through regular polling methods (nobody’s going to call you at dinner time to ask you to rank 10 or 20 municipal budget items … and what is “dinner time” these days, anyway?).  It may not be a representative sample, but it may be the only sample that matters, given that the average citizen of Cincinnati will have no idea about the details within the Mayor’s speech and likely will give any opinion simply to move a phone survey conversation along about a topic they know little about.

Of course, the democratic process is the best way to get the best sample (there’s little bias when it’s the whole friggin voting population!) to weigh in on public policy as a whole. But elections are very expensive, infrequent, and the focus of their policy debates is the broadest possible relative to their geographical units, meaning that micro-issues like these will often get lost in same the tired partisan debates.

Meanwhile, society, technology, and the economy no longer operate on cycles consistent with elections cycles: the rate and breadth of societal change is such that the public policy environment specific to an election quickly becomes obsolete, and new issues quickly need sorting out as they emerge, something our increasingly polarized legislative processes have a hard time doing.

Online polls are an imperfect, but necessary, way to evaluate public policy choices on an ongoing basis. Yes, they are susceptible to bias, but good statistical models can overcome a lot of such bias and in a world where the response rates for telephone polls continue to drop, there simply isn’t an alternative.  All polling is becoming a function of statistical modeling applied to imperfect datasets.  Offline polls are also expensive, and that cost is climbing as rapidly as response rates are dropping. A poll with a sample size of 800 can cost anywhere between $25,000 and $50,000 depending on the type of sample and the response rate.  Social media is, well, very approximate. As we’ve covered elsewhere in this blog, social media sentiment is noisy, biased, and overall very difficult to measure accurately.

In comes Ranker. The cost of that Cincinnati.com Ranker widget? $0. Its sample size? Nearly 2,000 people, or anywhere between 2 to 4x the average sample size of current political polls. Ranker is also the best way to get people to quickly and efficiently express a meaningful opinion about a complex set of issues, and we have collected thousands of precise opinions about conceptually complex topics like the scariest diseases and the most important life goals by making providing opinions entertaining within a context that makes simple actions meaningful.

Politics is the art of the possible, and we shouldn’t let the impossibility of perfect survey precision preclude the possibility of using technology to improve civic engagement at scale.  If you are an organization seeking to poll public opinion about a particular set of issues that may work well in a list format, we’d invite you to contact us.

– Ravi Iyer

by    in prediction

Ranker Predicts Jacksonville Jaguars to have NFL’s worst record in 2014

Today is the start of the NFL season and building on our success in using crowdsourcing to predict the World Cup, we’d like to release our predictions for the upcoming NFL season.  Using data from our “Which NFL Team Will Have the Worst Record in 2014?” list, which was largely voted on by the community at WalterFootball.com (using a Ranker widget), we would predict the following order of finish, from worst to first.  Unfortunately for fans in Florida, the wisdom of crowds predicts that the Jacksonville Jaguars will finish last this year.

As a point of comparison, I’ll also include predictions from WalterFootball’s Walter Cherepinsky, ESPN (based on power rankings), and Betfair (basted on betting odds for winning the Super Bowl).  Since we are attempting to predict the teams with the worst records in 2014, the worst teams are listed first and the best teams are listed last.

Ranker NFL Worst Team Predictions 2014

The value proposition of Ranker is that we believe that the combined judgments of many individuals is smarter than even the most informed individual experts.  Our predictions were based on over 27,000 votes from 2,900+ fans, taking into account both positive and negative sentiment by combining the raw magnitude of positive votes with the ratio of positive to negative votes.  As research on the wisdom of crowds predicts, the crowd sourced judgments from Ranker should outperform those from the experts.  Of course, there is a lot of luck and randomness that occurs throughout the NFL season, so our results, good or bad, should be taken with a grain of salt.  What is perhaps more interesting is the proposition that crowdsourced data can approximate the results of a betting market like BetFair, for the real value of Ranker data is in predicting things where there is no betting market (e.g. what content should Netflix pursue?).

Stay tuned til the end of the season for results.

– Ravi Iyer

by    in Data Science, Pop Culture, prediction

Comparing World Cup Prediction Algorithms – Ranker vs. FiveThirtyEight

Like most Americans, I pay attention to soccer/football once every four years.  But I think about prediction almost daily and so this year’s World Cup will be especially interesting to me as I have a dog in this fight.  Specifically, UC-Irvine Professor Michael Lee put together a prediction model based on the combined wisdom of Ranker users who voted on our Who will win the 2014 World Cup list, plus the structure of the tournament itself.  The methodology runs in contrast to the FiveThirtyEight model, which uses entirely different data (national team results plus the results of players who will be playing for the national team in league play) to make predictions.  As such, the battle lines are clearly drawn.  Will the Wisdom of Crowds outperform algorithmic analyses based on match results?  Or a better way of putting it might be that this is a test of whether human beings notice things that aren’t picked up in the box scores and statistics that form the core of FiveThirtyEight’s predictions or sabermetrics.

So who will I be rooting for?  Both methodologies agree that Brazil, Germany, Argentina, and Spain are the teams to beat.  But the crowds believe that those four teams are relatively evenly matched while the FiveThirtyEight statistical model puts Brazil as having a 45% chance to win.  After those first four, the models diverge quite a bit with the crowd picking the Netherlands, Italy, and Portugal amongst the next few (both models agree on Colombia), while the FiveThirtyEight model picks Chile, France, and Uruguay.  Accordingly, I’ll be rooting for the Netherlands, Italy, and Portugal and against Chile, France, and Uruguay.

In truth, the best model would combine the signal from both methodologies, similar to how the Netflix prize was won or how baseball teams combine scout and sabermetric opinions.  I’m pretty sure that Nate Silver would agree that his model would be improved by adding our data (or similar data from betting markets like Betfair that similarly thought that FiveThirtyEight was underrating Italy and Portugal) and vice versa.  Still, even as I know that chance will play a big part in the outcome, I’m hoping Ranker data wins in this year’s world cup.

– Ravi Iyer

Ranker’s Pre-Tournament Predictions:

FiveThirtyEight’s Pre-Tournament Predictions:

by    in Game of Thrones

Gender and the Moral Psychology of Game of Thrones

Most of my published academic work is in the field of moral psychology, where we study the moral reasoning behind judgments of right and wrong.  As I have previously argued, such study does not belong solely in the realm of university psychology labs, but also should be extended to the realm of “big data”, where online behavior is examined for convergence with what we see in the lab.  Ranker collects millions of user opinions each month on all sorts of topics, and one of them, where users rank the most uncomfortable moments in Game of Thrones, is actually very similar to psychology studies where we ask participants to rate the rightness or wrongness of various situations.

Amongst the situations to be voted on are:

  • Graphic Violence (Khaleesi Eats a Horse Heart, Execution of Eddard Stark)
  • Incest (Lannister Family Values, Theon Makes a Pass at Sister)
  • Sexual Violence (Danerys And Viserys, Jamie Rapes His Sister)
  • Homosexuality (Loras and Renly Shave and Scheme)

Men and women were equally likely to vote on items on this list (each gender averaged six votes per user), but women were twice as likely to be affected by sexual violence toward women, including Viserys’ lude treatment of his sister Danerys or The Red Wedding, which included the stabbing of a pregnant woman, than were men.  In contrast, men were made most uncomfortable by hints of homosexuality (Loras and Renly shaving each other’s chests), being seven times more likely to find this scene uncomfortable.  These patterns are convergent with research on mirror neurons, which indicate that people are most likely to be made uncomfortable by situations that threaten their self-identity, as well as accounts of women being driven to stop watching the show, due to the prevalence of depictions of violence against women.

Other patterns on this list also converged with previous research.  Americans, who may be less sensitive to violence due to its prevalence in American culture, were less affected by scenes such as the execution of Eddard Stark and Khaleesi eating a horse heart.  Southerners, who are more likely to be sensitive to purity concerns, were more affected by Petyr Baelish and Lord Varys’ discussion of perversity.

None of these findings are carefully controlled trials, so these patterns could have many explanations.  However, all research methods have flaws, and I would argue that it is the convergence of real world behavior with academic research that leads to true understanding.  Given Ranker’s new emphasis on Game of Thrones related content (like our Ranker of Thrones Facebook page if you’re a fan), more analyses of the repeated moral ambiguity in Game of Thrones are forthcoming and I would welcome new hypotheses to test.  What would you expect men/women to agree or disagree on?  Older vs. Younger fans?  West coasters vs. East Coasters?

– Ravi Iyer

Can Colbert bring young Breaking Bad Fans to The Late Show?

I have to admit that I thought it was a joke at first when I heard the news that Stephen Colbert is leaving The Colbert Report and is going to host the Late Show, currently hosted by David Letterman.  The fact that he won’t be “in character” in the new show makes it more intriguing, even as it brings tremendous change to my entertainment universe.  However, while it will take some getting used to, looking at Ranker data on the two shows reveals how the change really does make sense for CBS.

Despite the ire of those who disagree with The Colbert Report’s politics, CBS is definitely addressing a need to compete better for younger viewers, who are less likely to watch TV on the major networks.  Ranker users tend to be in the 18-35 year old age bracket and The Colbert Report ranks higher than the Late Show on most every list that they both are on including the Funniest TV shows of 2012 (19 vs. 28), Best TV Shows of All-Time (186 vs. 197), and Best TV Shows of Recent Memory (37 vs. 166).  Further, people who tend to like The Colbert Report also seem to like many of the most popular shows around like Breaking Bad, Mad Men, Game of Thrones, and 30 Rock.  In contrast, correlates of the Late Show include older shows like The Sopranos and 60 Minutes.  There is some overlap as fans of both shows like The West Wing and The Daily Show, indicating that Colbert may be able to appeal to current fans as well as new audiences.

Colbert Can Expand Late Show's Audience to New Groups, yet Retain Many Current Fans.

I’ll be sad to see “Stephen Colbert” the character go.  But it looks like my loss is CBS’ gain.

– Ravi Iyer

Lists are the Best way to get Opinion Graph Data: Comparing Ranker to State & Squerb

I was recently forwarded an article about Squerb, which shares an opinion we have long agreed with.  Specifically…

““Most sites rely on simple heuristics like thumbs-up, ‘like’ or 1-5 stars,” stated Squerb founder and CEO Chris Biscoe. He added that while those tools offer a quick overview of opinion, they don’t offer much in the way of meaningful data.

It reminds me a bit of State, another company building an opinion graph that connects more specific opinions to specific objects in the world.  They too are built upon the idea that existing sources of big data opinions, e.g. mining tweets and facebook likes, have inherent limitations.  From this Wired UK article:

Doesn’t Twitter already provide a pretty good ‘opinion network’? Alex thinks not. “The opinions out there in the world today represent a very thin slice. Most people are not motivated to express their opinion and the opinions out there for the most part are very chaotic and siloed. 98 percent of people never get heard,” he told Wired.co.uk.

I think more and more people who try to parse Facebook and Twitter data for deeper Netflix AltGenre-like opinions will realize the limitations of such data, and attempt to collect better opinion data.  In the end, I think collecting better opinion data will inevitably involve the list format that Ranker specializes in.  Lists have a few important advantages over the methods that Squerb and State are using, which include slick interfaces for tagging semantic objects with adjectives.  The advantages of lists include:

  • Lists are popular and easily digestible.  There is a reason why every article on Cracked is a list.  Lists appeal to the masses, which is precisely the audience that Alex Asseily is trying to reach on State.  To collect mass opinions, one needs a site that appeals to the masses, which is why Ranker has focused on growth as a consumer destination site, that currently collects millions of opinions.
  • Lists provide the context of other items.  It’s one thing to think that Army of Darkness is a good movie.  But how does it compare to other Zombie Movies?  Without context, it’s hard to compare people’s opinions as we all have different thresholds for different adjectives.  The presence of other items lets people consider alternatives they may not have considered in a vacuum and allows better interpretation of non-response.
  • Lists provide limits to what is being considered.  For example, consider the question of whether Tom Cruise is a good actor?  Is he one of the Best Actors of All-time?  one of the Best Action Stars?  One of the Best Actors Working Today?  Ranker data shows that people’s answers usually depend on the context (e.g. Tom Cruise gets a lot of downvotes as one of the best actors of all-time, but is indeed considered one of the best action stars.)
  • Lists are useful, especially in a mobile friendly world.

In short, collecting opinions using lists produces both more data and better data.  I welcome companies that seek to collect semantic opinion data as the opportunity is large and there are network effects such that each of our datasets is more valuable when other datasets with different biases are available for mashups.  As others realize the importance of opinion graphs, we likely will see more companies in this space and my guess is that many of these companies will evolve along the path that Ranker has taken, toward the list format.

– Ravi Iyer

by    in About Ranker, Opinion Graph, Pop Culture, Rankings

Ranker’s Rankings API Now in Beta

Increasingly, people are looking for specific answers to questions as opposed to webpages that happen to match the text they type into a search engine.  For example, if you search for the capital of France or the birthdate of Leonardo Da Vinci, you get a specific answer.  However, the questions that people ask are increasingly about opinions, not facts, as people are understandably more interested in what the best movie of 2013 was, as opposed to who the producer for Star Trek: Into Darkness was.

Enter Ranker’s Rankings API, which is currently now in beta, as we’d love the input of potential users’ of our API to help improve it.  Our API returns aggregated opinions about specific movies, people, tv shows, places, etc.  As an input, we can take a Wikipedia, Freebase, or Ranker ID.  For example, below is a request for information about Tom Cruise, using his Ranker ID from his Ranker page (contact us if you want to use other IDs to access).
http://api.ranker.com/rankings/?ids=2257588&type=RANKER

In the response to this request, you’ll get a set of Rankings for the requested object, including a set of list names (e.g. “listName”:”The Greatest 80s Teen Stars”), list urls (e.g. “listUrl”:”http://www.ranker.com/crowdranked-list/45-greatest-80_s-teen-stars” – note that the domain, www.ranker.com, is implied), item names (e.g. “itemName”:”Tom Cruise”) position of the item on this list (e.g. “position”:21), number of items on the list (e.g. “numItemsOnList”:70), the number of people who have voted on this list (e.g. “numVoters”:1149), the number of positive votes for this item (e.g. “numUpVotes”:245) vs. the number of negative votes (e.g. “numDownVotes”:169), and the Ranker list id (e.g. “listId”:584305).  Note that results are cached so they may not match the current page exactly.

Here is a snipped of the response for Tom Cruise.

[ { “itemName” : “Tom Cruise”,
“listId” : 346881,
“listName” : “The Greatest Film Actors & Actresses of All Time”,
“listUrl” : “http://www.ranker.com/crowdranked-list/the-greatest-film-actors-and-actresses-of-all-time”,
“numDownVotes” : 306,
“numItemsOnList” : 524,
“numUpVotes” : 285,
“numVoters” : 5305,
“position” : 85
},
{ “itemName” : “Tom Cruise”,
“listId” : 542455,
“listName” : “The Hottest Male Celebrities”,
“listUrl” : “http://www.ranker.com/crowdranked-list/hottest-male-celebrities”,
“numDownVotes” : 175,
“numItemsOnList” : 171,
“numUpVotes” : 86,
“numVoters” : 1937,
“position” : 63
},
{ “itemName” : “Tom Cruise”,
“listId” : 679173,
“listName” : “The Best Actors in Film History”,
“listUrl” : “http://www.ranker.com/crowdranked-list/best-actors”,
“numDownVotes” : 151,
“numItemsOnList” : 272,
“numUpVotes” : 124,
“numVoters” : 1507,
“position” : 102
}

…CLIPPED….
]

What can you do with this API?  Consider this page about Tom Cruise from Google’s Knowledge Graph.  It tells you his children, his spouse(s), and his movies.  But our API will tell you that he is one of the hottest male celebrities, an annoying A-List actor, an action star, a short actor, and an 80s teen star.  His name comes up in discussions of great actors, but he tends to get more downvotes than upvotes on such lists, and even shows up on lists of “overrated” actors.

We can provide this information, not just about actors, but also about politicians, books, places, movies, tv shows, bands, athletes, colleges, brands, food, beer, and more.  We will tend to have more information about entertainment related categories, for now, but as the domains of our lists grow, so too will the breadth of opinion related information available from our API.

Our API is free and no registration is required, though we would request that you provide links and attributions to the Ranker lists that provide this data.  We likely will add some free registration at some point.  There are currently no formal rate limits, though there are obviously practical limits so please contact us if you plan to use the API heavily as we may need to make changes to accommodate such usage.  Please do let me know (ravi a t ranker) your experiences with our API and any suggestions for improvements as we are definitely looking to improve upon our beta offering.

– Ravi Iyer

How Netflix’s AltGenre Movie Grammar Illustrates the Future of Search Personalization

I recently got sent this Atlantic article on how Netflix reverse engineered Hollywood by a few contacts, and it happens to mirror my long term vision for how Ranker’s data fits into the future of search personalization.  Netflix’s goal, to put “the right title in front of the right person at the right time,” is very similar to what Apple, Bing, Google, and Facebook are attempting to do with regards to personalized contextual search.  Rather than you having to type in “best kitchen gadgets for mothers”, applications like Google Now and Cue (bought by Apple) hope to eventually be able to surface this information to you in real time, knowing not only when your mother’s birthday is, but also that you tend to buy kitchen gadgets for her, and knowing what the best rated kitchen gadgets that aren’t too complex and are in your price range happen to be.  If the application was good enough, a lot of us would trust it to simply charge our credit card and send the right gift.  But obviously we are a long way from that reality.

Netflix’s altgenre movie grammar (e.g. Irreverent Werewolf Movies Of The 1960s) gives us a glimpse of the level of specificity that would be required to get us there.  Consider what you need to know to buy the right gift for your mom.  You aren’t just looking for a kitchen gadget, but one with specific attributes.  In altgenre terminology, you might be looking for “best simple, beautifully designed kitchen gadgets of 2014 that cost between $25 and $100” or “best kitchen gadgets for vegetarian technophobes”.  Google knows that simple text matching is not going to get it the level of precision necessary to provide such answers, which is why semantic search, where the precise meaning of pages is mapped, has become a strategic priority.

However, the universe of altgenre equivalents in the non-movie world is nearly endless (e.g. Netflix has thousands of ways just to classify movies), which is where Ranker comes in, as one of the world’s largest sources for collecting explicit cross-domain altgenre-like opinions.  Semantic data from sources like wikipedia, dbpedia, and freebase can help you put together factual altgenres like “of the 60s” or “that starred Brad Pitt“, but you need opinion ratings to put together subtler data like “guilty pleasures” or “toughest movie badasses“.  Netflix’s success is proof of the power of this level of specificity in personalizing movies and consider how they produced this knowledge.  Not through running machine learning algorithms on their endless stream of user behavior data, but rather by soliciting explicit ratings along these dimensions by paying “people to watch films and tag them with all kinds of metadata” using a “36-page training document that teaches them how to rate movies on their suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.”  Some people may think that with enough data, TripAdvisor should be able to tell you which cities are “cool”, but big data is not always better data.  Most data scientists will tell you the importance of defining the features in any recommendation task (see this article for technical detail on this), rather than assuming that a large amount of data will reveal all of the right dimensions.  The wrong level of abstraction can make prediction akin to trying to predict who will win the superbowl by knowing the precise position and status of every cell in every player on every NFL team.  Netflix’s system allows them to make predictions at the right level of abstraction.

The future of search needs a Netflix grammar that goes beyond movies.  It needs to able to understand not only which movies are dark versus gritty, but also which cities are better babymoon destinations versus party cities and which rock singers are great vocalists versus great frontmen.  Ranker lists actually have a similar grammar to Netflix movies, except that we apply this grammar beyond the movie domain.  In a subsequent post, I’ll go into more detail about this, but suffice it to say for now that I’m hopeful that our data will eventually play a similar role in the personalization of non-movie content that Netflix’s microtagging plays in film recommendations.

– Ravi Iyer

 

Page 2 of 612345...Last »