Lists are the Best way to get Opinion Graph Data: Comparing Ranker to State & Squerb

I was recently forwarded an article about Squerb, which shares an opinion we have long agreed with.  Specifically…

““Most sites rely on simple heuristics like thumbs-up, ‘like’ or 1-5 stars,” stated Squerb founder and CEO Chris Biscoe. He added that while those tools offer a quick overview of opinion, they don’t offer much in the way of meaningful data.

It reminds me a bit of State, another company building an opinion graph that connects more specific opinions to specific objects in the world.  They too are built upon the idea that existing sources of big data opinions, e.g. mining tweets and facebook likes, have inherent limitations.  From this Wired UK article:

Doesn’t Twitter already provide a pretty good ‘opinion network’? Alex thinks not. “The opinions out there in the world today represent a very thin slice. Most people are not motivated to express their opinion and the opinions out there for the most part are very chaotic and siloed. 98 percent of people never get heard,” he told Wired.co.uk.

I think more and more people who try to parse Facebook and Twitter data for deeper Netflix AltGenre-like opinions will realize the limitations of such data, and attempt to collect better opinion data.  In the end, I think collecting better opinion data will inevitably involve the list format that Ranker specializes in.  Lists have a few important advantages over the methods that Squerb and State are using, which include slick interfaces for tagging semantic objects with adjectives.  The advantages of lists include:

  • Lists are popular and easily digestible.  There is a reason why every article on Cracked is a list.  Lists appeal to the masses, which is precisely the audience that Alex Asseily is trying to reach on State.  To collect mass opinions, one needs a site that appeals to the masses, which is why Ranker has focused on growth as a consumer destination site, that currently collects millions of opinions.
  • Lists provide the context of other items.  It’s one thing to think that Army of Darkness is a good movie.  But how does it compare to other Zombie Movies?  Without context, it’s hard to compare people’s opinions as we all have different thresholds for different adjectives.  The presence of other items lets people consider alternatives they may not have considered in a vacuum and allows better interpretation of non-response.
  • Lists provide limits to what is being considered.  For example, consider the question of whether Tom Cruise is a good actor?  Is he one of the Best Actors of All-time?  one of the Best Action Stars?  One of the Best Actors Working Today?  Ranker data shows that people’s answers usually depend on the context (e.g. Tom Cruise gets a lot of downvotes as one of the best actors of all-time, but is indeed considered one of the best action stars.)
  • Lists are useful, especially in a mobile friendly world.

In short, collecting opinions using lists produces both more data and better data.  I welcome companies that seek to collect semantic opinion data as the opportunity is large and there are network effects such that each of our datasets is more valuable when other datasets with different biases are available for mashups.  As others realize the importance of opinion graphs, we likely will see more companies in this space and my guess is that many of these companies will evolve along the path that Ranker has taken, toward the list format.

– Ravi Iyer

How Netflix’s AltGenre Movie Grammar Illustrates the Future of Search Personalization

I recently got sent this Atlantic article on how Netflix reverse engineered Hollywood by a few contacts, and it happens to mirror my long term vision for how Ranker’s data fits into the future of search personalization.  Netflix’s goal, to put “the right title in front of the right person at the right time,” is very similar to what Apple, Bing, Google, and Facebook are attempting to do with regards to personalized contextual search.  Rather than you having to type in “best kitchen gadgets for mothers”, applications like Google Now and Cue (bought by Apple) hope to eventually be able to surface this information to you in real time, knowing not only when your mother’s birthday is, but also that you tend to buy kitchen gadgets for her, and knowing what the best rated kitchen gadgets that aren’t too complex and are in your price range happen to be.  If the application was good enough, a lot of us would trust it to simply charge our credit card and send the right gift.  But obviously we are a long way from that reality.

Netflix’s altgenre movie grammar (e.g. Irreverent Werewolf Movies Of The 1960s) gives us a glimpse of the level of specificity that would be required to get us there.  Consider what you need to know to buy the right gift for your mom.  You aren’t just looking for a kitchen gadget, but one with specific attributes.  In altgenre terminology, you might be looking for “best simple, beautifully designed kitchen gadgets of 2014 that cost between $25 and $100” or “best kitchen gadgets for vegetarian technophobes”.  Google knows that simple text matching is not going to get it the level of precision necessary to provide such answers, which is why semantic search, where the precise meaning of pages is mapped, has become a strategic priority.

However, the universe of altgenre equivalents in the non-movie world is nearly endless (e.g. Netflix has thousands of ways just to classify movies), which is where Ranker comes in, as one of the world’s largest sources for collecting explicit cross-domain altgenre-like opinions.  Semantic data from sources like wikipedia, dbpedia, and freebase can help you put together factual altgenres like “of the 60s” or “that starred Brad Pitt“, but you need opinion ratings to put together subtler data like “guilty pleasures” or “toughest movie badasses“.  Netflix’s success is proof of the power of this level of specificity in personalizing movies and consider how they produced this knowledge.  Not through running machine learning algorithms on their endless stream of user behavior data, but rather by soliciting explicit ratings along these dimensions by paying “people to watch films and tag them with all kinds of metadata” using a “36-page training document that teaches them how to rate movies on their suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.”  Some people may think that with enough data, TripAdvisor should be able to tell you which cities are “cool”, but big data is not always better data.  Most data scientists will tell you the importance of defining the features in any recommendation task (see this article for technical detail on this), rather than assuming that a large amount of data will reveal all of the right dimensions.  The wrong level of abstraction can make prediction akin to trying to predict who will win the superbowl by knowing the precise position and status of every cell in every player on every NFL team.  Netflix’s system allows them to make predictions at the right level of abstraction.

The future of search needs a Netflix grammar that goes beyond movies.  It needs to able to understand not only which movies are dark versus gritty, but also which cities are better babymoon destinations versus party cities and which rock singers are great vocalists versus great frontmen.  Ranker lists actually have a similar grammar to Netflix movies, except that we apply this grammar beyond the movie domain.  In a subsequent post, I’ll go into more detail about this, but suffice it to say for now that I’m hopeful that our data will eventually play a similar role in the personalization of non-movie content that Netflix’s microtagging plays in film recommendations.

– Ravi Iyer

 

Why Topsy/Twitter Data may never predict what matters to the rest of us

Recently Apple paid a reported $200 million for Topsy and some speculate that the reason for this purchase is to improve recommendations for products consumed using Apple devices, leveraging the data that Topsy has from Twitter.  This makes perfect sense to me, but the utility of Twitter data in predicting what people want is easy to overstate, largely because people often confuse bigger data with better data.  There are at least 2 reasons why there is a fairly hard ceiling on how much Twitter data will ever allow one to predict about what regular people want.

1.  Sampling – Twitter has a ton of data, with daily usage of around 10%.  Sample size isn’t the issue here as there is plenty of data, but rather the people who use Twitter are a very specific set of people.  Even if you correct for demographics, the psychographic of people who want to share their opinion publicly and regularly (far more people have heard of Twitter than actually use it) is way too unique to generalize to the average person, in the same way that surveys of landline users cannot be used to predict what psychographically distinct cellphone users think.

2. Domain Comprehensiveness – The opinions that people share on Twitter are biased by the medium, such that they do not represent the spectrum of things many people care about.  There are tons of opinions on entertainment, pop culture, and links that people want to promote, since they are easy to share quickly, but very little information on people’s important life goals or the qualities we admire most in a person or anything where people’s opinions are likely to be more nuanced.  Even where we have opinions in those domains, they are likely to be skewed by the 140 character limit.

Twitter (and by extension, companies that use their data like Topsy and DataSift) has a treasure trove of information, but people working on next generation recommendations and semantic search should realize that it is a small part of the overall puzzle given the above limitations.  The volume of information gives you a very precise measure of a very specific group of people’s opinions about very specific things, leaving out the vast majority of people’s opinions about the vast majority of things.  When you add in the bias introduced by analyzing 140 character natural language, there is a great deal of variance in recommendations that likely will have to be provided by other sources.

At Ranker, we have similar sampling issues, in that we collect much of our data at Ranker.com, but we are actively broadening our reach through our widget program, that now collects data on thousands of partner sites.  Our ranked list methodology certainly has bias too, which we attempt to mitigate that through combining voting and ranking data.  The key is not in the volume of data, but rather in the diversity of data, which helps mitigate the bias inherent in any particular sampling/data collection method.

Similarly, people using Twitter data would do well to consider issues of data diversity and not be blinded by large numbers of users and data points.  Certainly Twitter is bound to be a part of understanding consumer opinions, but the size of the dataset alone will not guarantee that it will be a central part.  Given these issues, either Twitter will start to diversify the ways that it collects consumer sentiment data or the best semantic search algorithms will eventually use Twitter data as but one narrowly targeted input of many.

– Ravi Iyer