by    in About Ranker, Opinion Graph, Pop Culture, Rankings

Ranker’s Rankings API Now in Beta

Increasingly, people are looking for specific answers to questions as opposed to webpages that happen to match the text they type into a search engine.  For example, if you search for the capital of France or the birthdate of Leonardo Da Vinci, you get a specific answer.  However, the questions that people ask are increasingly about opinions, not facts, as people are understandably more interested in what the best movie of 2013 was, as opposed to who the producer for Star Trek: Into Darkness was.

Enter Ranker’s Rankings API, which is currently now in beta, as we’d love the input of potential users’ of our API to help improve it.  Our API returns aggregated opinions about specific movies, people, tv shows, places, etc.  As an input, we can take a Wikipedia, Freebase, or Ranker ID.  For example, below is a request for information about Tom Cruise, using his Ranker ID from his Ranker page (contact us if you want to use other IDs to access).
http://api.ranker.com/rankings/?ids=2257588&type=RANKER

In the response to this request, you’ll get a set of Rankings for the requested object, including a set of list names (e.g. “listName”:”The Greatest 80s Teen Stars”), list urls (e.g. “listUrl”:”http://www.ranker.com/crowdranked-list/45-greatest-80_s-teen-stars” – note that the domain, www.ranker.com, is implied), item names (e.g. “itemName”:”Tom Cruise”) position of the item on this list (e.g. “position”:21), number of items on the list (e.g. “numItemsOnList”:70), the number of people who have voted on this list (e.g. “numVoters”:1149), the number of positive votes for this item (e.g. “numUpVotes”:245) vs. the number of negative votes (e.g. “numDownVotes”:169), and the Ranker list id (e.g. “listId”:584305).  Note that results are cached so they may not match the current page exactly.

Here is a snipped of the response for Tom Cruise.

[ { “itemName” : “Tom Cruise”,
“listId” : 346881,
“listName” : “The Greatest Film Actors & Actresses of All Time”,
“listUrl” : “http://www.ranker.com/crowdranked-list/the-greatest-film-actors-and-actresses-of-all-time”,
“numDownVotes” : 306,
“numItemsOnList” : 524,
“numUpVotes” : 285,
“numVoters” : 5305,
“position” : 85
},
{ “itemName” : “Tom Cruise”,
“listId” : 542455,
“listName” : “The Hottest Male Celebrities”,
“listUrl” : “http://www.ranker.com/crowdranked-list/hottest-male-celebrities”,
“numDownVotes” : 175,
“numItemsOnList” : 171,
“numUpVotes” : 86,
“numVoters” : 1937,
“position” : 63
},
{ “itemName” : “Tom Cruise”,
“listId” : 679173,
“listName” : “The Best Actors in Film History”,
“listUrl” : “http://www.ranker.com/crowdranked-list/best-actors”,
“numDownVotes” : 151,
“numItemsOnList” : 272,
“numUpVotes” : 124,
“numVoters” : 1507,
“position” : 102
}

…CLIPPED….
]

What can you do with this API?  Consider this page about Tom Cruise from Google’s Knowledge Graph.  It tells you his children, his spouse(s), and his movies.  But our API will tell you that he is one of the hottest male celebrities, an annoying A-List actor, an action star, a short actor, and an 80s teen star.  His name comes up in discussions of great actors, but he tends to get more downvotes than upvotes on such lists, and even shows up on lists of “overrated” actors.

We can provide this information, not just about actors, but also about politicians, books, places, movies, tv shows, bands, athletes, colleges, brands, food, beer, and more.  We will tend to have more information about entertainment related categories, for now, but as the domains of our lists grow, so too will the breadth of opinion related information available from our API.

Our API is free and no registration is required, though we would request that you provide links and attributions to the Ranker lists that provide this data.  We likely will add some free registration at some point.  There are currently no formal rate limits, though there are obviously practical limits so please contact us if you plan to use the API heavily as we may need to make changes to accommodate such usage.  Please do let me know (ravi a t ranker) your experiences with our API and any suggestions for improvements as we are definitely looking to improve upon our beta offering.

– Ravi Iyer

by    in Rankings

Rankings are the Future of Mobile Search

Did you know that Ranker is one of the top 100 web destinations for mobile per Quantcast, ahead of household names like The Onion and People magazine?  We are ranked #520 in the non-mobile world.  Why do we do better with mobile users as opposed to people using a desktop computer?  I’ve made this argument for awhile, but I’m hardly an authority, so I was heartened to see Google making a similar argument.

This embrace of mobile computing impacts search behavior in a number of important ways.

First, it makes the process of refining search queries much more tiresome. …While refining queries is never a great user experience, on a mobile device (and particularly on a mobile phone) it is especially onerous.  This has provided the search engines with a compelling incentive to ensure that the right search results are delivered to users on the first go, freeing them of laborious refinements.

Second, the process of navigating to web pages (is) a royal pain on a hand-held mobile device.

This situation provides a compelling incentive for the search engines to circumvent additional web page visits altogether, and instead present answers to queries – especially straightforward informational queries – directly in the search results.  While many in the search marketing field have suggested that the search engines have increasingly introduced direct answers in the search results to rob publishers of clicks, there’s more than a trivial case to be made that this is in the best interest of mobile users.  Is it really a good thing to compel an iPhone user to browse to a web page – which may or may not be optimized for mobile – and wait for it to load in order to learn the height of the Eiffel Tower?

As a result, if you ask your mobile phone for the height of a famous building (Taipei 101 in the below case), it doesn’t direct you to a web page.  Instead it answers the question itself.

That’s great for a question that has a single answer, but an increasing number of searches are not for objective facts with a single answer, but rather for subjective opinions where a ranked list is the best result.  Consider the below chart showing the increase in searches for the term “best”.  A similar pattern can be found for most any adjective.

So if consumers are increasingly doing searches on mobile phones, requiring a concise list of potential answers to questions with more than one answer, they naturally are going to end up at sites which have ranked lists…like Ranker. As such, a lot of Ranker’s future growth is likely to parallel the growth of mobile and the growth of searches for opinion based questions.

– Ravi Iyer

Ranker Uses Big Data to Rank the World’s 25 Best Film Schools

NYU, USC, UCLA, Yale, Julliard, Columbia, and Harvard top the Rankings.

Does USC or NYU have a better film school?  “Big data” can provide an answer to this question by linking data about movies and the actors, directors, and producers who have worked on specific movies, to data about universities and the graduates of those universities.  As such, one can use semantic data from sources like Freebase, DBPedia, and IMDB to figure out which schools have produced the most working graduates.  However, what if you cared about the quality of the movies they worked on rather than just the quantity?  Educating a student who went on to work on The Godfather must certainly be worth more than producing a student who received a credit on Gigli.

Leveraging opinion data from Ranker’s Best Movies of All-Time list in addition to widely available semantic data, Ranker recently produced a ranked list of the world’s 25 best film schools, based on credits on movies within the top 500 movies of all-time.  USC produces the most film credits by graduates overall, but when film quality is taken into account, NYU (208 credits) actually produces more credits among the top 500 movies of all-time, compared to USC (186 credits).  UCLA, Yale, Julliard, Columbia, and Harvard take places 3 through 7 on the Ranker’s list.  Several professional schools that focus on the arts also place in the top 25 (e.g. London’s Royal Academy of Dramatic Art) as well as some well-located high schools (New York’s Fiorello H. Laguardia High School & Beverly Hills High School).

The World’s Top 25 Film Schools

  1. New York University (208 credits)
  2. University of Southern California (186 credits)
  3. University of California – Los Angeles (165 credits)
  4. Yale University (110 credits)
  5. Julliard School (106 credits)
  6. Columbia University (100 credits)
  7. Harvard University (90 credits)
  8. Royal Academy of Dramatic Art (86 credits)
  9. Fiorello H. Laguardia High School of Music & Art (64 credits)
  10. American Academy of Dramatic Arts (51 credits)
  11. London Academy of Music and Dramatic Art (51 credits)
  12. Stanford University (50 credits)
  13. HB Studio (49 credits)
  14. Northwestern University (47 credits)
  15. The Actors Studio (44 credits)
  16. Brown University (43 credits)
  17. University of Texas – Austin (40 credits)
  18. Central School of Speech and Drama (39 credits)
  19. Cornell University (39 credits)
  20. Guildhall School of Music and Drama (38 credits)
  21. University of California – Berkeley (38 credits)
  22. California Institute of the Arts (38 credits)
  23. University of Michigan (37 credits)
  24. Beverly Hills High School (36 credits)
  25. Boston University (35 credits)

“Clearly, there is a huge effect of geography, as prominent New York and Los Angeles based high schools appear to produce more graduates who work on quality films compared to many colleges and universities,“ says Ravi Iyer, Ranker’s Principal Data Scientist, a graduate of the University of Southern California.

Ranker is able to combine factual semantic data with an opinion layer because Ranker is powered by a Virtuoso triple store with over 700 million triples of information that are processed into an entertaining list format for users on Ranker’s consumer facing website, Ranker.com.  Each month, over 7 million unique users interact with this data – ranking, listing and voting on various objects – effectively adding a layer of opinion data on top of the factual data from Ranker’s triple store. The result is a continually growing opinion graph that connects factual and opinion data.  As of January 2013, Ranker’s opinion graph included over 30,000 nodes with over 5 million edges connecting these nodes.

– Ravi Iyer

by    in Data Science, Google Knowledge Graph

How Ranker leverages Google’s Knowledge Graph

Google recently held their I/O conference and one of the talks was given by Freebase’s Shawn Simister, who was once Freebase’s biggest fan, and has since gone on to work at Google, which acquired Freebase a few years ago.  What is Freebase?  It’s the structured semantic data that powers Google’s knowledge graph and Ranker, along with many other organizations featured in this talk (Ranker is mentioned around the 8:45 mark).  This talk gives organizations that may not be familiar with Freebase an overview of how they can leverage the Freebase’s semantic data.

How does Ranker use the knowledge graph?  Freebase’s semantic data powers much of what we do at Ranker and the below graph illustrates how we relate to the semantic web.

How Ranker Relates to the Semantic Web

We leverage the data from the semantic web, often via Freebase, to create content in list format (e.g. The Best Beatles Songs), which our users then vote on and re-rank.  This creates an opinion data layer that is easily exportable to any other entity (e.g. The New York Times or Netflix) that is connected to the larger semantic web.  Our hope is that just as people in the presentation are beginning to create mashups of factual data, eventually people will also want to merge in opinion data, and we hope to have the best semantic opinion dataset out there when that happens.  The more people that connect their data to the semantic web, the more lists we can create, and the more potential consumers exist for our opinion data.  As such, we’d encourage you to check out Shawn’s presentation and hopefully you’ll find Freebase as useful as we do.

– Ravi Iyer

 

by    in Data Science

Siri (and other mobile interfaces) will eventually need semantic opinion data

Search engines, which process text and give you a menu of potential matches, make sense when you use an interface with a keyboard, a mouse, and a relatively large screen. Consider the below search for information about Columbia.  Whether I mean Columbia University, Columbia Sportswear, or Columbia Records, I can relatively easily navigate to the official website of the place that I need.

Mobile devices require specificity as the cost of an incorrect result is magnified by the limits of the user interface.  When using something like Siri, it is important to be able to give a precise answer to a question, rather than a menu of potential answers, as it is far harder to choose using these interfaces.  As technology gets better, we will start to expect intelligent devices to be able to make the same inferences that we are able to make about what we mean when given limited information.  For example, if I say “how do I get to Columbia?” to my phone while in New York, it should direct me to Columbia University, whereas in Chicago, it should direct me to Columbia College of Chicago.  Leveraging contextual information is part of what makes Siri special, as it allows you to, for example, use pronouns.  Some have said that Siri has resurrected the semantic web, as, in order to make the above choice of “Columbia” intelligently, it needs to know that Columbia University is located in New York while Columbia College is located in Chicago.

I have made the case before that people are increasingly seeking opinion data, not just factual data, online.  It bears repeating that, as depicted in the below graph, searches for opinion words like “best” are increasing, relative to factual words like “car”, “computer”, and “software” which once were as prevalent as “best”, but now lag behind.

The implication of these two trends is clear.  As more knowledge discovery is done via mobile devices that need semantic data to deliver precise contextual answers, and more knowledge discovery is about opinions, then mobile interfaces such as Siri, or Google’s answer to Siri, will increasingly require semantic opinion data sets to power them.  Using such a dataset, you could ask your mobile device to “find a foreign movie” while travelling and it could cross-reference your preferences with those of others to find the best foreign movie that happens to be playing in your geographic area and conforms to your taste.  You could ask your mobile device to play some Jazz music, and it could consider what music you might like or not like, in addition to the genre classifications of available albums.  These are the kinds of intelligent operations that human beings do everyday, leveraging our knowledge both of the world’s facts and the world’s opinions and in order to do these tasks well, any intelligent agent attempting these tasks will require the same set of structured knowledge, in the form of a semantic opinions.  Not coincidentally, Ranker’s unique competency is the development of a comprehensive semantic opinion dataset.

– Ravi Iyer

by    in Data Science

The Long Tail of Opinion Data

If you want to find out what the best restaurant in your area is, what the best printer under $80 is, or what the best movie of 2010 was, there are many websites out there that can help you.  Sites like Yelp, Rotten Tomatoes, and Engadget have built sustainable businesses by providing opinions in these vertical domains.  Ranker also has a best movies of all time list and while I might argue that our list is better than Rotten Tomatoes list (is Man on Wire really the best movie ever?), there isn’t anything particularly novel about having a list of best movies.  At the point where Ranker is the go-to site for opinions about restaurants, electronics, and movies, it will be a very big business indeed.

We are actually competitive already for movies, but where Ranker has unique value is in the long tail of opinions.  There are lots of domains where opinions are valuable, but are rarely systematically polled.  As this Motley Fool writer points out, we are one of the few places with opinions about companies with the worst customer service, and the only one that updates in real time.  Memes are arguably some of the most valuable things to know about, yet there is little data oriented competition for our funniest memes lists.  As inherently social creatures, opinions about people are obviously of tremendous value, yet outside of Gallup polls about politicians, there is little systematic knowledge of people’s opinions about people in the news, outside of our votable opinions about people lists.

Not only are there countless domains where systematic opinions are not collected, but even in the domains that exist, opinions tend to be unidimensionally focused on “best”, with little differentiation for other adjectives.  What if you want to identify the funniest, most annoying, dumbest, worst, or hottest item in a domain?  “Best” searches far outnumber “worst” searches on Google (about 50 to 1 according to Google trends), but if you combine all the adjectives (e.g. funniest, dumbest) and combine them with all the qualifers (e.g. of 2011, that remind you of college, that you love to hate), there is a long tail of opinions even in the most popular domains that is unserved.  Where else is data systematically collected on British Comedians?

When you combine the opportunities available in the long tail of domains plus the long tail of adjectives and qualifiers, you get a truly large set of opinions that make up the long tail of opinions on the internet.  There are myriad companies trying to mine Twitter for this data, which somewhat validates my intuition that there is opportunity here, but clever algorithms will never make up for the imperfections of mining 140 character text.  Many companies will try and compete by squeezing the last bit of signal from imperfect data, but my experience in academia and in technology has taught me that there is no substitute for collecting better data. If my previous assertion that the knowledge graph is more than just facts is true, then there will be great demand for this long tail of opinions, just as there is great demand for the long tail of niche searches.  And Ranker is one of the few companies empirically sampling this long tail.

– Ravi Iyer

by    in Data Science, Market Research

Better Data, Not Bigger Data – Thoughts from the Data 2.0 Conference

As part of our effort to promote Ranker’s unique dataset, I recently attended the Data 2.0. conference in San Francisco.  “Data 2.0.” is a relatively vague term, and as Ranker’s resident Data Scientist, I have a particular perspective on what constitutes the future of data.  My PhD is in psychology, not computer science, and so for me, data has always been a means, rather than an end. One thing that became readily apparent at the first few talks I saw, was that a lot of the emphasis of the conference was on dealing with bigger data sets, but without much consideration of what one could do with this data.  It goes without saying that larger sample sizes allow for more statistical power than smaller sample sizes, but as the person who has collected some of the larger samples of psychological data (via YourMorals.org and BeyondThePurchase.org), I have often found that what holds me back from predictive power with my data is not the volume of data, but rather the diversity of variables in my dataset.  What I often need is not bigger data, it’s better data.

The same premise has informed much of our data decision making at Ranker, where we emphasize the quality of our semantic, linked data, as opposed to the quantity.  Again, both quality and quantity are important, but my thought going through the conference was that there was an over-emphasis on quantity.  I didn’t find anyone talking about semantic data, which is one of the primary “Data 2.0.” concepts that relates more to quality than quantity.

I tested this idea out with a few people at the conference, framed as “better data beats better algorithms” and generally got positive feedback about the phrase.  I was heartened when the moderator of a panel entitled “Data Science and Predicting the Future”, which included Alex Gray, Anthony Goldbloom, and Josh Wills, specifically proposed the question as to what was more important, data, people, or algorithms.  It wasn’t quite the question I had in mind, but it served as a great jumping off point for a great discussion.  Josh Wills, who worked as a data scientist at Google previously actually said the following, which I’m paraphrasing, as I didn’t take exact notes:

“Google and Facebook both have really smart people.  They use essentially the same algorithms.  The reason why Google can target ads better than Facebook is purely a function of better data.  There is more intent in the data related to the Google user, who is actively searching for something, and so there is more predictive power.  If I had a choice between asking my team to work on better algorithms or joining the data we have with other data, I’d want my team joining my data with other data, as that is what will lead to the most value.”

 

Again, that is paraphrased.  Some of the panelists disagreed a bit.  Alex Gray works on algorithms and so emphasized the importance of algorithms.  To be fair, I work with relatively precise data, so I have the same bias in emphasizing the importance of quality data.  Daniel Tunkelang, Principal Data Scientist of LinkedIn, supported Josh, in saying that better data was indeed more important than bigger data, a point his colleague, Monica Rogati, had made recently at a conference.  I was excited to hear that others had been having similar thoughts about the need for better, not bigger, data.

I ended up asking a question myself about the Netflix challenge, where the algorithms and collective intelligence addressing the problem (reducing error of prediction) were maximized, but the goal was a relatively modest 10% gain, which was won by a truly complex algorithm that Netflix itself found too costly to use, relative to the gains.  Surely better data (e.g. user opinions about different genres or user opinions about more dimensions of each movie) would have led to much greater than a 10% gain.  There seemed to be general agreement, though Anthony Goldbloom rightly pointed out that you need the right people to help figure out how to get better data.

In the end, we all have our perspectives, based perhaps on what we work on, but I do think that the “better data” perspective is often lost in the rush toward larger datasets with more complex algorithms.  For more on this perspective, here and here are two blog posts I found interesting on the subject.  Daniel Tunkelang blogged about the same panel here.

– Ravi Iyer

by    in About Ranker

Introduction to Data @ Ranker.com

Ranker is continuing to grow, both in terms of the traffic that comes to our website and in terms of our coverage of the world of objects to be ranked.  As we grow, we collect more and more data and are only beginning to tap the possibilities of the data we collect.  If you’re interested in our data, this video will hopefully give you a quick introduction to data at Ranker.com.

by    in Data Science

The Moral Psychology and Big Data Singularity – SXSW 2012

Below is a narrated powerpoint from a presentation I gave at South by Southwest Interactive on March 11, 2012.  The point of this presentation was to explore the intersection of technology and psychology, and hopefully to convince technologists to try to use our data to examine intangible things like values.  While the talk focuses more on psychology, many of the ideas were inspired by the semantic datasets we work with at Ranker.  Working with semantic datasets puts one in the mindset of considering synergy among different fields with different kinds of data.