by    in prediction

Predicting the Movie Box Office

The North American market for films totaled about US$11,000 million in 2013, with over 1300 million admissions. The film industry is a big business that not even Ishtar, nor Jaws: The Revenge, nor even the 1989 Australian film “Houseboat Horror” manages to derail. (Check out Houseboat Horror next time you’re low on self-esteem, and need to be reminded there are many people in the world much less talented than you.)

Given the importance of the film industry, we were interested in using Ranker data to make predictions about box office grosses for different movies. The ranker list dealing with the Most Anticipated 2013 Films gave us some opinions — both in the form of re-ranked lists, and up and down votes — on which to base predictions. We used the same cognitive modeling approach previously applied to make Football (Soccer) World Cup predictions, trying to combine the wisdom of the ranker crowd.

Our basic results are shown in the figure below. The movies people had ranked are listed from the heavily anticipated Iron Man 3, Star Trek: Into Darkness, and Thor: The Dark World down to less anticipated films like Simon Killing, The Conjuring, and Alan Partridge: Alpha Papa. The voting information is shown in the middle panel, with the light bar showing the number of up-votes and the dark bar showing the number of down-votes for each movie. The ranking information is shown in the right panel, with the size of the circles showing how often each movie was placed in each ranking position by a user.

This analysis gives us an overall crowd rank order of the movies, but that is still a step away from making direct predictions about the number of dollars a movie will gross. To bridge this gap, we consulted historical data. The Box Office Mojo site provides movie gross totals for the top 100 movies each year for about the last 20 years. There is a fairly clear relationship between the ranking of a movie in a year, and the money it grosses. As the figure below shows, a few highest grossing movies return a lot more than the rest, following a “U-shaped” pattern that is often found in real-world statistics. If a movie is the 5th top grossing in a given year, for example, it grosses between about 100 and 300 million dollars. if it is the 50th highest grossing, it makes between about 10 and 80 million.

We used this historical relationship between ranking and dollars to map our predictions about ranking to predictions about dollars. The resulting predictions about the 2013 movies are shown below. These predictions are naturally uncertain, and so cover a range of possible values, for two reasons. We do not know exactly where the crowd believed they would finish in the ranking list, and we only know a range of possible historical grossed dollars for each rank. Our predictions acknowledge both of those sources of uncertainty, and the blue bars in the figure below show the region in which we predicted it was 95% likely to final outcome would lie. To assess our predictions, we looked up the answers (again at Box Office Mojo), and overlayed them as red crosses.

Many of our predictions are good, for both high grossing (Iron Man 3, Star Trek) and more modest grossing (Percy Jackson, Hansel and Gretel) movies. Forecasting social behavior, though, is very difficult, and we missed a few high grossing movies (Gravity) and over-estimated some relative flops (47 Ronin, Kick Ass 2). One interesting finding came from contrasting an analysis based on ranking and voting data with similar analyses based on just ranking or just voting. Combining both sorts of data led to more accurate predictions than using either alone.

We’re repeating this analysis for 2014, waiting for user re-ranks and votes for the Most Anticipated Films of 2014. The X-men and Hunger Games franchises are currently favored, but we’d love to incorporate your opinion. Just don’t up-vote Houseboat Horror.

by    in Data, Game of Thrones

Game of Thrones: Don’t Get Too Comfortable

[Spoiler Alert: This post contains references to Seasons 1-4, Episode 3 of the show. There are no references to the books. If you’re all caught up on the show, then you are safe!]

fans react

Have you recently found yourself unreasonably happy about a certain child’s death? Excited, even, to watch the bile frothing out of his mouth and the blood streaming from the far corners of his eyes? Have you rationalized incest-rape and chalked it up to the pressures of the times? Rejoiced as a small girl murders a man in cold blood? (Something wrong with your leg, boy?)

Don’t get too comfortable.

Now that you’re fully immersed in the world of the Seven Kingdoms, your capacity for moral relativism may surprise you. You may feel like nothing on the show could totally shock or upset you anymore. Now that you’re sort of OK with incest-rape, should you just hang up your hat and quit? Can’t anything feel uncomfortable or shocking anymore?!

Don’t worry: If we’ve learned anything about this series so far, there will be plenty of horrific incidents to come. And according to our data, there is pretty much something on the show to upset every sensibility.

We were looking at our list of The Most Uncomfortable Game of Thrones Moments again (weird, we know — we like to keep the wounds fresh) and noticed an interesting pattern in our data that gives us an insight on what makes certain viewers uncomfortable. So far, over 1,000 people have voted on this list an average of 5 times. There are 18 uncomfortable moments to choose from, and they are ordered from jaw-dropping to ain’t-no-thing. (Vote if you haven’t already. It’s fun!)

As more and more people vote, some interesting correlations have emerged.

TheonFor example, Ranker users who said that they were very uncomfortable when Theon Greyjoy lost his “most prized possession” were far more likely to also feel uncomfortable when Jaime Lannister’s right hand was cut off. Jaime
A particular distaste for bodily harm, it would seem.

[In plain English: The majority of people who hated watching that first scene also hated watching the second. Most people who didn’t mind the first also didn’t mind the second.]

But there’s more. Two main “camps” of voters emerged in our data. We’ll call them “Camp Emotional” and “Camp Physical.

People who voted for one thing that could be considered emotionally distressing — witnessing an incest scene between brother and sister, for example — were highly likely to also vote for other moments that can be associated with emotional distress: Lysa Tully’s disturbing breastfeeding scene and Viserys Targaryan’s willingness to whore out his own sister in exchange for power both come to mind.

Similarly, people who voted on one item in “Camp Physical” were more likely to vote on other physically revolting scenes. Viserys Targaryen getting “crowned,” Khaleesi eating a horse heart, and the execution of Eddard Stark were all positively correlated.

Disgusting GoT Tastes Graph smaller 500

The “Game of Thrones” show creators certainly have their bases covered as far as upsetting every sensibility.

Don’t mind a six year-old suckling on the teat of his mother? Maybe your favorite character will be brutally executed. Don’t think the gory stuff is that big of a deal? Maybe a character you thought you trusted will double-cross his sister, have sex with his mother, and steal the crown for himself. This is all just speculation, of course, but we’re just saying: no one is safe. Not even you. 

Can Colbert bring young Breaking Bad Fans to The Late Show?

I have to admit that I thought it was a joke at first when I heard the news that Stephen Colbert is leaving The Colbert Report and is going to host the Late Show, currently hosted by David Letterman.  The fact that he won’t be “in character” in the new show makes it more intriguing, even as it brings tremendous change to my entertainment universe.  However, while it will take some getting used to, looking at Ranker data on the two shows reveals how the change really does make sense for CBS.

Despite the ire of those who disagree with The Colbert Report’s politics, CBS is definitely addressing a need to compete better for younger viewers, who are less likely to watch TV on the major networks.  Ranker users tend to be in the 18-35 year old age bracket and The Colbert Report ranks higher than the Late Show on most every list that they both are on including the Funniest TV shows of 2012 (19 vs. 28), Best TV Shows of All-Time (186 vs. 197), and Best TV Shows of Recent Memory (37 vs. 166).  Further, people who tend to like The Colbert Report also seem to like many of the most popular shows around like Breaking Bad, Mad Men, Game of Thrones, and 30 Rock.  In contrast, correlates of the Late Show include older shows like The Sopranos and 60 Minutes.  There is some overlap as fans of both shows like The West Wing and The Daily Show, indicating that Colbert may be able to appeal to current fans as well as new audiences.

Colbert Can Expand Late Show's Audience to New Groups, yet Retain Many Current Fans.

I’ll be sad to see “Stephen Colbert” the character go.  But it looks like my loss is CBS’ gain.

– Ravi Iyer

by    in Data Science, prediction, Rankings

World Cup 2014 Predictions

An octopus called Paul was one of the media stars of the 2010 soccer world cup. Paul correctly predicted 11 out of 13 matches, including the final in which Spain defeated the Netherlands. The 2014 world cup is in Brazil and, in an attempt to avoid eating mussels painted with national flags, we made predictions by analyzing data from Ranker’s “Who Will Win The 2014 World Cup?” list.

Ranker lists provide two sources of information, and we used both to make our predictions. One source is the original ranking, and the re-ranks provided by other users. For the world cup list, some users were very thorough, ranking all (or nearly all) of the 32 teams who qualified for the world cup. Other users were more selective, listing just the teams they thought would finish in the top places. An interesting question for data analysis is how much weight should be given to different rankings, depending on how complete they are.

The second source of information on Ranker are the thumbs-up and thumbs-down votes other users make in response to the master list of rankings. Often ranker lists have many more votes than they have re-ranks, and so the voting data potentially are very valuable. So, another interesting question for data analysis is how the voting information should be combined with the ranking information.

A special feature of making world cup predictions is that there is very useful information provided by the structure of the competition itself. The 32 teams have been drawn in 8 brackets with 4 teams each. Within a bracket, every team plays every other team once in initial group play. The top two teams from each bracket then advance to a series of elimination games. This system places strong constraints on possible outcomes, which a good prediction should follow. For example, Although Group B contains Spain, the Netherlands, and Chile — all strong teams, currently ranked in the top 16 in the world according to FIFA rankings — only two can progress from group play and finish in the top 16 for the world cup.

We developed a model that accounts for all three of these sources of information. It uses the ranking and re-ranking data, the voting data, and the constraints coming from the brackets, to make an overall prediction. The results of this analysis are shown in the figure. The left panel shows the thumbs-up (to the right, lighter) and thumbs-down (to the left, darker) votes for each team. The middle panel summarizes the ranking data, with the area of the circles corresponding to how often each team was ranked in each position. The right hand panel shows the inferred “strength” of each team on which we based our predicted order.

Our overall prediction has host-nation Brazil winning. But the distribution of strengths shown in the model inferences panel suggests it is possible Germany, Argentina, or Spain could win. There is little to separate the remainder of the top 16, with any country from the Netherlands to Algeria capable of doing well in the finals. The impact of the drawn brackets on our predictions is clear, with a raft of strong countries — the England, USA, Uruguay, and Chile — predicted to miss the finals, because they have been drawn in difficult brackets.

– Michael Lee

Lists are the Best way to get Opinion Graph Data: Comparing Ranker to State & Squerb

I was recently forwarded an article about Squerb, which shares an opinion we have long agreed with.  Specifically…

““Most sites rely on simple heuristics like thumbs-up, ‘like’ or 1-5 stars,” stated Squerb founder and CEO Chris Biscoe. He added that while those tools offer a quick overview of opinion, they don’t offer much in the way of meaningful data.

It reminds me a bit of State, another company building an opinion graph that connects more specific opinions to specific objects in the world.  They too are built upon the idea that existing sources of big data opinions, e.g. mining tweets and facebook likes, have inherent limitations.  From this Wired UK article:

Doesn’t Twitter already provide a pretty good ‘opinion network’? Alex thinks not. “The opinions out there in the world today represent a very thin slice. Most people are not motivated to express their opinion and the opinions out there for the most part are very chaotic and siloed. 98 percent of people never get heard,” he told

I think more and more people who try to parse Facebook and Twitter data for deeper Netflix AltGenre-like opinions will realize the limitations of such data, and attempt to collect better opinion data.  In the end, I think collecting better opinion data will inevitably involve the list format that Ranker specializes in.  Lists have a few important advantages over the methods that Squerb and State are using, which include slick interfaces for tagging semantic objects with adjectives.  The advantages of lists include:

  • Lists are popular and easily digestible.  There is a reason why every article on Cracked is a list.  Lists appeal to the masses, which is precisely the audience that Alex Asseily is trying to reach on State.  To collect mass opinions, one needs a site that appeals to the masses, which is why Ranker has focused on growth as a consumer destination site, that currently collects millions of opinions.
  • Lists provide the context of other items.  It’s one thing to think that Army of Darkness is a good movie.  But how does it compare to other Zombie Movies?  Without context, it’s hard to compare people’s opinions as we all have different thresholds for different adjectives.  The presence of other items lets people consider alternatives they may not have considered in a vacuum and allows better interpretation of non-response.
  • Lists provide limits to what is being considered.  For example, consider the question of whether Tom Cruise is a good actor?  Is he one of the Best Actors of All-time?  one of the Best Action Stars?  One of the Best Actors Working Today?  Ranker data shows that people’s answers usually depend on the context (e.g. Tom Cruise gets a lot of downvotes as one of the best actors of all-time, but is indeed considered one of the best action stars.)
  • Lists are useful, especially in a mobile friendly world.

In short, collecting opinions using lists produces both more data and better data.  I welcome companies that seek to collect semantic opinion data as the opportunity is large and there are network effects such that each of our datasets is more valuable when other datasets with different biases are available for mashups.  As others realize the importance of opinion graphs, we likely will see more companies in this space and my guess is that many of these companies will evolve along the path that Ranker has taken, toward the list format.

– Ravi Iyer

by    in About Ranker, Opinion Graph, Pop Culture, Rankings

Ranker’s Rankings API Now in Beta

Increasingly, people are looking for specific answers to questions as opposed to webpages that happen to match the text they type into a search engine.  For example, if you search for the capital of France or the birthdate of Leonardo Da Vinci, you get a specific answer.  However, the questions that people ask are increasingly about opinions, not facts, as people are understandably more interested in what the best movie of 2013 was, as opposed to who the producer for Star Trek: Into Darkness was.

Enter Ranker’s Rankings API, which is currently now in beta, as we’d love the input of potential users’ of our API to help improve it.  Our API returns aggregated opinions about specific movies, people, tv shows, places, etc.  As an input, we can take a Wikipedia, Freebase, or Ranker ID.  For example, below is a request for information about Tom Cruise, using his Ranker ID from his Ranker page (contact us if you want to use other IDs to access).

In the response to this request, you’ll get a set of Rankings for the requested object, including a set of list names (e.g. “listName”:”The Greatest 80s Teen Stars”), list urls (e.g. “listUrl”:”” – note that the domain,, is implied), item names (e.g. “itemName”:”Tom Cruise”) position of the item on this list (e.g. “position”:21), number of items on the list (e.g. “numItemsOnList”:70), the number of people who have voted on this list (e.g. “numVoters”:1149), the number of positive votes for this item (e.g. “numUpVotes”:245) vs. the number of negative votes (e.g. “numDownVotes”:169), and the Ranker list id (e.g. “listId”:584305).  Note that results are cached so they may not match the current page exactly.

Here is a snipped of the response for Tom Cruise.

[ { “itemName” : “Tom Cruise”,
“listId” : 346881,
“listName” : “The Greatest Film Actors & Actresses of All Time”,
“listUrl” : “”,
“numDownVotes” : 306,
“numItemsOnList” : 524,
“numUpVotes” : 285,
“numVoters” : 5305,
“position” : 85
{ “itemName” : “Tom Cruise”,
“listId” : 542455,
“listName” : “The Hottest Male Celebrities”,
“listUrl” : “”,
“numDownVotes” : 175,
“numItemsOnList” : 171,
“numUpVotes” : 86,
“numVoters” : 1937,
“position” : 63
{ “itemName” : “Tom Cruise”,
“listId” : 679173,
“listName” : “The Best Actors in Film History”,
“listUrl” : “”,
“numDownVotes” : 151,
“numItemsOnList” : 272,
“numUpVotes” : 124,
“numVoters” : 1507,
“position” : 102


What can you do with this API?  Consider this page about Tom Cruise from Google’s Knowledge Graph.  It tells you his children, his spouse(s), and his movies.  But our API will tell you that he is one of the hottest male celebrities, an annoying A-List actor, an action star, a short actor, and an 80s teen star.  His name comes up in discussions of great actors, but he tends to get more downvotes than upvotes on such lists, and even shows up on lists of “overrated” actors.

We can provide this information, not just about actors, but also about politicians, books, places, movies, tv shows, bands, athletes, colleges, brands, food, beer, and more.  We will tend to have more information about entertainment related categories, for now, but as the domains of our lists grow, so too will the breadth of opinion related information available from our API.

Our API is free and no registration is required, though we would request that you provide links and attributions to the Ranker lists that provide this data.  We likely will add some free registration at some point.  There are currently no formal rate limits, though there are obviously practical limits so please contact us if you plan to use the API heavily as we may need to make changes to accommodate such usage.  Please do let me know (ravi a t ranker) your experiences with our API and any suggestions for improvements as we are definitely looking to improve upon our beta offering.

– Ravi Iyer

Ranker Opinion Graph: the Best Froyo Toppings

Its hard to resist a cold treat on a hot summer afternoon, and frozen yogurt shops with their array of flavors and toppings have a little of something for everyone. Once you’re done agonizing over whether you want new york cheesecake or wild berry froyo (and trying a sample of each at least twice), its time for the topping bar. But which topping should you choose? We asked people to vote for their favorite frozen yogurt toppings on Ranker from a list of 32 toppings, and they responded with over 7,500 votes.

The Top 5 Frozen Yogurt Toppings (by number of upvotes):
1. Oreo (235 votes)
2. Strawberries(225 votes)
3. Brownie bits (223 votes)
4. Hot fudge (216 votes)
5. Whipped cream (201 votes)

But let’s be honest, who can just choose just ONE topping for their froyo? Using Gephi and data from Ranker’s Opinion Graph, we ran a cluster analysis on people’s favorite froyo topping votes to determine which toppings people like to eat together (click on graph to enlarge). In the graph, larger circles mean more likes with other toppings. Most of the versatile toppings were either a syrup (like strawberry sauce) or chocolate candy (like Reese’s Pieces).froyo

The 10 Most Versatile Froyo Toppings:

1. Strawberry sauce
2. Snickers
3. Magic Shell
4. White Chocolate chips
5. Peanut butter chips
6. Butterscotch syrup
7. Candies Nestle Butterfinger Bar
8. Reese’s Pieces
9. M&Ms
10. Brownie bits


Using the modularity clustering tool in Gephi, we were then able to sort toppings into groups based on which toppings people were most likely to upvote together. We identified 4 kinds of froyo topping lovers:

fruitnut1. Fruit and Nuts (Blue): This cluster is all about the fruits and nuts. These people love Strawberry sauce, sliced almonds, and Marschino cherries.

chocolate2. Chocolate (purple): This cluster encompases all things chocolate. These people love Magic Shell, Brownie bits, and chocolate syrup.


sugar3. Sugar candy (green): This cluster is made up of pure sugar. These people love gummy worms, Rainbow sprinkles, and Skittles.



salty4. Salty and Cake (Red): This cluster encompasses cake bites and toppings that have a salty taste to them. These people like Snickers, Cheesecake bits, and Caramel Syrup.


Some additional thoughts:

  • Banana was a strange topping that was only linked with Snickers.
  •  People who like nuts like both fruit and items from the salty category.
  •  People who like blueberries only like other fruits.
  • People who like sugar items like gummy worms also like chocolate, but don’t particularly like fruit.


– Kate Johnson

How Netflix’s AltGenre Movie Grammar Illustrates the Future of Search Personalization

I recently got sent this Atlantic article on how Netflix reverse engineered Hollywood by a few contacts, and it happens to mirror my long term vision for how Ranker’s data fits into the future of search personalization.  Netflix’s goal, to put “the right title in front of the right person at the right time,” is very similar to what Apple, Bing, Google, and Facebook are attempting to do with regards to personalized contextual search.  Rather than you having to type in “best kitchen gadgets for mothers”, applications like Google Now and Cue (bought by Apple) hope to eventually be able to surface this information to you in real time, knowing not only when your mother’s birthday is, but also that you tend to buy kitchen gadgets for her, and knowing what the best rated kitchen gadgets that aren’t too complex and are in your price range happen to be.  If the application was good enough, a lot of us would trust it to simply charge our credit card and send the right gift.  But obviously we are a long way from that reality.

Netflix’s altgenre movie grammar (e.g. Irreverent Werewolf Movies Of The 1960s) gives us a glimpse of the level of specificity that would be required to get us there.  Consider what you need to know to buy the right gift for your mom.  You aren’t just looking for a kitchen gadget, but one with specific attributes.  In altgenre terminology, you might be looking for “best simple, beautifully designed kitchen gadgets of 2014 that cost between $25 and $100” or “best kitchen gadgets for vegetarian technophobes”.  Google knows that simple text matching is not going to get it the level of precision necessary to provide such answers, which is why semantic search, where the precise meaning of pages is mapped, has become a strategic priority.

However, the universe of altgenre equivalents in the non-movie world is nearly endless (e.g. Netflix has thousands of ways just to classify movies), which is where Ranker comes in, as one of the world’s largest sources for collecting explicit cross-domain altgenre-like opinions.  Semantic data from sources like wikipedia, dbpedia, and freebase can help you put together factual altgenres like “of the 60s” or “that starred Brad Pitt“, but you need opinion ratings to put together subtler data like “guilty pleasures” or “toughest movie badasses“.  Netflix’s success is proof of the power of this level of specificity in personalizing movies and consider how they produced this knowledge.  Not through running machine learning algorithms on their endless stream of user behavior data, but rather by soliciting explicit ratings along these dimensions by paying “people to watch films and tag them with all kinds of metadata” using a “36-page training document that teaches them how to rate movies on their suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.”  Some people may think that with enough data, TripAdvisor should be able to tell you which cities are “cool”, but big data is not always better data.  Most data scientists will tell you the importance of defining the features in any recommendation task (see this article for technical detail on this), rather than assuming that a large amount of data will reveal all of the right dimensions.  The wrong level of abstraction can make prediction akin to trying to predict who will win the superbowl by knowing the precise position and status of every cell in every player on every NFL team.  Netflix’s system allows them to make predictions at the right level of abstraction.

The future of search needs a Netflix grammar that goes beyond movies.  It needs to able to understand not only which movies are dark versus gritty, but also which cities are better babymoon destinations versus party cities and which rock singers are great vocalists versus great frontmen.  Ranker lists actually have a similar grammar to Netflix movies, except that we apply this grammar beyond the movie domain.  In a subsequent post, I’ll go into more detail about this, but suffice it to say for now that I’m hopeful that our data will eventually play a similar role in the personalization of non-movie content that Netflix’s microtagging plays in film recommendations.

– Ravi Iyer


Why Topsy/Twitter Data may never predict what matters to the rest of us

Recently Apple paid a reported $200 million for Topsy and some speculate that the reason for this purchase is to improve recommendations for products consumed using Apple devices, leveraging the data that Topsy has from Twitter.  This makes perfect sense to me, but the utility of Twitter data in predicting what people want is easy to overstate, largely because people often confuse bigger data with better data.  There are at least 2 reasons why there is a fairly hard ceiling on how much Twitter data will ever allow one to predict about what regular people want.

1.  Sampling – Twitter has a ton of data, with daily usage of around 10%.  Sample size isn’t the issue here as there is plenty of data, but rather the people who use Twitter are a very specific set of people.  Even if you correct for demographics, the psychographic of people who want to share their opinion publicly and regularly (far more people have heard of Twitter than actually use it) is way too unique to generalize to the average person, in the same way that surveys of landline users cannot be used to predict what psychographically distinct cellphone users think.

2. Domain Comprehensiveness – The opinions that people share on Twitter are biased by the medium, such that they do not represent the spectrum of things many people care about.  There are tons of opinions on entertainment, pop culture, and links that people want to promote, since they are easy to share quickly, but very little information on people’s important life goals or the qualities we admire most in a person or anything where people’s opinions are likely to be more nuanced.  Even where we have opinions in those domains, they are likely to be skewed by the 140 character limit.

Twitter (and by extension, companies that use their data like Topsy and DataSift) has a treasure trove of information, but people working on next generation recommendations and semantic search should realize that it is a small part of the overall puzzle given the above limitations.  The volume of information gives you a very precise measure of a very specific group of people’s opinions about very specific things, leaving out the vast majority of people’s opinions about the vast majority of things.  When you add in the bias introduced by analyzing 140 character natural language, there is a great deal of variance in recommendations that likely will have to be provided by other sources.

At Ranker, we have similar sampling issues, in that we collect much of our data at, but we are actively broadening our reach through our widget program, that now collects data on thousands of partner sites.  Our ranked list methodology certainly has bias too, which we attempt to mitigate that through combining voting and ranking data.  The key is not in the volume of data, but rather in the diversity of data, which helps mitigate the bias inherent in any particular sampling/data collection method.

Similarly, people using Twitter data would do well to consider issues of data diversity and not be blinded by large numbers of users and data points.  Certainly Twitter is bound to be a part of understanding consumer opinions, but the size of the dataset alone will not guarantee that it will be a central part.  Given these issues, either Twitter will start to diversify the ways that it collects consumer sentiment data or the best semantic search algorithms will eventually use Twitter data as but one narrowly targeted input of many.

– Ravi Iyer

by    in interest graph, Market Research, Pop Culture

Hierarchical Clustering of a Ranker list of Beers

This is a guest post by Markus Pudenz.

Ranker is currently exploring ways to visualize the millions of votes collected on various topics each month.  I’ve recently begun using hierarchical cluster analysis to produce taxonomies (also known as dendograms), and applied these techniques to Ranker’s Best Beers from Around the World. A dendrogram allows one to visualize the relationships on voting patterns (scroll down to see what a dendrogram looks like). What hierarchical clustering does is break down the list into related groups based on voting patterns of the users, grouping like items with items that were voted similarly by the same users. The algorithm is agglomerative, meaning it is starts with individual items and combines them iteratively until one large cluster (all of the beers in the list)  remains.

Every beer in our dendrogram is related to another at some level, whether in the original cluster or further down the dendrogram. See the height axis on the left side? The lower the cluster is on the axis, the closer the relationship the beers will have. For example, the cluster containing Guinness and Guinness Original is the lowest in this dendrogram indicating these to beers have the closest relationship based on the voting patterns. Regarding our list, voters have the option to Vote Up or Vote Down any beer they want. Let’s start at the top of the dendrogram and work our way down.

Hierarchical Clustering of Beer Preferences

Looking at the first split of the clusters, one can observe the cluster on the right contains beers that would generally be considered well-known including Guinness, Sam Adams, Heineken and Corona. In fact, the cluster on the right includes seven of the top ten beers from the list. The fact that most of our popular beers are in this right cluster indicates that there is a strong order effect with voters more likely to select beers that are more popular when ranking their favorite beers. For example, if someone selects a beer that is in the top ten, then another beer they select is also more likely to be in the top ten. As we examine the right cluster further, the first split divides the cluster into two smaller clusters. In the left cluster, we can clearly see, unsurprisingly, that a drinker who likes Guinness is more likely to vote for another variety of Guinness. This left cluster is comprised almost entirely of Guinness varieties with the exception of Murphy’s Irish Stout. The right cluster lists a larger variety of beer makers including Sam Adams, Stella Artois and Pyramid. In addition, none of the beers in this right cluster are stouts as with the left cluster. The only brewer in this right cluster with multiple varieties is Sam Adams with Boston Lager and Octoberfest meaning drinkers in this cluster were not as brand loyal as in the left cluster. Drinkers in this cluster were more likely to select a beer variety from a different brewer. When reviewing this cluster from the first split in the dendrogram, there is clearly a defined split between those drinkers who prefer a heavier beer (stout) as opposed to those who prefer lighter beers like lagers, pilseners, pale ales or hefeweizen.

Conversely, for beers in the left cluster, drinkers are more likely to vote for other beers that are not as popular with only three of the top ten beers in this cluster. In addition, because of the larger size, the range of beer styles and brewers for this cluster is more varied as opposed to those in the right cluster. The left cluster splits into three smaller clusters before splitting further. One cluster that is clearly distinct is the second of these clusters. This cluster is comprised almost entirely of Belgian style beers with the only exception being Pliny the Elder, an IPA. La Fin du Monde is a Belgian style tripel from Quebec with the remaining brewers from Belgium. One split within this cluster is comprised entirely of beer varieties from Chimay indicating a strong relationship; voters who select Chimay are more likely to also select a different style from Chimay when ranking their favorites.  Our remaining clusters have a little more variety. Our first cluster, the smallest of the three, has a strong representation from California with varieties from Stone, Sierra Nevada and Anchor Steam taking four out of six nodes in the cluster. Stone IPA and Stone Arrogant Bastard Ale have the strongest relationship in this cluster. Our third cluster, the largest of the three, has even more variety than the first. We see a strong relationship especially with Hoegaarden and Leffe.

I was also curious as to whether the beers in the top ten were associated with larger or smaller breweries. As the following list shows,  there is an even split between the larger conglomerates like AB InBev, Diageo, Miller Coors and independent breweries like New Belgium and Sierra Nevada.

  1. Guinness (Diageo)
  2. Newcastle (Heineken)
  3. Sam Adams Boston Lager (Boston Beer Company)
  4. Stella Artois (AB InBev)
  5. Fat Tire (New Belgium Brewing Company)
  6. Sierra Nevada Pale Ale (Sierra Nevada Brewing Company)
  7. Blue Moon (Miller Coors)
  8. Stone IPA (Stone Brewing Company)
  9. Guinness Original (Diageo)
  10. Hoegaarden Witbier (AB InBev)

Markus Pudenz