by    in Data, Data Science, Popular Lists

Applying Machine Learning to the Diversity within our Worst Presidents List

Ranker visitors come from a diverse array of backgrounds, perspectives and opinions.  The diversity of the visitors, however, is often lost when we look at the overall rankings of the lists, due to the fact that the rankings reflect a raw average of all the votes on a given item–regardless of how voters behave on multiple other items.  It would be useful then, to figure out more about how users are voting across a range of items, and to recreate some of the diversity inherent in how people vote on the lists.

Take for instance, one of our most popular lists: Ranking the Worst U.S. Presidents, which has been voted on by over 60,000 people, and is comprised of over a half a million votes.

In this partisan age, it is easy to imagine that such a list would create some discord. So when we look at the average voting behavior of all the voters, the list itself has some inconsistencies.  For instance, the five worst-rated presidents alternate along party lines–which is unlikely to represent a historically accurate account of which presidents are actually the worst.  The result is a list that represents our partisan opinions about our nation’s presidents:

 

ListScreenShot

 

The list itself provides an interesting glimpse of what happens when two parties collide in voting for the worst presidents, but we are missing interesting data that can inform us about how diverse our visitors are.  So how can we reconstruct the diverse groups of voters on the list such that we can see how clusters of voters might be ranking the list?

To solve this, we turn to a common machine learning technique referred to as “k-means clustering.” K-means clustering takes the voting data for each user, summarizes it into a result, and then finds other users with similar voting patterns.  The k-means algorithm is not given any information whatsoever from me as the data scientist, and has no real idea what the data mean at all.  It is just looking at each Ranker visitor’s votes and looking for people who vote similarly, then clustering the patterns according to the data itself.  K-means can be done to parse as many clusters of data as you like, and there are ways to determine how many clusters should be used.  Once the clusters are drawn, I re-rank the presidents for each cluster using Ranker’s algorithm, and the we can see how different clusters ranked the presidents.

As it happens, there are some differences in how clusters of Ranker visitors voted on the list.  In a two-cluster analysis, we find two groups of people with almost completely opposite voting behavior.

(*Note that since this is a list of voting on the worst president, the rankings are not asking voters to rank the presidents from best to worst, it is more a ranking of how much worse each president is compared to the others)

The k-means analysis found one cluster that appears to think Republican presidents are worst:

ClusterOneB

Here is the other cluster, with opposite voting behavior:

ClusterTwoB

In this two-cluster analysis, the shape of the data is pretty clear, and fits our preconceived picture of how partisan politics might be voting on the list.  But there is a bias toward recent presidents, and the lists do not mimic academic lists and polls ranking the worst presidents.

To explore the data further, I used a five cluster analysis–in other words, looking for five different types of voters in the data.

Here is what the five cluster analysis returned:

FiveClusterRankings

The results show a little more diversity in how the clusters ranked the presidents.  Again, we see some clusters that are more or less voting along party lines based on recent presidents (Clusters 5 and 4).  Cluster 1 and 3 also are interesting in that the algorithm also seems to be picking up clusters of visitors who are voting for people that have not been president (Hillary Clinton, Ben Carson), and thankfully were never president (Adolf Hitler).  Cluster 2 and 3 are most interesting to me however, as they seem to show a greater resemblance to the academic lists of worst presidents, (for reference, see wikipedia’s rankings of presidents) but the clusters tend toward a more historical bent on how we think of these presidents–I think of this as a more informed partisan-ship.

By understanding the diverse sets of users that make up our crowdranked lists, we are able to improve our overall rankings, and also provide more nuanced understanding how different group opinions compare, beyond the demographic groups we currently expose on our Ultimate Lists.  Such analyses help us determine outliers and agenda pushers in the voting patterns, as well as allowing us to rebalance our sample to make lists that more closely resemble a national average.

  • Glenn Fox

 

 

by    in Data, Data Science, Opinion Graph

A Ranker Opinion Graph of Important Life Goals

What does it mean to be successful, and what life goals should we be setting in order to get there? Is spending time with family most important? What about your career?  We asked people to rank their life goals in order of importance on Ranker, and using a layout algorithm (force atlas in Gephi), we were able to determine goal categories and organized these goals into a layout which placed goals most closely related nearer to each other.

The connecting lines in the graph represent significant correlations or relationships between different life goals, with thicker lines indicating stronger relationships.  The colors in the graph differentiate between unique groups that emerged from a cluster analysis.  Click on the below graph to expand it.

all_black

The classification algorithm produced 5 main life goal clusters:
(1) Religion/Spirituality (e.g., Christian values, achieving Religion & Spirituality),
(2) Achievement and Material Goods (e.g., being a leader, avoiding failure, having money/wealth),
(3) Interpersonal Involvement/Moral Values (e.g., sharing life, doing the right thing, being inspiring),
(4) Personal Growth (e.g., achieving wisdom & serenity, pursuing ideals and passions, peace of mind), and
(5) Emotional/Physical Well-Being (e.g., being healthy, enjoying life, being happy).

These clusters are well matched to those identified by Robert Emmon’s (1999) psychological research on goal pursuit and well-being. Emmon’s found that life goals form 4 primary categories: work and achievement, relationships and intimacy, religion and spirituality, and generativity (leaving legacy/contributing to society).

However, not all goals are created equal.  While success related goals may be able to help us get ahead in life, they also have downsides.   People who focus on zero-sum goals such as work and achievement tend to report less happiness and life satisfaction compared to people who pursue goals. Our data also show a large divide between Well-being and Work/Achievement goals with relatively no overlap between these two groups.

Other interesting relationships in our graph:

  • Goals related to moral values (e.g., doing the right thing) were clustered with (and therefore more closely related to) interpersonal goals than they were to religious goals.
  • Sexuality was related to goals from opposite ends of the space in unique ways. Well-being goals were related to sexual intimacy whereas Achievement goals were related to promiscuity.
  • While most goal clusters were primarily made up of goals for pursuing positive outcomes, the Achievement/Material Goods goal cluster also included the most goals related to avoiding negative consequences (e.g., avoiding failure, avoiding effort, never going to jail).
  • Our Personal Growth goal cluster is unique from many of the traditional goal taxonomies in the psychological literature, and our data did not find the typical goal cluster related to Generativity. This may show a shift in goal striving from community growth to personal growth.

– Kate Johnson

Citation: Emmons, R. A. (1999). The psychology of ultimate concerns: Motivation and spirituality in personality. New York: Guilford Press.

 

Ranker Opinion Graph: the Best Froyo Toppings

Its hard to resist a cold treat on a hot summer afternoon, and frozen yogurt shops with their array of flavors and toppings have a little of something for everyone. Once you’re done agonizing over whether you want new york cheesecake or wild berry froyo (and trying a sample of each at least twice), its time for the topping bar. But which topping should you choose? We asked people to vote for their favorite frozen yogurt toppings on Ranker from a list of 32 toppings, and they responded with over 7,500 votes.

The Top 5 Frozen Yogurt Toppings (by number of upvotes):
1. Oreo (235 votes)
2. Strawberries(225 votes)
3. Brownie bits (223 votes)
4. Hot fudge (216 votes)
5. Whipped cream (201 votes)

But let’s be honest, who can just choose just ONE topping for their froyo? Using Gephi and data from Ranker’s Opinion Graph, we ran a cluster analysis on people’s favorite froyo topping votes to determine which toppings people like to eat together (click on graph to enlarge). In the graph, larger circles mean more likes with other toppings. Most of the versatile toppings were either a syrup (like strawberry sauce) or chocolate candy (like Reese’s Pieces).froyo

The 10 Most Versatile Froyo Toppings:

1. Strawberry sauce
2. Snickers
3. Magic Shell
4. White Chocolate chips
5. Peanut butter chips
6. Butterscotch syrup
7. Candies Nestle Butterfinger Bar
8. Reese’s Pieces
9. M&Ms
10. Brownie bits

 

Using the modularity clustering tool in Gephi, we were then able to sort toppings into groups based on which toppings people were most likely to upvote together. We identified 4 kinds of froyo topping lovers:

fruitnut1. Fruit and Nuts (Blue): This cluster is all about the fruits and nuts. These people love Strawberry sauce, sliced almonds, and Marschino cherries.

chocolate2. Chocolate (purple): This cluster encompases all things chocolate. These people love Magic Shell, Brownie bits, and chocolate syrup.

 

sugar3. Sugar candy (green): This cluster is made up of pure sugar. These people love gummy worms, Rainbow sprinkles, and Skittles.

 

 

salty4. Salty and Cake (Red): This cluster encompasses cake bites and toppings that have a salty taste to them. These people like Snickers, Cheesecake bits, and Caramel Syrup.

 

Some additional thoughts:

  • Banana was a strange topping that was only linked with Snickers.
  •  People who like nuts like both fruit and items from the salty category.
  •  People who like blueberries only like other fruits.
  • People who like sugar items like gummy worms also like chocolate, but don’t particularly like fruit.

 

– Kate Johnson

by    in interest graph, Market Research, Pop Culture

Hierarchical Clustering of a Ranker list of Beers

This is a guest post by Markus Pudenz.

Ranker is currently exploring ways to visualize the millions of votes collected on various topics each month.  I’ve recently begun using hierarchical cluster analysis to produce taxonomies (also known as dendograms), and applied these techniques to Ranker’s Best Beers from Around the World. A dendrogram allows one to visualize the relationships on voting patterns (scroll down to see what a dendrogram looks like). What hierarchical clustering does is break down the list into related groups based on voting patterns of the users, grouping like items with items that were voted similarly by the same users. The algorithm is agglomerative, meaning it is starts with individual items and combines them iteratively until one large cluster (all of the beers in the list)  remains.

Every beer in our dendrogram is related to another at some level, whether in the original cluster or further down the dendrogram. See the height axis on the left side? The lower the cluster is on the axis, the closer the relationship the beers will have. For example, the cluster containing Guinness and Guinness Original is the lowest in this dendrogram indicating these to beers have the closest relationship based on the voting patterns. Regarding our list, voters have the option to Vote Up or Vote Down any beer they want. Let’s start at the top of the dendrogram and work our way down.

Hierarchical Clustering of Beer Preferences

Looking at the first split of the clusters, one can observe the cluster on the right contains beers that would generally be considered well-known including Guinness, Sam Adams, Heineken and Corona. In fact, the cluster on the right includes seven of the top ten beers from the list. The fact that most of our popular beers are in this right cluster indicates that there is a strong order effect with voters more likely to select beers that are more popular when ranking their favorite beers. For example, if someone selects a beer that is in the top ten, then another beer they select is also more likely to be in the top ten. As we examine the right cluster further, the first split divides the cluster into two smaller clusters. In the left cluster, we can clearly see, unsurprisingly, that a drinker who likes Guinness is more likely to vote for another variety of Guinness. This left cluster is comprised almost entirely of Guinness varieties with the exception of Murphy’s Irish Stout. The right cluster lists a larger variety of beer makers including Sam Adams, Stella Artois and Pyramid. In addition, none of the beers in this right cluster are stouts as with the left cluster. The only brewer in this right cluster with multiple varieties is Sam Adams with Boston Lager and Octoberfest meaning drinkers in this cluster were not as brand loyal as in the left cluster. Drinkers in this cluster were more likely to select a beer variety from a different brewer. When reviewing this cluster from the first split in the dendrogram, there is clearly a defined split between those drinkers who prefer a heavier beer (stout) as opposed to those who prefer lighter beers like lagers, pilseners, pale ales or hefeweizen.

Conversely, for beers in the left cluster, drinkers are more likely to vote for other beers that are not as popular with only three of the top ten beers in this cluster. In addition, because of the larger size, the range of beer styles and brewers for this cluster is more varied as opposed to those in the right cluster. The left cluster splits into three smaller clusters before splitting further. One cluster that is clearly distinct is the second of these clusters. This cluster is comprised almost entirely of Belgian style beers with the only exception being Pliny the Elder, an IPA. La Fin du Monde is a Belgian style tripel from Quebec with the remaining brewers from Belgium. One split within this cluster is comprised entirely of beer varieties from Chimay indicating a strong relationship; voters who select Chimay are more likely to also select a different style from Chimay when ranking their favorites.  Our remaining clusters have a little more variety. Our first cluster, the smallest of the three, has a strong representation from California with varieties from Stone, Sierra Nevada and Anchor Steam taking four out of six nodes in the cluster. Stone IPA and Stone Arrogant Bastard Ale have the strongest relationship in this cluster. Our third cluster, the largest of the three, has even more variety than the first. We see a strong relationship especially with Hoegaarden and Leffe.

I was also curious as to whether the beers in the top ten were associated with larger or smaller breweries. As the following list shows,  there is an even split between the larger conglomerates like AB InBev, Diageo, Miller Coors and independent breweries like New Belgium and Sierra Nevada.

  1. Guinness (Diageo)
  2. Newcastle (Heineken)
  3. Sam Adams Boston Lager (Boston Beer Company)
  4. Stella Artois (AB InBev)
  5. Fat Tire (New Belgium Brewing Company)
  6. Sierra Nevada Pale Ale (Sierra Nevada Brewing Company)
  7. Blue Moon (Miller Coors)
  8. Stone IPA (Stone Brewing Company)
  9. Guinness Original (Diageo)
  10. Hoegaarden Witbier (AB InBev)

Markus Pudenz