by   Ranker
in Data, Data Science, Popular Lists

Applying Machine Learning to the Diversity within our Worst Presidents List

Ranker visitors come from a diverse array of backgrounds, perspectives and opinions.  The diversity of the visitors, however, is often lost when we look at the overall rankings of the lists, due to the fact that the rankings reflect a raw average of all the votes on a given item–regardless of how voters behave on multiple other items.  It would be useful then, to figure out more about how users are voting across a range of items, and to recreate some of the diversity inherent in how people vote on the lists.

Take for instance, one of our most popular lists: Ranking the Worst U.S. Presidents, which has been voted on by over 60,000 people, and is comprised of over a half a million votes.

In this partisan age, it is easy to imagine that such a list would create some discord. So when we look at the average voting behavior of all the voters, the list itself has some inconsistencies.  For instance, the five worst-rated presidents alternate along party lines–which is unlikely to represent a historically accurate account of which presidents are actually the worst.  The result is a list that represents our partisan opinions about our nation’s presidents:




The list itself provides an interesting glimpse of what happens when two parties collide in voting for the worst presidents, but we are missing interesting data that can inform us about how diverse our visitors are.  So how can we reconstruct the diverse groups of voters on the list such that we can see how clusters of voters might be ranking the list?

To solve this, we turn to a common machine learning technique referred to as “k-means clustering.” K-means clustering takes the voting data for each user, summarizes it into a result, and then finds other users with similar voting patterns.  The k-means algorithm is not given any information whatsoever from me as the data scientist, and has no real idea what the data mean at all.  It is just looking at each Ranker visitor’s votes and looking for people who vote similarly, then clustering the patterns according to the data itself.  K-means can be done to parse as many clusters of data as you like, and there are ways to determine how many clusters should be used.  Once the clusters are drawn, I re-rank the presidents for each cluster using Ranker’s algorithm, and the we can see how different clusters ranked the presidents.

As it happens, there are some differences in how clusters of Ranker visitors voted on the list.  In a two-cluster analysis, we find two groups of people with almost completely opposite voting behavior.

(*Note that since this is a list of voting on the worst president, the rankings are not asking voters to rank the presidents from best to worst, it is more a ranking of how much worse each president is compared to the others)

The k-means analysis found one cluster that appears to think Republican presidents are worst:


Here is the other cluster, with opposite voting behavior:


In this two-cluster analysis, the shape of the data is pretty clear, and fits our preconceived picture of how partisan politics might be voting on the list.  But there is a bias toward recent presidents, and the lists do not mimic academic lists and polls ranking the worst presidents.

To explore the data further, I used a five cluster analysis–in other words, looking for five different types of voters in the data.

Here is what the five cluster analysis returned:


The results show a little more diversity in how the clusters ranked the presidents.  Again, we see some clusters that are more or less voting along party lines based on recent presidents (Clusters 5 and 4).  Cluster 1 and 3 also are interesting in that the algorithm also seems to be picking up clusters of visitors who are voting for people that have not been president (Hillary Clinton, Ben Carson), and thankfully were never president (Adolf Hitler).  Cluster 2 and 3 are most interesting to me however, as they seem to show a greater resemblance to the academic lists of worst presidents, (for reference, see wikipedia’s rankings of presidents) but the clusters tend toward a more historical bent on how we think of these presidents–I think of this as a more informed partisan-ship.

By understanding the diverse sets of users that make up our crowdranked lists, we are able to improve our overall rankings, and also provide more nuanced understanding how different group opinions compare, beyond the demographic groups we currently expose on our Ultimate Lists.  Such analyses help us determine outliers and agenda pushers in the voting patterns, as well as allowing us to rebalance our sample to make lists that more closely resemble a national average.

  • Glenn Fox



by   Ranker
in Data Science, Popular Lists, Rankings

In Good Company: Varieties of Women we would like to Drink With


They say you’re defined by the company you keep.  But how are you defined by the company you want to keep?

The list “Famous Women You’d Want to Have a Beer With”  provides an interesting way to examine this idea.  In other words, how people vote on this list can define something about what kind of person is doing the voting.

We can think of people as having many traits, or dimensions.  The traits and dimensions that are most important to the voters will be given higher rankings.  For instance, some people may rank the list thinking about the trait of how funny the person is, so may be more inclined to rate comedians higher than drama actresses.  Others may vote just on attractiveness, or based on singing talent, etc…  It may be the case that some people rank comedians and singers in a certain way, whereas others would only spend time with models and actresses.  By examining how people rank the various celebrities along these dimensions, we can learn something about the people doing the voting.

The rankings on the site, however, are based on the sum of all of the voters’ behavior on the list, so the final rankings do not tell us about how certain types of people are voting on the list.  While we could manually go through the list to sort the celebrities according to their traits, i.e. put comedians with comedians, singers with singers,  we would risk using our own biases to put voters into categories where they do not naturally belong.  It would be much better to let the voter’s own voting decide how the celebrities should be clustered.  To do this, we can use some fancy-math techniques from machine learning, called clustering algorithms, to let a computer examine the voting patterns and then tell us which patterns are similar between all the voters.   In other words, we use the algorithm to find patterns in the voting data, to then put similar patterns together into groups of voters, and then examine how the different groups of voters ranked the celebrities.  How each group ranked the celebrities tells us something about the group, and about the type of people they would like to keep them company.

As it happens, using this approach actually finds unique clusters, or groups, in the voting data, and we can then guess for ourselves how the voters from each group can be defined based on the company they wish to keep.

Here are the results:

Cluster 1:


Cluster 1 includes females known to be funny, and includes established comedians like Carol Burnett and Ellen DeGeneres. What is interesting is that Emma Stone and Jennifer Lawrence are also included, who are also highly ranked on lists based on physical attractiveness, they also have a reputation for being funny.  The clustering algorithm is showing us that they are often categorized alongside other funny females as well.  Among the clusters, this cluster has the highest proportion of female voters, which may explain why the celebrities are ranked along dimensions other than attractiveness.


Cluster 2:


Cluster 2 appears to consist of celebrities that are more in the nerdy camp, with Yvonne Strahovski and Morena Baccarin, both of whom play roles on shows popular with science fiction fans.  In the bottom of this list we see something of a contrarian streak as well, with downvotes handed out to some of the best known celebrities who rank highly on the list overall.

Cluster 3:


Cluster 3 is a bit more of a puzzle.  The celebrities tend to be a bit older, and come from a wide variety of backgrounds that are less known for a single role or attribute.  This cluster could be basing their votes more on the celebrity’s degree of uniqueness, which is somewhat in contrast with the bottom ranked celebrities who represent the most common and regularly listed female celebrities on Ranker.

Cluster 4:


We would also expect a list such as this to be heavily correlated with physical attractiveness, or perhaps for the celebrity’s role as a model.  Cluster 4 is perhaps the best example of this, and likely represents our youngest cluster.  The top ranked women are from the entertainment sector and are known for their looks, whereas in the bottom ranked people are from politics, comedy, or are older and probably less well known to the younger voters.  As we might expect, cluster 3 also has a high proportion of younger voters.

Here is the list of the top and bottom ten for each cluster (note that the order within these lists is not particularly important since the celebrity’s scores will be very close to one another):




In the end, the adage that we are defined by the company we keep appears to have some merit–and can be detected with machine learning approaches.  Though not a perfect split among the groups, there were trends in each group that drew the people of the cluster together.  This approach can provide a useful tool as we improve the site and improve the content for our visitors.   We are using these approaches to help improve the site and to provide better content to our visitors.


–Glenn R. Fox, PhD



by   Ranker
in Data, Data Science, Entertainment, Opinion Graph, Pop Culture

A Ranker World of Comedy Opinion Graph: Who Connects the Funny Universe?

In the previous post, we showed how a Gephi layout algorithm was able to capture different domains in the world of comedy across all of the Ranker lists tagged with the word “funny”.  However, these algorithms also give us information about the roles that individuals play within clusters. The size of the node indicates that node’s ability to connect other nodes, so bigger nodes indicate individuals who serve as a gateway between different nodes and categories.  These are the nodes that you would want to target if you wanted to reach the broadest audience, as people who like these comedic individuals also like many others.  Sort of like having that one friend who knows everyone send out the event invite instead of having to send it to a smaller group of friends in your own social network and hoping it gets around. So who connects the comedic universe?

The short answer: Dave Chappelle (click to enlarge)


Dave Chappelle is the superconnector. He has both the largest number of direct connections and the largest number of overall connections. If you want to reach the most people, go to him. If you want to connect people between different kinds of comedy, go to him.  He is the center of the comedic universe. He’s not the only one with connections though.

Top 10 Overall Connectors

  1. Dave Chappelle 
  2. Eddie Izzard 
  3. John Cleese 
  4. Ricky Gervais
  5. Rowan Atkinson
  6. Eric Idle
  7. Billy Connolly
  8. Bill Hicks
  9. It’s Always Sunny In Philadelphia
  10. Sarah Silverman


We can also look at who the biggest connectors are between different comedy domains.

  • Contemporary TV Shows: It’s Always Sunny in Philadelphia, ALF, and The Daily Show are the strongest connectors. They provide bridges to all 6 other comedy domains.
  • Contemporary Comedians on American Television: Dave Chappelle, Eddie Izzard and Ricky Gervais are the strongest connectors. They provide bridges to all 6 other comedy domains.
  •  Classic Comedians: John Cleese and Eric Idle are the strongest connectors. They provide bridges to all 6 other comedy domains.
  • Classic TV Shows: The Muppet Show and Monty Python’s Flying Circus are the strongest connectors. They provide bridges to Classic TV Comedians, Animated TV shows, and Classic Comedy Movies.
  • British Comedians: Rowan Atkinson is the strongest connector. He serves as a bridge to all of the other 6 comedy domains.
  • Animated TV Shows: South Park is the strongest connector. It serves as a bridge to Classic Comedians, Classic TV Shows, and British Comedians.
  • Classic Comedy Movies: None of the nodes in this domain were strong connectors to other domains, though National Lampoon’s Christmas Vacation was the strongest node in this network.


– Kate Johnson

A Ranker Opinion Graph of the Domains of the World of Comedy

One unique aspect of Ranker data is that people rank a wide variety of lists, allowing us to look at connections beyond the scope of any individual topic.  We compiled data from all of the lists on Ranker with the word “funny” to get a bigger picture of the interconnected world of comedy.  Using Gephi layout algorithms, we were able to create an Opinion Graph which categorizes comedy domains and identify points of intersection between them (click to make larger).


In the following graphs, colors indicate different comedic categories that emerged from a cluster analysis, and the connecting lines indicate correlations between different nodes with thicker lines indicating stronger relationships.  Circles (or nodes) that are closest together are most similar.  The classification algorithm produced 7 comedy domains:


CurrentTVwAmerican TV Shows and Characters: 26% of comedy, central nodes =  It’s Always Sunny in Philadelphia, ALF, The Daily Show, Chappelle’s Show, and Friends.

NowComedianwContemporary Comedians on American Television: 25% of nodes, includes Dave Chappelle, Eddie Izzard, Ricky Gervais, Billy Connolly, and Bill Hicks.


ClassicComedianswClassic Comedians: 15% of comedy, central nodes = John Cleese, Eric Idle, Michael Palin, Charlie Chaplin, and George Carlin.

ClassicTVClassic TV Shows and Characters: 14% of comedy, central nodes = The Muppet Show, Monty Python’s Flying Circus, In Living Color, WKRP in Cincinnati, and The Carol Burnett Show.

BritComwBritish Comedians: 9% of comedy, central nodes = Rowan Atkinson, Jennifer Saunders, Stephen Fry, Hugh Laurie, and Dawn French.

AnimwAnimated TV Shows and Characters: 9% of comedy, central nodes = South Park, Family Guy, Futurama, The Simpsons, and Moe Szyslak.

MovieswClassic Comedy Movies: 1.5% of comedy, central nodes = National Lampoon’s Christmas Vacation, Ghostbusters, Airplane!, Vacation, and Caddyshack.



Clusters that are the most similar (most overlap/closest together):

  • Classic TV Shows and Contemporary TV Shows
  • British Comedians and Classic TV shows
  • British Comedians and Contemporary Comedians on American Television
  • Animated TV Shows and Contemporary TV Shows

Clusters that are the most distinct (lest overlap/furthest apart):

  • Classic Comedy Movies do not overlap with any other comedy domains
  • Animated TV Shows and British Comedians
  • Contemporary Comedians on American Television and Classic TV Shows


Take a look at our follow-up post on the individuals who connect the comedic universe.

– Kate Johnson


by   Ranker
in Data Science, prediction, Rankings

Cognitive Models for the Intelligent Aggregation of Lists

Ranker is constantly working to improve our crowdsourced list algorithms, in order to surface the best possible answers to the questions on our site.  As part of this effort, we work with leading academics who research the “wisdom of crowds”, and below is a poster we recently presented at the annual meeting for the Association for Psychological Science (led by Ravi Selker at the University of Amsterdam and in collaboration with Michael Lee from the University of California-Irvine).

While the math behind the aggregation model may be complex (a paper describing it in detail will hopefully be published shortly), the principle being demonstrated is relatively simple.  Specifically, aggregating lists using models that take into account the inferred expertise of the list maker outperform simple averages, when compared to real-world ground truths (e.g. box office revenue).  While Ranker’s algorithms for determining our crowdsourced rankings may be similarly complex, they are similarly designed to produce the best answers possible.




– Ravi Iyer

by   Ranker
in Data, Data Science, Opinion Graph

A Ranker Opinion Graph of Important Life Goals

What does it mean to be successful, and what life goals should we be setting in order to get there? Is spending time with family most important? What about your career?  We asked people to rank their life goals in order of importance on Ranker, and using a layout algorithm (force atlas in Gephi), we were able to determine goal categories and organized these goals into a layout which placed goals most closely related nearer to each other.

The connecting lines in the graph represent significant correlations or relationships between different life goals, with thicker lines indicating stronger relationships.  The colors in the graph differentiate between unique groups that emerged from a cluster analysis.  Click on the below graph to expand it.


The classification algorithm produced 5 main life goal clusters:
(1) Religion/Spirituality (e.g., Christian values, achieving Religion & Spirituality),
(2) Achievement and Material Goods (e.g., being a leader, avoiding failure, having money/wealth),
(3) Interpersonal Involvement/Moral Values (e.g., sharing life, doing the right thing, being inspiring),
(4) Personal Growth (e.g., achieving wisdom & serenity, pursuing ideals and passions, peace of mind), and
(5) Emotional/Physical Well-Being (e.g., being healthy, enjoying life, being happy).

These clusters are well matched to those identified by Robert Emmon’s (1999) psychological research on goal pursuit and well-being. Emmon’s found that life goals form 4 primary categories: work and achievement, relationships and intimacy, religion and spirituality, and generativity (leaving legacy/contributing to society).

However, not all goals are created equal.  While success related goals may be able to help us get ahead in life, they also have downsides.   People who focus on zero-sum goals such as work and achievement tend to report less happiness and life satisfaction compared to people who pursue goals. Our data also show a large divide between Well-being and Work/Achievement goals with relatively no overlap between these two groups.

Other interesting relationships in our graph:

  • Goals related to moral values (e.g., doing the right thing) were clustered with (and therefore more closely related to) interpersonal goals than they were to religious goals.
  • Sexuality was related to goals from opposite ends of the space in unique ways. Well-being goals were related to sexual intimacy whereas Achievement goals were related to promiscuity.
  • While most goal clusters were primarily made up of goals for pursuing positive outcomes, the Achievement/Material Goods goal cluster also included the most goals related to avoiding negative consequences (e.g., avoiding failure, avoiding effort, never going to jail).
  • Our Personal Growth goal cluster is unique from many of the traditional goal taxonomies in the psychological literature, and our data did not find the typical goal cluster related to Generativity. This may show a shift in goal striving from community growth to personal growth.

– Kate Johnson

Citation: Emmons, R. A. (1999). The psychology of ultimate concerns: Motivation and spirituality in personality. New York: Guilford Press.


by   Ranker
in Data, Data Science, Entertainment, Opinion Graph, Pop Culture, Trends

Changes in Opinion for House of Cards, The Walking Dead, Mad Men, & Workaholics

One of the coolest things about Ranker is the fact that Ranker votes are recorded in real time as they happen, allowing the potential for it to track changes in people’s opinions. A list like, “The Best Shows Currently on Air” generates heavy traffic due to the popularity of television shows on air and online. A certain television show can amass an impressive, almost cult-like, following and it’s interesting to see how public opinions change over time, why, and if it corresponds to changes happening in the real-world.

The figure below shows the pattern of change in the proportion of up-votes for the TV shows in this list, and highlights four shows: House of Cards, The Walking Dead, Mad Men, and Workaholics.


There is a steep decline in the proportion of up votes in December of 2013 for the House of Cards. Interestingly, this was during an interim period between seasons where seemingly nothing significant relating to the show was occurring. A plausible explanation could be due to a ceiling effect as there were few up votes and no down votes until that time. When a show first gets on a Ranker list, it often is only voted on by the fans of that show. As the show is only accessible through Netflix, the viewing audience is significantly smaller than cable or network Television shows, so that may further skew the number of people who knew enough about the show to consider downvoting it. Fascinatingly enough, in the same month, during a televised meeting with tech industry CEOs on NSA surveillance, President Obama expressed his love for the show stating “I wish things were that ruthlessly efficient,” adding that Rep. Frank Underwood, played by Kevin Spacey, “is getting a lot of stuff done”. Could the increase in downvotes be due to certain members of the public expressing their opinions about the President through the voting patterns on The House of Cards on Ranker?
The entire second season of The House of Cards was released on February 14th on Netflix in the same binge-watching format as the first season, which garnered positive reviews. Interestingly, there is a significant decline in proportion of up votes for The House of Cards from February 2014 to April 2014, however viewership of season two was much higher than season one based on early reviews. The show also garnered critical acclaim for season two earning thirteen Primetime Emmy Award nominations for the 66th Primetime Emmy Awards, and three nominations at both the 72nd Golden Globe Awards and the 21st Screen Actors Guild Awards. Given the viewership ratings and critical success, it may seem surprising to see such a steep drop in votes. But in looking at Ranker data, it is often common for shows to get more downvotes over time as they get better known, as people rarely downvote things they haven’t heard of, even as a show also receives more upvotes. This is why our algorithms take into account both the volume and proportion of upvotes vs. downvotes.
Shows that are more readily accessible may exhibit less of a ceiling effect early on, as there is a greater likelihood of people watching the show who aren’t specifically looking for it. Looking at Mad Men and The Walking Dead, there is a steady increase in up-vote proportion over the span that votes were submitted from June 2013 to last month, April 2015. The Walking Dead is the most watched drama series telecast in basic cable history, making it reasonable to assume that the reason for the continual increases are due to the increasing number of fans of the show who vote for it as the “Best Show Currently on Air”. Mad Men fans had similar voting patterns.

For a show like Workaholic, which airs on Comedy Central, there is a significantly smaller viewing audience compared to national networks, and they do not have the fanbase power of House of Cards or The Walking Dead. However, it is a show with positive reviews and a steady following of loyal fans. Though it is not as popular as other shows airing, it’s proven to be a show with comedic talent that generates positive sentiments amongst its viewers and a growing proportion of up-votes.
While these examples are only suggestive, the enormous number of votes made by Ranker uses, and the variety of topics they cover, makes the possibility of measuring opinions, and detecting and understanding change in opinions, an intriguing one that is worth continuing to expand upon.
-Emily Liu

by   Ranker
in Opinion Graph, Ranker Comics

A Cluster Analysis of the Superpower Opinion Graph produces 5 Superhero types

If you could have one superpower, which would you choose?  Data from the Ranker list “Badass Superpowers We’d Give Anything to Have” improves on the age-old classroom ice breaker question by letting people rank all of the superpowers in order of how much they would want them.  Because really, unless you’re one of the X-men, you probably would have more than one power. So, if you could have a collection of superpowers, what kind of superhero would you be?

Using Gephi and data from Ranker’s Opinion Graph, we ran a cluster analysis on people’s votes on the superpowers list to determine what groupings of superpowers different people wanted.

This analysis grouped superpowers into 5 clusters, which we interpreted to represent unique superhero types.


The Overall Superpower Opinion Graph




The 5 Types of Superheroes


1. The Creationist God: This superhero type is characterized by creation and destruction, Old-Testament Christian God-style. Notable superpowers: the ability to create/destroy worlds, die and come back to life, have gods’ weapons (Thor’s Hammer, Zeus’ Thunderbolt), remove others’ senses, and resurrect the dead.


2. The Time Lord: This superhero type is basically The Doctor from Dr. Who. Notable superpowers: omnipotence, travel to other dimensions, open portals to anywhere, and travel beyond the omniverse.


3. The Elementalist: This superhero type has the ability to manipulate the elements and use them as weapons to their advantage. Notable superpowers: manipulation of water, fire, weather, and plants, ability to shapeshift, shoot ice, and lightning and fire.


4. The Superhuman: This superhero type is humans+, with enhanced human senses and decreased human limitations. Notable superpowers: sense danger, x-ray vision, walk through walls, super speed, mind reading, flight, super strength, and enhanced flexibility.


5. The Zen Master: This superhero type sounds a bit like being permanently on mind-altering psychoactive substances crossed with Gandhi. Notable superpowers: speech empowerment, spiritual enlightenment, and infinite appetite!!.


-Kate Johnson

by   Ranker
in Opinion Graph

Characteristics of people who are not annoyed by Bill O’Reilly

On today’s The O’Reilly Factor (video below), Bill O’Reilly lamented the fact that he was only #10 on Ranker’s Most Annoying TV Hosts list and decided that he would make it his New Year’s Resolution to become the #1 most annoying person on our list. While I may not share O’Reilly’s politics, I like him as a person, even as he does annoy me from time to time, and would like to help him reach his goals. I enjoy working with the Ranker dataset as it lets me answer very specific questions, like whether people who think the show 24 is overrated are also convinced that George W. Bush was a terrible person—or, in this case, I can study the people who specifically disagree that O’Reilly is annoying, in the hopes that O’Reilly can find these people and work to annoy them more.

Who does O’Reilly need to work harder to annoy? From our opinion graph of 20+ million edges, (so named because we can connect not only vague “likes” or “interests,” but specifically whether someone thinks something is best, worst, hot, annoying, overrated, etc.), we have hundreds of specific opinions that characterize people who don’t find O’Reilly annoying. Here are a chosen few findings about these people:

People who are NOT annoyed by O’Reilly tend to…
– find liberals like Jon Stewart, Rachel Maddow, and Bill Maher annoying.
– believe that John Wayne and Humphrey Bogart are among the Best Actors in Film History.
– enjoy movies like The Sound of Music and Toy Story.
– watch America’s Got Talent, Cops, Dirty Jobs, Deadliest Catch, Home Improvement, and Extreme Makeover: Home Edition.
– listen to Lynyrd Skynyrd, Boston, and Elvis.
– enjoy comedians like Bob Hope, Jeff Foxworthy, Joan Rivers, and Billy Crystal.
– be attracted to  Carrie Underwood, Jessica Simpson, Brooklyn Decker, and Sarah Palin.

Thanks to big data, these audiences are all readily targetable online—and if O’Reilly really wants to annoy these people, he might want to study our biggest pet peeves list for ideas (e.g. chewing with his mouth open might work on TV). We hope this list will help O’Reilly with his ambitions for 2015, and please do reach out to us if you need more market research on how to annoy people more.

– Ravi Iyer

by   Ranker
in Data, Opinion Graph

The Opinion Graph Connections between 24, George W. Bush, Jack Bauer, and Rachel Maddow.

As someone whose roots are in political psychology, I’m always interested in seeing how the Ranker dataset shows how our values are reflected in our entertainment choices.  We’ve seen many instances where politicians have cited 24 in the case for or against torture, but are politics reflected in attitudes toward 24 amongst the public?  Using data from users who have voted on multiple Ranker lists, including our lists polling for The Worst Person in History, the Greatest TV Characters of All-Time, the most Overrated TV shows and The Biggest Hollywood Douchebags, the clear answer is yes.

People who think George W. Bush is one of the worst people in history, also tend to think that 24 is one of the most overrated TV shows of all-time.

People who think Bush is a terrible person also think 24 is overrated.
People who think Bush is a terrible person also think 24 is overrated.

…and people who think Jack Bauer is one of the best TV Characters of All-Time also think that Rachel Maddow is one of Hollywood’s Biggest Douchebags.

People who think Jack Bauer is a great TV character also think Rachel Maddow is a douchebag.

– Ravi Iyer

ps. …and these are just a few of the relationships between 24 and politicians in our opinion graph, which all tell the same basic story.

Page 1 of 712345...Last »