by    in Popular Lists

Why do Ranker voters think Ellen should be president?

Yesterday, Ellen talked about being voted #1 on our list of Celebrities who should run for President.

What is it that makes a celebrity “president”-worthy?  Because Ranker polls about each person along dozens of dimensions (e.g. cool vs. hot vs. good actor vs. trustworthy vs. ?), we can see how ratings on other lists relate to being voted as someone who should run for president.  For example, below we can see that being seen as “cool” is only weakly related to being seen as presidential, with actors like Tom Hanks and Clint Eastwood scoring as relatively cool, but not relatively presidential.

CoolVsPresident

Being good at your job seems to relate moderately to being seen as presidential.  For example, below you can see how being seen as a good actor positively relates to being seen as presidential, with people like Meryl Streep, Leonardo Di Caprio, and Morgan Freeman scoring well on both fronts.

GoodActorVsPresident

It also relates well to likability.  Below you can see how the men who people want to have a beer with, like Johnny Depp, Morgan Freeman, and Di Caprio, also tend to be people they rate well as potential presidential candidates.

BeerVsPresident

It seems to relate best to trust as people like Ellen, Meryl Streep, and Morgan Freeman seem to be rated as both Trustworthy and as someone who should run for President.  Notice how the items below form a fairly straight line going up and to the right.

TrustVsPresident

In all, looking at the relationship between Ranker lists yields comparable results to what political scientists find drives evaluations of presidential candidates.  People want a president who is competent, likable, and trustworthy.  And clearly Ellen fits all three buckets as she ranks as one of the best comedians of all-time, someone people would want to have a beer with, and as trustworthy.  Hence, Ranker users vote her as the #1 Celebrity Who Should Run for President.

Ravi Iyer

by    in Popular Lists

Ranker Users Predict Final Four Teams Accurately Based on Limited Bias

In 2015, Ranker’s voters predicted seven teams in the NCAA tournament’s Elite Eight. With the field for the Sweet 16 now set, we can see how well our rankings can predict how far a particular team will in this year’s tournament. This has been a historically tumultuous season of college basketball. Top-10 teams lost regularly, upsets were commonplace, and no teams were safe.

We can use Ranker’s data to see which team is having a year that matches their historical reputation as a powerhouse, and vice versa.  Ranker visitors drew a clear line around North Carolina, Michigan State, Kansas and Villanova as favorites to make it into the Final Four.  Kentucky is notable because it ranks highest in the overall best college programs poll, but is not predicted by our voters to end up in the Final Four.  Villanova, which is not ranked among the top historical teams, is the main outlier of teams that aren’t as strong in the same way that Kentucky and UNC are, yet is expected to have a good tournament showing.

The rankings provide an insight into how our voting data is based on the current season instead of a bias towards teams based on their longstanding reputations.

 

Here are our results from the 2015 tournament:

Screen Shot 2016-03-08 at 12.25.50 PM

 

Here are our results for this year’s tournament:

Screen Shot 2016-03-08 at 10.43.38 AM

 

 

by    in Popular Lists

Duke and Kentucky Among Teams with the Most Annoying Fans

With March Madness tipping off, we turn to Ranker’s voters to learn more about college basketball and what to expect in this year’s tournament!

Which college basketball fan base wears their pride the best way?  We all know the traditional powers in college basketball, but sometimes their gloating can be a bit much.  In two separate lists, Ranker visitors ranked which college basketball team was the best, and which had the most annoying fans.  When we combine these two lists, we can see which team is best respected for its prowess on the court and how this relates to how annoying its fans are to the rest of the world.  As it happens, powerhouse bluebloods Duke and Kentucky are ranked among the top teams for both being historically successful, and for having annoying fans.  The most successful team with only moderately annoying fans is North Carolina.  The least annoying but still respected team fan base was Villanova.  Ohio State and Florida stand out for having annoying fans, but not particularly respected as programs overall.

 

Screen Shot 2016-03-01 at 1.08.08 PM

 

by    in Popular Lists

Combining Best and Worst Lists to find Polarizing TV Shows

Ranker lists are expressions of people’s opinions, and it is possible for people to have opposite views. The same movie, television show, song, or celebrity can be loved or hated by different groups of people. (If this is not immediately obvious, think about Donald Trump for a moment). Social psychology has long been interested in differences of opinion, and has gathered all sorts of evidence that people will take more extreme views in an argument (attitude polarization), that they will focus on evidence that reinforces what they already believe (confirmation bias), and that they tend to judge new items and experiences based on their previous knowledge (apperception).

Ranker can provide evidence of polarization, since people’s ranks can express different opinions about the same items. This polarization can be especially clear when looking at “best” and “worst” lists on the same general topic. At the moment, it is easy to imagine Donald Trump at the top of both a “Best Presidential Candidates” and a “Worst Presidential Candidates” list. About the only way to explain this pattern of opinions is to identify Trump as a polarizing person. He doesn’t lead to one opinion or attitude. He polarizes people into “lovers” and “hater”.

Previously, we have developed cognitive models to analyze Ranker lists as diverse as the Soccer World Cup, movie box office takings, and how people feel about pizza toppings. None of these models, however, allowed for polarization. The assumption has always been that each item was perceived in a similar way by everybody. So, we extended our cognitive modeling approach to allow for polarizing items, perceived by some users with a “positive spin” and by others with a “negative spin”.

Not wanting to give Trump any more publicity, we decided to test the new model by looking at people’s opinions of recent TV shows. The two lists we looked at were The Best New TV Series of 2015 and The Most Disappointing New TV Shows of 2015. Together these lists involve 22 users — 17 in the best list and 5 from the worst list — ranking a total of 67 shows, with 14 of shows appearing on both best and worst lists. Some of the lists had as few as 3 shows, while others had as many as 27, with an average of about 9 shows per list.

Our new model assumes each TV show is represented in one of two ways. One possibility is that everybody has the same opinion, and the show is not polarizing. This means if a TV show is good, for example, people put it high in their best list, and low in the worst list, or doesn’t list it on their worst list at all. On the other hand, if a TV show is bad people put it high in their bad list and low in in their good list, or don’t mention it in their bad list at all. The new possibility in our model is that a show is polarizing, and so some people believe it is good while others believe it is bad. These polarizing shows need two separate representations: one for the “lovers”, and one for the “haters”.

TVShowBlog

The model we created determined which shows were polarizing and which were not, and how each should be represented on a scale from best to worst. The results are summarized in the graph. The shows are listed from best at the top to worst at the bottom. If a show is not polarizing, it is listed once in gray. If a show is polarizing, it is listed twice: once in green in for in its positive form, and once in red for in its negative form. The graph also summarizes the Ranker data that lead to these conclusions. The green circles indicate when a show was included in the “best” list, starting from rank 1 on the left, to lower ranks moving to the right. The larger the area of the circle, the more people ranked the show in that position. The red crosses indicate when a show was included in the “worst” list, again starting from rank 1 on the left, and again with size of the cross indicating how often it was ranked in that position.

It is clear from the figure that shows identified as polarizing — Better Call Saul, Empire, Ballers, Backstrom, and so on — generally were included in high positions on both the “best” and “worst” lists. Other shows are not polarizing: Last Man on Earth is consistently highly rated, and Schitt’s Creek seems to review itself with its name. A good question for the producers, marketers, and consumers of these TV shows is why some are polarizing. Better Call Saul, which is perhaps the most polarizing show in our results, is a nice example. It has a “lover” representation at the top of the overall list, and a “hater” representation near the bottom. One possibility is that the polarization arises is because Better Call Saul was created as a spin-off prequel to Breaking Bad, and many people would argue that Breaking Bad is one of the greatest television series of all time (and we’d agree). We guess that the people who had a negative opinion of Better Call Saul were die-hard fans of Breaking Bad, and found it didn’t match their lofty expectations. On the other hand, people with positive opinions of Better Call Saul probably evaluated it largely independent of Breaking Bad, as a good new crime television series.

Whatever the causes of polarization, it seems clear that Ranker data provide useful measures, and we think our modeling approach can lead to deeper insights. Finding what is polarizing, and identifying the “lovers” and “haters” should apply not just to TV shows, but to rappers, directors, songs, and everything else where not everybody feels the same way about everything. There is lots for us to do. Or, as Donald Trump has it: “If you’re going to be thinking, you may as well think big.”

– Crystal Velazquez and Michael Lee

by    in Data, Data Science, Popular Lists

Applying Machine Learning to the Diversity within our Worst Presidents List

Ranker visitors come from a diverse array of backgrounds, perspectives and opinions.  The diversity of the visitors, however, is often lost when we look at the overall rankings of the lists, due to the fact that the rankings reflect a raw average of all the votes on a given item–regardless of how voters behave on multiple other items.  It would be useful then, to figure out more about how users are voting across a range of items, and to recreate some of the diversity inherent in how people vote on the lists.

Take for instance, one of our most popular lists: Ranking the Worst U.S. Presidents, which has been voted on by over 60,000 people, and is comprised of over a half a million votes.

In this partisan age, it is easy to imagine that such a list would create some discord. So when we look at the average voting behavior of all the voters, the list itself has some inconsistencies.  For instance, the five worst-rated presidents alternate along party lines–which is unlikely to represent a historically accurate account of which presidents are actually the worst.  The result is a list that represents our partisan opinions about our nation’s presidents:

 

ListScreenShot

 

The list itself provides an interesting glimpse of what happens when two parties collide in voting for the worst presidents, but we are missing interesting data that can inform us about how diverse our visitors are.  So how can we reconstruct the diverse groups of voters on the list such that we can see how clusters of voters might be ranking the list?

To solve this, we turn to a common machine learning technique referred to as “k-means clustering.” K-means clustering takes the voting data for each user, summarizes it into a result, and then finds other users with similar voting patterns.  The k-means algorithm is not given any information whatsoever from me as the data scientist, and has no real idea what the data mean at all.  It is just looking at each Ranker visitor’s votes and looking for people who vote similarly, then clustering the patterns according to the data itself.  K-means can be done to parse as many clusters of data as you like, and there are ways to determine how many clusters should be used.  Once the clusters are drawn, I re-rank the presidents for each cluster using Ranker’s algorithm, and the we can see how different clusters ranked the presidents.

As it happens, there are some differences in how clusters of Ranker visitors voted on the list.  In a two-cluster analysis, we find two groups of people with almost completely opposite voting behavior.

(*Note that since this is a list of voting on the worst president, the rankings are not asking voters to rank the presidents from best to worst, it is more a ranking of how much worse each president is compared to the others)

The k-means analysis found one cluster that appears to think Republican presidents are worst:

ClusterOneB

Here is the other cluster, with opposite voting behavior:

ClusterTwoB

In this two-cluster analysis, the shape of the data is pretty clear, and fits our preconceived picture of how partisan politics might be voting on the list.  But there is a bias toward recent presidents, and the lists do not mimic academic lists and polls ranking the worst presidents.

To explore the data further, I used a five cluster analysis–in other words, looking for five different types of voters in the data.

Here is what the five cluster analysis returned:

FiveClusterRankings

The results show a little more diversity in how the clusters ranked the presidents.  Again, we see some clusters that are more or less voting along party lines based on recent presidents (Clusters 5 and 4).  Cluster 1 and 3 also are interesting in that the algorithm also seems to be picking up clusters of visitors who are voting for people that have not been president (Hillary Clinton, Ben Carson), and thankfully were never president (Adolf Hitler).  Cluster 2 and 3 are most interesting to me however, as they seem to show a greater resemblance to the academic lists of worst presidents, (for reference, see wikipedia’s rankings of presidents) but the clusters tend toward a more historical bent on how we think of these presidents–I think of this as a more informed partisan-ship.

By understanding the diverse sets of users that make up our crowdranked lists, we are able to improve our overall rankings, and also provide more nuanced understanding how different group opinions compare, beyond the demographic groups we currently expose on our Ultimate Lists.  Such analyses help us determine outliers and agenda pushers in the voting patterns, as well as allowing us to rebalance our sample to make lists that more closely resemble a national average.

  • Glenn Fox

 

 

by    in Data Science, Popular Lists, Rankings

In Good Company: Varieties of Women we would like to Drink With

They say you’re defined by the company you keep.  But how are you defined by the company you want to keep?

The list “Famous Women You’d Want to Have a Beer With”  provides an interesting way to examine this idea.  In other words, how people vote on this list can define something about what kind of person is doing the voting.

We can think of people as having many traits, or dimensions.  The traits and dimensions that are most important to the voters will be given higher rankings.  For instance, some people may rank the list thinking about the trait of how funny the person is, so may be more inclined to rate comedians higher than drama actresses.  Others may vote just on attractiveness, or based on singing talent, etc…  It may be the case that some people rank comedians and singers in a certain way, whereas others would only spend time with models and actresses.  By examining how people rank the various celebrities along these dimensions, we can learn something about the people doing the voting.

The rankings on the site, however, are based on the sum of all of the voters’ behavior on the list, so the final rankings do not tell us about how certain types of people are voting on the list.  While we could manually go through the list to sort the celebrities according to their traits, i.e. put comedians with comedians, singers with singers,  we would risk using our own biases to put voters into categories where they do not naturally belong.  It would be much better to let the voter’s own voting decide how the celebrities should be clustered.  To do this, we can use some fancy-math techniques from machine learning, called clustering algorithms, to let a computer examine the voting patterns and then tell us which patterns are similar between all the voters.   In other words, we use the algorithm to find patterns in the voting data, to then put similar patterns together into groups of voters, and then examine how the different groups of voters ranked the celebrities.  How each group ranked the celebrities tells us something about the group, and about the type of people they would like to keep them company.

As it happens, using this approach actually finds unique clusters, or groups, in the voting data, and we can then guess for ourselves how the voters from each group can be defined based on the company they wish to keep.

Here are the results:

Cluster 1:

Cluster4_MakeCelebPanels

Cluster 1 includes females known to be funny, and includes established comedians like Carol Burnett and Ellen DeGeneres. What is interesting is that Emma Stone and Jennifer Lawrence are also included, who are also highly ranked on lists based on physical attractiveness, they also have a reputation for being funny.  The clustering algorithm is showing us that they are often categorized alongside other funny females as well.  Among the clusters, this cluster has the highest proportion of female voters, which may explain why the celebrities are ranked along dimensions other than attractiveness.

 

Cluster 2:

Cluster1_MakeCelebPanels

Cluster 2 appears to consist of celebrities that are more in the nerdy camp, with Yvonne Strahovski and Morena Baccarin, both of whom play roles on shows popular with science fiction fans.  In the bottom of this list we see something of a contrarian streak as well, with downvotes handed out to some of the best known celebrities who rank highly on the list overall.

Cluster 3:

Cluster2_MakeCelebPanels

Cluster 3 is a bit more of a puzzle.  The celebrities tend to be a bit older, and come from a wide variety of backgrounds that are less known for a single role or attribute.  This cluster could be basing their votes more on the celebrity’s degree of uniqueness, which is somewhat in contrast with the bottom ranked celebrities who represent the most common and regularly listed female celebrities on Ranker.

Cluster 4:

Cluster3_MakeCelebPanels

We would also expect a list such as this to be heavily correlated with physical attractiveness, or perhaps for the celebrity’s role as a model.  Cluster 4 is perhaps the best example of this, and likely represents our youngest cluster.  The top ranked women are from the entertainment sector and are known for their looks, whereas in the bottom ranked people are from politics, comedy, or are older and probably less well known to the younger voters.  As we might expect, cluster 3 also has a high proportion of younger voters.

Here is the list of the top and bottom ten for each cluster (note that the order within these lists is not particularly important since the celebrity’s scores will be very close to one another):

TopCelebsPerClusterTable

 

In the end, the adage that we are defined by the company we keep appears to have some merit–and can be detected with machine learning approaches.  Though not a perfect split among the groups, there were trends in each group that drew the people of the cluster together.  This approach can provide a useful tool as we improve the site and improve the content for our visitors.   We are using these approaches to help improve the site and to provide better content to our visitors.

 

–Glenn R. Fox, PhD

 

 

by    in Popular Lists

Tracking Votes to Measure Changing Opinions

A key part of any Ranker list are the votes are associated with each item, counting how often a user has given that item the “thumbs up” or “thumbs down”. These votes measure people’s opinions about politics, movies, celebrities, music, sports, and all of the other issues Ranker lists cover.

A natural question is how the opinions that votes measure relate to external assessments. As an example, we considered the The Most Dangerous Cities in America list. Forbes magazine lists the top 10 as Detroit, St. Louis, Oakland, Memphis, Birmingham, Atlanta, Baltimore, Stockton, Cleveland, and Buffalo.

The graph below show the proportion of up-votes, evolving over time up towards the end of last year, for all of the cities voted on by Ranker users. Eight of the Forbes’ list are included, and are highlighted. They are all in the top half of the worst cities in the list, and Detroit is correctly placed clearly as the overall worst city. Only Stockton and Buffalo, at positions 8 and 10 on the Forbes list, are missing. There is considerable agreement between the expert opinion from Forbes’ analysis, and the voting patterns of Ranker users.

MostDangerousCitiesAmerica

Because Ranker votes are recorded as they happen, they can potentially also track changes in people’s opinions. To test this possibility, we turned to a pop-culture topic that has generated a lot of votes. The Walking Dead is the most watched drama series telecast in basic cable history, with 17.3 million viewers tuning in to watch the season 5 premiere. With such a large fan base of zombie lovers and characters regularly dying left and right, there is a lot of interest in The Walking Dead Season 5 Death Pool list.

The figure below shows the pattern of change in the proportion of up-votes for the characters in this list, and highlights three people. For the first four seasons, Gareth had been one of the main antagonists and survivors on the show. His future as a survivor became unclear in an October 13th episode where Rick vowed to kill Gareth with a machete and Gareth, undeterred, simply laughed at the threat. Two episodes later on October 26th, Rick fulfilled his promise and killed Gareth using the machete While Gareth apparently did not take the threat seriously, the increase in up-votes for Gareth during this time makes it clear many viewers did.

WalkingDeadDeathPool

A second highlighted character, Gabriel, is a priest introduced in the latest season of the October 19th episode. Upon his arrival, Rick has already expressed his distrust in the priest and threatened that, if his own sins ends up hurting his family, it will be Gabriel who has to face the consequences. Since Rick is a man of many sins, the threat seems to be real. Ranker voters agree, as shown by the jump in up-votes around mid-October, coinciding with Gabriel’s arrival on the show.

The votes also sometimes tell us who has a good chance of surviving. Carol Peletier had been a mainstay in the season, but was kidnapped in the October 19th episode and did not appear in the following episode. She briefly appeared again in the subsequent episode, only to be rendered unconscious. Despite the ambiguity surrounding her survival, her proportion of up-votes decreased significantly, perhaps driven by her mention by another character, which provided a sort of “spoiler” hinting at survival.

While these two examples are just suggestive, the enormous number of votes made by Ranker uses, and the variety of topics they cover, makes the possibility of measuring opinions, and detecting and understanding change in opinions, an intriguing one. If there were a list of “Research uses for Ranker data”, we would give this item a clear thumbs up.

  • Emily Liu & Michael Lee
by    in Popular Lists

Will 2015 be the year that better data eclipses bigger data?

Data is a tool, not an end, but understandably, some people are really into their tools. They like to describe how many petabytes zettabytes their data takes up every second picosecond, requiring even more tools that allow them to analyze that data ever faster. It’s very very cool. But just like the engines on those lamborghinis I see idling in Los Angeles traffic on the way to the office, I have to question how truly useful all that engineering is.

Do we really need zettabytes of data to produce the insight that I might, in my weaker moments, click on a link advertising photos of singles in my area or detaling “13+ Things You Shouldn’t Eat in a Restaurant”? [ these are actual headlines served by content recommendation companies that leverage enormous datasets on web behavior] Does Facebook really need all my likes, interests, and friends to know to serve me clickbait or is the single biggest predictor of whether I might generate a click for an advertiser the fact that I have enjoyed clickbait in the past?  If 8% of internet users account for 85% of banner ad clicks, how effective can the plethora of data scientists who work on advertising actually be, over and above a simple cookie that identifies that 8% and removes banner ads for everyone else?

Rather than simply declaring, in rather cliched form, that “big data is dead”, I have a solution: Better Data.  If I want to know what to buy my wife for Christmas, I can analyze everything she has done on the internet for the past 10 years…or I could just ask her.  If I want to know who is going to win the world cup, I could analyze the statistics of every player and team in every situation and create an algorithm that scores their collective talents…or I could just ask people who they think will win.  Small datasets with rich variables that incorporate lots of information intelligently (e.g. stock prices) almost always out-perform complex algorithms performed on low-level datasets.

Evidence for this is found not only in the fact that algorithms cannot reliably beat the stock market (though they can make money by beating slower, dumber algorithms), but that the world’s biggest companies like Google, Facebook, and Baidu are emphasizing “Deep Learning” artificial intelligence as primary initiatives.  Deep learning attempts to encode the patterns hiding in lots of low level data points (e.g. pixel colors) into higher-order variables that human beings find meaningful (e.g. a cat or a smiling friend), effectively creating better smaller datasets.  The excitement over deep learning is an acknowledgment that zettabytes of data yield far less meaningful information about a person than the average human can get from a 15 minute conversation.  Deep learning may someday allow Google to read our email with the same sophistication as a human, but the average toddler still far outpaces the most sophisticated deep learning algorithms. And it still needs good data to be trained on.  It will never be able to take all the videos ever uploaded onto YouTube and predict much variance in the direction of the stock market because the data is not there. If you want to predict the stock market, you need better data on companies.  If you want to predict what a person will buy or better yet, what really motivates them, you need to ask them questions about what motivates them.

How can we create better datasets?  Think less like an engineer and more like someone writing a biography.  Rather that trying ever more technological solutions to squeeze knowledge from a stone, think about what is missing in our understanding of the average person.  If, through some combination of deep learning and data aggregation, I am able to fully understand 1% or 25% or 100% of a person’s online behavior, I still will only understand that part of their world that is revealed through their online behavior.  How can we start to ask people what their most meaningful moments from college were, what annoys them most, or what makes them happiest in their quiet moments?  Dating sites probably have some of the best data around because they ask meaningful questions, even given the relatively low number of people who use those sites as compared to Gmail or Facebook, and the sharpness of the insights that they are able to produce is no accident.  The OK Cupid blog (better data) will always be more interesting than the Facebook data blog (bigger data) until Facebook is able to collect data more meaningful than the generic “like”.

2015 is an exciting time to be working on data.  Tools are more accessible than ever, such that many engineers can find a tutorial and learn to run any algorithm in a weekend.  Data is more ubiquitous and accessible than ever as well. But the world doesn’t need yet another company that takes publicly accessible data and mines it for sentiment, while throwing off stats about how big their data is.  Think like a biographer,  figure out what nobody else is asking and create meaningful data.

 

by    in Popular Lists

Happy Inanimate Objects, Dramatic Animals + Gamer Survival Kits

60+ Everyday Objects That Look Really Happy
What could be nicer than finding a smiley face at the bottom of your coffee mug? How about living in a house that looks super excited to see you every time you come home? This is a simple gallery of pics of everyday things that look like they are happy.

Essential Products For a Gamer Survival Kit
No matter what, gamers gonna game! But for a gamer to play at peak performance levels, there are certain survival essentials for quests, campaigns, and candy crushes. Behold: everything you need to survive an intense gaming sesh.

The Worst Qualities in a Person
Let’s face it: not everyone is perfect. Even the most charming people are guilty of at least a few negative personality traits. But which ones are the worst?

Being a Student Is . . .
Being a student isn’t always as easy as it sounds. Sure, there are the parties, the booze, the moving away from home and living without your parents – but there’s also a whole lot of stress, worry, studying, and exams.

The 36 Most Dramatic Animals on the Internet
Drama isn’t just a human thing. As this list of GIFs proves, the animal kingdom is full of feisty creatures who have a flair for the dramatic.

Pretentious Words You Secretly Don’t Know How to Pronounce
What are some of the most overused, mispronounced words people use to try to sound smarter?

The Internet Remembers Robin Williams
And finally, we lost a beloved comedy legend this month. Robin Williams was the face, voice, and talent of a lot of our childhoods and many people took to social media to share touching memories of him. We’ve rounded up the best tributes and would like to invite fans to vote on their favorite Robin Williams movies. R.I.P. Genie.

That’s it! Stay in touch and we hope you’re having a great month!

Page 1 of 1512345...10...Last »