Collecting and Connecting Millions of Opinions

insights_logo_transparent

Ranker Insights is the Most Precise Data for Entertainment, Personalities, Sports, Brands and More

Ranker is a leading, digital media company that ranks opinions on (almost) everything through our vote-based, user experience. Our rankings don’t just collect opinions, they contextualize them. Through context, Ranker can discern users who prefer an actor’s talent vs. their attractiveness, for example, or fans who like a college for its academics vs. athletics; and the millions upon millions of correlations therein.

Thusly, we created Ranker Insights: Ranker’s first-party analytics platform that optimizes data from users votes, into actionable intelligence with countless applications.

by    in Data

Big Data Shows Movie Fans Love Tom Hanks, Just Not in Sequels

It’s summertime. And when it comes to big-budget movies, that also means it’s sequel time. We’ve already seen remarkable successes like Captain America: Civil War and Finding Dory, and a few flops (at least, based on their allotted budget) like Teenage Mutant Ninja Turtles 2 and Independence Day: Resurgence. This got us at Ranker Insights thinking: what goes into making a successful sequel? The truth is, there are a lot of extenuating circumstances that contribute. The box office success of the original just happens to be one of them. From solid, open-ended plot lines and apparent depth of main characters to preordained fan bases and predictably bankable actors, big data suggests many factors come into play when creating a flourishing movie franchise. However, this much seems certain: you’re probably better off casting anyone but who voters consider the greatest actor of all time.

Allow us to explain. Big data can tell you big things when it comes to making a great film. But if you’re planning on getting the most bang for your buck on your original idea, even the smallest minutia might make a big deal. For instance, let’s take a look at the top 30 of the Best Movie Characters of All Time. Notice anything? Sure, you see all the memorable characters you would expect to see near the top: Forrest Gump, Indiana Jones, James Bond, and Bruce Wayne/Batman are all in the top 10. This makes sense, especially when you consider their names are usually in the title of the movies their characters star in. Look a little closer, and further analyze the films from which these characters came. Of the top 30, 22 of them were strong enough to star in a sequel or trilogy. Now, let’s look at the eight that didn’t return to entertain you once again. What do all these movies have in common? That’s right. They all involve the indisputably lovable Thomas Jeffrey Hanks.

Why is this you ask? Good question. Certainly Toy Story was a smashing success, and went on to create not one – but two – great sequels. Toy Story 2 was even voted 8th on Ranker’s list of Best Movie Sequels. But for obvious reasons, that franchise just featured his voice, not his face. The only sequel in which Tom Hanks participated in and had to actually act, The DaVinci Code, produced far less favorable results. While Angels & Demons still proved to be a box office success, it only took in about 2/3 of the box office its predecessor did. And as for the character Hanks portrayed, Robert Langdon, well, he is nowhere to be found on the Best Movie Characters of All Time list.

It doesn’t seem to be Tom’s directorial choices either, as the Tom Hanks/Steven Spielberg combo are a whopping 975% more likely to be liked by Tom Hanks fans, with the Tom Hanks/Ron Howard team coming in a close second at 809%. And it’s not like these fans are adverse to the idea of sequels either. Voters who like Tom all like their action, adventure, and animated sequels. In fact, Tom fanatics are 549% more likely to enjoy Captain America: The Winter Solider; 258% more likely to have high praise for Back to the Future II; and 396% more likely to be a fan of the previously mentioned Toy Story 2. Heck, the analytics show that voters on the Greatest Actor & Actress in Entertainment History are willing for a sequel of any kind: they’re 38% more likely to vote up the universally agreed upon clunker, Crocodile Dundee in Los Angeles. Maybe Tom Hanks-related sequels were meant not to be seen, but simply heard.

Perhaps it’s just a demographic thing? Nope, as that doesn’t seem to matter either. In fact, Toy Story 2 even drops in the rankings to number 9 among international voters and even further to 10 among female voters. Judging by the data mined from Actors You Would Watch Read The Phone Book, analytics show that Tom Hanks fans are 200% (or more) likely to listen to Robert De Niro, Harrison Ford, Johnny Depp or Liam Neeson go through the names from A to Z, and all four know a thing or two about sequels. However, with Hanks ranking sixth on that same list, we can now confidently deduce that the reason for so few sequels from the actor is probably not his acting itself.

In all likeliness, it’s probably just a content thing. Most of Hanks roles have a historical end, or at the very least, a distinctive one. The stories he stars in just don’t lend themselves to sequels. Voters must agree, as there is nary a Hanks movie to be found on Ranker’s list of Movies That Need Sequels. Saving Private Ryan? Saved. Catch Me If You Can? Caught. Philadelphia? Finished. So don’t hold your breath waiting for Forrest Gumper or Sully 2: Nursing Home Boogaloo, regardless of how well it does upon release in early September. These Hanks vehicles just don’t seem to be in demand, success be damned.

Now, Ranker Insights would never be one to tell you how to create a successful movie franchise, because frankly, that would be a thankless job. But if your job is to create a character that is memorable enough to secure a sequel, big data shows your main character should probably be a Hanks-less one. He’s seems to be the epitome of Mr. One-and-Done.

by    in Data

Using Data To Determine The Best Months Of The Year

Why do people like some months more than others? For many, it is all about the holidays:

“I love the scents of winter! For me, it’s all about the feeling you get when you smell pumpkin spice, cinnamon, nutmeg, gingerbread and spruce.” – Taylor Swift

while for others, it is about avoiding the cold

“A lot of people like snow. I find it to be an unnecessary freezing of water.” – Carl Reiner

and for some more disaffected souls, it is about the specifics

“August used to be a sad month for me. As the days went on, the thought of school starting weighed heavily upon my young frame.“ – Henry Rollins

Presumably all of these preferences and this angst is reflected in Ranker’s Best Months of the Year list. The graphic below provides a visualization of the opinions of ranker users. Each row is a different person, and their (sometimes incomplete) ranking of the months is shown from best-to-worst from left-to-right. The months are color coded by the four seasons: Spring has the hues of green, summer is yellow, fall has the rustic earth hues of brown, and winter is blue.

BestMonthsOriginal

The patchwork quilt of colors and hues makes it is clear that different people have different opinions. We wanted to understand the structure of these individual differences, using cognitive data analysis.

To do this, we used a simple model of how people produce rankings—known as a Thurstonian model, going back to the 1920s in psychology—that we have previously applied successfully to Ranker data. Rather than assuming everybody’s rankings were based on a shared opinion, we allowed this version of the model to have groups or clusters of people, and for each group to have their own preferences for the months. We didn’t want to pre-determine the number of groups, and so we allowed our model to make this inference directly from the data. Our modeling approach thus involves two sorts of interacting uncertainties: about how many groups there are, and about which people belong to which group. Bayesian statistical methods are well suited to handling these sorts of uncertainties.

For fans of Bayesian cognitive graphical models — we know you’re out there — the final model we used is shown in the figure below. For non-fans of Bayesian cognitive graphical models — we KNOW you’re out there — there are three important parts. The variable gamma at the top corresponds to how many groups there are, the variables z to the side correspond to which of these groups each individual belongs, and all of this is inferred from the rankings people gave, represented by the variables at the bottom.

GraphicalModel

The figure below shows the first key insight from the model. It shows the probability that there are 1, 2, …, 17 groups, ranging from everybody having the same opinion about the best months, to everyone having their own unique opinion. There is uncertainty about how many groups the rankings reveal, but the most likely answer is that there are four.

Gamma

Assuming there are four groups, the figure below organizes the ranking data  by grouping together the people most likely to belong to each group. Group 1 shows a preference for late summer and early fall, and hates cold weather. Group 2 shows a preference for the holidays. They like fall and Christmas time and despise hot weather. Group 3 loves the summertime and hates the winter. We had a look at where these people were from, and it probably comes as no surprise they’re all from the north-east of the US. The last group, a bit like Henry Rollins, stands out as a consensus of one.

BestMonths

This analysis shows how cognitive models with individual differences can help understand opinion groupings, and deal with difficult questions like how many groups exist. One especially interesting feature of the Best Months list is that at least one of the groups is defined more by what comes at the bottom of their lists than the top. People in group 1 don’t agree very precisely on which months they like, but they all agree they don’t like winter months. This shows that it is not just the top few items on a Ranker list that carry useful information: what comes at the bottom can be just as informative. Both what you love and hate matters.

“When I was young, I loved summer and hated winter. When I got older I loved winter and hated summer. Now that I’m even older, and wiser, I hate both summer and winter.” – Jarod Kintz

 

Crystal Velasquez and Michael Lee

by    in Data

According to Big Data, Millennials Don’t Care Much About America’s Pastime

Does Respect for the Past Bode Well for Baseball’s Future?
Breaking Down the Big Data of the Greatest Baseball Players of All Time List

How much does America’s Pastime’s current popularity factor into the rankings of who are the greatest baseball players of all time? And, what factors beyond simple player statistics come into play when one makes their own list? Well, the resulting Ranker data speaks – or rather, cheers – volumes when it comes to players of past generations. While nostalgia might have some effect on the voting, is the lack of current players represented on the list a sign that voters have an unwavering respect for the legends of the past, or is our national pastime becoming just that? Past its time.

Ranker asked participants upfront to list the best baseball players only by their on-field accomplishments. Nearly 115,000 votes from almost 7,500 participants have chimed in, and it’s no surprise who was the consensus top pick. With a lifetime batting average of .342 and #1 in all-time OPS (on-base plus slugging percentage), the voters made their choice clear: Babe Ruth. Anyone who has had a casual conversation around this topic knows the Great Bambino is always one of the first names mentioned when it comes to ranking the greatest players of all time, and he’s usually a favorite across all ages.

Whether you are an astute baseball statistical historian, been sitting in your team’s bleachers since you were a child, or are one of nearly 60 million people who play fantasy sports, you probably have at least a passing opinion about who is the best of all time. According to Ranker’s data, your top 5 has some combination of the Babe, Stan Musial, Ted Williams, Mickey Mantle, and Willie Mays or Hank Aaron, the latter being the latest retiree of the group, which was all the way back in 1976. Once you break down the demographics even a little bit further, that’s when things start to get interesting.

Gone, but not forgotten.

The most glaring data at first glance is there’s nary an active player on the all-time list’s starting roster. In fact, it isn’t until you get down to #44 where you’ll find someone who is still an active player in Ichiro Suzuki. For the record, Ichiro is ranked only #76 on Ranker’s Top CURRENT Baseball Players List. Does this imply that voters know and respect their history? Or could it be that the current crop of baseball players aren’t well represented because they aren’t being watched? Television ratings data suggests that a steady decline in viewership over the years might play a factor in the voting. Major League Baseball as an entity is as strong as ever (just have a look at some of the salaries they’re handing out), people aren’t as interested in the game as they used to be.

How much does a voter’s age factor into the results? A deeper dive into the big data analytics suggests quite a bit. Baby Boomers are 184% more likely to have Mel Ott on their list than any other age group because, you know, they’ve actually seen him play. If you’re between the ages of 30-49, you are a whopping 305% more likely to have Sadaharu Oh of the Yomuiri Giants on your list (which suggests that internationally, fans aren’t only passionate about their soccer). If you’re a Millennial, you must enjoy a good quote. They are 248% and 234% more likely to vote for the non sequitur machine Yogi Berra and the forever quirky Rickey Henderson, respectively. Ranker doesn’t have analytics to suggest that voters in the 30-49 age demographic were all mustache enthusiasts, they were 281% more likely to include Rollie Fingers on their list.

However, those stats focus on specific characters in the game that a certain demographic is drawn to. Where are the Mike Trouts (#1 with people under the age of 29 on the Top Current Baseball Players List), Clayton Kershaws (#2), or players who have brand recognition among fans like Troy Tulowitzki (#20)? All of them, gaudy numbers and all, failed to crack the top 100. In fact, the only other active players on the list (besides the aging Ichiro) were the also-aging Albert Pujols (#48) and Miguel Cabrera (#90). Maybe, there’s just not a large (or long) enough sample size to include current players on this list of all-time greats.

Is today’s game yesterday’s news?

Perhaps voters are just into something else. When you look at the voting demographics, Young voters are the least represented participants, with the majority being aged 30 and up. But with nearly 23% of the votes, you would think at least a couple more current players would sneak in, wouldn’t you? Perhaps baseball just doesn’t resonate with this new generation. They’re gravitating toward playing lacrosse, on their video game consoles, or even fiddling with their smartphones. As a recent article in the Wall Street Journal even suggests, younger people are just tuning out.

So who’s got next?

The times may have changed, but according to Ranker data, the best baseball players really haven’t. From Cobb in the dead-ball era and Satchel Paige of the Negro Leagues to various International Leagues and beyond, the voters know that the greatest all-time baseball was played beyond just the Major Leagues here in the States. Records were made to be broken, but which of the best baseball players of today do you think will eventually break into the all-time list? Only time (and the fickle, under the age of 30 voters) will tell. So if you should happen to ask a Millennial if they saw the game last night, just don’t expect them to inquire who won. You’ll probably just get a “who cares?”

by    in Popular Lists

Why do Ranker voters think Ellen should be president?

Yesterday, Ellen talked about being voted #1 on our list of Celebrities who should run for President.

What is it that makes a celebrity “president”-worthy?  Because Ranker polls about each person along dozens of dimensions (e.g. cool vs. hot vs. good actor vs. trustworthy vs. ?), we can see how ratings on other lists relate to being voted as someone who should run for president.  For example, below we can see that being seen as “cool” is only weakly related to being seen as presidential, with actors like Tom Hanks and Clint Eastwood scoring as relatively cool, but not relatively presidential.

CoolVsPresident

Being good at your job seems to relate moderately to being seen as presidential.  For example, below you can see how being seen as a good actor positively relates to being seen as presidential, with people like Meryl Streep, Leonardo Di Caprio, and Morgan Freeman scoring well on both fronts.

GoodActorVsPresident

It also relates well to likability.  Below you can see how the men who people want to have a beer with, like Johnny Depp, Morgan Freeman, and Di Caprio, also tend to be people they rate well as potential presidential candidates.

BeerVsPresident

It seems to relate best to trust as people like Ellen, Meryl Streep, and Morgan Freeman seem to be rated as both Trustworthy and as someone who should run for President.  Notice how the items below form a fairly straight line going up and to the right.

TrustVsPresident

In all, looking at the relationship between Ranker lists yields comparable results to what political scientists find drives evaluations of presidential candidates.  People want a president who is competent, likable, and trustworthy.  And clearly Ellen fits all three buckets as she ranks as one of the best comedians of all-time, someone people would want to have a beer with, and as trustworthy.  Hence, Ranker users vote her as the #1 Celebrity Who Should Run for President.

Ravi Iyer

by    in Popular Lists

Ranker Users Predict Final Four Teams Accurately Based on Limited Bias

In 2015, Ranker’s voters predicted seven teams in the NCAA tournament’s Elite Eight. With the field for the Sweet 16 now set, we can see how well our rankings can predict how far a particular team will in this year’s tournament. This has been a historically tumultuous season of college basketball. Top-10 teams lost regularly, upsets were commonplace, and no teams were safe.

We can use Ranker’s data to see which team is having a year that matches their historical reputation as a powerhouse, and vice versa.  Ranker visitors drew a clear line around North Carolina, Michigan State, Kansas and Villanova as favorites to make it into the Final Four.  Kentucky is notable because it ranks highest in the overall best college programs poll, but is not predicted by our voters to end up in the Final Four.  Villanova, which is not ranked among the top historical teams, is the main outlier of teams that aren’t as strong in the same way that Kentucky and UNC are, yet is expected to have a good tournament showing.

The rankings provide an insight into how our voting data is based on the current season instead of a bias towards teams based on their longstanding reputations.

 

Here are our results from the 2015 tournament:

Screen Shot 2016-03-08 at 12.25.50 PM

 

Here are our results for this year’s tournament:

Screen Shot 2016-03-08 at 10.43.38 AM

 

 

by    in Popular Lists

Duke and Kentucky Among Teams with the Most Annoying Fans

With March Madness tipping off, we turn to Ranker’s voters to learn more about college basketball and what to expect in this year’s tournament!

Which college basketball fan base wears their pride the best way?  We all know the traditional powers in college basketball, but sometimes their gloating can be a bit much.  In two separate lists, Ranker visitors ranked which college basketball team was the best, and which had the most annoying fans.  When we combine these two lists, we can see which team is best respected for its prowess on the court and how this relates to how annoying its fans are to the rest of the world.  As it happens, powerhouse bluebloods Duke and Kentucky are ranked among the top teams for both being historically successful, and for having annoying fans.  The most successful team with only moderately annoying fans is North Carolina.  The least annoying but still respected team fan base was Villanova.  Ohio State and Florida stand out for having annoying fans, but not particularly respected as programs overall.

 

Screen Shot 2016-03-01 at 1.08.08 PM

 

by    in Popular Lists

Combining Best and Worst Lists to find Polarizing TV Shows

Ranker lists are expressions of people’s opinions, and it is possible for people to have opposite views. The same movie, television show, song, or celebrity can be loved or hated by different groups of people. (If this is not immediately obvious, think about Donald Trump for a moment). Social psychology has long been interested in differences of opinion, and has gathered all sorts of evidence that people will take more extreme views in an argument (attitude polarization), that they will focus on evidence that reinforces what they already believe (confirmation bias), and that they tend to judge new items and experiences based on their previous knowledge (apperception).

Ranker can provide evidence of polarization, since people’s ranks can express different opinions about the same items. This polarization can be especially clear when looking at “best” and “worst” lists on the same general topic. At the moment, it is easy to imagine Donald Trump at the top of both a “Best Presidential Candidates” and a “Worst Presidential Candidates” list. About the only way to explain this pattern of opinions is to identify Trump as a polarizing person. He doesn’t lead to one opinion or attitude. He polarizes people into “lovers” and “hater”.

Previously, we have developed cognitive models to analyze Ranker lists as diverse as the Soccer World Cup, movie box office takings, and how people feel about pizza toppings. None of these models, however, allowed for polarization. The assumption has always been that each item was perceived in a similar way by everybody. So, we extended our cognitive modeling approach to allow for polarizing items, perceived by some users with a “positive spin” and by others with a “negative spin”.

Not wanting to give Trump any more publicity, we decided to test the new model by looking at people’s opinions of recent TV shows. The two lists we looked at were The Best New TV Series of 2015 and The Most Disappointing New TV Shows of 2015. Together these lists involve 22 users — 17 in the best list and 5 from the worst list — ranking a total of 67 shows, with 14 of shows appearing on both best and worst lists. Some of the lists had as few as 3 shows, while others had as many as 27, with an average of about 9 shows per list.

Our new model assumes each TV show is represented in one of two ways. One possibility is that everybody has the same opinion, and the show is not polarizing. This means if a TV show is good, for example, people put it high in their best list, and low in the worst list, or doesn’t list it on their worst list at all. On the other hand, if a TV show is bad people put it high in their bad list and low in in their good list, or don’t mention it in their bad list at all. The new possibility in our model is that a show is polarizing, and so some people believe it is good while others believe it is bad. These polarizing shows need two separate representations: one for the “lovers”, and one for the “haters”.

TVShowBlog

The model we created determined which shows were polarizing and which were not, and how each should be represented on a scale from best to worst. The results are summarized in the graph. The shows are listed from best at the top to worst at the bottom. If a show is not polarizing, it is listed once in gray. If a show is polarizing, it is listed twice: once in green in for in its positive form, and once in red for in its negative form. The graph also summarizes the Ranker data that lead to these conclusions. The green circles indicate when a show was included in the “best” list, starting from rank 1 on the left, to lower ranks moving to the right. The larger the area of the circle, the more people ranked the show in that position. The red crosses indicate when a show was included in the “worst” list, again starting from rank 1 on the left, and again with size of the cross indicating how often it was ranked in that position.

It is clear from the figure that shows identified as polarizing — Better Call Saul, Empire, Ballers, Backstrom, and so on — generally were included in high positions on both the “best” and “worst” lists. Other shows are not polarizing: Last Man on Earth is consistently highly rated, and Schitt’s Creek seems to review itself with its name. A good question for the producers, marketers, and consumers of these TV shows is why some are polarizing. Better Call Saul, which is perhaps the most polarizing show in our results, is a nice example. It has a “lover” representation at the top of the overall list, and a “hater” representation near the bottom. One possibility is that the polarization arises is because Better Call Saul was created as a spin-off prequel to Breaking Bad, and many people would argue that Breaking Bad is one of the greatest television series of all time (and we’d agree). We guess that the people who had a negative opinion of Better Call Saul were die-hard fans of Breaking Bad, and found it didn’t match their lofty expectations. On the other hand, people with positive opinions of Better Call Saul probably evaluated it largely independent of Breaking Bad, as a good new crime television series.

Whatever the causes of polarization, it seems clear that Ranker data provide useful measures, and we think our modeling approach can lead to deeper insights. Finding what is polarizing, and identifying the “lovers” and “haters” should apply not just to TV shows, but to rappers, directors, songs, and everything else where not everybody feels the same way about everything. There is lots for us to do. Or, as Donald Trump has it: “If you’re going to be thinking, you may as well think big.”

– Crystal Velazquez and Michael Lee

by    in Data, Data Science, Popular Lists

Applying Machine Learning to the Diversity within our Worst Presidents List

Ranker visitors come from a diverse array of backgrounds, perspectives and opinions.  The diversity of the visitors, however, is often lost when we look at the overall rankings of the lists, due to the fact that the rankings reflect a raw average of all the votes on a given item–regardless of how voters behave on multiple other items.  It would be useful then, to figure out more about how users are voting across a range of items, and to recreate some of the diversity inherent in how people vote on the lists.

Take for instance, one of our most popular lists: Ranking the Worst U.S. Presidents, which has been voted on by over 60,000 people, and is comprised of over a half a million votes.

In this partisan age, it is easy to imagine that such a list would create some discord. So when we look at the average voting behavior of all the voters, the list itself has some inconsistencies.  For instance, the five worst-rated presidents alternate along party lines–which is unlikely to represent a historically accurate account of which presidents are actually the worst.  The result is a list that represents our partisan opinions about our nation’s presidents:

 

ListScreenShot

 

The list itself provides an interesting glimpse of what happens when two parties collide in voting for the worst presidents, but we are missing interesting data that can inform us about how diverse our visitors are.  So how can we reconstruct the diverse groups of voters on the list such that we can see how clusters of voters might be ranking the list?

To solve this, we turn to a common machine learning technique referred to as “k-means clustering.” K-means clustering takes the voting data for each user, summarizes it into a result, and then finds other users with similar voting patterns.  The k-means algorithm is not given any information whatsoever from me as the data scientist, and has no real idea what the data mean at all.  It is just looking at each Ranker visitor’s votes and looking for people who vote similarly, then clustering the patterns according to the data itself.  K-means can be done to parse as many clusters of data as you like, and there are ways to determine how many clusters should be used.  Once the clusters are drawn, I re-rank the presidents for each cluster using Ranker’s algorithm, and the we can see how different clusters ranked the presidents.

As it happens, there are some differences in how clusters of Ranker visitors voted on the list.  In a two-cluster analysis, we find two groups of people with almost completely opposite voting behavior.

(*Note that since this is a list of voting on the worst president, the rankings are not asking voters to rank the presidents from best to worst, it is more a ranking of how much worse each president is compared to the others)

The k-means analysis found one cluster that appears to think Republican presidents are worst:

ClusterOneB

Here is the other cluster, with opposite voting behavior:

ClusterTwoB

In this two-cluster analysis, the shape of the data is pretty clear, and fits our preconceived picture of how partisan politics might be voting on the list.  But there is a bias toward recent presidents, and the lists do not mimic academic lists and polls ranking the worst presidents.

To explore the data further, I used a five cluster analysis–in other words, looking for five different types of voters in the data.

Here is what the five cluster analysis returned:

FiveClusterRankings

The results show a little more diversity in how the clusters ranked the presidents.  Again, we see some clusters that are more or less voting along party lines based on recent presidents (Clusters 5 and 4).  Cluster 1 and 3 also are interesting in that the algorithm also seems to be picking up clusters of visitors who are voting for people that have not been president (Hillary Clinton, Ben Carson), and thankfully were never president (Adolf Hitler).  Cluster 2 and 3 are most interesting to me however, as they seem to show a greater resemblance to the academic lists of worst presidents, (for reference, see wikipedia’s rankings of presidents) but the clusters tend toward a more historical bent on how we think of these presidents–I think of this as a more informed partisan-ship.

By understanding the diverse sets of users that make up our crowdranked lists, we are able to improve our overall rankings, and also provide more nuanced understanding how different group opinions compare, beyond the demographic groups we currently expose on our Ultimate Lists.  Such analyses help us determine outliers and agenda pushers in the voting patterns, as well as allowing us to rebalance our sample to make lists that more closely resemble a national average.

  • Glenn Fox

 

 

by    in Data Science, Popular Lists, Rankings

In Good Company: Varieties of Women we would like to Drink With

MainImagesvg

They say you’re defined by the company you keep.  But how are you defined by the company you want to keep?

The list “Famous Women You’d Want to Have a Beer With”  provides an interesting way to examine this idea.  In other words, how people vote on this list can define something about what kind of person is doing the voting.

We can think of people as having many traits, or dimensions.  The traits and dimensions that are most important to the voters will be given higher rankings.  For instance, some people may rank the list thinking about the trait of how funny the person is, so may be more inclined to rate comedians higher than drama actresses.  Others may vote just on attractiveness, or based on singing talent, etc…  It may be the case that some people rank comedians and singers in a certain way, whereas others would only spend time with models and actresses.  By examining how people rank the various celebrities along these dimensions, we can learn something about the people doing the voting.

The rankings on the site, however, are based on the sum of all of the voters’ behavior on the list, so the final rankings do not tell us about how certain types of people are voting on the list.  While we could manually go through the list to sort the celebrities according to their traits, i.e. put comedians with comedians, singers with singers,  we would risk using our own biases to put voters into categories where they do not naturally belong.  It would be much better to let the voter’s own voting decide how the celebrities should be clustered.  To do this, we can use some fancy-math techniques from machine learning, called clustering algorithms, to let a computer examine the voting patterns and then tell us which patterns are similar between all the voters.   In other words, we use the algorithm to find patterns in the voting data, to then put similar patterns together into groups of voters, and then examine how the different groups of voters ranked the celebrities.  How each group ranked the celebrities tells us something about the group, and about the type of people they would like to keep them company.

As it happens, using this approach actually finds unique clusters, or groups, in the voting data, and we can then guess for ourselves how the voters from each group can be defined based on the company they wish to keep.

Here are the results:

Cluster 1:

Cluster4_MakeCelebPanels

Cluster 1 includes females known to be funny, and includes established comedians like Carol Burnett and Ellen DeGeneres. What is interesting is that Emma Stone and Jennifer Lawrence are also included, who are also highly ranked on lists based on physical attractiveness, they also have a reputation for being funny.  The clustering algorithm is showing us that they are often categorized alongside other funny females as well.  Among the clusters, this cluster has the highest proportion of female voters, which may explain why the celebrities are ranked along dimensions other than attractiveness.

 

Cluster 2:

Cluster1_MakeCelebPanels

Cluster 2 appears to consist of celebrities that are more in the nerdy camp, with Yvonne Strahovski and Morena Baccarin, both of whom play roles on shows popular with science fiction fans.  In the bottom of this list we see something of a contrarian streak as well, with downvotes handed out to some of the best known celebrities who rank highly on the list overall.

Cluster 3:

Cluster2_MakeCelebPanels

Cluster 3 is a bit more of a puzzle.  The celebrities tend to be a bit older, and come from a wide variety of backgrounds that are less known for a single role or attribute.  This cluster could be basing their votes more on the celebrity’s degree of uniqueness, which is somewhat in contrast with the bottom ranked celebrities who represent the most common and regularly listed female celebrities on Ranker.

Cluster 4:

Cluster3_MakeCelebPanels

We would also expect a list such as this to be heavily correlated with physical attractiveness, or perhaps for the celebrity’s role as a model.  Cluster 4 is perhaps the best example of this, and likely represents our youngest cluster.  The top ranked women are from the entertainment sector and are known for their looks, whereas in the bottom ranked people are from politics, comedy, or are older and probably less well known to the younger voters.  As we might expect, cluster 3 also has a high proportion of younger voters.

Here is the list of the top and bottom ten for each cluster (note that the order within these lists is not particularly important since the celebrity’s scores will be very close to one another):

TopCelebsPerClusterTable

 

In the end, the adage that we are defined by the company we keep appears to have some merit–and can be detected with machine learning approaches.  Though not a perfect split among the groups, there were trends in each group that drew the people of the cluster together.  This approach can provide a useful tool as we improve the site and improve the content for our visitors.   We are using these approaches to help improve the site and to provide better content to our visitors.

 

–Glenn R. Fox, PhD

 

 

Page 1 of 2512345...1020...Last »