by    in Data

Baby Bomb – Here’s How We Knew Bridget Jones’s Baby Would Tank

Doc is going to be honest here. He was probably never going to buy a ticket for Bridget Jones’s Baby… mostly because Doc believes in restricting oneself to just an apostrophe when a possessive word ends in “s.” But also because the travails of a winsome Anglo-dumpling with a journaling fixation never held much personal appeal.

But movies that Doc doesn’t personally care for make bank all the time, and clearly there were plenty in Hollywood (or at least at Universal Pictures) who were convinced that the franchise’s devoted fanbase would turn out for another spin on the Bridget-go-round. And why not? Over the past couple of years, Sequels That No One Asked For actually have been a pretty safe bet, especially the ones targeting women over 25. My Big Fat Greek Wedding 2 wasn’t the surprise smash of the original, but it more than made its budget back, grossing a respectable $60 million in the U.S. And last year’s Second Best Exotic Marigold Hotel took in over $80 million worldwide, with a little over a third of that total coming from the U.S. With this summer’s sleeper hit Bad Moms proving the strength of the women-over-25 market, and credible critical response, most experts were looking at Bridget Jones’s Baby opening at $15 million, if not higher.

bluegraph

Of course, the gang here at Ranker are not “most experts.” And accordingly, Doc can say that we had a pretty strong idea that Bridget Jones’s Baby was due for a troubled birth and a sickly, blighted existence on this earth. How’d we know?

Easy. We pulled up Ranker Insights, and dug into the numbers on Bridget Jones’s Diary, the first and best-regarded of Bridget’s misadventures. After all, the fanbase for Bridget Jones’s Diary seems like an obvious—really, the obvious—group for the movie to market to. And we learned all sorts of interesting things, like that it’s the 66th best rainy-day movie, and that Bridget’s stateside popularity is strongest in the southeast, then wanes as you move north and west across the country.

And then we pulled up the list of other films that Bridget Jones fans were most likely to voice their approval of. The first on the list, unsurprisingly, is Bridget Jones: The Edge of Reason, the (widely derided) first sequel to Diary. But how about those next six titles? See if you can spot any pattern…

  • Love Actually
  • Elizabeth
  • About a Boy
  • Notting Hill
  • Sense and Sensibility
  • Four Weddings and a Funeral

moviecollage-1

You’re a smart cookie—you see where Doc is going with this, yes? No fewer than five of those six movies feature the harried, boyish stammerings of one Hugh John Mungo Grant. (And in related news: Mungo? MUNGO? Doc swears he isn’t making this stuff up.) Yes, Love Actually additionally features Grant’s Bridget Jones co-star Colin Firth, which probably accounts for its placement at #2 on the list after Edge of Reason. But otherwise, the message is clear as day: Above all others, Bridget Jones fans love, love, love them some Hugh Grant.

mungo80

This would be just peachy, except for the tiny, easily-overlooked detail that Hugh Grant decided he wanted no part of Bridget Jones’s Baby, and isn’t in the movie.   Even if you’ve just seen the film’s traditional three-shot poster, you know that the role of “handsome douche” previously filled by Grant is this time assayed by Patrick Dempsey (nee McDreamy). Now Bridget Jones fans don’t seem to have anything especially against Dempsey. On the list of TV shows most liked by Bridget Jones fans, Grey’s Anatomy ranks #9. (It’s still behind Pinky & the Brain and Golden Girls, so go figure.) But there’s no comparison between their mild affection for Dempsey and their deep and abiding passion for Hugh Grant. Their feelings for Grant’s co-stars Renee Zellweger and Colin Firth similarly pale by comparison. After Edge of Reason, the top Zellweger film on the list is Chicago, at #19. Zellweger’s breakthrough film, Jerry Maguire, sits at #532.

Wouldn’t you think that if the Bridget Jones fanbase was really devoted to Renee Zellweger, they’d be more inclined to like Jerry Maguire than, say, The Mighty Ducks or American History X? But no. Apparently, fans of Bridget Jones would rather watch Ed Norton curb-stomp a dude than see Renee Zellweger “complete” Tom Cruise. Good stuff to bear in mind when you’re planning your next at-home double feature.

graphwithfaces-1

And so there was zero astonishment around Ranker HQ when Bridget Jones’s Baby didn’t even crack $9 million in its opening weekend. Doc takes no joy in being right about this stuff. He wants all the movies to do well, what with a rising tide lifting all boats and everything. But when you blow it this big, and this obviously, you deserve to get called on it.

So for future reference, trying to sustain a movie franchise after shedding its fans’ favorite character/actor is a lousy idea. And that’s the only truth Doc has for you this week, baby.

by    in Data

Big Data Shows Movie Fans Love Tom Hanks, Just Not in Sequels

It’s summertime. And when it comes to big-budget movies, that also means it’s sequel time. We’ve already seen remarkable successes like Captain America: Civil War and Finding Dory, and a few flops (at least, based on their allotted budget) like Teenage Mutant Ninja Turtles 2 and Independence Day: Resurgence. This got us at Ranker Insights thinking: what goes into making a successful sequel? The truth is, there are a lot of extenuating circumstances that contribute. The box office success of the original just happens to be one of them. From solid, open-ended plot lines and apparent depth of main characters to preordained fan bases and predictably bankable actors, big data suggests many factors come into play when creating a flourishing movie franchise. However, this much seems certain: you’re probably better off casting anyone but who voters consider the greatest actor of all time.

Allow us to explain. Big data can tell you big things when it comes to making a great film. But if you’re planning on getting the most bang for your buck on your original idea, even the smallest minutia might make a big deal. For instance, let’s take a look at the top 30 of the Best Movie Characters of All Time. Notice anything? Sure, you see all the memorable characters you would expect to see near the top: Forrest Gump, Indiana Jones, James Bond, and Bruce Wayne/Batman are all in the top 10. This makes sense, especially when you consider their names are usually in the title of the movies their characters star in. Look a little closer, and further analyze the films from which these characters came. Of the top 30, 22 of them were strong enough to star in a sequel or trilogy. Now, let’s look at the eight that didn’t return to entertain you once again. What do all these movies have in common? That’s right. They all involve the indisputably lovable Thomas Jeffrey Hanks.

Why is this you ask? Good question. Certainly Toy Story was a smashing success, and went on to create not one – but two – great sequels. Toy Story 2 was even voted 8th on Ranker’s list of Best Movie Sequels. But for obvious reasons, that franchise just featured his voice, not his face. The only sequel in which Tom Hanks participated in and had to actually act, The DaVinci Code, produced far less favorable results. While Angels & Demons still proved to be a box office success, it only took in about 2/3 of the box office its predecessor did. And as for the character Hanks portrayed, Robert Langdon, well, he is nowhere to be found on the Best Movie Characters of All Time list.

It doesn’t seem to be Tom’s directorial choices either, as the Tom Hanks/Steven Spielberg combo are a whopping 975% more likely to be liked by Tom Hanks fans, with the Tom Hanks/Ron Howard team coming in a close second at 809%. And it’s not like these fans are adverse to the idea of sequels either. Voters who like Tom all like their action, adventure, and animated sequels. In fact, Tom fanatics are 549% more likely to enjoy Captain America: The Winter Solider; 258% more likely to have high praise for Back to the Future II; and 396% more likely to be a fan of the previously mentioned Toy Story 2. Heck, the analytics show that voters on the Greatest Actor & Actress in Entertainment History are willing for a sequel of any kind: they’re 38% more likely to vote up the universally agreed upon clunker, Crocodile Dundee in Los Angeles. Maybe Tom Hanks-related sequels were meant not to be seen, but simply heard.

Perhaps it’s just a demographic thing? Nope, as that doesn’t seem to matter either. In fact, Toy Story 2 even drops in the rankings to number 9 among international voters and even further to 10 among female voters. Judging by the data mined from Actors You Would Watch Read The Phone Book, analytics show that Tom Hanks fans are 200% (or more) likely to listen to Robert De Niro, Harrison Ford, Johnny Depp or Liam Neeson go through the names from A to Z, and all four know a thing or two about sequels. However, with Hanks ranking sixth on that same list, we can now confidently deduce that the reason for so few sequels from the actor is probably not his acting itself.

In all likeliness, it’s probably just a content thing. Most of Hanks roles have a historical end, or at the very least, a distinctive one. The stories he stars in just don’t lend themselves to sequels. Voters must agree, as there is nary a Hanks movie to be found on Ranker’s list of Movies That Need Sequels. Saving Private Ryan? Saved. Catch Me If You Can? Caught. Philadelphia? Finished. So don’t hold your breath waiting for Forrest Gumper or Sully 2: Nursing Home Boogaloo, regardless of how well it does upon release in early September. These Hanks vehicles just don’t seem to be in demand, success be damned.

Now, Ranker Insights would never be one to tell you how to create a successful movie franchise, because frankly, that would be a thankless job. But if your job is to create a character that is memorable enough to secure a sequel, big data shows your main character should probably be a Hanks-less one. He’s seems to be the epitome of Mr. One-and-Done.

by    in Data

Using Data To Determine The Best Months Of The Year

Why do people like some months more than others? For many, it is all about the holidays:

“I love the scents of winter! For me, it’s all about the feeling you get when you smell pumpkin spice, cinnamon, nutmeg, gingerbread and spruce.” – Taylor Swift

while for others, it is about avoiding the cold

“A lot of people like snow. I find it to be an unnecessary freezing of water.” – Carl Reiner

and for some more disaffected souls, it is about the specifics

“August used to be a sad month for me. As the days went on, the thought of school starting weighed heavily upon my young frame.“ – Henry Rollins

Presumably all of these preferences and this angst is reflected in Ranker’s Best Months of the Year list. The graphic below provides a visualization of the opinions of ranker users. Each row is a different person, and their (sometimes incomplete) ranking of the months is shown from best-to-worst from left-to-right. The months are color coded by the four seasons: Spring has the hues of green, summer is yellow, fall has the rustic earth hues of brown, and winter is blue.

BestMonthsOriginal

The patchwork quilt of colors and hues makes it is clear that different people have different opinions. We wanted to understand the structure of these individual differences, using cognitive data analysis.

To do this, we used a simple model of how people produce rankings—known as a Thurstonian model, going back to the 1920s in psychology—that we have previously applied successfully to Ranker data. Rather than assuming everybody’s rankings were based on a shared opinion, we allowed this version of the model to have groups or clusters of people, and for each group to have their own preferences for the months. We didn’t want to pre-determine the number of groups, and so we allowed our model to make this inference directly from the data. Our modeling approach thus involves two sorts of interacting uncertainties: about how many groups there are, and about which people belong to which group. Bayesian statistical methods are well suited to handling these sorts of uncertainties.

For fans of Bayesian cognitive graphical models — we know you’re out there — the final model we used is shown in the figure below. For non-fans of Bayesian cognitive graphical models — we KNOW you’re out there — there are three important parts. The variable gamma at the top corresponds to how many groups there are, the variables z to the side correspond to which of these groups each individual belongs, and all of this is inferred from the rankings people gave, represented by the variables at the bottom.

GraphicalModel

The figure below shows the first key insight from the model. It shows the probability that there are 1, 2, …, 17 groups, ranging from everybody having the same opinion about the best months, to everyone having their own unique opinion. There is uncertainty about how many groups the rankings reveal, but the most likely answer is that there are four.

Gamma

Assuming there are four groups, the figure below organizes the ranking data  by grouping together the people most likely to belong to each group. Group 1 shows a preference for late summer and early fall, and hates cold weather. Group 2 shows a preference for the holidays. They like fall and Christmas time and despise hot weather. Group 3 loves the summertime and hates the winter. We had a look at where these people were from, and it probably comes as no surprise they’re all from the north-east of the US. The last group, a bit like Henry Rollins, stands out as a consensus of one.

BestMonths

This analysis shows how cognitive models with individual differences can help understand opinion groupings, and deal with difficult questions like how many groups exist. One especially interesting feature of the Best Months list is that at least one of the groups is defined more by what comes at the bottom of their lists than the top. People in group 1 don’t agree very precisely on which months they like, but they all agree they don’t like winter months. This shows that it is not just the top few items on a Ranker list that carry useful information: what comes at the bottom can be just as informative. Both what you love and hate matters.

“When I was young, I loved summer and hated winter. When I got older I loved winter and hated summer. Now that I’m even older, and wiser, I hate both summer and winter.” – Jarod Kintz

 

Crystal Velasquez and Michael Lee

by    in Data

According to Big Data, Millennials Don’t Care Much About America’s Pastime

Does Respect for the Past Bode Well for Baseball’s Future?
Breaking Down the Big Data of the Greatest Baseball Players of All Time List

How much does America’s Pastime’s current popularity factor into the rankings of who are the greatest baseball players of all time? And, what factors beyond simple player statistics come into play when one makes their own list? Well, the resulting Ranker data speaks – or rather, cheers – volumes when it comes to players of past generations. While nostalgia might have some effect on the voting, is the lack of current players represented on the list a sign that voters have an unwavering respect for the legends of the past, or is our national pastime becoming just that? Past its time.

Ranker asked participants upfront to list the best baseball players only by their on-field accomplishments. Nearly 115,000 votes from almost 7,500 participants have chimed in, and it’s no surprise who was the consensus top pick. With a lifetime batting average of .342 and #1 in all-time OPS (on-base plus slugging percentage), the voters made their choice clear: Babe Ruth. Anyone who has had a casual conversation around this topic knows the Great Bambino is always one of the first names mentioned when it comes to ranking the greatest players of all time, and he’s usually a favorite across all ages.

Whether you are an astute baseball statistical historian, been sitting in your team’s bleachers since you were a child, or are one of nearly 60 million people who play fantasy sports, you probably have at least a passing opinion about who is the best of all time. According to Ranker’s data, your top 5 has some combination of the Babe, Stan Musial, Ted Williams, Mickey Mantle, and Willie Mays or Hank Aaron, the latter being the latest retiree of the group, which was all the way back in 1976. Once you break down the demographics even a little bit further, that’s when things start to get interesting.

Gone, but not forgotten.

The most glaring data at first glance is there’s nary an active player on the all-time list’s starting roster. In fact, it isn’t until you get down to #44 where you’ll find someone who is still an active player in Ichiro Suzuki. For the record, Ichiro is ranked only #76 on Ranker’s Top CURRENT Baseball Players List. Does this imply that voters know and respect their history? Or could it be that the current crop of baseball players aren’t well represented because they aren’t being watched? Television ratings data suggests that a steady decline in viewership over the years might play a factor in the voting. Major League Baseball as an entity is as strong as ever (just have a look at some of the salaries they’re handing out), people aren’t as interested in the game as they used to be.

How much does a voter’s age factor into the results? A deeper dive into the big data analytics suggests quite a bit. Baby Boomers are 184% more likely to have Mel Ott on their list than any other age group because, you know, they’ve actually seen him play. If you’re between the ages of 30-49, you are a whopping 305% more likely to have Sadaharu Oh of the Yomuiri Giants on your list (which suggests that internationally, fans aren’t only passionate about their soccer). If you’re a Millennial, you must enjoy a good quote. They are 248% and 234% more likely to vote for the non sequitur machine Yogi Berra and the forever quirky Rickey Henderson, respectively. Ranker doesn’t have analytics to suggest that voters in the 30-49 age demographic were all mustache enthusiasts, they were 281% more likely to include Rollie Fingers on their list.

However, those stats focus on specific characters in the game that a certain demographic is drawn to. Where are the Mike Trouts (#1 with people under the age of 29 on the Top Current Baseball Players List), Clayton Kershaws (#2), or players who have brand recognition among fans like Troy Tulowitzki (#20)? All of them, gaudy numbers and all, failed to crack the top 100. In fact, the only other active players on the list (besides the aging Ichiro) were the also-aging Albert Pujols (#48) and Miguel Cabrera (#90). Maybe, there’s just not a large (or long) enough sample size to include current players on this list of all-time greats.

Is today’s game yesterday’s news?

Perhaps voters are just into something else. When you look at the voting demographics, Young voters are the least represented participants, with the majority being aged 30 and up. But with nearly 23% of the votes, you would think at least a couple more current players would sneak in, wouldn’t you? Perhaps baseball just doesn’t resonate with this new generation. They’re gravitating toward playing lacrosse, on their video game consoles, or even fiddling with their smartphones. As a recent article in the Wall Street Journal even suggests, younger people are just tuning out.

So who’s got next?

The times may have changed, but according to Ranker data, the best baseball players really haven’t. From Cobb in the dead-ball era and Satchel Paige of the Negro Leagues to various International Leagues and beyond, the voters know that the greatest all-time baseball was played beyond just the Major Leagues here in the States. Records were made to be broken, but which of the best baseball players of today do you think will eventually break into the all-time list? Only time (and the fickle, under the age of 30 voters) will tell. So if you should happen to ask a Millennial if they saw the game last night, just don’t expect them to inquire who won. You’ll probably just get a “who cares?”

Collecting and Connecting Millions of Opinions

insights_logo_transparent

Ranker Insights is the Most Precise Data for Entertainment, Personalities, Sports, Brands and More

Ranker is a leading, digital media company that ranks opinions on (almost) everything through our vote-based, user experience. Our rankings don’t just collect opinions, they contextualize them. Through context, Ranker can discern users who prefer an actor’s talent vs. their attractiveness, for example, or fans who like a college for its academics vs. athletics; and the millions upon millions of correlations therein.

Thusly, we created Ranker Insights: Ranker’s first-party analytics platform that optimizes data from users votes, into actionable intelligence with countless applications.

by    in Data, Data Science, Popular Lists

Applying Machine Learning to the Diversity within our Worst Presidents List

Ranker visitors come from a diverse array of backgrounds, perspectives and opinions.  The diversity of the visitors, however, is often lost when we look at the overall rankings of the lists, due to the fact that the rankings reflect a raw average of all the votes on a given item–regardless of how voters behave on multiple other items.  It would be useful then, to figure out more about how users are voting across a range of items, and to recreate some of the diversity inherent in how people vote on the lists.

Take for instance, one of our most popular lists: Ranking the Worst U.S. Presidents, which has been voted on by over 60,000 people, and is comprised of over a half a million votes.

In this partisan age, it is easy to imagine that such a list would create some discord. So when we look at the average voting behavior of all the voters, the list itself has some inconsistencies.  For instance, the five worst-rated presidents alternate along party lines–which is unlikely to represent a historically accurate account of which presidents are actually the worst.  The result is a list that represents our partisan opinions about our nation’s presidents:

 

ListScreenShot

 

The list itself provides an interesting glimpse of what happens when two parties collide in voting for the worst presidents, but we are missing interesting data that can inform us about how diverse our visitors are.  So how can we reconstruct the diverse groups of voters on the list such that we can see how clusters of voters might be ranking the list?

To solve this, we turn to a common machine learning technique referred to as “k-means clustering.” K-means clustering takes the voting data for each user, summarizes it into a result, and then finds other users with similar voting patterns.  The k-means algorithm is not given any information whatsoever from me as the data scientist, and has no real idea what the data mean at all.  It is just looking at each Ranker visitor’s votes and looking for people who vote similarly, then clustering the patterns according to the data itself.  K-means can be done to parse as many clusters of data as you like, and there are ways to determine how many clusters should be used.  Once the clusters are drawn, I re-rank the presidents for each cluster using Ranker’s algorithm, and the we can see how different clusters ranked the presidents.

As it happens, there are some differences in how clusters of Ranker visitors voted on the list.  In a two-cluster analysis, we find two groups of people with almost completely opposite voting behavior.

(*Note that since this is a list of voting on the worst president, the rankings are not asking voters to rank the presidents from best to worst, it is more a ranking of how much worse each president is compared to the others)

The k-means analysis found one cluster that appears to think Republican presidents are worst:

ClusterOneB

Here is the other cluster, with opposite voting behavior:

ClusterTwoB

In this two-cluster analysis, the shape of the data is pretty clear, and fits our preconceived picture of how partisan politics might be voting on the list.  But there is a bias toward recent presidents, and the lists do not mimic academic lists and polls ranking the worst presidents.

To explore the data further, I used a five cluster analysis–in other words, looking for five different types of voters in the data.

Here is what the five cluster analysis returned:

FiveClusterRankings

The results show a little more diversity in how the clusters ranked the presidents.  Again, we see some clusters that are more or less voting along party lines based on recent presidents (Clusters 5 and 4).  Cluster 1 and 3 also are interesting in that the algorithm also seems to be picking up clusters of visitors who are voting for people that have not been president (Hillary Clinton, Ben Carson), and thankfully were never president (Adolf Hitler).  Cluster 2 and 3 are most interesting to me however, as they seem to show a greater resemblance to the academic lists of worst presidents, (for reference, see wikipedia’s rankings of presidents) but the clusters tend toward a more historical bent on how we think of these presidents–I think of this as a more informed partisan-ship.

By understanding the diverse sets of users that make up our crowdranked lists, we are able to improve our overall rankings, and also provide more nuanced understanding how different group opinions compare, beyond the demographic groups we currently expose on our Ultimate Lists.  Such analyses help us determine outliers and agenda pushers in the voting patterns, as well as allowing us to rebalance our sample to make lists that more closely resemble a national average.

  • Glenn Fox

 

 

by    in Data Science, Popular Lists, Rankings

In Good Company: Varieties of Women we would like to Drink With

MainImagesvg

They say you’re defined by the company you keep.  But how are you defined by the company you want to keep?

The list “Famous Women You’d Want to Have a Beer With”  provides an interesting way to examine this idea.  In other words, how people vote on this list can define something about what kind of person is doing the voting.

We can think of people as having many traits, or dimensions.  The traits and dimensions that are most important to the voters will be given higher rankings.  For instance, some people may rank the list thinking about the trait of how funny the person is, so may be more inclined to rate comedians higher than drama actresses.  Others may vote just on attractiveness, or based on singing talent, etc…  It may be the case that some people rank comedians and singers in a certain way, whereas others would only spend time with models and actresses.  By examining how people rank the various celebrities along these dimensions, we can learn something about the people doing the voting.

The rankings on the site, however, are based on the sum of all of the voters’ behavior on the list, so the final rankings do not tell us about how certain types of people are voting on the list.  While we could manually go through the list to sort the celebrities according to their traits, i.e. put comedians with comedians, singers with singers,  we would risk using our own biases to put voters into categories where they do not naturally belong.  It would be much better to let the voter’s own voting decide how the celebrities should be clustered.  To do this, we can use some fancy-math techniques from machine learning, called clustering algorithms, to let a computer examine the voting patterns and then tell us which patterns are similar between all the voters.   In other words, we use the algorithm to find patterns in the voting data, to then put similar patterns together into groups of voters, and then examine how the different groups of voters ranked the celebrities.  How each group ranked the celebrities tells us something about the group, and about the type of people they would like to keep them company.

As it happens, using this approach actually finds unique clusters, or groups, in the voting data, and we can then guess for ourselves how the voters from each group can be defined based on the company they wish to keep.

Here are the results:

Cluster 1:

Cluster4_MakeCelebPanels

Cluster 1 includes females known to be funny, and includes established comedians like Carol Burnett and Ellen DeGeneres. What is interesting is that Emma Stone and Jennifer Lawrence are also included, who are also highly ranked on lists based on physical attractiveness, they also have a reputation for being funny.  The clustering algorithm is showing us that they are often categorized alongside other funny females as well.  Among the clusters, this cluster has the highest proportion of female voters, which may explain why the celebrities are ranked along dimensions other than attractiveness.

 

Cluster 2:

Cluster1_MakeCelebPanels

Cluster 2 appears to consist of celebrities that are more in the nerdy camp, with Yvonne Strahovski and Morena Baccarin, both of whom play roles on shows popular with science fiction fans.  In the bottom of this list we see something of a contrarian streak as well, with downvotes handed out to some of the best known celebrities who rank highly on the list overall.

Cluster 3:

Cluster2_MakeCelebPanels

Cluster 3 is a bit more of a puzzle.  The celebrities tend to be a bit older, and come from a wide variety of backgrounds that are less known for a single role or attribute.  This cluster could be basing their votes more on the celebrity’s degree of uniqueness, which is somewhat in contrast with the bottom ranked celebrities who represent the most common and regularly listed female celebrities on Ranker.

Cluster 4:

Cluster3_MakeCelebPanels

We would also expect a list such as this to be heavily correlated with physical attractiveness, or perhaps for the celebrity’s role as a model.  Cluster 4 is perhaps the best example of this, and likely represents our youngest cluster.  The top ranked women are from the entertainment sector and are known for their looks, whereas in the bottom ranked people are from politics, comedy, or are older and probably less well known to the younger voters.  As we might expect, cluster 3 also has a high proportion of younger voters.

Here is the list of the top and bottom ten for each cluster (note that the order within these lists is not particularly important since the celebrity’s scores will be very close to one another):

TopCelebsPerClusterTable

 

In the end, the adage that we are defined by the company we keep appears to have some merit–and can be detected with machine learning approaches.  Though not a perfect split among the groups, there were trends in each group that drew the people of the cluster together.  This approach can provide a useful tool as we improve the site and improve the content for our visitors.   We are using these approaches to help improve the site and to provide better content to our visitors.

 

–Glenn R. Fox, PhD

 

 

A Ranker World of Comedy Opinion Graph: Who Connects the Funny Universe?

In the previous post, we showed how a Gephi layout algorithm was able to capture different domains in the world of comedy across all of the Ranker lists tagged with the word “funny”.  However, these algorithms also give us information about the roles that individuals play within clusters. The size of the node indicates that node’s ability to connect other nodes, so bigger nodes indicate individuals who serve as a gateway between different nodes and categories.  These are the nodes that you would want to target if you wanted to reach the broadest audience, as people who like these comedic individuals also like many others.  Sort of like having that one friend who knows everyone send out the event invite instead of having to send it to a smaller group of friends in your own social network and hoping it gets around. So who connects the comedic universe?

The short answer: Dave Chappelle (click to enlarge)

Chappelle

Dave Chappelle is the superconnector. He has both the largest number of direct connections and the largest number of overall connections. If you want to reach the most people, go to him. If you want to connect people between different kinds of comedy, go to him.  He is the center of the comedic universe. He’s not the only one with connections though.

Top 10 Overall Connectors

  1. Dave Chappelle 
  2. Eddie Izzard 
  3. John Cleese 
  4. Ricky Gervais
  5. Rowan Atkinson
  6. Eric Idle
  7. Billy Connolly
  8. Bill Hicks
  9. It’s Always Sunny In Philadelphia
  10. Sarah Silverman

 

We can also look at who the biggest connectors are between different comedy domains.

  • Contemporary TV Shows: It’s Always Sunny in Philadelphia, ALF, and The Daily Show are the strongest connectors. They provide bridges to all 6 other comedy domains.
  • Contemporary Comedians on American Television: Dave Chappelle, Eddie Izzard and Ricky Gervais are the strongest connectors. They provide bridges to all 6 other comedy domains.
  •  Classic Comedians: John Cleese and Eric Idle are the strongest connectors. They provide bridges to all 6 other comedy domains.
  • Classic TV Shows: The Muppet Show and Monty Python’s Flying Circus are the strongest connectors. They provide bridges to Classic TV Comedians, Animated TV shows, and Classic Comedy Movies.
  • British Comedians: Rowan Atkinson is the strongest connector. He serves as a bridge to all of the other 6 comedy domains.
  • Animated TV Shows: South Park is the strongest connector. It serves as a bridge to Classic Comedians, Classic TV Shows, and British Comedians.
  • Classic Comedy Movies: None of the nodes in this domain were strong connectors to other domains, though National Lampoon’s Christmas Vacation was the strongest node in this network.

 

 

A Ranker Opinion Graph of the Domains of the World of Comedy

One unique aspect of Ranker data is that people rank a wide variety of lists, allowing us to look at connections beyond the scope of any individual topic.  We compiled data from all of the lists on Ranker with the word “funny” to get a bigger picture of the interconnected world of comedy.  Using Gephi layout algorithms, we were able to create an Opinion Graph which categorizes comedy domains and identify points of intersection between them (click to make larger).

all3sm

In the following graphs, colors indicate different comedic categories that emerged from a cluster analysis, and the connecting lines indicate correlations between different nodes with thicker lines indicating stronger relationships.  Circles (or nodes) that are closest together are most similar.  The classification algorithm produced 7 comedy domains:

 

CurrentTVwAmerican TV Shows and Characters: 26% of comedy, central nodes =  It’s Always Sunny in Philadelphia, ALF, The Daily Show, Chappelle’s Show, and Friends.

NowComedianwContemporary Comedians on American Television: 25% of nodes, includes Dave Chappelle, Eddie Izzard, Ricky Gervais, Billy Connolly, and Bill Hicks.

 

ClassicComedianswClassic Comedians: 15% of comedy, central nodes = John Cleese, Eric Idle, Michael Palin, Charlie Chaplin, and George Carlin.

ClassicTVClassic TV Shows and Characters: 14% of comedy, central nodes = The Muppet Show, Monty Python’s Flying Circus, In Living Color, WKRP in Cincinnati, and The Carol Burnett Show.

BritComwBritish Comedians: 9% of comedy, central nodes = Rowan Atkinson, Jennifer Saunders, Stephen Fry, Hugh Laurie, and Dawn French.

AnimwAnimated TV Shows and Characters: 9% of comedy, central nodes = South Park, Family Guy, Futurama, The Simpsons, and Moe Szyslak.

MovieswClassic Comedy Movies: 1.5% of comedy, central nodes = National Lampoon’s Christmas Vacation, Ghostbusters, Airplane!, Vacation, and Caddyshack.

 

 

Clusters that are the most similar (most overlap/closest together):

  • Classic TV Shows and Contemporary TV Shows
  • British Comedians and Classic TV shows
  • British Comedians and Contemporary Comedians on American Television
  • Animated TV Shows and Contemporary TV Shows

Clusters that are the most distinct (lest overlap/furthest apart):

  • Classic Comedy Movies do not overlap with any other comedy domains
  • Animated TV Shows and British Comedians
  • Contemporary Comedians on American Television and Classic TV Shows

 

Take a look at our follow-up post on the individuals who connect the comedic universe.

– Kate Johnson

 

by    in Data Science, prediction, Rankings

Cognitive Models for the Intelligent Aggregation of Lists

Ranker is constantly working to improve our crowdsourced list algorithms, in order to surface the best possible answers to the questions on our site.  As part of this effort, we work with leading academics who research the “wisdom of crowds”, and below is a poster we recently presented at the annual meeting for the Association for Psychological Science (led by Ravi Selker at the University of Amsterdam and in collaboration with Michael Lee from the University of California-Irvine).

While the math behind the aggregation model may be complex (a paper describing it in detail will hopefully be published shortly), the principle being demonstrated is relatively simple.  Specifically, aggregating lists using models that take into account the inferred expertise of the list maker outperform simple averages, when compared to real-world ground truths (e.g. box office revenue).  While Ranker’s algorithms for determining our crowdsourced rankings may be similarly complex, they are similarly designed to produce the best answers possible.

 

cognitive_model_aggregating_lists

 

– Ravi Iyer

Page 1 of 712345...Last »