Ranker Uses Big Data to Rank the World’s 25 Best Film Schools

NYU, USC, UCLA, Yale, Julliard, Columbia, and Harvard top the Rankings.

Does USC or NYU have a better film school?  “Big data” can provide an answer to this question by linking data about movies and the actors, directors, and producers who have worked on specific movies, to data about universities and the graduates of those universities.  As such, one can use semantic data from sources like Freebase, DBPedia, and IMDB to figure out which schools have produced the most working graduates.  However, what if you cared about the quality of the movies they worked on rather than just the quantity?  Educating a student who went on to work on The Godfather must certainly be worth more than producing a student who received a credit on Gigli.

Leveraging opinion data from Ranker’s Best Movies of All-Time list in addition to widely available semantic data, Ranker recently produced a ranked list of the world’s 25 best film schools, based on credits on movies within the top 500 movies of all-time.  USC produces the most film credits by graduates overall, but when film quality is taken into account, NYU (208 credits) actually produces more credits among the top 500 movies of all-time, compared to USC (186 credits).  UCLA, Yale, Julliard, Columbia, and Harvard take places 3 through 7 on the Ranker’s list.  Several professional schools that focus on the arts also place in the top 25 (e.g. London’s Royal Academy of Dramatic Art) as well as some well-located high schools (New York’s Fiorello H. Laguardia High School & Beverly Hills High School).

The World’s Top 25 Film Schools

  1. New York University (208 credits)
  2. University of Southern California (186 credits)
  3. University of California – Los Angeles (165 credits)
  4. Yale University (110 credits)
  5. Julliard School (106 credits)
  6. Columbia University (100 credits)
  7. Harvard University (90 credits)
  8. Royal Academy of Dramatic Art (86 credits)
  9. Fiorello H. Laguardia High School of Music & Art (64 credits)
  10. American Academy of Dramatic Arts (51 credits)
  11. London Academy of Music and Dramatic Art (51 credits)
  12. Stanford University (50 credits)
  13. HB Studio (49 credits)
  14. Northwestern University (47 credits)
  15. The Actors Studio (44 credits)
  16. Brown University (43 credits)
  17. University of Texas – Austin (40 credits)
  18. Central School of Speech and Drama (39 credits)
  19. Cornell University (39 credits)
  20. Guildhall School of Music and Drama (38 credits)
  21. University of California – Berkeley (38 credits)
  22. California Institute of the Arts (38 credits)
  23. University of Michigan (37 credits)
  24. Beverly Hills High School (36 credits)
  25. Boston University (35 credits)

“Clearly, there is a huge effect of geography, as prominent New York and Los Angeles based high schools appear to produce more graduates who work on quality films compared to many colleges and universities,“ says Ravi Iyer, Ranker’s Principal Data Scientist, a graduate of the University of Southern California.

Ranker is able to combine factual semantic data with an opinion layer because Ranker is powered by a Virtuoso triple store with over 700 million triples of information that are processed into an entertaining list format for users on Ranker’s consumer facing website, Ranker.com.  Each month, over 7 million unique users interact with this data – ranking, listing and voting on various objects – effectively adding a layer of opinion data on top of the factual data from Ranker’s triple store. The result is a continually growing opinion graph that connects factual and opinion data.  As of January 2013, Ranker’s opinion graph included over 30,000 nodes with over 5 million edges connecting these nodes.

– Ravi Iyer

by    in interest graph, Opinion Graph

A Battle of Taste Graphs: Baltimore Ravens Fans vs. San Francisco 49ers Fans

Super Bowl Sunday is a day when two cities and two fan groups are competing for bragging rights, even as the Baltimore Ravens and San Francisco 49ers themselves do the playing.  You might be interested in understanding these teams’ fans better through an exploration of their fans’ taste graphs, from a recent post on our data blog, which examines correlations between votes on lists like the Top NFL Teams of 2012 and non-sports lists like our list of delicious vegetables (yum!).

For one, There is also absolutely zero consensus where music is concerned. 49er’s fans listen to an eclectic mixture of genres: up-and-coming rappers like Kendrick Lamar sit right next to INXS and 90s brit-poppers Pulp. Yet where the Ravens are concerned, classic rock is still king: Hendrix, CCR, and Neil Young are an undisputed top three. The 49ers also have the Ravens utterly beat in terms of culinary taste. Monterrey Jack and Cosmos are a fairly clear favorite among fans, while Baltimore’s stick to staples: Coffee, Bell peppers, and Ham are the only food items that correlated enough to even be tracked.

 A Snapshot from Ranker’s Data Mining Tool

TV tastes also varied between the two teams: Ravens fans stuck to almost exclusively comedic faire (Pinky and The Brain, Rugrats, Mythbusters and Louie correlated strongly), while the 49er’s stuck to more structured, dramatic shows, such as The Walking Deadand Dexter.

Read the full post here over on our data blog.

– Ravi Iyer

by    in Data Science, interest graph, Opinion Graph

The Opinion Graph predicts more than the Interest Graph

At Ranker, we keep track of talk about the “interest graph” as we have our own parallel graph of relationships between objects in our system, that we call an “opinion graph”.  I was recently sent this video concerning the power of the interest graph to drive personalization.

The points made in the video are very good, about how the interest graph is more predictive than the social graph, as far as personalization goes.  I love my friends, but the kinds of things they read and the kinds of things I read are very different and while there is often overlap, there is also a lot of diversity.  For example, trying to personalize my movie recommendations based on my wife’s tastes would not be a satisfying experience.  Collaborative filtering using people who have common interests with me is a step in the right direction and the interest graph is certainly an important part of that.

However, you can predict more about a person with an opinion graph versus an interest graph. The difference is that while many companies can infer from web behavior what people are interested in, perhaps by looking at the kinds of articles and websites they consume, a graph of opinions actually knows what people think about the things they are reading about.  Anyone who works with data knows that the more specific a data point is, the more you can predict, as the amount of “error” in your measurement is reduced.  Reduced measurement error is far more important for prediction than sample size, which is a point that gets lost in the drive toward bigger and bigger data sets.  Nate Silver often makes this point in talks and in his book.

For example, if you know someone reads articles about Slumdog Millionare, then you can serve them content about Slumdog Millionare.  That would be a typical use case for interest graph data. Using collaborative filtering, you can find out what other Slumdog Millionare fans like and serve them appropriate content.  With opinion graph data, of the type we collect at Ranker, you might be able to differentiate between a person who thinks that Slumdog Millionare is simply a great movie versus someone who thinks the soundtrack was one of the best ever.  If you liked the movie, we would predict that you would also like Fight Club.  But if you liked the soundtrack, you might instead be interested in other music by A.R. Rahman.

Simply put, the opinion graph can predict more about people than the interest graph can.

– Ravi Iyer

by    in Opinion Graph

The Best Possible Answers To Opinion-Based Questions

Ranker, as an openended platform for ranking people/places/things, is a lot of different (awesome) things to different people. But the overarching goal for Ranker has always been to provide the best possible answer to opinion-based questions like “What are the best _____?”

Popular sports and entertainment vote lists often grow into being a great answer within 12-72 hours as they get lots of traffic quickly, but the majority of Ranker lists take 1 – 3 months to build to full credibility as visitors on Ranker and from search engines find them and shape them with votes and re-ranks.

I thought it would be fun to showcase some Ultimate Lists and Vote Lists in other categories that haven’t gone viral, but through the participation of lots of Rankers over a few months have indeed become “the best possible answer” to this question.

Food

You all clearly love to weigh in on the start of the day, and the 5 o’clock hour:

Best Breakfast Cereals

The Best Cocktails

But you also have strong opinions on hydration during the day:

Best Sodas (and for the more calorie-conscious among you The Best Diet Sodas)

And even specific Gatorade flavors (thanks for the list Lucas)

Snacking, whether it be on a particular type of cheese, candy bar, or even as granular as a specific Jelly Belly flavor (thanks for the list Samantha but what’s with all the chocolate pudding haters?)

Dining out, specifically at Italian chain restaurants

A list I am not authorized to vote on, pregnancy cravings

And hundreds more, including perhaps a new category entirely – food nostalgia (I do miss those Crispy M&Ms myself)

Fashion/Beauty

Not categories that I personally check up on much, so I was psyched to see quite a few solid rankings here, some of them high-end but mostly stuff you can find at the mall:

Best women’s shoe brands

Best denim brands

Top handbag designers

Fashion Blogs

Sulfate-free shampoos

And even a men’s facial moisturizers list (have only tried 3 or 4 myself, but agree with their relative positions on the list)

Travel

Rankers, I know from a number of you that as we’ve been adding datasets of “rank-able objects” over the last year, one of the most-requested ones that we don’t yet have is hotels/resorts. Trust me, it’s still on the list. But in the meantime, it’s been heartening to see how many of you have participated in these great resources for travel destinations and attractions, like these:

Best US cities for vacations

Honeymoon destinations

Coolest cities in America

Theme parks for roller coaster addicts

And my personal faves, “bucket lists” of the world’s most beautiful natural wonders and historical landmarks.

Great stuff – these lists and 1000s more like them are true testimonials to the “wisdom of crowds”. Thanks, crowds!

by    in Opinion Graph

The Best Possible Answers To Opinion-Based Questions

Ranker, as an openended platform for ranking people/places/things, is a lot of different (awesome) things to different people. But the overarching goal for Ranker has always been to provide the best possible answer to opinion-based questions like “What are the best _____?”

Popular sports and entertainment vote lists often grow into being a great answer within 12-72 hours as they get lots of traffic quickly, but the majority of Ranker lists take 1 – 3 months to build to full credibility as visitors on Ranker and from search engines find them and shape them with votes and re-ranks.

I thought it would be fun to showcase some Ultimate Lists and Vote Lists in other categories that haven’t gone viral, but through the participation of lots of Rankers over a few months have indeed become “the best possible answer” to this question.

 

Food: you all clearly love to weigh in on the start of the day, and the 5 o’clock hour:

Best Breakfast Cereals

The Best Cocktails

But you also have strong opinions on hydration during the day:

Best Sodas (and for the more calorie-conscious among you The Best Diet Sodas)

And even specific Gatorade flavors (thanks for the list Lucas)

Snacking, whether it be on a particular type of cheese, candy bar, or even as granular as a specific Jelly Belly flavor (thanks for the list Samantha but what’s with all the chocolate pudding haters?)

Dining out, specifically at Italian chain restaurants

A list I am not authorized to vote on, pregnancy cravings

And hundreds more, including perhaps a new category entirely – food nostalgia (I do miss those Crispy M&Ms myself)

 

Fashion/Beauty: not categories that I personally check up on much, so I was psyched to see quite a few solid rankings here, some of them high-end but mostly stuff you can find at the mall:

Best women’s shoe brands

Best denim brands

Top handbag designers

Fashion Blogs

Sulfate-free shampoos

And even a men’s facial moisturizers list (have only tried 3 or 4 myself, but agree with their relative positions on the list)

 

Travel: Rankers, I know from a number of you that as we’ve been adding datasets of “rank-able objects” over the last year, one of the most-requested ones that we don’t yet have is hotels/resorts. Trust me, it’s still on the list. But in the meantime, it’s been heartening to see how many of you have participated in these great resources for travel destinations and attractions, like these:

Best US cities for vacations

Honeymoon destinations

Coolest cities in America

Theme parks for roller coaster addicts

And my personal faves, “bucket lists” of the world’s most beautiful natural wonders and historical landmarks.

Great stuff – these lists and 1000s more like them are true testimonials to the “wisdom of crowds”.  Thanks, crowds!

by    in Data Science, Opinion Graph, Rankings

From Ranker Labs: A Deeper Look at the Worst Movies List

Perhaps you didn’t know Ranker had a whole large laboratory full of scientists in neatly pressed white coats doing crazy, some might even say Willy Wonka-esque experiments. We try to keep that sort of thing fairly under wraps. The government’s been sort of cracking down on evil science ever since that Freeze Ray incident a few years back… you know the one I mean…

A rare glimpse behind the curtain at how Ranker lists are made. Photo by RDECOM.

Anyway, recently, our list technicians have been playing around with CrowdRanked lists. We get a lot of Ranker users giving us their opinion on these lists.

(Ranker’s CrowdRankings invite our community members to all gather together and make lists about one topic. Then everyone else can come in and vote on what they think. When it’s all been going on for a while, and a bunch of people have participated, you get a list that’s a fairly definitive guide to that topic.)

One list that has interested us in particular is this one: The Worst Movies of All Time. Almost 70 people have contributed their own lists of the worst films ever, and thousands of other members of the Ranker community have voted.

And what do we learn from this list? Everyone really, really, really hates “Gigli.” I mean, hates it. That movie is no good at all.


Ben Affleck does his impression of everyone watching more than 5 minutes of ‘Gigli.’

It comes in #2 right now, with almost 700 votes upholding its general crapitude. The only movie topping it in votes right now is Mariah Carey’s vanity project, “Glitter,” which, to be fair, barely qualifies as “a movie.”

But our scientists – because they are seriously all about science – thought, there must be something more we can do with this data now that we’ve collected it. And wouldn’t you know, they came up with something. They call it “Factor Analysis.” I call it “The thing on my desk I’m supposed to write about after I have a few more cups of coffee.”

So What Is Factor Analysis Anyway?

Here’s how the technicians explained it to me…

We’re going to perform a statistical analysis of the votes we collected on the “Worst Movies Ever” list. (Just the votes, not the lists people made nominating movies.)  To do this, we’re going to break up the list of movies into groups based on similarities in people’s voting patterns. (That is, if a lot of people voted for both “Twilight” and “From Justin to Kelly,” we might group them together. If a lot of those same people voted against “Catwoman,” we’d put that in a separate group.)

Sometimes, you’ll be able to look at the grouping and the common thread between those choices will be obvious. Of course the same people hated “Lady in the Water’ and “The Last Airbender.” They can’t stand M. Night Shyamalan (or, perhaps more accurately, they can’t stand what he has become.) Not exactly a shocking twist there.

The Airbender gains his abilities by harnessing the power of constant downvotes.

But other times, the groupings will not be quite as obvious, and that’s where the analysis can get more intriguing. Once we collect enough data, we’ll be able to make all kinds of weird connections between movies, and maybe figure out a more Unified Theory of Bad Movies than currently exists! (Hey, a blogger can dream…)

When doing this kind of factor analysis, you first must determine the number of groups that exist in your data. We used something called a Catell’s Scree Test to determine the number of groups. (This is fancy-talk for saying: “We plot everything on a graph like the one below, and look for the elbow – the point where the steepness of the dropoff between factors is the greatest.”)

The “Eigenvalue” that you see along the Y axis there is a measure of the importance of each factor. It helps us to differentiate between significant factors (the “signal”) from insignificant ones (the “noise”).

Once we decide how many factors we have, it’s time to actually extract factors whereby we determine which movies load on which factors. It sounds precise and mathematical, but there’s some amount of subjectivity that still comes into play. For example, let’s say you were talking about your favorite foods. (Yes, yes, we all love “bacon,” but be serious.)

One way to group them would be on a spectrum from spicy to bland foods. But you could also choose to go from very exotic foods to more ordinary, everyday ones. Or starting with healthy foods and moving into junk food. Each view would be a legitimate way to classify food, so a decision must be made on some level about how to “rotate” the factor solution.

In our case, we chose what’s called the “varimax rotation,” which maximizes the independence of each factor and tries to prevent a ton of overlap. This allows us to break up the movies into interesting sub-groups, rather than just having one big list of “bad” films (which is where we started out.)

Doing that yields the below chart.

Along the top, you can see the factors that were extracted. The higher the number a film gets for a certain component, the more closely aligned it is with that component. Using these charts, we can then place movies in “Factors,” or categories, with relative ease.

Unfortunately, the program can only get us this far – we can see the factors, but we can’t tell why certain items apply to certain factors and not others.

So What Can Factor Analysis Tell Us About the Worst Movies?

First, our lab rats managed to split the entire Worst Movies List (containing 70 total films) into 5 different categories.

Category 1 (we called it “Factor 1”) contained the most movies overall, so whatever the common thread was, we knew that it must be something that people immediately identified with “bad movies.” Some of the titles that most closely correlated with Factor 1 were:

– “Monster a Go-Go”
– “Manos: The Hands of Fate”
– “Crossover”
– “The Final Sacrifice”
– “Zombie Nation”

Check out the full group here on our Complete List of Factor 1 Bad Movies.

We decided that “Classic B-Movie Horror” was the best way to describe this grouping. Of the group, 1965’s “Monster a Go-Go” was the most representative item, and it didn’t really overlap with any of the other groups. The film is a fairly standard horror/sci-fi matinee of the time. An astronaut crashes back to Earth having suffered radiation poisoning, and then goes on a rampage.

So when most Rankers think about what makes a movie “bad,” they tend to think of older, low budget movies that fail at being scary, and maybe have a sci-fi element as well.

Factor 2 was a bit harder to pin down. Lots more movies seemed to fall into or overlap with this category, but it was a bit tricky to pinpoint what they had in common. Representative Factor 2 movies included:

– “Glitter”
– “SuperBabies: Baby Geniuses 2”
– “From Justin to Kelly”
– “Catwoman”

and the most representative of all for Factor 2 was “Gigli.” (See all the movies relating to Factor 2 here.)

We settled on “Cheesiness” as a good common thread for these movies. (Especially if you continue on down the list: “Battlefield Earth,” “The Room,” “Batman and Robin,” “Superman IV: The Quest for Peace”…yeesh…)

Note here that “Gigli” was the film that most closely correlated to Factor 2 (what we have deemed “cheesy movies”), and “Glitter” was also considered highly cheesy. Yet “Glitter” is the overall most popular “Worst Movie” on the list, when going by straight votes. This seems to indicate that “Gigli” was hated SOLELY because it is cheesy, while “Glitter” commits numerous cinematic crimes, including cheesiness.

Factor 3 had even fewer films that closely correlated, but it was very simple to figure out what they all had in common. Consider the movies that were most representative of Factor 3:

– “The English Patient”
– “The Family Stone”
– “Far and Away”
– “Legends of the Fall”
– “The Fountain”
– “Eyes Wide Shut” (oh come on are you guys kidding it’s freaking Kubrick!)
– “What Dreams May Come”

Check out the full group here on our Complete List of Factor 3 Bad Movies.

Let’s call this the “Self-Important Pretension” group. People who hate movies that are self-consciously “artsy” and “important” REALLY hate those movies, and will pretty much always pick them over other bad movies from other genres. These folks are just outnumbered by the people who think it’s worse to be old-fashioned or cheesy than pompous. (At least, people ON RANKER.)

Factors 4 and 5 are sort of interesting. It’s definitely harder to make a clear-cut distinction between these two groups when you’re just looking at the films. We know they are distinct, because of the voting patterns that created them. But consider the actual movies:

– “Star Wars: Episode I: The Phantom Menace”
– “Transformers: Revenge of the Fallen”
– “Indiana Jones and the Kingdom of the Crystal Skull”
– “Spider-Man”
– “Godzilla” (the 1998 Matthew Broderick version)
– “Star Wars: Episode II: Attack of the Clones”
– “Pearl Harbor”

(Here’s the complete Factor 4 list.)

Biggest disappointments? That was our first thought. But then check out Factor 5:
 
– “Forrest Gump”
– “Indiana Jones and the Temple of Doom”
– “Million Dollar Baby”
– “Avatar”
– “Quantum of Solace”

(All the Factor 5 movies are listed here.)

Certainly, if you didn’t like Best Picture winners “Forrest Gump,” “Million Dollar Baby” and “Avatar,” you considered them disappointments? “Quantum of Solace” was the lukewarm follow-up to “Casino Royale,” one of the best Bond films of all time. And “Temple of Doom” is the sequel to arguably the best adventure movie ever made, “Raiders of the Lost Ark.”

So how come the movies in Factor 4 closely correlated with one another, and the movies in Factor 5 closely correlated with one another, if they’re BOTH groups of disappointing films? Maybe they disappointed different people, or they disappointed people in different ways?

One theory: Factor 4 films are entries in above-average franchises that are considered not as good as the other films. (This doesn’t quite apply to “Pearl Harbor,” unless you consider Michael Bay Movies to be a franchise. As I do.) The people who agreed on voting for these films felt that the worst thing a movie can do is disappoint fans of other, similar movies.


For example, movies starring Ben Affleck…

This would make Factor 5 the “overhyped” category. Everyone’s “supposed” to love “Million Dollar Baby” and “Avatar” and “Forrest Gump.” And the people who don’t like them feel a curmudgeonly sense of kinship around some of these titles. (One would expect “The English Patient,” then, to fall into this factor. Unfortunately for our theory, it’s most closely aligned with Factor 3, the “Long and Boring” category.)

More theories as to the strange circumstances of Factor 4 and 5 are certainly welcome. We just thought it was kind of an intriguing puzzle.

There were 3 movies that seemed to coalesce into a “Factor 6,” but we didn’t have enough data and enough films didn’t correlate to create a true category in any meaningful sense. So it may forever elude us what “Waterworld,” “The Postman” and “Road House” have in common. Aside from kicking ass, amiright? R-r-right?

Movies That Scored High in Multiple Factors

Some movies didn’t closely align with any single group, but nonetheless scored high for numerous different factors. For example, “Masters of the Universe,” the ill-fated live-action ’80s adaptation of the He-Man line of toys. “Masters of the Universe” was somewhat aligned with Factor 1 – the “dated B-movie genre” group – as well as Factor 3 – the self-important pretension group. Now that is just weird. I mean, yes, He-Man is kind of a blowhard, with all that “I Have the Power!” stuff. But I don’t really think of it as terribly similar to “The English Patient” when all is said and done.

Also, consider “Lady in the Water.” It aligns fairly closely with Factors 1, 2 AND 3, and even makes a showing in Factor 4. This is a movie upon which haters of every kind of movie can agree.

A Look at Things to Come

So, that’s how we’ve gotten started with using Factor Analysis on some of our CrowdRanked lists. Isn’t it very very very interesting, such that you’d like to tell all of your friends about what you’ve just read? If only there were some kind of digital environment where people could socially interact and share hypertextual links to information that they enjoy with their friends…

Be sure to check out the next edition of Ranker Labs, coming in a few weeks, when we’ll apply some Factor Analysis to ANOTHER one of our big CrowdRanked lists – History’s Worst People.

Page 3 of 3123