by   Ranker
Staff
in Data Science, prediction, Rankings

World Cup 2014 Predictions

An octopus called Paul was one of the media stars of the 2010 soccer world cup. Paul correctly predicted 11 out of 13 matches, including the final in which Spain defeated the Netherlands. The 2014 world cup is in Brazil and, in an attempt to avoid eating mussels painted with national flags, we made predictions by analyzing data from Ranker’s “Who Will Win The 2014 World Cup?” list.

Ranker lists provide two sources of information, and we used both to make our predictions. One source is the original ranking, and the re-ranks provided by other users. For the world cup list, some users were very thorough, ranking all (or nearly all) of the 32 teams who qualified for the world cup. Other users were more selective, listing just the teams they thought would finish in the top places. An interesting question for data analysis is how much weight should be given to different rankings, depending on how complete they are.

The second source of information on Ranker are the thumbs-up and thumbs-down votes other users make in response to the master list of rankings. Often ranker lists have many more votes than they have re-ranks, and so the voting data potentially are very valuable. So, another interesting question for data analysis is how the voting information should be combined with the ranking information.

A special feature of making world cup predictions is that there is very useful information provided by the structure of the competition itself. The 32 teams have been drawn in 8 brackets with 4 teams each. Within a bracket, every team plays every other team once in initial group play. The top two teams from each bracket then advance to a series of elimination games. This system places strong constraints on possible outcomes, which a good prediction should follow. For example, Although Group B contains Spain, the Netherlands, and Chile — all strong teams, currently ranked in the top 16 in the world according to FIFA rankings — only two can progress from group play and finish in the top 16 for the world cup.

We developed a model that accounts for all three of these sources of information. It uses the ranking and re-ranking data, the voting data, and the constraints coming from the brackets, to make an overall prediction. The results of this analysis are shown in the figure. The left panel shows the thumbs-up (to the right, lighter) and thumbs-down (to the left, darker) votes for each team. The middle panel summarizes the ranking data, with the area of the circles corresponding to how often each team was ranked in each position. The right hand panel shows the inferred “strength” of each team on which we based our predicted order.

Our overall prediction has host-nation Brazil winning. But the distribution of strengths shown in the model inferences panel suggests it is possible Germany, Argentina, or Spain could win. There is little to separate the remainder of the top 16, with any country from the Netherlands to Algeria capable of doing well in the finals. The impact of the drawn brackets on our predictions is clear, with a raft of strong countries — the England, USA, Uruguay, and Chile — predicted to miss the finals, because they have been drawn in difficult brackets.

– Michael Lee

by   Ranker
Staff
in Popular Lists

Crazy Things Girls Do, Presidents Gone Wild + Sexy Co-Star Couples

Happy spring! Here are the best lists you crazy kids have been upvoting this month:

Which Ex-Presidents Would You Want to Go on a Bender With?
Did you know that some of the former presidents of the U.S. were huge drinkers and recreational drug users? Huge! Read about the scandalous habits of these former POTUSes (or would that be POTI?) and then vote for the ex-pres that you’d most want to get down with.

The Craziest Things Girls Will Do to Make You Like Them
Quit playing games with my heart! Really. No, not really. We all can get a little crazy when it comes to romance. Whether they’re just flirting or looking to put a ring on it, some ladies will do some pretty insane things to get noticed.

25 Celebrities Who Lost a Ton of Weight (Before and After Photos)
These stars transformed themselves from fat to all that! From flabby to fabby. From obese to bitch, please! Okay, I think we’re done here.

 

Who Did These Eventually Famous Kids Grow Up To Be?
When kids are young, they have the potential to be anything: astronauts, politicians, police officers… or they could go the way of Darth Vader and cross over to the Dark Side. From these vintage childhood photos, can you guess who turned in to whom?

The Biggest Turn Ons in a Person
Wondering how to be charming and attractive to the opposite sex? This is what men – and women – truly want.

25 Movie Couples Who Got Together in Real Life
After pretending to be an item on the silver screen, these sexy celebrities rode that wave of attraction all the way to Couplesville.

 

Top 10 Most Ironic Deaths of All Time (Vol. 2)
These incidents are far more serious than having “too many spoons when all you need is a knife.” Folks died here, people. Bonus: If you really love yourself some ironic deaths, check out Volume 1: The Top Ironic Deaths of All Time. You weirdo.

20 Celebrities Who Have Had Hair Transplants
We aren’t 100% certain how these balding men managed to halt the cruel hands of time… but we’re guessing that science had something to do with it, because most human beings don’t lose hair and then miraculously get it back.

by   Ranker
Staff
in interest graph, Opinion Graph, semantic search

Lists are the Best way to get Opinion Graph Data: Comparing Ranker to State & Squerb

I was recently forwarded an article about Squerb, which shares an opinion we have long agreed with.  Specifically…

““Most sites rely on simple heuristics like thumbs-up, ‘like’ or 1-5 stars,” stated Squerb founder and CEO Chris Biscoe. He added that while those tools offer a quick overview of opinion, they don’t offer much in the way of meaningful data.

It reminds me a bit of State, another company building an opinion graph that connects more specific opinions to specific objects in the world.  They too are built upon the idea that existing sources of big data opinions, e.g. mining tweets and facebook likes, have inherent limitations.  From this Wired UK article:

Doesn’t Twitter already provide a pretty good ‘opinion network’? Alex thinks not. “The opinions out there in the world today represent a very thin slice. Most people are not motivated to express their opinion and the opinions out there for the most part are very chaotic and siloed. 98 percent of people never get heard,” he told Wired.co.uk.

I think more and more people who try to parse Facebook and Twitter data for deeper Netflix AltGenre-like opinions will realize the limitations of such data, and attempt to collect better opinion data.  In the end, I think collecting better opinion data will inevitably involve the list format that Ranker specializes in.  Lists have a few important advantages over the methods that Squerb and State are using, which include slick interfaces for tagging semantic objects with adjectives.  The advantages of lists include:

  • Lists are popular and easily digestible.  There is a reason why every article on Cracked is a list.  Lists appeal to the masses, which is precisely the audience that Alex Asseily is trying to reach on State.  To collect mass opinions, one needs a site that appeals to the masses, which is why Ranker has focused on growth as a consumer destination site, that currently collects millions of opinions.
  • Lists provide the context of other items.  It’s one thing to think that Army of Darkness is a good movie.  But how does it compare to other Zombie Movies?  Without context, it’s hard to compare people’s opinions as we all have different thresholds for different adjectives.  The presence of other items lets people consider alternatives they may not have considered in a vacuum and allows better interpretation of non-response.
  • Lists provide limits to what is being considered.  For example, consider the question of whether Tom Cruise is a good actor?  Is he one of the Best Actors of All-time?  one of the Best Action Stars?  One of the Best Actors Working Today?  Ranker data shows that people’s answers usually depend on the context (e.g. Tom Cruise gets a lot of downvotes as one of the best actors of all-time, but is indeed considered one of the best action stars.)
  • Lists are useful, especially in a mobile friendly world.

In short, collecting opinions using lists produces both more data and better data.  I welcome companies that seek to collect semantic opinion data as the opportunity is large and there are network effects such that each of our datasets is more valuable when other datasets with different biases are available for mashups.  As others realize the importance of opinion graphs, we likely will see more companies in this space and my guess is that many of these companies will evolve along the path that Ranker has taken, toward the list format.

– Ravi Iyer

by   Ranker
Staff
in Popular Lists

Things You Should Never Do While Naked, 90s Slang, Embarrassing Selfies + More

January is almost over. Good riddance! Have you given up on your New Year’s Resolutions yet? Trust us, you’ll feel much better once you just let go. For your enjoyment, here are the most popular lists that people have been upvoting on Ranker this month. Enjoy!

Incredible 90s Slang That We (Almost) Forgot About. Almost.

If you grew up in the ’90s, odds are that in between playing Pogs or watching reruns of “Saved by the Bell”, you were telling your mom to “talk to the hand” and that her cooking was Da Bomb…Not! Looking back, slang from the ’90s involved giving people a lot of attitude and tricking them.

The Last Words of 15 Famous Serial Killers

“I did not get my Spaghetti O’s. I got spaghetti. I want the press to know this.”

The Most Egregious Celebrity Wardrobe Malfunctions of 2013

Nip slips, wedgies and rips, oh my! Last year was an epic one for crazy celebrity wardrobe malfunctions. For your enjoyment, here are the best (read: most embarrassing).

The Most Extreme Body Transformations Ever Done for a Movie Role

Matthew McConaughey lost 47 pounds for his role in Dallas Buyer’s Club. Jared Leto lost nearly 40. Christian Bale packed on 43 pounds and a huge beer belly for American Hustle. Those aren’t even the most extreme cases! See the shocking before and after pictures of actors who completely changed their bodies for a movie role.

15 Reasons Why You Are the Most Annoying Person on Facebook

You instantly thought of at least one person when you saw the name of this list, didn’t you? Odds are that you have at least one special person in your life that is a major Facebook offender. Take heart, it happens to the best of us.

The Most Embarrassing Celebrity Selfies

Embarrassing selfies happen when a sexy guy or gal is trying just a little too hard to look good for the camera. They are obviously sucking it in (Chris Pratt, Justin Bieber) or showing a bit too much skin that no one wants to see (Lindsay Lohan). We would feel bad…but these celebs did post these photos to their own social media.

23 Things You Should Never Do While Naked

Similar to getting drunk and singing at the top of your lungs, being naked is fun (!!) if not always appropriate. Whether an activity involves sharp, flying objects, extreme heat or compromising positions, there are some things that you should just never do naked. Ever.

That’s it! Stay in touch and we hope you’re having a great month!

by   Ranker
Staff
in Trends

Bruno Mars Is Disliked, But Mostly by Older People

Bruno Mars With His Grammy

Bruno Mars is #98 out of the 468 worst bands of all time, as voted on by over 12.5k voters. But it turns out that older people dislike him way more than younger people.

Super Bowl XLVIII is upon us! Woop, woop! Time to prepare for head-spinning sensory overload: brawny men knocking each others’ lights out, sassy cheerleaders, flashy TV ads… it’s almost too much to handle. And what about the crown jewel of the day’s entertainment: the halftime performer, Bruno Mars?

Will Peter Gene Hernandez, aka Bruno Mars, have the charisma to command a crowd of 100,000 screaming fans, not to mention the 110 million Americans who are expected to watch the game from home?

According to our data, it’s not looking good. At least not on the surface. Bruno Mars is currently ranked #98 on our list of The Worst Bands of All Time.

As of right now, 12,627 people have voted on this list, which means that there are a whole lot of haters out there. Compound that with the fact that people always complain about the Super Bowl halftime performer, and it’s looking like Bruno Mars may not get a lot of love for his performance.


Puppy Bowl FTWLove for BeyonceStanding up for Amurika

However, when we slice the data up a little, it actually looks like things may not be that bad for the incredibly short crooner.

Why’s that, you ask?

1. Not to be taken lightly: Bruno Mars Has Got Some Serious Dance Moves.

Even though he was voted as one of the worst bands of all-time, people also acknowledge that he’s got some moves, which bodes well for him as a performer—especially in a situation where he is expected to wow the crowd. He was voted #29 out of 54 on this list of the best dancing singers.

Bruno Mars Dancing GIF
Bruno’s got moves.

2. Young people like Bruno Mars way more than this list would have you believe.  

Statistically Speaking: If we isolate the votes coming from only young people—people ages 30 and under, that is, Bruno Mars would drop all the way down to #381 on the list of worst bands of all time.

Only 5 out of 12 young people who voted on this list upvoted Bruno Mars. That’s about 40% who agree that he should be considered one of the worst bands of all time.

Compare that to say, Justin Bieber who received 16 upvotes for every 19 people who voted on him, which is close to 85%.

*Or, in Plain English if You’re Starting to Get a Headache: Being #381 on a ‘worst bands’ list is way better than being #1. Young people voting him down on this list means that a lot of them do not think that he is the worst.

3. Old people hating on Bruno Mars is making him rise in the ‘worst of’ rankings. 

Bruno Mars loses the mic GIF
What do you have to say about that, Bruno?

Let’s look at how old people (over the age of 50) feel about our buddy Bruno. If we strip out all of the young and middle-aged voters, Bruno Mars would climb up to #82 on the list of worst bands of all time. Remember, getting closer to #1 on a ‘worst’ list is not a good thing.

The ratio for this demographic is much higher. 3 out of 5 old-timers who voted on this list upvoted Bruno Mars as the worst band of all time. That’s 60% for those of you who are keeping track.

While we can crunch the numbers for preferences according to age, it must be noted that we do not have a specific reason as to why people voted this way. Why do people over 50 dislike Bruno Mars? Is it his cavalier attitude, his voluminous hair, his sexy lyrics? His widely-publicized cocaine bust?

Bruno Mars' mug shot actually isn't that bad.
Bruno Mars’ mug shot actually isn’t that bad.

Either way–we’d bet that Bruno would rather be pleasing young’uns than winning the hearts of old folks. They are the ones, after all, who will be more likely to pay to see him perform live (they’ll also be way more likely to pirate his music, but that’s another conversation).

So, while you are eating your triple beany cheese nachos and downing Bud heavies this weekend and one of your friends starts to complain about how much he hates Bruno Mars…you can think to yourself (or gently point out) that maybe he’s just too old to understand him.

by   Ranker
Staff
in About Ranker, Opinion Graph, Pop Culture, Rankings

Ranker’s Rankings API Now in Beta

Increasingly, people are looking for specific answers to questions as opposed to webpages that happen to match the text they type into a search engine.  For example, if you search for the capital of France or the birthdate of Leonardo Da Vinci, you get a specific answer.  However, the questions that people ask are increasingly about opinions, not facts, as people are understandably more interested in what the best movie of 2013 was, as opposed to who the producer for Star Trek: Into Darkness was.

Enter Ranker’s Rankings API, which is currently now in beta, as we’d love the input of potential users’ of our API to help improve it.  Our API returns aggregated opinions about specific movies, people, tv shows, places, etc.  As an input, we can take a Wikipedia, Freebase, or Ranker ID.  The request needs to be made to http://api.ranker.com/rankings/ with “type” (e.g. FREEBASE, WIKIPEDIA, or RANKER, depending on the type of ID sent) and “id” (the specific wikipedia, freebase or Ranker ID) sent in the URL request, and our API returns JSON by default. For example, below are requests for information about Tom Cruise, using each of these IDs.

http://api.ranker.com/rankings/?id=/m/07r1h&type=FREEBASE
http://api.ranker.com/rankings/?id=2257588&type=RANKER
http://api.ranker.com/rankings/?id=31460&type=WIKIPEDIA (look for wgArticleId in the source of any wikipedia page to get a wikipedia id)

In the response to this request, you’ll get a set of Rankings for the requested object, including a set of list names (e.g. “listName”:”The Greatest 80s Teen Stars”), list urls (e.g. “listUrl”:”http://www.ranker.com/crowdranked-list/45-greatest-80_s-teen-stars” – note that the domain, www.ranker.com, is implied), item names (e.g. “itemName”:”Tom Cruise”) position of the item on this list (e.g. “position”:21), number of items on the list (e.g. “numItemsOnList”:70), the number of people who have voted on this list (e.g. “numVoters”:1149), the number of positive votes for this item (e.g. “numUpVotes”:245) vs. the number of negative votes (e.g. “numDownVotes”:169), and the Ranker list id (e.g. “listId”:584305).  Note that results are cached so they may not match the current page exactly.

Here is a snipped of the response for Tom Cruise.

[ { "itemName" : "Tom Cruise",
"listId" : 346881,
"listName" : "The Greatest Film Actors & Actresses of All Time",
"listUrl" : "http://www.ranker.com/crowdranked-list/the-greatest-film-actors-and-actresses-of-all-time",
"numDownVotes" : 306,
"numItemsOnList" : 524,
"numUpVotes" : 285,
"numVoters" : 5305,
"position" : 85
},
{ "itemName" : "Tom Cruise",
"listId" : 542455,
"listName" : "The Hottest Male Celebrities",
"listUrl" : "http://www.ranker.com/crowdranked-list/hottest-male-celebrities",
"numDownVotes" : 175,
"numItemsOnList" : 171,
"numUpVotes" : 86,
"numVoters" : 1937,
"position" : 63
},
{ "itemName" : "Tom Cruise",
"listId" : 679173,
"listName" : "The Best Actors in Film History",
"listUrl" : "http://www.ranker.com/crowdranked-list/best-actors",
"numDownVotes" : 151,
"numItemsOnList" : 272,
"numUpVotes" : 124,
"numVoters" : 1507,
"position" : 102
}

...CLIPPED....
]

What can you do with this API?  Consider this page about Tom Cruise from Google’s Knowledge Graph.  It tells you his children, his spouse(s), and his movies.  But our API will tell you that he is one of the hottest male celebrities, an annoying A-List actor, an action star, a short actor, and an 80s teen star.  His name comes up in discussions of great actors, but he tends to get more downvotes than upvotes on such lists, and even shows up on lists of “overrated” actors.

We can provide this information, not just about actors, but also about politicians, books, places, movies, tv shows, bands, athletes, colleges, brands, food, beer, and more.  We will tend to have more information about entertainment related categories, for now, but as the domains of our lists grow, so too will the breadth of opinion related information available from our API.

Our API is free and no registration is required, though we would request that you provide links and attributions to the Ranker lists that provide this data.  We likely will add some free registration at some point.  There are currently no formal rate limits, though there are obviously practical limits so please contact us if you plan to use the API heavily as we may need to make changes to accommodate such usage.  Please do let me know (ravi a t ranker) your experiences with our API and any suggestions for improvements as we are definitely looking to improve upon our beta offering.

– Ravi Iyer

How Netflix’s AltGenre Movie Grammar Illustrates the Future of Search Personalization

I recently got sent this Atlantic article on how Netflix reverse engineered Hollywood by a few contacts, and it happens to mirror my long term vision for how Ranker’s data fits into the future of search personalization.  Netflix’s goal, to put “the right title in front of the right person at the right time,” is very similar to what Apple, Bing, Google, and Facebook are attempting to do with regards to personalized contextual search.  Rather than you having to type in “best kitchen gadgets for mothers”, applications like Google Now and Cue (bought by Apple) hope to eventually be able to surface this information to you in real time, knowing not only when your mother’s birthday is, but also that you tend to buy kitchen gadgets for her, and knowing what the best rated kitchen gadgets that aren’t too complex and are in your price range happen to be.  If the application was good enough, a lot of us would trust it to simply charge our credit card and send the right gift.  But obviously we are a long way from that reality.

Netflix’s altgenre movie grammar (e.g. Irreverent Werewolf Movies Of The 1960s) gives us a glimpse of the level of specificity that would be required to get us there.  Consider what you need to know to buy the right gift for your mom.  You aren’t just looking for a kitchen gadget, but one with specific attributes.  In altgenre terminology, you might be looking for “best simple, beautifully designed kitchen gadgets of 2014 that cost between $25 and $100″ or “best kitchen gadgets for vegetarian technophobes”.  Google knows that simple text matching is not going to get it the level of precision necessary to provide such answers, which is why semantic search, where the precise meaning of pages is mapped, has become a strategic priority.

However, the universe of altgenre equivalents in the non-movie world is nearly endless (e.g. Netflix has thousands of ways just to classify movies), which is where Ranker comes in, as one of the world’s largest sources for collecting explicit cross-domain altgenre-like opinions.  Semantic data from sources like wikipedia, dbpedia, and freebase can help you put together factual altgenres like “of the 60s” or “that starred Brad Pitt“, but you need opinion ratings to put together subtler data like “guilty pleasures” or “toughest movie badasses“.  Netflix’s success is proof of the power of this level of specificity in personalizing movies and consider how they produced this knowledge.  Not through running machine learning algorithms on their endless stream of user behavior data, but rather by soliciting explicit ratings along these dimensions by paying “people to watch films and tag them with all kinds of metadata” using a “36-page training document that teaches them how to rate movies on their suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.”  Some people may think that with enough data, TripAdvisor should be able to tell you which cities are “cool”, but big data is not always better data.  Most data scientists will tell you the importance of defining the features in any recommendation task (see this article for technical detail on this), rather than assuming that a large amount of data will reveal all of the right dimensions.  The wrong level of abstraction can make prediction akin to trying to predict who will win the superbowl by knowing the precise position and status of every cell in every player on every NFL team.  Netflix’s system allows them to make predictions at the right level of abstraction.

The future of search needs a Netflix grammar that goes beyond movies.  It needs to able to understand not only which movies are dark versus gritty, but also which cities are better babymoon destinations versus party cities and which rock singers are great vocalists versus great frontmen.  Ranker lists actually have a similar grammar to Netflix movies, except that we apply this grammar beyond the movie domain.  In a subsequent post, I’ll go into more detail about this, but suffice it to say for now that I’m hopeful that our data will eventually play a similar role in the personalization of non-movie content that Netflix’s microtagging plays in film recommendations.

– Ravi Iyer

 

by   Ranker
Staff
in New Features

Ranker's Sexy New Look for 2014

Ranky-Up-VoteOh, hey there! You may have noticed that we got a whole new look for 2014. We’ve built some cool new features and streamlined the look of our editing interface. Making lists on Ranker has never been easier.

Here’s what’s new:

  • New YES/NO switches make it a breeze to customize your list.

YES/NO

  • Searching and finding stuff is now 10x faster.

Search

  • Reranking is easier than ever thanks to our improved list suggestion tool.

suggestions

  • Now you can create a new list or add items to other peoples’ lists on the go. Hello, mobile! Imagine creating your grocery list on your phone, allowing your roommates to vote and then buying only the most popular items for the party you’re throwing that weekend. Wow, you sure are nice… are you looking for new roommates?

Take a look around, make a new list or two, and drop us a line if you have any feedback. We’d love to hear what you think. Oh, and happy new year!

by   Ranker
Staff
in Popular Lists

Bad Santas, Awkward Christmas Photos, and Celebs Who Have Killed

Happy (non-denominational) holidays to you all! As our gift to you, we present the very best lists that Ranker users have been upvoting this season. Enjoy!

Top Crimes Committed By Guys in Santa Suits

‘Tis the season to be jolly, greedy, lecherous and absolutely, undeniably unfit to set foot in public. In honor of the holiday season, here are thirteen crimes committed by guys in Santa suits: from bank robbers to mall flashers and child molesters. Merry Christmas.

The 20 Most Awkwardly Hilarious Family Christmas Photos

The fact that an entire family can gather together to take one photo is already a miracle, but the level of awkwardness captured on film here is absolutely brilliant.

The Best Christmas Songs Written by Jewish Songwriters

Here’s proof that a truly great songwriter can write about anything…including music celebrating a completely different faith and an imaginary elderly man who brings gifts to other people’s children.

29 Celebrities Who Have Received Organ Transplants

Usually we’d agree that it’s better to give than to receive, but we’re betting that these famous organ recipients were pretty thankful for these life-saving gifts.

33 Celebrities Who Have Killed People

It turns out that some very famous people have done some very bad things. We’re just reporting the facts, so please don’t shoot the messenger. Seriously. Please?
 

The Best Cures For Hangovers

Holiday parties = hangovers. There’s really no way around it. For the most effective remedies, you’ll want to consult this comprehensive list of foods you should eat the day after.
 

The Best Cities to Party in for New Years Eve

Partaaay! Some cities are famous for their NYE celebrations, but there are epic parties going on all over the world. We suggest you go forth and explore some of these destinations.
 

Toast The New Year With The Top New Year’s Eve Movie Scenes

According to these films, the best things to do on NYE include professing your love to a long time friend, hooking up with people you shouldn’t, and grand larceny. One of these things is not like the other…

That’s it for this year! From all of us here at Ranker, we wish you lots of holiday cheer!

by   Ranker
Staff
in Data Science, Opinion Graph, prediction, semantic search

Why Topsy/Twitter Data may never predict what matters to the rest of us

Recently Apple paid a reported $200 million for Topsy and some speculate that the reason for this purchase is to improve recommendations for products consumed using Apple devices, leveraging the data that Topsy has from Twitter.  This makes perfect sense to me, but the utility of Twitter data in predicting what people want is easy to overstate, largely because people often confuse bigger data with better data.  There are at least 2 reasons why there is a fairly hard ceiling on how much Twitter data will ever allow one to predict about what regular people want.

1.  Sampling – Twitter has a ton of data, with daily usage of around 10%.  Sample size isn’t the issue here as there is plenty of data, but rather the people who use Twitter are a very specific set of people.  Even if you correct for demographics, the psychographic of people who want to share their opinion publicly and regularly (far more people have heard of Twitter than actually use it) is way too unique to generalize to the average person, in the same way that surveys of landline users cannot be used to predict what psychographically distinct cellphone users think.

2. Domain Comprehensiveness – The opinions that people share on Twitter are biased by the medium, such that they do not represent the spectrum of things many people care about.  There are tons of opinions on entertainment, pop culture, and links that people want to promote, since they are easy to share quickly, but very little information on people’s important life goals or the qualities we admire most in a person or anything where people’s opinions are likely to be more nuanced.  Even where we have opinions in those domains, they are likely to be skewed by the 140 character limit.

Twitter (and by extension, companies that use their data like Topsy and DataSift) has a treasure trove of information, but people working on next generation recommendations and semantic search should realize that it is a small part of the overall puzzle given the above limitations.  The volume of information gives you a very precise measure of a very specific group of people’s opinions about very specific things, leaving out the vast majority of people’s opinions about the vast majority of things.  When you add in the bias introduced by analyzing 140 character natural language, there is a great deal of variance in recommendations that likely will have to be provided by other sources.

At Ranker, we have similar sampling issues, in that we collect much of our data at Ranker.com, but we are actively broadening our reach through our widget program, that now collects data on thousands of partner sites.  Our ranked list methodology certainly has bias too, which we attempt to mitigate that through combining voting and ranking data.  The key is not in the volume of data, but rather in the diversity of data, which helps mitigate the bias inherent in any particular sampling/data collection method.

Similarly, people using Twitter data would do well to consider issues of data diversity and not be blinded by large numbers of users and data points.  Certainly Twitter is bound to be a part of understanding consumer opinions, but the size of the dataset alone will not guarantee that it will be a central part.  Given these issues, either Twitter will start to diversify the ways that it collects consumer sentiment data or the best semantic search algorithms will eventually use Twitter data as but one narrowly targeted input of many.

– Ravi Iyer

Page 3 of 2312345...1020...Last »