How we will beat Netflix
The Netflix Challenge is a $1 million prize for an improved movie recommendation algorithm that does better than the Netflix algorithm at predicting how a user will rate a movie.
Different teams, including some at Stanford, have adopted various approaches to the problem including very, very sophisticated algorithms. But a few teams have taken a different approach and instead of limiting themselves to the Netflix data set they have instead included additional useful data sets. And no surprise - when you think outside the box the results are often much better (but unfortunately you won’t win the prize by bending the rules).
The bigger point is that adding more independent data usually beats out designing increasingly better, but only incrementally better, algorithms.
Another example of this comes from Google. Many people mistakenly believe the success of Google is predicated on their brilliant algorithms, namely PageRank. The truth is the innovation lies in the dual recognition that hyperlinks were an important measure of popularity and that anchortext in the web index should weight the page title. Previous search engines had only used the text of the web pages themselves, so the addition of these two data sets rocketed Google’s search up the leaderboard.
Another Google example is from Adwords - the keyword auction model. Overture popularized advertisers bidding on keywords but Google significantly improved the results by adding additional data: the click-through rate (CTR) on each ad. This change made Google’s ad marketplace much more efficient than Overture’s and again the point is the algorithm itself isn’t the key component but rather the addition of new data.
So we aren’t working on incrementally improving the algorithms around personalized search and recommendations for movies - we are working on adding more data, the right data, to fundamentally change the quality of those recommendations.
The obvious question remains: can’t Netflix do that also? they could, but they would have to radically change the information they collect from each user, on each users social circle, on each movie, and to top it all off develop the right algorithms. Still sound easy?
November 12th, 2008 at 7:18 pm
nqd6a9u93a2pgs1v