Search engines
Search engines are difficult creatures. (In case you didn’t know, Steven tells me I’m “working” on the search engine for the site.) With minimal work, he’d like the algorithmic engine and database to intuit what we meant — even if it isn’t what we asked. The engine, however, lacks a critical feature that disambiguates statements. Consider the queries “batman” and “val kilmer.” What should a search engine for movies return? Should “batman” return the Val Kilmer actor entry? Should “val kilmer” return the Batman movie entry? Maybe it should just return the phrase “worst batman ever.”
What do you expect? I would expect a search for “batman” to return a list of the Batman movies. Likewise, I would expect a search for “val kilmer” to return his actor page as one of a very small set of results. These results are comparable to going to your movie friend and asking about Batman and Val Kilmer. I like using the phrase Batman to test our search engine. One day, it surprised me and pulled up an actor. No, it wasn’t Val Kilmer, or even Michael Keaton. It was an actor named Batman. I was ready to scream! It was probably around 1am when this was happening, and it most likely meant that my initial database dump had some sort of irregularity that I was picking up in the parsing. So I looked through the raw files. The files showed that Batman really was there. So who is this guy?
IMDB comes to my rescue — http://www.imdb.com/name/nm2533507/
Batman indeed appears in a movie. I have never heard of this movie, but Batman stars in it. What happened here? The engine lacks cultural knowledge! We know of Batman because of the movie (or more likely, the comic) not because of the actor. So should the engine bias the results towards the movie? Such a bias raises troubling questions. As someone who works with data, I think the actor batman is the more interesting result, but it might not agree with what people expect. Therefore, the engine will end up biased towards what most people expect. It’d be better if the search engine really was a baby that we could train just like a person; that way, it’d have the cultural knowledge to answer queries appropriately. (Of course, treating it like a baby raises ethical issues with telling a baby about certain unmentionable movies .)
October 16th, 2007 at 10:29 am
Oh come on, search can’t be that hard…