Tuesday, February 23, 2010

The Audrey Fino Failure

Steven Levy has a new article out on Google's search algorithm (HT Boing Boing). It has a brief discussion of the problem of parsing n-grams (e.g., how do you know what Times goes with in "New York Times" vs "New York Times Square"). I spent a brief time working with a person name parsing group and they were just branching out into the business name parsing field while I was there, so I know how challenging this is (you noted how I just helped you with italics, right, hehe). Unfortunately levy's article is actually quite a light weight puff piece of the "gee wiz, ain't Google swell" variety. Anyone who has spent some time in a morphology class or computational linguistics 101 course will likely find it simplistic at best. 

