Friday, December 17, 2010

ngram or n-gram?

The hottest story of the day is clearly Google's Ngram Viewer. It's all over blogs, twitter and even the MSM. But why did Google call it the Ngram Viewer and not the N-gram Viewer?

The hyphenated form is more common in the NLP industry and in general search results (by a 10-1 margin at that). Nunberg's LL post and Languagehat's post both prefer n-gram when speaking about the tokens themselves and only use Ngram when referencing Google's named product. Even Google's own people used n-gram in a blog post here.

You gotta wonder what kind of branding process Google went through to decide on ngram (they are notoriously conscious about that kind of thing). The popularity of this story also demonstrates how much more media savvy Google is because Microsoft has almost exactly the same tool, but no one knows about it. See here. The difference is that Microsoft didn't link its use to studying culture and history and give us a nifty online tool to play with, making it more dull sounding than perhaps it otherwise would.

Also, note Microsoft uses N-gram ... frikkin Microsoft.

No comments:

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...