Saturday, May 11, 2013

Pullum thinks there are no NLP products???

Famed linguist Geoffrey Pullum has a recent Chronicle of Higher Education post about NLP: Why Are We Still Waiting for Natural Language Processing? As a linguist, I deeply respect Geoff Pullum's reputation for fierce skepticism, but this recent post borders on the ornery old man syndrome.

First of all, Powerset didn't die when Microsoft bought them. Their technology is part of Bing search*. That's not death. Powerset technology is used by millions of people today, whereas before it was used by 3 guys in a SoMA cubicle. And to call Bing "a plain old keyword-based search engine" is a bit naïve.

Also, Pullum's claim that there are "absolutely no commercial NLP products" is flat bonkers. There are thousands of commercially viable and profitable NLP products. Just ask Clarabridge, Nuance, or BBN.

I'll grant that Pullum is somewhat correct that question answering hasn't matched the expectations it raised in the 1990s, but it's much more sophisticated than he lets on. How does Pullum not even mention Siri or the host of Android competitors? Yes the results are hit-or-miss, but they exist.

As a [somewhat former] linguist, the fact that NLP hasn't yet managed to mirror natural language isn't a reason to lament. Rather, I celebrate that it exposes just how complex natural language is and the fact that sheer computing power that the likes of Google, Apple, and Microsoft can throw at it still ain't enough.

What I would like to see is tech companies hiring more *real* linguists. During the first NLP boom of the 90s, companies hired many linguists (my first NLP job was at an early Q and A start-up). Then, after the bust and with the rise of statistical machine learning, tech companies now hire engineers almost exclusively (except for contract jobs annotating data). I'm seeing more and more engineers learning some linguistics and getting jobs, whereas I suspect we'd be better off the other way around.

Anyhoo, NLP is alive and well Geoff. Geesh...

PS - I know Pullum is well aware of everything thing I've pointed out. He's ginning up the crowd for his series of posts about where NLP went wrong (which I'm looking forward to). But, he runs the risk of leading naïve readers down a false path. There ARE people who have no clue about all the great stuff NLP has done in the last 30 years and after reading Pullum's article, they'll think that's a fair assessment of the state-of-the-art, when it is not.

*UPDATE (5/12/13): I may have overstated this. A little birdie tells me that "not much Powerset technology" was actually incorporated into Bing. Disappointing, but I don't think this undermines my main point that Pullum mis-represents the state of commercial Q and A tech.

2 comments:

Anonymous said...

I agree; NLP projects would do right by training real linguists to be programmers.

My first NLP job was as a plain old linguist right after finishing my Masters in linguistics. I tried to get into computational linguistics PhD programs but nobody wanted to have to guide a *regular* linguist through computational science (though a good grasp of the lambda calculus is huge advantage for linguists wanting to learn programming languages.)

So I got a job as developer to get the programming experience I needed and now I do what I want.

Most engineers I've run into severely underestimate the effort required to build NLP tools; only an understanding of linguistics can get at the heart of the problem... as well as an appreciation of what both statistical (quantity) and "rule/logic-based" (quality) solutions offer.

Unknown said...

Some people just underestimates the use of NLP. What they missed about it is the fact that NLP helps build great teams, or it can even build the dream of a person. Let us not forget that NLP programs are fun too.








By: Hitting the Wall

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

 [reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...