Friday, May 31, 2013

Blame the linguists!

Pullum has let me down. His latest NLP lament isn’t nearly as enraging or baffling as his previous posts.

I basically agree with his points about statistical machine translation. I even agree with his overall point that contemporary NLP is mostly focused on building commercial tools, not on mimicking human language processes.

But Pullum offers no way forward. Even if you agree 100% with everything he says, re-read all four of his NLP laments (one, two, three, four) and ask yourself: What’s his solution? His plan? His proposal? His suggestion? His hint? He offers none.

I suspect one reason he offers no way forward is because he mis-analyzes the cause. He blames commercial products for distracting researchers from doing *real* NLP.

His basic complaint is that engineers haven’t built real NLP tools yet because they haven’t used real linguistics. This is like complaining that drug companies haven’t cured Alzheimer’s yet because they haven’t used real neuroscience. Uh, nope. That’s not what’s holding them back. There is a deep lack of understanding about how the brain works and that’s a hill that’s yet to be climbed. Doctors are trying to understand it, but they’re just not there yet.

He never addresses the fact that linguists have failed to provide engineers with a viable blueprint for *real* machine translation, or *real* speech recognition, or *real Q&A. Sorry, Geoff. The main thing discouraging the development of *real* NLP is the failure of linguists, not engineers. Linguists are trying to understand language, but they’re just not there yet.

Pullum and Huddleston compiled a comprehensive grammar of the English language. Does Pullum believe that work is sufficient to construct a computational grammar of English? One that would allow for question answering of the sort he yearns for? The results would surely be peppered with at least as many howlers as Google translate. If his own comprehensive grammar of English is insufficient for NLP, then what does he expect engineers to use to build *real* NLP?

It’s not that I don’t like the idea of rule-based NLP. I bloody love it. But Pullum acts like it doesn’t exist, when in fact, it does. Lingo Grammar is a good example. But even that project is not commercially viable.

One annoying side point worth repeating: Pullum repeatedly leads his reader towards a false conclusion: that Google is representative of NLP. Yes, Google is heavily invested in statistical machine translation, but there exist syntax-based translation tools that use tree structures, dependencies, known constructions, and yes even semantics. Pullum fails to tell his readers about this. In fact, most contemporary MT systems tend to be hybrids, combining some rule-based approaches with statistical approaches.

In Pullum's defense (sort of), I like big re-thinks (MIT tried a big AI re-think, though it's not clear what has come of it). But Pullum hasn't engaged in big-re-thinking. He makes zero proposals. Zero.

One bit of fisking I will add:
Machine translation is the unclimbed Everest of computational linguistics. It calls for syntactic and semantic analysis of the source language, mapping source-language meanings to target-language meanings, and generating acceptable output from the latter. If computational linguists could do all those things, they could hang up the “mission accomplished” banner.
How does translation work in the brain, Geoff? It’s not so clear exactly how bilinguals perform syntactic and semantic analysis of the source language, map source-language meanings to target-language meanings, and generate acceptable output. Contemporary psycholinguistics cannot state with a high degree of certainty whether or not bilinguals store words in their two languages together or separately, let alone explicate the path Geoff sketches out. Even if it is true that bilinguals translate the way Pullum suggests, it is also true that linguists cannot currently provide a viable blueprint of this process such that engineers could use it to build a *real* NLP machine translation system.

And that's what I have to say about that.

Friday, May 24, 2013

open the pod bay doors, Geoff

There’s man all over for you, blaming on his boots the faults of his feet.
― Samuel Beckett, Waiting for Godot

Geoffery Pullum posted a third lament about the current state of NLP: Speech Recognition vs. Language Processing. Here are his first two:

One: Why Are We Still Waiting for Natural Language Processing?
Two: Keyword Search, Plus a Little Magic.

I have responded twice.

One: Pullum thinks there are no NLP products???
Two: Pullum’s NLP Lament: More Sleight of Hand Than Fact.

My Third Response
The more I read Pullum’s three laments, the more I keep asking myself, “exactly what is Pullum complaining about and who is he aiming his complaints at?”

As far as I can tell, Pullum is complaining that commercial forces have lured researchers away from creating his dream of a human-mimicking android like 2001’s Hal 9000 or Star Trek’s Data.

This is like saying we’re still “waiting for NASA” because they failed to give us moon houses and jet packs! Is Pullum similarly unmoved by the Mars Curiosity rover? C’mon Geoff, it’s got a frikkin laser on its head!

He’s aiming his complaints at people who know nothing about linguistics or NLP (an easy audience to convince with straw men and misrepresentation).
...we are still waiting for natural language processing (NLP).
Who? Who is still waiting? I’m not waiting. I’m jumping head first into the ocean of NLP tools available right now. Who’s waiting?
...some companies run systems that enable you to hold a conversation with a machine. But that doesn’t involve NLP, i.e. syntactic and semantic analysis of sentences.
This is a rhetorical slight of hand because he is about to stack the deck and compare one petite tool to his grand Platonic ideal. Pullum continues to utilize a straw man definition of NLP that 99% of people who use the term do NOT agree with. He is wrong to insinuate that contemporary NLP cannot perform “syntactic and semantic analysis of sentences.” Of course it can. In the very least it exists in the form of POS taggers, chunkers, semantic role labeling, dependency parsers, etc. The fact that most VUI tools do not employ these extra processing components is mostly a function of optimization, not ontological failure. He dismisses this as merely "dialog design", but it's what gets products working for real consumers in the here and now. Pullum also unreasonably demotes phonetics as if it is not part of linguistics. There are many NLP tools related to speech recognition, which is where his third post goes. His punching bag for this argument is Automatic Speech Recognition.

By doing this, he creates a new straw man. What he actually describes is closer to what industry calls Voice User Interface (VUI). The distinction is non-trivial because VUI is a limited special case of ASR, not the whole kit and kaboodle. Yes, there are VUI systems which are designed to nudge users to provide responses within a limited predictable range, but there are also far more sophisticated ASR systems (like Nuance’s Dragon). These systems can produce text transcripts of voice that can then easily be ingested into any number of syntactic and semantic NLP processing tools. Ignoring them is journalistic malfeasance. Pretending they don’t exist is bonkers.
Labeling noise bursts is the goal [of VUIs], not linguistically based understanding.
This is true, but it’s not the whole picture. It’s true that VUIs are primarily trying to categorize noise bursts, but that’s the first step in the human language comprehension system too. It’s true that humans use some top-down context for predicting the likelihood of words in a continuous speech stream, but there’s plenty of bottom-up processing that is little more than “labeling noise bursts” (one of my favorite examples of this is Voice Onset Time for classifying speech segments). In focusing on this, VUIs are simply choosing one small part of the great human language puzzle to address.
Current ASR systems cannot reliably identify arbitrary sentences from continuous speech input. This is partly because such subtle contrasts are involved. The ear of an expert native speaker can detect subtle differences between detect wholly nitrate or holy night rate, but ASR systems will have trouble.
Pullum plays a little slight-of-hand trick here as he switches from talking about word segmentation to sentence breaking. These are two different tasks. Yes, human beings are very good at word segmentation and yes, ASR is mediocre, but ASR is better than he suggests and humans are not infallible word segmenters. He overstates his premise (as pointed out by his very first commenter). So, when he says that “The ear of an expert native speaker can detect subtle differences between detect wholly nitrate or holy night rate, but ASR systems will have trouble” he’s only kinda right. In fact, plenty of “expert native speakers” would have trouble segmenting those two phrases if spoken in isolation and a well trained ASR system could very well segment those phrases successfully.

Having said all that, I agree with Pullum’s underlying point that human language comprehension is mysteriously complex and intertwined with a host of non-linguistic processes like logic and memory (making "linguistic based understanding" very challenging indeed). But this is not really a fair indictment of contemporary NLP. Yes, NLPers typically narrow their focus in order to build working tools that solve one small part of the great language processing puzzle, but put those tools together in a pipeline and you can create some pretty impressive functionality.

As Pullum knows, human speech comprehension involves a complex mixture of processes and is not entirely understood by linguists even today. Understanding speech comprehension is an ongoing project in linguistics, not a finished one. Once linguists have a fully specified model of speech comprehension, I’m sure the engineers at Nuance would be happy to model it computationally. But until we linguists provide that, they’re stuck kludging a solution. If linguists are going to complain about NLP’s failures, it’s *us* we shall complain about.
...the extent to which speech as such is being processed and understood (i.e., grammatically recognized and analyzed into its meaningful parts) is essentially zero.
Zero? Really? ZERO!? Pullum was being disingenuous at best, obtuse at worst, when he wrote that. Again, his conclusion rests crucially on the straw-man comparison of one kind of limited ASR with his pie-in-the-sky fantasy of what NLP should be. This is unfair.

What Pullum refuses to tell his audience is that it is within the bounds of contemporary NLP to automatically segment a continuous human speech stream into words and then parse those words into many different grammatical and semantic categories like Parts of Speech, Subject-Verb relations, coreference, concrete nouns, verbs of motion, named entities, etc. All of this can be done by NLP tools today, right now, not in the future, by you if you have a few hours to download and learn the tools. For example, CMUSphinx Open Source Toolkit For Speech Recognition, Stanford NLP, and OpenNLP.

Pullum might complain that this NLP pipeline wouldn’t count because it wouldn’t accomplish its tasks the same way the human mind accomplishes those language tasks (and does them slower). But I repeat that it is linguists who have failed to specify exactly how the human brain accomplishes those tasks.

If Pullum is still waiting for NLP, it's because he's blaming his boots for the faults of his feet.

ADDENDUM: To be clear, I respect Geoffery Pullum quite a lot as a great linguist who has contributed (and continues to contribute) tremendous value to the field of linguistics, and language research in general. Anything I've written in my three responses to his NLP posts which might suggest otherwise is most likely the product of my uncertainty about his goals in writing these posts. I admit to feeling a bit free to employ some rhetorical flourish here and there partly because Pullum himself is quick with the lexical blade. It's fun to poke back.

Wednesday, May 22, 2013

VerbCorner - crowd sourcing and verb meaning

Josh, a postdoc at Harvard, has initiated an online game called VerbCorner in order to crowd source the study of the meaning of verbs. How often do you and I, the little people, get a chance to contribute to Harvard quality linguistic research? Well, apparently quite a lot these days. Research is for the masses!

Here's Josh's explanation
Dictionaries have existed for centuries, but scientists still haven't worked out the exact meanings for most words. At VerbCorner, we are trying to work out what verbs mean. Rather than try to work out the definition of a word all at once, we have broken the problem into a series of tasks. Each task has a fanciful backstory -- which we hope you enjoy! -- but at its heart, each task is asking about a specific component of meaning that scientists suspect makes up one of the building blocks of verb meaning.

Ultimately, we hope to probe dozens of aspects of the meaning of thousands of verbs. This is a massive project, which is why we need your help! We will be sharing the results of this project freely with scientists and the public alike, and we expect it to make a valuable contribution to linguistics, psychology, and computer science.
Being a verb meaning kinda guy myself, I'm very interested to see how this all plays out (literally and figuratively). My [defunct] dissertation was on verb semantics and Talmy's force dynamics. I'm really curious to see if Josh has included any Force Dynamics into this game.

Now, go play!

Tuesday, May 21, 2013

David Books, Word Classes, and Google Ngrams

David Brooks waxes poetic about word frequencies and the good old days in today's NYT: What Our Words Tell Us.

Update: Before reading my own most excellent original post below, here are two three well respected linguists who fisk Brooks' article as well:

John McWhorter: David Brooks' Favorite New Theory of Language Is Wrong. Money Quote:
...the faddish attempt to apply the Big Data approach to social psychology via Google’s Ngram viewer tool will shed much less light on these matters than many expect. In any language, concepts are expressed by several words and phrases at any given time, all of which morph eternally with the passage of time.

Robin Lakoff: What Our Words Don’t Tell Us. Money Quote:
It is hardly respectable scholarship to jump to the conclusion that changes in word frequency necessarily indicate changes in topics under discussion (new words may replace familiar ones but have similar meanings), and even if they do, it is very dubious – ethically questionable, you might say – to jump from there to the conclusion that these changes signify deep societal changes in the direction of moral decline, unless writers are prepared to make explicit and be prepared to defend their understanding of “morality” and “decline.” Social science is still, happily, distinguishable from theology.

Mark Liberman: Ngram morality. Money quote
David Brooks doesn't mention this ideological and temporal inconsistency in his sources. In general, as I've noted in discussions of his earlier columns, his "unparalleled ability to shape an intellectually interesting idea into the rhetorical arc of an 800-word op-ed piece" crucially depends on skillful editing — or revision — of his raw materials into a form that fits his theme.
My Original Post
Brooks cherry picks three recent Google Ngram analyses (by non linguists) and provides paper thin summaries of their findings, then concludes that America has lost is moral core. These analyses all depend crucially on the creation of word categories like “individualistic words” and “moral terms”. These are not quite synonyms*, but they require that the words in each class bear some semantic link between them. This begs the question: Are these groupings natural? Is there something psychologically real about them?

Linguists care about word classes quite a bit (computational linguists even more so). There are ways of constructing naturalistic sets of words. However, Brooks says nothing about how these studies performed their categorizations, so I thought I would post a quick review as it's important in judging the validity of the results.

Twenge et al
The first study by Twenge et al (which he doesn’t link to, but I do below) followed a scientifically reasonable path to create their word sets. They asked 53 Mechanical Turk participants to “generated words characteristic of individualism and communalism.” Then, they had a different set of 55 Mechanical Turk participants rate those words on a 7-point Likert scale. The top 20 words were then used as their search set. FYI, here are their two sets:

independent, individual, individually, unique, uniqueness, self, independence, oneself, soloist, identity, personalized, solo, solitary, personalize, loner, standout, single, personal, sole, and singularity
communal, community, commune, unity, communitarian, united, teamwork, team, collective, village, tribe, collectivization, group, collectivism, everyone, family, share, socialism, tribal, and union

UPDATE: For more on Twenge, commenter "unknown" helpfully suggests these Language Log posts:
Textual narcissism by Liberman
Textual narcissism, replication 2 by Liberman
It's all about who? by Liberman

Kesebir and Kesebir
Kesebir and Kesebir did 2 studies. In study one, they took ten words they found as synonyms of “virtue” in an unnamed thesaurus and searched Google’s Ngram for those words. Here are the ten: character, conscience, decency, dignity, ethics, morality, rectitude, righteousness, uprightness, and virtue.

In their second study, they constructed a set of 80 virtue words taken from websites about virtue in literature (e.g., honesty, patience, honor) then asked participants to rate each one as No = -1, Perhaps = 0, and Yes = 1. Then they took the 50 words with the highest averaged rating and search Ngrams for frequency.

Klein unapologetically gives no motivation for his word sets whatsoever. A “very casual paper” indeed.

The Problem
While I respect the attempt of the first two sets of authors to add some psychological reality to their linguistic categories, they fall for the same naïve assumption that plagued linguistics for hundreds of years: that people's conscious judgement of meta-linguistics is valid. For example, syntacticians discovered the folly of grammaticality judgments. I have been involved recently in a number of Mechanical Turk ratings tasks and we're finding that it is very difficult to get consistent ratings. I believe the same issue is at play here. Plus, ratings can easily be affected by context like surrounding text, yet none is given in these tasks. It's not clear what it means to rate isolated words. Word semantics by their very nature are contextual.

UPDATE: Commenter Arjan rightly brought up the great acceptability debates. One could claim that I am unfairly dismissing grammaticality judgments. And one could claim that I am not. The good folks at MIT's Tedlab have posted a few excellent resources on multiple sides of the controversy. Look under the 2010 heading on this page.

Words are not thought. These studies seem to be a variation on the “No word for X” syndrome (see here for a recent rant). Certain types of words may be used more or less frequently over some time-scale (like one century), but that doesn’t necessarily mean that we are thinking differently over that time-scale.

Unlike Brooks, I’ll link to the actual papers (all free, but the second two require email registration):

Increases in Individualistic Words and Phrases in American Books, 1960–2008. Jean M. Twenge, W. Keith Campbell and Brittany Gentile

The Cultural Salience of Moral Character and Virtue Declined in Twentieth Century America. Kesebir and Kesebir

Ngrams of the Great Transformations. Daniel B. Klein

UPDATE: *Rumor has it that WordNet has copyrighted the term "synset", so I'm being careful to avoid their cease and desist letter. Anyone know if there's truth to this rumor?

Saturday, May 18, 2013

Book Reviews

A quick (very self-serving) link fest. Here are the cognitive linguistics related book reviews I've written:

1. Adam's Tongue: How Humans Made Language, How Language Made Humans. By Derek Bickerton.

2. Louder Than Words: The New Science of How the Mind Makes Meaning. By Benjamin K. Bergen.

3. Through the Language Glass: Why the World Looks Different in Other Languages. By Guy Deutscher.

Thursday, May 16, 2013

heard tell 'bout them linguistic constructions yonder

Randomly, my mind wondered onto an older English slang phrase, "heard tell" which is culturally associated with rural and working class. It means something like "I heard about X." Before I get to the interesting role of prepositions, here are some examples:

a) When you say that you knew about, do you mean that you have heard tell about other things?
a) Maria and me concluded that we had struck one o' them gamblin' places we'd heard tell about, and I tell ye we got out in a hurry!
b) I never heard tell of it until I was told by Justice Bolte about it.
a) "I niver heard tell that you had an owl in your parlor chimney," said he, sort o' suspicious-like.
b) And we had a gentleman in our county that perhaps most of you have heard tell of,
a) I asked him if he had ever heard tell of a bouse they called the House of Shaws.
b) "Never heard tell of him," said John William, making spectacles of his burnished bores, and looking through them into the sunlight.

Originally, I selfishly assumed "heard tell" was an American English slang construction, but upon a little Google Ngram sleuthing, I discovered it is a common English construction.

American English

British English


Though details differ, the general pattern is clear: A general rise in frequency throughout the late 1800s, peaking at the turn of the century, a general decline throughout the 20th century, then a leveling off around the mid 1970s. The American English usage was a lot more unpredictable, a bit choppy, but generally follows the same pattern.

But what interests me most is the red line in all of the above graphs. That's the one that plots the frequency of the tri-gram "heard tell of". This stung me a bit because my brilliant native speaker intuition strongly preferred "heard tell about", but in this I am in the minority.

For the graph impaired: What the red line in the above graph tell us is that when English speakers have used "heard tell", they most frequently follow it with "of" even though they have a semantically similar choice available, namely "about" (and even "that").

I don't have a semantic analysis of the difference between prepositions (though I don't doubt an interesting one could be engineered), but after I consoled my wounded linguistic pride, I then realized that the construction with the preposition "of" strongly tracks the overall pattern. The bigram search "heard tell" is a more general and inclusive one, hence its results include all of "heard tell of" results as well. If you imagine taking away everything underneath the red line, there would not be much left, less than half to be sure. This means that "heard tell of" accounts for more than half of all instances of "heard tell".

I don't know why "of" became so tightly associated with the "heard tell" construction, but it struck me as a nice example of how construction semantics are not necessarily compositional, intuitive, or even necessarily coherent. I wonder if these three choices "heard tell of/about/that" have regional variances? We will need a more nuanced corpous to tease that out.

Monday, May 13, 2013

Pullum’s NLP Lament: More Sleight of Hand Than Fact

My first reading of both of Pullum’s recent NLP posts (one and two) interpreted them to be hostile, an attack on a whole field (see my first response here). Upon closer reading, I see Pullum chooses his words carefully and it is less of an attack and more of a lament. He laments that the high-minded goals of early NLP (to create machines that process language like humans do) has not been reached, and more to the point, that commercial pressures have distracted the field from pursuing those original goals, hence they are now neglected. And he’s right about this to some extent.

But, he’s also taking the commonly used term "natural language processing" and insisting that it NOT refer to what 99% of people who use the term use it for, but rather only a very narrow interpretation consisting of something like "computer systems that mimic human language processing." This is fundamentally unfair.
In the 1980s I was convinced that computers would soon be able to simulate the basics of what (I hope) you are doing right now: processing sentences and determining their meanings.
I feel Pullum is moving the goal posts on us when he says “there is, to my knowledge, no available system for unaided machine answering of free-form questions via general syntactic and semantic analysis” [my emphasis]. Pullum’s agenda appears to be to create a straw-man NLP world where NLP techniques are only admirable if they mimic human processing. And this is unfair for two reasons.

One: Getting a machine to process language like humans is an interesting goal, but it is not necessarily a useful goal. Getting a machine to provide human-like output (regardless of how it gets there) is a more valuable enterprise.

Two: A general syntactic and semantic analysis of human language DOES. NOT. EXIST. To draw back the curtain hiding Pullum’s unfair illusion, I ask Pullum to explain exactly how HUMANS process his first example sentence:
Which UK papers are not part of the Murdoch empire?
Perhaps the most frustrating part of Pullum’s analysis so far is that he fails to point the blame where it more deservedly belongs: at linguist themselves. How dare Pullum complain that engineers at Google don’t create algorithms that follow "general syntactic and semantic analysis" when you could make the claim against linguists that they have failed to provide the world with a unified "general syntactic and semantic analysis" to begin with!

Ask Noam Chomsky, Ivan Sag, Robert van Valin, and Adele Goldberg to provide a general syntactic and semantic analysis of Pullum’s sentence and you will get four vastly different responses. Don’t blame Google for THAT! While commercial vendors may be overly-focused on practical solutions, it is at least as true that academic linguists are overly-focused on theory. Academic linguists rarely produce the sort of syntactic and semantic analyses that are useful (or even comprehensible … let alone UNIFIED!) to anyone outside of a small group of devotees of their pet theory. Pullum is well known to be a fierce critic of such linguistic theory solipsism, but that view is wholly unrepresented in this series of posts.

In his more recent post, Pullum insists again that commercial NLP is tied to keyword searching, but this remains naïve. Pullum does his readers a disservice by glossing over the now almost 70 years of research on information theory underpinning much of contemporary NLP.

Also, Pullum unfairly puts Google search at the center of the NLP world as if that alone represents the wide array of tools and techniques that exist right now. This is more propaganda than fact. He does a disservice by not reviewing the immense value of ngram techniques, dependency parsers, Wordnet, topic models, etc.

When he laments that Google search doesn’t "rely on artificial intelligence, it relies on your intelligence", Pullum also fails to relate the lessons of Cyc Corp and the Semantic Web community which have spent hundreds of millions of dollars and decades trying to develop smart artificial intelligence approaches with comparatively little success (compared to the epic scale success of Google et al). In this, Pullum stacks the deck. He laments the failure of NLP to include AI without reviewing the failure of AI to enhance NLP.

I actually agree that business goals (like those of Google) have steered NLP in certain directions away from the goal of mimicking human language, but to dismiss this enterprise as a failure is unfair. It may be that NLP does not mimic humans, but until [we] linguists provide engineers with a unified account of human language, we can hardly complain that they go looking elsewhere for inspiration.

And for the record, there does exist exactly the kind of NLP work that attempts to incorporate more human-style understanding (for example, this). But boy, it ain’t easy, so don’t hold your breath Geoff.

If Geoff has some free time in June, I recommend he attend The 1st Workshop on Metaphor in NLP 2013.

Saturday, May 11, 2013

Pullum thinks there are no NLP products???

Famed linguist Geoffrey Pullum has a recent Chronicle of Higher Education post about NLP: Why Are We Still Waiting for Natural Language Processing? As a linguist, I deeply respect Geoff Pullum's reputation for fierce skepticism, but this recent post borders on the ornery old man syndrome.

First of all, Powerset didn't die when Microsoft bought them. Their technology is part of Bing search*. That's not death. Powerset technology is used by millions of people today, whereas before it was used by 3 guys in a SoMA cubicle. And to call Bing "a plain old keyword-based search engine" is a bit naïve.

Also, Pullum's claim that there are "absolutely no commercial NLP products" is flat bonkers. There are thousands of commercially viable and profitable NLP products. Just ask Clarabridge, Nuance, or BBN.

I'll grant that Pullum is somewhat correct that question answering hasn't matched the expectations it raised in the 1990s, but it's much more sophisticated than he lets on. How does Pullum not even mention Siri or the host of Android competitors? Yes the results are hit-or-miss, but they exist.

As a [somewhat former] linguist, the fact that NLP hasn't yet managed to mirror natural language isn't a reason to lament. Rather, I celebrate that it exposes just how complex natural language is and the fact that sheer computing power that the likes of Google, Apple, and Microsoft can throw at it still ain't enough.

What I would like to see is tech companies hiring more *real* linguists. During the first NLP boom of the 90s, companies hired many linguists (my first NLP job was at an early Q and A start-up). Then, after the bust and with the rise of statistical machine learning, tech companies now hire engineers almost exclusively (except for contract jobs annotating data). I'm seeing more and more engineers learning some linguistics and getting jobs, whereas I suspect we'd be better off the other way around.

Anyhoo, NLP is alive and well Geoff. Geesh...

PS - I know Pullum is well aware of everything thing I've pointed out. He's ginning up the crowd for his series of posts about where NLP went wrong (which I'm looking forward to). But, he runs the risk of leading naïve readers down a false path. There ARE people who have no clue about all the great stuff NLP has done in the last 30 years and after reading Pullum's article, they'll think that's a fair assessment of the state-of-the-art, when it is not.

*UPDATE (5/12/13): I may have overstated this. A little birdie tells me that "not much Powerset technology" was actually incorporated into Bing. Disappointing, but I don't think this undermines my main point that Pullum mis-represents the state of commercial Q and A tech.

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...