Monday, August 12, 2013

On Ennui and Verb Classification Methodologies

Linguists and NLPers alike love word classes, especially verb classes. But linguistic categories are are tricky little buggers. They drove me to a deep ennui which led me out of academia and into industry.

Nonetheless, I occasionally retrace my old steps. Recently, I stumbled across an old chapter from my failed dissertation on verb classes and wondered if this little table of mine still holds water:
Here was the motivation (this is a cut and paste job from a draft chapter, largely unedited. Anyone already familiar with standard verb classification can easily skim away): The general goal of any verb classification scheme is to group verbs into sets based on similar properties, either semantic or syntactic. For linguists, the value of these classifications comes from trying to understand how the human language system naturally categorizes verbs within the mental lexicon (the value may be quite different for NLPers). One assumes that the human language system includes some categorical association between verbs within the mental lexicon and one attempts to construct a class of verbs that is consistent with those mental lexicon associations.

Verbs can be categorized into groups based on their semantic similarity. For example, the verbs hit, punch, kick, smack, slap could all be categorized as verbs of HITTING. They could also be grouped based on constructions. For example, verbs like give and send occur in both the ditransitive and double object constructions:
Ditransitive
Chris gave the box to Willy.
Chris sent the box to Willy.
Double Object
Chris gave Willy the box.
Chris sent Willy the box.
Verb classes have long been a central part of linguistics research. However, any set of naturally occurring objects can allow different sub-groups to be created using different criteria or features. The unfortunate truth is that we don’t really know how the mental lexicon is organized (this is not to say that patterns of relations have not been found using, say, priming experiments, or language acquisition, or fMRI. They have. But the big picture of mental lexicon organization remains fuzzy, if not opaque). Therefore, all verb classifications are speculative and all verb classification methodologies are experimental. Two key challenges face the verb classification enterprise:
  1. Identify the natural characteristics of each class (e.g., defining the frame)
  2. Identify the verbs which invoke the frame (e.g., which verbs are members of the class)
But how do we overcome these two challenges? There is, as yet, no standard method for doing either. Most verb classification projects to date have employed some combination of empirical corpus data collection, automatic induction (e.g., k-means clustering), psycholinguistic judgment tasks or old fashioned intuition. Nonetheless, in recent years there have emerged certain best practices which appear to be evolving into a de facto standard.

This emerging de facto standard includes a mixture of intuitive reasoning (about verbs, their meaning, and their relationships to each other) and corpus analysis (e.g., frequencies, collocations). Below is a table detailing methods of verb classification and some of the major researchers associated with the methods:

But how do we know if our speculations about a verb class are "correct" (in the sense that a proposed class should be consistent with a class assumed to exist in the mental lexicon)? The quick answer is that we don’t. Without a better understanding of the mental lexicon, we are left to defend our classes based on our methods only: proposed verb class A is good to the extent that it was constructed using sound methods (a somewhat circular predicament). We also have cross-validation testing methods available. If my class A contains most of the same verbs that your class B contains (using different methods of constructing the classes) this suggests that we have both identified a class that is consistent with a natural grouping. Finally, via consensus, a certain classification can emerge as the most respected, quasi-gold standard classification and further attempts to create classes can be measured by their consistency with that gold standard.

The closest thing to a gold standard for English verb classes is the Berkeley FrameNet project. FrameNet is perhaps the most comprehensive attempt to hand-create a verb classification scheme that is consistent with natural, cognitively salient verb classes. It is based on painstaking annotation of naturally occurring sentences containing target words.

But even FrameNet is ripe for criticism. It's not good at distinguishing exemplar members of a verb class from coerced members, save by arbitrary designation.

For example, I was working on a class of verbs evoking barrier events like prevent, ban, protect. What was curious in my research was how some verbs had a strong statistical correlation with the semantics of the class (like prevent and protect), yet there were others that clearly appeared in the proper semantic and syntactic environments evoking barriers, but were not, by default, verbs of barring. For example, stop. The verb stop by itself does not evoke the existence of a barrier. For example, "Chris stopped singing", or "It stopped raining." Neither of those two events involve a barrier to the singing or raining. Yet in "Chris stopped Willy from opening the door" there is now a clear barrier meaning evoked (yes yes, the from is crucial. I have a a whole chapter on that. What will really blow your mind is when you realize that from CANNOT be a preposition in this case...).

The process of coercing verbs into a new verb class with new meaning was a central part of my dissertation. Damned interesting stuff. I found some really weird examples too. For example I found a sentence like "Chris joked Willie into going to the movie with us", meaning Chris used the act of joking to convince Willie to do something he otherwise would not have done.

2 comments:

Chris Brew said...

I like your comment about 'stopped'. Entirely correct to point out that judgements about the semantics of verbs are based on judgements about individual occurrences of verbs in particular contexts.

If you allow me to guess, I'd hazard the suggestion that the "mental lexicon" is far messier than we linguists often assume, with multiple, cross-cutting and partial classifications that are used for different purposes. Whether you gather evidence through priming, FMRI, corpus work or systematic appeal to intuition, you are going to get a partial view. If this is right, there is no gold standard. FrameNet is great, but I think the reason it is so widely used is simply that it is the best known piece of quality work out there.

Certainly, in order to get our work on German verbs going, Sabine Schulte im Walde had to define a verb-class structure of her own. In doing this, there is a real risk of being self-serving, by defining the categories that are biased to what one's favorite tools are likely to do well in recovering.

Two OSU students whom I advised wrote dissertations, in linguistics, on verb classifications.

Kirk Baker found that if you turn the question round, and ask "which of the several classifications of English verbs is best supported by corpus evidence?", the answer seems to be Roget's thesaurus.

http://www.ling.ohio-state.edu/~kbaker/

and Jianguo Li did a thorough exploration of different feature sets for trying to get at Levin classes.

http://www.ling.ohio-state.edu/~jianguo/papers/jianguo-diss.pdf

I fear that I put Kirk and Jianguo in a similar place to you, facing the difficult question of "what in heaven's name do these classifications amount to?", and asking them to somehow come up with interesting and testable hypotheses in an area where everyone is pursuing different goals with different methodologies. I like there work, but I don't think I understood how hard it was going to be when we started out on the enterprise.

Chris said...

Chris, excellent points. Interesting about Roget's too. At the IARPA Metaphor Program, at least one team has replaced WordNet with Roget's because they're getting better results. But I'll leave it to them to publish their results before outing them.

I'll look into Jianguo's work too.

My intuition is that there is no such thing as a verb class in a static, classic category sense of having necessary and sufficient conditions for membership. That's why I was so interested in coercion and instances like "joked X into Y". I got stuck trying to determine a coherent methodology for finding lots of coerced examples.