Tuesday, July 5, 2011

the big picture: automatic metaphor identification

The recently popularized IARPA Metaphor Program piqued my curiosity, so I've been reviewing a variety of articles on contemporary approaches to automatic metaphor identification. I've read three articles so far and one thing is somewhat dissapointing: they all severely restrict the notion of metaphor to mean local metaphors within single sentences.

They all pay considerable lip service to Lakoff & Johnson's seminal 1980 work Metaphors We Live By, taking as gospel the notion that metaphor is defined as a mapping from one conceptual domain to another. But their examples are all of a limited type. Here are three representative examples from the papers I've been reading:
  • Achilles was a lion. (Babarczy et al.)
  • The sky is sad. (Tang et al.)
  • I attacked his arguments (Baumer)
What struck me is the methods used to identify metaphor are remarkably lexalist. The dominant strategy is Selectional Preferences whereby a list of source and target conceptual domains is created. Then from each, a list of words typically associated with that domain is culled from corpora or intuition or dictionaries. Then, each word is given a set of selectional preferences which constrain what kinds of subjects or predicates it typically occurs with.

Here is my Ling 101 version of this methodology: If I understand correctly (and I may not), for Tang et al.'s example "The sky is sad", we would have a concept like THE ENVIRONMENT IS HUMAN. We would have a list of words typically associated with the environment (e.g., "sky") and a list of words typically associated with being human (for example "sad"). A computer could then recognize the following:
  1. The subject (the sky) is associated with the environment.
  2. The predicate (sad) is associated with humans.
  3. This subject (the sky) is not typical for this predicate (sad).
  4. This sentence is incoherent on first analysis.
  5. The concept THE ENVIRONMENT IS HUMAN links these non-typical phrases coherently.
  6. This sentence is only coherent using conceptual mapping, therefore it is probably metaphorical.
This is a gross oversimplification, but I think it gets the big picture about right.

At first blush, I'm impressed with the simplicity and elegance of this solution. However, it seems to me that much metaphorical language is not local like this (local here = within a single sentence). For example, imagine a situation in a biology class where two students, Alger and Miriam, were originally going to be partners for a lab assignment. Then they got into an argument. A third student, Annette, asks Miriam:
  • Annette: Are you still going to be lab partners with Alger?
  • Miriam: No. That ship has sailed.
In this scenario, the sentence "That ship has sailed" is entirely coherent from a selectional preferences perspective (i.e., ships really do sail). Yet it is clearly being used metaphorically (there is literally no ship). Here, the metaphor is only detectable if we link two sentences together via co-reference. The phrase "the ship" does not co-refer to a real ship in the discourse. Rather, it refers to the possible event of be-lab-partners-with-Alger. Unless we can link phrases between sentences and between types (i.e., allowing an NP to co-refer to an event), then we are not going to get a computer to recognize these types of metaphors (which I suspect are quite common).

Xuri Tang, Weiguang Qu, Xiaohe Chen, & Shiwen Yu (2010). Automatic Metaphor Recognition Based on Semantic Relation Patterns International Conference on Asian Language Processing

Other citations:
The Automatic Identification of Conceptual Metaphors in Hungarian Texts: A
Corpus-Based Analysis. Anna Babarczy, Ildikó Bencze M.1, István Fekete1, Eszter Simon1

Computational Metaphor Identification to Foster Critical Thinking and Creativity. ERIC BAUMER (dissertation). 2009.


Brian said...

Yes! I've had the same problem with Lakoff's language math equations. Communication is more elaborate than single sentences, and we should appreciate the same of the metaphors therein.

indolering said...

You forgot to mention the lack of standards for metaphorical identification. Some quote their ID rate compared to other automated analysis, others use corpus with a limited set of metaphors, etc. There is no way to compare results between studies. If there was a standardized test, it would include the more complex metaphors you identify.

Chris said...

indolering, excellent point. The classic Trec conferences of the 90s did a lot to create Named Entity Extraction & Co-reference standards. Automatic metaphor recognition is going to require that.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...