One of the most challenging tasks a linguist can engage in is that of annotating natural language text for semantics. It is simultaneously interesting, tedious and tricky, which makes it altogether maddening. We perform this task for a variety of reasons. Sometimes to create training data for learning algorithms (which was a big topic of discussion at last year's NAACL HLT) or to explicate the semantics of events like the FrameNet project. Part of my dissertation is very FrameNet-like, so I do a lot of annotating (I will save my bile-filled hateful remarks about the general crappiness of annotator apps for another post).
Generally speaking, the annotator's task is to read naturally occurring sentences, then identify and tag the semantic roles of the participants involved in the particular event represented by the sentence. It would be easy if all of English was composed of sentences like "Bobby kicked the ball"; that would be sweet. "Bobby" is an AGENT, "the ball" is a PATIENT. Done. Let's move on. But that's not how real language works, is it?In any case, I have been annotating sentences involving the verb "exclude" recently and I find it's a particularly challenging set. The BNC “exclude” sentence below was difficult to annotate because the exclude event is not clear about its participants:
The new Minister for Health, Dr Noel Browne, a dedicated reformer of the health services and much concerned in-particular with the eradication of tuberculosis in
Ugh!
No comments:
Post a Comment