Wednesday, November 18, 2009

Crowdsourcing Annotation

(image from Phrase Detectives)

Thanks to the LingPipe blog here, I discovered an online annotation game called Phrase Detectives designed to encourage people to contribute to the creation of hand annotated corpora by making a game of it. It was created by the University of Essex, School of Computer Science and Electronic Engineering. Of course, they have a wiki, Anawiki. I'm not crazy about the cutesy cartoon mascot (they given it a name: Sherlink Holmes. Ugh. I guess Annie would be a bit too obvious?) . I've wondered aloud about this kind of thing before, so I'm glad to see it coming to fruition.

I haven't started playing the game yet, but I'm looking forward to it. For now, here is the project description:

The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words.

However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to contribute to collaborative resource creation. AnaWiki is a recently started project that will develop tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).

Cheers.

No comments:

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...