Wednesday, February 27, 2008

"garfield minus garfield"

I never liked the comic Garfield. But under this guy's interpretation here, I find it brilliant! I haven't laughed out loud to a comic in years. These versions swing from hilarious, to sad and poignant, then back to hilarious.

Who would have guessed that when you remove Garfield from the Garfield comic strips, the result is an even better comic about schizophrenia, bipolor disorder, and the empty desperation of modern life?

Friends, meet Jon Arbuckle. Let’s laugh and learn with him on a journey deep into the tortured mind of an isolated young everyman as he fights a losing battle against lonliness and methamphetamine addiction in a quiet American suburb.

(HT Andrew Sullivan)

Monday, February 25, 2008

Die Buch, Die Tisch, Die Stuhl

I never took grammatical gender seriously when I studied German. I just made everything feminine ‘cause, ya know, that was the easy one. The rest of my German was so bad, I figured it didn’t really matter anyway, right? (I frikkin LOVED studying Mandarin Chinese because, ya know, who needs morphology?)

Now Heidi Harley has convinced me I was right all along. She blogs about Dalila Ayoun’s research on French gender:

…native French speakers don't agree on the genders of French nouns. They really don't agree. Fifty-six native French speakers, asked to assign the gender of 93 masculine words, uniformly agreed on only 17 of them. Asked to assign the gender of 50 feminine words, they uniformly agreed only 1 of them. Some of the words had been anecdotally identified as tricky cases, but others were plain old common nouns.
… second language speakers of French, take heart! Make your grammatical gender agreement mistakes with confidence. There's a chance that your native-speaker interlocutor will agree with your version!

Danke, Heidi! Viel Danke!

Pssst, I should note that David Zubin has done a variety of cognitive linguistic studies on German gender. Most recently, this one:

Köpcke, Klaus-Michael and David A. Zubin 2003. “Metonymic pathways to neuter-gender human nominals in German”. In Metonymy and Pragmatic Inferencing, Panther, Klaus-Uwe and Linda L. Thornburg (eds.), 149–166.

Friday, February 15, 2008

Fancy Corpus Search Tool

I've only just now discovered the entirely online corpus search utility Sketch Engine by Adam Kilgarriff, Pavel Rychlý, and Jan Pomikálek. It can replicate a lot of what I do with tgrep2 and Python scripts, but a lot faster (I mean, A LOT faster).

It has the advantages of being fast, easy to use, covering corpora from multiple languages (plus allowing you to add new corpora) and providing user friendly output.

One disadvantage is the brevity of the sketches it provides. For example, I performed a sketch of the verb "prevent" in the BNC and it returned a list of subjects and objects that occur with the verb. Sweet! This is really important stuff if you're interested in FrameNet type semantic description (see my related post here). Unfortunately, it maxed out at 100 (that's a small sample of the 10,000+ examples).

Nonetheless, this utility goes a long way to providing the sort of user-friendly (yet still sophisticated) online corpus query tools that I think the average non-computationally minded linguist would benefit from greatly.

I've used Mark Davies' BNC interface a lot too and that's also an excellent, entirely online search tool. Davies provides a nice interface to a variety of corpora here.

Thursday, February 14, 2008

Tigrigna Blog and Resources

I just discovered a blog by a student of the language Tigrinya Qeyḥ bāḥrī.

From his site,

Being from a small city in Canada (Halifax, Nova Scotia) I found it very difficult to learn the mother tongue of my parents, as there are few resources availible from which I can learn. So, I decided to create a resource for myself, somewhere I could collect everything I know about the language and use it at my leisure. I thought about using my limited knowledge on HTML to create a webpage, that way I could have easy access to my work wherever I go.

And from Ethnologue

Tigrigna -- A language of Ethiopia

Population -- 3,224,875 in Ethiopia (1998 census). 2,819,755 monolinguals.
Region Tigray Province. Also spoken in Eritrea, Germany, Israel.

Alternate names -- Tigrinya, Tigray
Classification -- Afro-Asiatic, Semitic, South, Ethiopian, North
Language use -- National language. 146,933 second-language speakers.
Language development -- Literacy rate in first language: 1% to 10%. Literacy rate in second language: 26.5%. Ethiopic script. Radio programs. Grammar. Bible: 1956.
Comments -- Speakers are called 'Tigrai'.

Monday, February 11, 2008

The Perils of Semantic Annotation

One of the most challenging tasks a linguist can engage in is that of annotating natural language text for semantics. It is simultaneously interesting, tedious and tricky, which makes it altogether maddening. We perform this task for a variety of reasons. Sometimes to create training data for learning algorithms (which was a big topic of discussion at last year's NAACL HLT) or to explicate the semantics of events like the FrameNet project. Part of my dissertation is very FrameNet-like, so I do a lot of annotating (I will save my bile-filled hateful remarks about the general crappiness of annotator apps for another post).

Generally speaking, the annotator's task is to read naturally occurring sentences, then identify and tag the semantic roles of the participants involved in the particular event represented by the sentence. It would be easy if all of English was composed of sentences like "Bobby kicked the ball"; that would be sweet. "Bobby" is an AGENT, "the ball" is a PATIENT. Done. Let's move on. But that's not how real language works, is it?

In any case, I have been annotating sentences involving the verb "exclude" recently and I find it's a particularly challenging set. The BNC “exclude” sentence below was difficult to annotate because the exclude event is not clear about its participants:

The new Minister for Health, Dr Noel Browne, a dedicated reformer of the health services and much concerned in-particular with the eradication of tuberculosis in Ireland, modified the earlier bill to exclude the compulsion elements.

At first, I thought “Dr Noel Browne” was the agent doing the excluding, but then I realized it was the bill which excluded. But which bill? I concluded that “the earlier bill” is NOT participating in the exclude event because, logically, it must be the version of the bill that came AFTER the early one which did the excluding. So, this requires a presupposed later bill. So, should I annotate the good Dr. as the agent, or leave this participant alone (FrameNet's annotator app has the ability to mark an unexpressed element, and I believe this is exactly why, but I don't use their app). Also, it’s not clear if the “to” means “in order to” as a purpose statement. Is the bill explicitly, directly excluding, or was that simply the intent of the changes? If it’s indirect, that makes Dr. Noel a better candidate for the agent of exclusion.


Friday, February 8, 2008

Words and Meaning

In discussing the recent Japanese phenomenon of cell phone novels, a reader of Andrew Sullivan’s blog tries to explain why the Japanese language is well suited to this style:

The use of Chinese characters also serves to compact sentences. Since you don't have to actually spell out entire words, as in English, but can represent them with an ideogram, you can say a lot more in a much smaller space.

I will provisionally accept that kanji and kana make typing out written Japanese on a cell phone more efficient than typing out English (in the sense of requiring fewer key strokes; I'd have to test to see if this is really true), but I reject the logical fallacy that this mechanical efficiency leads to greater meaning.

This strikes me as a variation of a phenomenon Ben Zimmer over at Language Log has written about regarding the all too often misrepresented meaning of the Chinese word for ‘crisis’ wēijī . Underlying both of these is the naïve belief that logograms are inherently more meaningful than alphabetic words. This belief, I reject.

I could be wrong about this, but my hunch is that the human language system takes all written representations of language and converts them into an internal mental representation it’s happy with. There may be differences between the way the brain accesses the meaning of kanji and the way the brain access the meaning of alphabetic words (in terms of recognition), but I don’t see any reason to believe that the internal semantic representation of kanji is somehow different than the representation of words. If I’m wrong and there is a difference, this would be an interesting piece of data for the Sapir-Whorf folks.

FYI: The Sapir-Whorf hypothesis (aka linguistic relativity) has re-emerged in recent years. Some of the most interesting empirical work is being done by Buffalo’s own Jürgen Bohnemeyer and his Spatial language and cognition in Mesoamerica project.

Saturday, February 2, 2008

Why Should I Learn a Foreign Language?

We’re not that far from the Universal Translator , right?

Welcome to SpeakLike, the first instant messaging service for accurate, real-time translation chat across different languages. You type text in your language, and others see it in theirs.

Skype has their version too
Universal Chat Language Translator and Speaker for Skype

It goes without saying that the boys and girls at Carnegie Mellon have already developed their version and gotten it to market: Franklin 12-Language Speaking Global Translator.

(HT Blogos)

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

I recently watched Andrew Ng's excellent lecture from 2016 Nuts and Bolts of Applying Deep Learning and took notes. I post them as a he...