David Crystal's newest book, Txtng: The Gr8 Db8, has just recently been released. I'm looking forward to reading it (though I'll likely wait until the paperback is available ... I'm staunchly anti-hardback). From the book's Amazon synopsis:
"Does texting spell the end of literacy? Is there a panic in the media? David Crystal looks at the evidence. He investigates how texting began and who uses it, why and what for. He shows how to interpret its mix of pictograms, logograms, abbreviations, symbols, and wordplay, and how it works in different languages.He explores the ways similar devices have been used in different eras and discovers that the texting system of conveying sounds and meaning goes back a long way, all the way in fact to the origins of writing - and he concludes that far from hindering literacy, texting may turn out to help it."
My colleagues and I were wondering if there was any NLP work being done on parsing text messages? I haven't been able to find anything. Since there is an growing market for thinks like machine translation of text messages, I gotta believe somebody out there is researching this. But, has anything been published?
The linguistics of texting was, in fact, the topic of my very first post on this blog here.
My basic point last year was this: "I've noticed that, in the context of email and online slang/abbreviations, the character "8" is the only number or character that gets used to replace a phonological rime (a nucleus plus a coda). Most other replacements either replace whole syllables, or just consonant clusters.
For example (from Wikipedia's "List of Internet slang phrases" [note: this page no longer exists on Wikipedia so I linked to the Simple English page that copied it])
2L8 -- too late
GR8 -- great
H8 — Hate
L8R — Later (sometimes abbreviated to L8ER)
M8 — Mate
sk8/sk8r — skate/skater
W8 — Wait"
I hope Crystal discusses the linguistics of text formation.