Monday, September 29, 2008

I Love This Guy

The other famous Chris Phipps.

Txt Parsing

David Crystal's newest book, Txtng: The Gr8 Db8, has just recently been released. I'm looking forward to reading it (though I'll likely wait until the paperback is available ... I'm staunchly anti-hardback). From the book's Amazon synopsis:

"Does texting spell the end of literacy? Is there a panic in the media? David Crystal looks at the evidence. He investigates how texting began and who uses it, why and what for. He shows how to interpret its mix of pictograms, logograms, abbreviations, symbols, and wordplay, and how it works in different languages.He explores the ways similar devices have been used in different eras and discovers that the texting system of conveying sounds and meaning goes back a long way, all the way in fact to the origins of writing - and he concludes that far from hindering literacy, texting may turn out to help it."

My colleagues and I were wondering if there was any NLP work being done on parsing text messages? I haven't been able to find anything. Since there is an growing market for thinks like machine translation of text messages, I gotta believe somebody out there is researching this. But, has anything been published?

The linguistics of texting was, in fact, the topic of my very first post on this blog here.

My basic point last year was this: "I've noticed that, in the context of email and online slang/abbreviations, the character "8" is the only number or character that gets used to replace a phonological rime (a nucleus plus a coda). Most other replacements either replace whole syllables, or just consonant clusters.

For example (from Wikipedia's "List of Internet slang phrases" [note: this page no longer exists on Wikipedia so I linked to the Simple English page that copied it])

2L8 -- too late
GR8 -- great
H8 — Hate
L8R — Later (sometimes abbreviated to L8ER)
M8 — Mate
sk8/sk8r — skate/skater
W8 — Wait"

I hope Crystal discusses the linguistics of text formation.

Sunday, September 28, 2008

Palin's Big Finnish...

Stop beating up on Google's machine translation! The New Yorker's Hendrik Hertzberg recently complained about Sara Palin's interview answers by saying "The whole thing reads like something rendered from the Finnish by Google Translate" (HT: Daily Dish).

Does it? Really? This is testable folks. Let's see...

Katie Couric asked the following question: "But polls have shown that Sen. Obama has actually gotten a boost as a result of this latest crisis, with more people feeling that he can handle the situation better than John McCain."

Watch clip here:

Palin's Original answer:
I'm not looking at poll numbers. What I think Americans at the end of the day are going to be able to go back and look at track records and see who's more apt to be talking about solutions and wishing for and hoping for solutions for some opportunity to change, and who's actually done it?

Google's Finish translation:
En ole katsot kyselyn numerot. Mielestäni amerikkalaiset lopussa on päivässä tulee voida palata taaksepäin ja tarkastella tuloksia ja katsoa, kuka on enemmän omiaan puhuu ratkaisuja ja haluavat ja toivovat ratkaisuja joillekin mahdollisuus muuttua, ja kuka oikeastaan tehnyt se?

Google's Finnish to English translation:
I'm not looking at poll numbers. I think the Americans have at the end of a day will be able to go back and look at the results and see who is more likely talking about solutions, and wish and hope for solutions to some of the possibility of change, and who had done it?

You be the judge...

Frame Semantics in Oz

(image from HBO's Oz website)

Continuing my obsession with Netflixing cable TV shows, I've begun watching one of cable TV's first big successful original shows, Oz. I just finished the second of six seasons. Each episode is book-ended by a narrator and in episode 6 he makes a clever linguistic observation that is essentially a frame-semantics argument:

Narrator: "You made your bed, now lie in it." Anybody wanna tell me what the fuck that means? You're gonna go to the trouble of making up your bed, smoothing out the sheets, fluffing up the pillows, just to ruin it all by lying down. The phrase should be, "You laid in your bed, now make it." Point being, you got to be responsible for your actions. Responsible. (season 2, episode 6: strange bedfellows).

I think he's right. The semantic meaning of this saying doesn't quite match the frame it's evoking. The meaning should be something like you made certain choices which caused a given situation and now you must accept responsibility for the results of your choices. I have to work too hard to imagine how a situation where lying in a bed is a natural consequence of the decision to make it. I have to imagine a slightly different sense of making a bed. I have to imagine choices being made (like maybe the type of pillows, blankets, how big it is, etc). But that is not the normal sense of making a bed in contemporary American English. What choices does one make when making a bed? None, right? Has this meaning changed?

However, if one chooses to lie in a bed, one chooses to cause it be messed up. This seems to be closer to the meaning of the saying.

"Palin and McCain"

(image from

(image from

Following up on a comment I made over on polyglot conspiracy, I wanted to object to criticism of Sara Palin's recent use of the NP "a Palin and McCain administration". She was criticized for putting her name first as if she was framing herself as the presidential candidate rather than the VP candidate.

My objection is linguistic. The linguistic construction used to refer to a presidential ticket is typically two last names with maybe a dash or special character separating them, and often nothing at all (see above examples), but rarely are the two names conjoined by "and". When I took a look at what Palin actually said ("a Palin and McCain administration") I felt it was perfectly acceptable for a VP candidate to refer to a hypothetical administration that way.

I take this to be roughly a NP-noun compound where the NP is made up of two conjoined names. There are surely patterns to these kind of NP-noun compounds that probably favor listing the more prominant name first, but patterns are not prescritpive rules. On the other hand, I'm sure there are prescriptive rules used by campaing staff and journalists that explicitly spell out which name comes first when referring to a presidential ticket.

Please note, I am not intending to make any sort of political statement with this post. This is intended to be primarily a linguistics blog.

Wednesday, September 24, 2008

Hero Acquisition Device

(screen shot from online video)

Folks, I can suspend my disbelief with the best of them, but there are times that pop culture pseudo-science references try my soul. One of these times occurred on Monday night's premier of the superhero show Heroes.

While slicing into good, sweet, honey, sugar-candied Clair's brain, über villain Sylar marvels at the brain's complexity, then, disappointingly, repeats one of the greatest neuroscience fallacies of all time: the brain only uses 10%-20% of its capacity.

There is zero factual basis to this ridiculous claim. "Though an alluring idea, the "10 percent myth" is so wrong it is almost laughable, says neurologist Barry Gordon at Johns Hopkins School of Medicine in Baltimore" (Scientific American). The writer's could't spend two minutes Googling around to check up on this tid-bit? Hmmm? Really? Unfortunate. Even the Neuroscience For Kids website debunks this idiocy.

In Feb 2007, blogger Peggy at her Biology in Science Fiction blog ran down a few genetics related plot devices that make her start shaking her fist at the screen. An enjoyable read. More recently, she took on the new adrenelin plot device here.

Note: feel free to track down the homage in my italicized description of Clair (wink).

Friday, September 19, 2008

Prototypical Podcast

Eleanor Rosch seems to have come out of semi-retirement and is teaching a new course at Berkeley on Buddhist psychology, complete with podcasts here.

Reading Rosch in graduate school was a transformative experience fort me. Her empirical work on cognition and prototype theory changed many of my ill-formed preconceived notions about how the mind works. She created clever and intelligent methods for studying how humans naturally categorize. Her findings were astonishing.

From The U. Pitt School of Information Science - Hall of Fame: "A basic tenet demonstrated by Rosch's experiments is that people classify an everyday object or experience less on abstract definitions than on what they regard as the best representation of the appropriate category. For example, a robin is considered a much better prototype for the concept of a bird than is a chicken or an ostrich. Her findings led Rosch to develop a hierarchy of basic, superordinate, and subordinate categories that "provide maximum information with the least congitive ability."

Ave Maria

Having little to do with linguistics other than the pure astonishment of what the human vocal folds can accomplish when properly trained, I offer this amazing tribute to the late, great Luciano Pavarotti:

Wednesday, September 17, 2008

“Listen honey, …”

(A modified screenshot from Artvoice)

After contributing a brilliant and witty comment to a conversation at a local pub last night, I was slightly accused of being a misogynist … or at least of employing a misogynistic discourse construction, namely “honey”. I have occasion to employ “honey” in a specialized discourse function and I’m going to defend my usage of this function and its value in conversation.

The word honey in conversation certainly can be demeaning when it is used to trivialize or marginalize the referent, as in “Hey, honey, get me a sandwich” (see here for relevant article).

However, the specialized application I utilized last night is different; in my case, I used it to indicate that my contribution was intended to be helpful and somehow more common sensical, more honest, more folksy than my interlocutors previous point. The use of “honey”(often co-occurring with “listen”), rather than being demeaning, was meant to convey familiarity and solidarity. I use it when my contribution is intended to wise up an interlocutor. I use it with both male and female interlocutors. And I am far from alone is employing "honey" in this way.

I assume it was borrowed from African American culture, but this specialized usage is particularly prevalent amongst gay men (just fyi, see Jeff Runner’s excellent powerpoint presentation In Search of Gay Language). I have had three gay male housemates over the years and I suspect I picked it up a bit from them; I think I’ve heard Bill Maher use it on his show as well (can’t for the life of me find an example though. I need Everyzing to improve dramatically).

Here’s a first pass attempt at listing the constitutive features of this construction:

Contribution should
1. be formed in low register vocabulary and syntax
2. begin with “honey” or “listen honey”
3. semantically contrast with another participant’s contribution

Example 1: blog commenter
"I don't get this obsession with men's "bulges" on the gay blogs. It really makes gays look juvenile and prurient. You really debase yourself with such stories."

Listen honey, if straight guys can check out boobs, we can check out baskets. Of course, I won't be able to drag my partner away from the computer today. [my emphasis]

Example 2: blog post

Here in my hometown, the reports and anecdotes are not so good. A friend’s daughter heard Obama is a Muslim. Another friend’s mother-in-law says that if Obama wins, “the Blacks will take over.” (Listen, honey, they can’t screw it up any worse than the Whites have.) [my emphasis]

Thursday, September 4, 2008

Semantic Faces

(Rafael Nadal pics from his official page

In an earlier post here, I boldly claimed that the semantic web movement was a fool's errand. Rather than relying on a preconceived ontology, I argued that web searching would be better facilitated by "smart search technologies that can look at new, uncategorized things and figure out what to do with them right now, on the fly."

Recently, Google's Picasa photo sharing site has added some face recognition software to help users find different pictures of the same person then add name tags. The name tags are more reliable right now, but as face recognition software inevitably improves, I predict that they will be able to do away with tags altogether and rely wholly on the recognition of similarity in the pictures themselves. This is closer to the way the human cognitive system works. There will come a day when an algortihm can accurately match the two pictures of Rafael Nadal above and that algortihm with be the future of search.

This cognitive model of searching is what I want to see applied to web search as well. Find matches based on on-the-fly analysis of content. No tags. No ontology (at least, not built into the page itself). Laten Semantic Analysis is one quasi-linguistic method of doing this and it is already being applied quite profitably to the problem of matching advertisements with relevant web pages. LSA, with its somewhat crude bag-o'-words approach, has miles to go before it sleeps, but it's the right basic idea. Analyze content based on some salient metrics.

(Again, I admit I am no expert on the semantic web or search technologies, so my views are naive. If I am misunderstanding something, please feel free to educate me.)

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...