Wednesday, November 18, 2009

Random Linguistics

(randomly discovered blog miresua conlang)

For reasons that are not entirely clear to me, there is a remarkable prevalence of what I'll call quazi-linguistics blogs on blogger.com. Try, as I just did, using the "Next Blog" button above at the top left of this page ten or more times. Each time it will take you to a randomly selected blog within the blogger network of blogs (No, I'm wrong here. see update below). It's pretty cool. Almost as good as StumbleUpon. But I suspect you'll find, as I did, a preponderance of language/linguistic related blogs. My rough estimate was 60% of the blogs were language related. Now, this was driven up a bit by many ESL sites, but that counts, as far as I'm concerned. Unfortunately, the quality of these blogs was poor, at best (e.g., see the tiresome anti-passive voice post here).

Why are so many bloggers blogging about language issues? Maybe Geoff Nunberg is right and "the Internet turns everybody into a linguist" (see here).


UPDATE: Commenter MPJ cleared up the mystery. Blogger.com's Next Blog button is NOT random (it used to be). Blogger.com's explanation here (HT The Real Blogger Status). Money quote:

We've made the Next Blog link more useful, by taking you to a blog that you might like. The new and improved Next Blog link will now take you to a blog with similar content, in a language that you understand. If you are reading a Spanish blog about food, the Next Blog link will likely take you to another blog about food. In Spanish!

I'd be interested to know if they're using the same technology as their Ad Sense product to detect "similarity." How do they determine the anchor blog?

Also, I think I can still make a similar claim to my original one: of the blogs that are related to language, most are prescriptivist. Fair?

Crowdsourcing Annotation

(image from Phrase Detectives)

Thanks to the LingPipe blog here, I discovered an online annotation game called Phrase Detectives designed to encourage people to contribute to the creation of hand annotated corpora by making a game of it. It was created by the University of Essex, School of Computer Science and Electronic Engineering. Of course, they have a wiki, Anawiki. I'm not crazy about the cutesy cartoon mascot (they given it a name: Sherlink Holmes. Ugh. I guess Annie would be a bit too obvious?) . I've wondered aloud about this kind of thing before, so I'm glad to see it coming to fruition.

I haven't started playing the game yet, but I'm looking forward to it. For now, here is the project description:

The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words.

However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to contribute to collaborative resource creation. AnaWiki is a recently started project that will develop tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).

Cheers.

Wednesday, November 11, 2009

The Right to Write

Last Friday, one of the world's most articulate and brave bloggers, Yoani Sánchez, was brutally beaten and kidnapped by her own government. Read her description of the events here A gangland style kidnapping.

Read her blog Generation Y.

Thankfully, she is recovering and remains resolute as a blogger and dissident. In her own words:

"Thank you to friends and family who have looked after and supported me, the effects are fading, even the psychological ones which are the hardest. Orlando and Claudia are still in shock, but they are incredibly strong and also will overcome it. We have already begun to smile, the best medicine against abuse. The principal therapy for me remains this blog, and the thousands of topics still waiting to be touched on."

Sunday, November 8, 2009

Infrequently Asked Questions

A nice example of a linguistic construction is Frequently Asked Questions because, as far as I can tell from the lists of questions on most of these pages, they are almost cerytainly NOT frequently asked at all. I've never once seen a page that lists the number of times a particular question has been asked nor any discussion of the method of counting said frequency. It simply goes without saying that "Frequently Asked Questions" are simply those that the creator of the page either a) perceives as important or b) wants readers to think about (some are clearly designed by marketers to push certain points of view).

Saturday, October 17, 2009

Judge This!

(screen shot from NBC's Community)

I'm not normally a spelling fanatic, mainly because I'm such a horrible speller myself. However, I'm also not a set designer for a major network sitcom production. Unlike the person who designed the backdrop for the recent episode of NBC's Community which featured a gigantic sign reading "JUDGES BOOTH." I'm reasonably certain that the booth in question is possessed by the judges in question, rendering the preferred orthography as "JUDGES' BOOTH" (same in MLA and APA, see Purdue's excellent OWL site for MLA and for APA):

add ' to the end of plural nouns that end in -s:
--- houses' roofs

--- three friends' letters


See for yourself: Community, Season 1 : Ep. 5 "Advanced Criminal Law", 9:26 minute mark (on Hulu)

Sunday, October 4, 2009

I guess I'm a few months behind the curve on this one, but I just watched the Google Wave demo about the new social media/collaboration tool and I'm seriously impressed. They said it should go live in 2009, so maybe by Christmas? Pretty please...

In any case, when the video first started, the guy said something that caught my ear. He mentioned that traditional email is built around the snail mail model where a message is an object that goes from a sender to a receiver. But, Google Wave discards that model in favor of a model of a "conversational" where the conversation as a whole is a single object which simply gets updated in a single place, not sent around (like a chat session).

This struck me as linguistically interesting because this is more in line with traditional conversation analysis theory which centers around "the floor" where one can "hold the floor", or "interrupt the floor", etc. This more natural model of conversations has yielded a beautiful and elegant collaboration tool that I can't wait to get my hands on. Hopefully Google's model of conversations is more coherent than the ragtag sloppiness that pervaded the linguistic analysis of conversations. It's a tough field, no doubt.

Also, near the end, the speaker made what I took to be a geek version of a linguistic relativity claim: he said the it was only the Google Web Toolkit (HTML 5 & Java) that allowed him to think of Wave's possibilities that he never would have thought of otherwise. I'm not sure this is really true, of course, but a cute thought nonetheless.

Saturday, September 26, 2009

on "High Speed"

(screen shot of hotel connection speed)

It has become painfully clear that the meaning of "high speed" with respect to internet connection is being co-opted by hotel franchises as a marketing tool and as a result is fast being cleansed of any valuable meaning. Case in point, I'm in Kansas this weekend for business, but it turns out that Kansas State and U. Kansas both have home football games this weekend and they're both less than an hour's drive of where I am, so the hotels around the area have all been booked solid. Hence, I was forced to take a room at a modest priced hotel (with a poor reputation) but at least they had "high speed internet" so I could get work done, right? This is a business trip, remember.

Not so fast (literally): I'll spare you the rant about the many other issues with this hotel and point you to the screen shot above which shows the speed of my connection (when it actually worked, that is). How does 305kbs download speed count as "high speed"? I hereby call upon the ISO to determine a minimum speed that shall henceforth be the standard for determining whether a connection is "high speed" or not...pretty please?

It will surely embiggen the hearts of my more gentle readers to know that I convinced my company to waive their per diem and find me more suitable lodgings for the remainder of the trip.

NOTE: This is another good example of the commercialization of Google's search engine. Any query with "high speed internet" in it will be riddled with advertisements for service, not discussions about. Yet another reason Google is not a good linguistics research tool. See more discussion here.

Monday, September 14, 2009

Fuck The Bills!!!

A rare (but much deserved) non-linguistics rant:

The Bills suck ass. Deal with it. How can that moron run it back? What's he thinking? Why? What value is there to a run back? This reminds me of the game against Dallas when they led by 8 and all they had to do was position for a field goal and they would have had the greatest upset in NFL history. But no, those fucking morons throw and it and it gets intercepted and they lose the fucking game.

Fuck the Bills. Fuck 'em.

Don't watch their games. Don't buy their merchandise. Let them move to Toronto. They don't deserve fans. Fuck the Bills. Fuck 'em.

The Buffalo Bills are like the worst relationship anyone has ever been in. The one you're totally, blindly in love with but who just keeps fucking you over and you let them because you're so fucking deep you're willing to be shit on just to be in the same room with them and you'll never be the one to cut the chord. Until the day they just stop being interested in toying with you and they just go away, and it's the best thing in the world for you, but you don't get that right away, you’re crushed until years later you get it. They sucked. They sucked ass and they should die, but they’re gone now and that’s good. Fuck ‘em. Fuck the Bills. Go to Toronto and stop fucking us over. We’ll fall in love with the Steelers soon anyway, because they actually win shit! That’s right, I said it. Fuck the Bills because they lose a lot and the Steelers actually win.

Sunday, August 23, 2009

An X of Y

(a pod of whales from About.com)

10 years ago, when I was teaching English in China, I was surprised by how interested my students were in learning about phrases like "a pod of whales," "a cup of coffee," and "a pride of lions." When I mentioned a phrase like this, they would perk up immediately (difficult to do in the oppressive Guangzhou 广 summer heat). This was a year before I began studying linguistics proper so I had no clue what a collective noun was, nor did I know what a classifier was, nor did I know that Chinese languages like Mandarin and Cantonese have elaborate systems of nominal classifiers (this Wiki page is a good primer). I just thought it was a cute diversion to talk about at the end of an evening's class.

It turns out that collective nouns have very interesting properties which linguists love to obsess over (I regret I do not have access to a copy of The Cambridge Grammar of the English Language because I suspect Huddleston and Pullum have some fascinating points).

Now, Via kottke, I discovered a blog called All Sorts dedicated to culling collective nouns from Twitter feeds. It relies on a little NLP and some crowd sourcing. It appears to be restricted to the syntactic construction "an X of Y". Since it relies so heavily on syntax, it gathers examples that are weak, at best. For example, in what way are the following collective nouns?

a conspiracy of theorists
a tantrum of 2 year olds
a pratfall of clowns

My first pass reading of those thee phrases is not as collective nouns, but rather as periphrastic genitives (e.g., "a mayor of Buffalo once said..."). The "a X of Y" syntax is, by itself, ambiguous between the periphrastic genitive and collective noun constructions (as well as simple PP attachment like "a webcomic of romance"). Do people prefer the use of "a X of Y" for one of these constructions? I suspect any preference would be based on the semantic features of the nouns involved (once you read the word "group", you pretty much know you've got a collective noun on your hands).

I wonder if anyone has done online reading tasks with subjects reading the two kinds of phrases and experimenting with different features to see what cues one reading over another. Imagine creating a set of stimuli containing sentence frames that could take either a collective noun or a periphrastic genitive and alternating each, controlling for features like animacy.

I'll take a crack at one such frame. My goal is to create sentence pairs involving minimal pairs of "a X of Y" constructions which differ only in the Y noun and where the first is a collective noun while the second is a periphrastic genitive. This relies critically on finding an X word that can be a collective noun like "group" as well as a possessive. Hmmmmmm, this ain't gonna be easy....

a. That cup of coffee that I broke has been cleaned up.
b. That cup of John's that I broke has been cleaned up.

My original hypothesis was that people will delay on (b) [meaning, their reading of the following region will be slower than (a)]. But I dunno, because some will be confused at "broke" in (a) as well. Part of this will depend on where the delay occurs.

Now, you go write up 100 of these pairs, norm them for acceptability, set up a moving window reading test in ePrime, run at least 30 subjects, then call me when you got results. I've done my part.

Saturday, August 22, 2009

Against Prescriptivism

It's all too common for prescriptivists to complain about word usage deviations, as if a word had one fixed meaning forever and ever. This is not true. A couple of good examples popped up on The Daily Dish when guest blogger Conor Clarke, a smart and well educated journalist, used two words (arbitrary and cynical) in ways that deviate from the way I would use them (and from what I would consider traditional usage); yet, his usage conforms to the way both of these words seem to be evolving in general usage in American English:

"We are all born with talents that are equally arbitrary -- strength and intelligence and social grace -- and yet we all compete for prizes under the impression that the outcomes are fair. Perhaps something called free will enters the picture at some point. And perhaps not: The ability to work hard might be doled out just as arbitrarily at a Y Chromosome or a great voice. I don't know how you'd prove it either way. Anyway, the cynical conclusion here is that there's nothing inherently just or fair about these outcomes."

On Arbitrary
For me, something is arbitrary when it is a function of decision making (note its obvious relationship to arbitrate). For example, WordNet's definition:"based on or subject to individual discretion or preference or sometimes impulse or caprice." But Clark uses it to mean something like 'not under our direct control' when he describes genetic traits as arbitrary. I can imagine an historical shift whereby decisions that are arbitrary came to be viewed as being made on the idiosyncratic whim of the decider (rather than based on some sound, objective, logical reasoning). Hence, the word came to mean 'unfair or without sound reason'. Then, quite recently I believe, the word shifted again when users found a salient connection between 'lacking sound reason' and 'out of one's direct control'. And this seems to be how American English speakers of Clarke's generation (I believe he's about 15 years younger than I am) use the word. And this helps explain why it's now commonly used for situations where an outcome is indifferent to fairness.

On Cynical
For me, a person is cynical when they reduce the intentions of others down to one root cause, namely selfishness. For example, WordNet's definition: "believing the worst of human nature and motives; having a sneering disbelief in e.g. selflessness of others." But Clarke uses it to mean something like 'preferring the explanation that is most indifferent to fairness'. The conclusion he predicates as cynical has nothing to do with human motives or intentions. I believe what he's saying in the last sentence of the passage above is that there are two competing beliefs:

Belief A = competition outcomes are fair because all competitors start out equal.
Belief B = competition outcomes are indifferent to fairness because they are rooted in genetic differences (which themselves are indifferent to fairness).

Clarke then says that to prefer Belief B is to be cynical.

Final Thought
As for me, I believe word meanings are arbitrary, but then again, I'm cynical.

PS: For font geeks, that's Bradely Hand ITC.

Saturday, August 15, 2009

Them Maths Is Hard

This morning's NYT contained an article on search engines which contained a claim of such discombobulated mathematical incompetence, I just had to share:

It’s no secret that even with their recently-announced alliance, Yahoo and Microsoft will lag well behind Google in the hugely profitable search and search advertising business. How far behind? With a combined 28 percent of the American search market, Yahoo and Microsoft could double their usage and still trail Google, which accounts for 65 percent of the market.

I don't have to get all Mark Liberman on you to explain what's wrong with this claim. If Microsoft/Yahoo! doubled their 28% market share, that's 56%, at which point they would no longer trail Google who could have no more than 44% of the market.

Maybe it's finally time to stop reading the NYT...

Saturday, July 18, 2009

Immortality?

(pic from Huffington Post)

Headline: Henry Allingham: World's Oldest Man Dies At 113.

Am I wrong, or is it logically impossible for the world's oldest man to die?

Saturday, June 27, 2009

Adam's Tongue (pt 3)

(classic depiction of Saussure's arbitrariness of the sign claim)

This is the third in a series of posts detailing my notes and thoughts about the book Adam's Tongue as I prepare to lead a book discussion meeting July 6, 2009 in the DC metro area (see my first post here and second here).

Ch 3 - Thinking Like Engineers

I've spent the last 5 years working in natural language processing and with engineers and I agree that there is something very valuable for a linguist to "think like an engineer" so I was curious from the start about this chapter, but I was also weary because the Chomskyan syntacticians also "think like engineers" and I believe they have led linguistics down a garden path of false starts and flawed theories for 40 years. So I read on cautiously.
  • DB notes that he came into linguistics via pidgins and creoles and they bear on his thinking about language evolution. But does this bias him too, like the man who has a hammer and sees everything as a nail? We shall see.
  • DB says there's no syntax when we try to speak with people who don't share our language (p 39) because we don't know enough of the language, the foreign words just pop out as we grope for them. Now, I certainly defer to DB's far greater expertise of pidgin & creole formation, but this thought experiment of his does not jive with my own experiences. Like many travelers, I've had this exact experience in places like Guangzhou China and Prague but I don't think the foreign words "just popped out" quite as randomly as he suggests. I'm tending to side with Slobin here.
  • He claims that protowords must not have had any internal morphological structure (41) because early language users would have had no rules defining that structure. On it's face, this makes sense, nonetheless this begs the question: which came first, the word or the morphology? Is it not plausible that some neurologically based process for seeking internal structure to sounds developed prior to the advent or words? I just don't know.
  • The boom vocalization of the Campbell's monkey occurs 30 seconds before the alarm (42). My first reaction: wow! this is stretching the limits of transitional probabilities, isn't it? Can we plausibly claim that an association between sounds 30 seconds apart is neurologically feasible?
  • DB claims these booms are not modifiers (p42) because the boom "cancels out" the alarm. I'd have to review the literature on these boom carefully, but my first reaction is: does it really cancel the alarm? If I understand the context, it simply means "not immediate threat (but still a threat)". That's not a cancellation. It's more like epistemic modality: "there MIGHT be danger."
  • Page 44 -- The gavagai problem restated.
  • Confused: I'm confused by DB's claim on page 45 that "words combine as separate units -- they never blend. They're atoms, not mudballs." I'm not sure what he means. Blending and combining are different, in that blending suggests some elements of both previous words/calls are preserved in the new word/call. This happens all the time in contemporary linguistic change (classic example: motel blends motor + hotel, persevering bits of each's morphology as well as semantic blending). But I suspect DB is not referencing that. So what is he referencing?
  • He makes a nice distinction between ACSs and Language: ACSs are primarily for manipulation of behavior while language is primarily for information sharing. I have no clue if this is really true, but if yes, it's a good point (p 47).
  • He writes "language units are symbolic because they're designed to convey information." A nice follow-up point on the difference point above, but it begs the question: what is "information"? Any answer which supports DB would have to couch a definition in abstraction, right? E.g., Information is a conceptualization that is independent from direct reference.
  • DB makes a bold claim on page 52 that strikes at the heart of post-Saussurean linguistics: displacement is a more important factor to language evolution than arbitrariness. But it's worth noting that both are functions of abstraction, so perhaps this is just another version of his previous point that the jump to abstract thought is the key.
On to chapter 3 -- Singing Apes....

Adam's Tongue (pt 2)

This is the second in a series of posts detailing my notes and thoughts about the book Adam's Tongue as I prepare to lead a book discussion meeting July 6, 2009 in the DC metro area (see my first post here. UPDATE: My third post is here).

Ch 1 - The size of the problem
  • This chapter is designed to walk through what's wrong with other theories of language evolution.
  • The basic point of the chapter seems to be this: no animal communication system (ACS) allows itself to refer to things distant in time and space, therefore they are not likely the precursors of language. Everyone who has taken or taught a Language Files course knows these two criteria as Hockett's two communication features unique to human language (Bickerton get's to Hockett in due course).
  • On the very first page of this chapter, I noted, "Is there a gavagai problem here?" By which I meant, how can we know what one of these ACS references really refers to? Bickerton's index lists nothing for either "Quine" or "gavagai," though he skirts this issue repeated for the next few chapters (and possibly the whole book). This dilemma become particularly critical in chapter 4 Chatting Apes, but I'll come to that later.
As a background, here's a passage from Wikipedia's Indeterminacy of translation page describing Quine's famous example:

Consider Quine's example of the word "gavagai" uttered by a native upon seeing a rabbit[1]. The linguist could do what seems natural and translate this as "Lo, a rabbit." But other translations would be compatible with all the evidence he has: "Lo, food"; "Let's go hunting"; "There will be a storm tonight" (these natives may be superstitious); "Lo, a momentary rabbit-stage"; "Lo, an undetached rabbit-part." Some of these might become less likely – that is, become more unwieldy hypotheses – in the light of subsequent observation. Others can only be ruled out by querying the natives: An affirmative answer to "Is this the same gavagai as that earlier one?" will rule out "momentary rabbit stage," and so forth. But these questions can only be asked once the linguist has mastered much of the natives' grammar and abstract vocabulary; that in turn can only be done on the basis of hypotheses derived from simpler, observation-connected bits of language; and those sentences, on their own, admit of multiple interpretations, as we have seen.
  • No gradual move from ACS to human language (17): Since evolution is gradual and slow, there would have to be a "missing link" (my term, not DB's); an ACS that made the jump from referring to the here and now to referring to the distant and far. No such link exists
  • Therefore, ACSs grew out of non-communication behaviors
  • Uniqueness of language also not relevant because many species have unique features (Pinker's elephant trunk, 20).
  • Humans suddenly had something else to "talk" about other than the here and now and THAT'S what spurned language.
  • The new thing humans had was abstract concepts (22). We can talk about dogs as a category (he makes an important distinction between categories and concepts much later in chapter 4 at the bottom of page 87).
  • This new ability to abstract is not associated with evolutionary fitness.
  • Critical Point: other species didn't develop language because they didn't need language (p 24).
  • Bickerton's 4 tests for any theory of language evolution: 1) uniqueness, 2) ecology, 3) credibility, 4) selfishness (p28). Bolles' blog Babel's Dawn discusses these criteria at length here.

Space and Thought

(two of Boroditsky's stimuli, pdf here)

Yet again, Andrew Sullivan treads into the area of linguistics and cognition research. But at least this time he's wise enough to make no comments about the studies he links to (he's typically misguided, or flat out wrong in his linguistic sensibilities, see here, here and here).

This time he reprints a quote here from an article titled How Does Our Language Shape The Way We Think? written by Stanford assistant professor Lera Boroditsky regarding how language influences thought. Of course, Sullivan reprints the least interesting piece of information in the article, a mere behavioral anecdote about how speakers of different languages use different direction terms. This fact has been well known for a long time (I first learned about it in an introductory cog sci course in 1998 and it was old news then). The more interesting fact is the following effect she observed during a test to compare Russian and English speakers' ability to discriminate shades of blue (color terms is a classic topic within cognitive science going back to Berlin & Kay's work in the sixties, see here):

The disappearance of the advantage when performing a verbal task shows that language is normally involved in even surprisingly basic perceptual judgments — and that it is language per se that creates this difference in perception between Russian and English speakers.

After skimming Boroditsky's article, I felt had it was a very good review of the field of language and thought studies as I remember it, but it didn't add much, if anything, but it's clearly a layperson's article, so I looked at her Stanford page and skimmed her list of publications and more critically, the references she cites.

My first impression was, "she doesn't cite much, does she?" I'm used to experimental psychology articles containing lists of references almost as long as the article itself, but most of her (first author) papers have a handful of citations. But the more surprising thing was the notable absence of two names, Len Talmy and Jürgen Bohnemeyer. I'll grant that I'm a little biased because both of them were professor's at my grad school, but the granting ends there. I can't imagine writing a serious research paper on how language shapes thought without references to one or both of these researchers, especially as Talmy has written an extensive, typologically rich, two volume set on the relationship between language ands thought: Toward a Cognitive Semantics and he has a forthcoming book The Attention System of Language (his work in progress handout on the same topic can be read in this pdf).

Don't get me wrong, I basically like Boroditsky's research methods and approach. I just think it's time for her to review Talmy and Bohnemeyer.