Friday, June 25, 2010

An X of Y

(a pod of whales from
[reposted from last year with update]

[UPDATE: kottke points to the same blog with added pics here).

10 years ago, when I was teaching English in China, I was surprised by how interested my students were in learning about phrases like "a pod of whales," "a cup of coffee," and "a pride of lions." When I mentioned a phrase like this, they would perk up immediately (difficult to do in the oppressive Guangzhou 广 summer heat). This was a year before I began studying linguistics proper so I had no clue what a collective noun was, nor did I know what a classifier was, nor did I know that Chinese languages like Mandarin and Cantonese have elaborate systems of nominal classifiers (this Wiki page is a good primer). I just thought it was a cute diversion to talk about at the end of an evening's class.

It turns out that collective nouns have very interesting properties which linguists love to obsess over (I regret I do not have access to a copy of The Cambridge Grammar of the English Language because I suspect Huddleston and Pullum have some fascinating points).

Now, Via kottke, I discovered a blog called All Sorts dedicated to culling collective nouns from Twitter feeds. It relies on a little NLP and some crowd sourcing. It appears to be restricted to the syntactic construction "an X of Y". Since it relies so heavily on syntax, it gathers examples that are weak, at best. For example, in what way are the following collective nouns?

a conspiracy of theorists
a tantrum of 2 year olds
a pratfall of clowns

My first pass reading of those thee phrases is not as collective nouns, but rather as periphrastic genitives (e.g., "a mayor of Buffalo once said..."). The "a X of Y" syntax is, by itself, ambiguous between the periphrastic genitive and collective noun constructions (as well as simple PP attachment like "a webcomic of romance"). Do people prefer the use of "a X of Y" for one of these constructions? I suspect any preference would be based on the semantic features of the nouns involved (once you read the word "group", you pretty much know you've got a collective noun on your hands).

I wonder if anyone has done online reading tasks with subjects reading the two kinds of phrases and experimenting with different features to see what cues one reading over another. Imagine creating a set of stimuli containing sentence frames that could take either a collective noun or a periphrastic genitive and alternating each, controlling for features like animacy.

I'll take a crack at one such frame. My goal is to create sentence pairs involving minimal pairs of "a X of Y" constructions which differ only in the Y noun and where the first is a collective noun while the second is a periphrastic genitive. This relies critically on finding an X word that can be a collective noun like "group" as well as a possessive. Hmmmmmm, this ain't gonna be easy....

a. That cup of coffee that I broke has been cleaned up.
b. That cup of John's that I broke has been cleaned up.

My original hypothesis was that people will delay on (b) [meaning, their reading of the following region will be slower than (a)]. But I dunno, because some will be confused at "broke" in (a) as well. Part of this will depend on where the delay occurs.

Now, you go write up 100 of these pairs, norm them for acceptability, set up a moving window reading test in ePrime, run at least 30 subjects, then call me when you got results. I've done my part.


Anonymous said...

I'm just looking up collective nouns in the Cambridge Grammar for you right now...

There's a short discussion on pages 502-503 about whether collective noun phrases are singular or plural. Examples cited include "A number of spots have (not has) appeared", "Heaps of money has (not have) been spent", "A bunch of flowers was (not were) presented to the teacher", "A bunch of hooligans were (not was) seen leaving the premises".

I hope that's sufficiently fascinating because I can't find much else that's relevant.

Chris said...

Awesome, thanks for looking it up for me. I'm a little disappointed they didn't say more, but they they only had 1860 to work with, hehe.

Chris said...

fyi, I just deleted a comment written in Cyrillic linking to a new site for each word. Google translated the text as follows:

"From the pleasures of the most pleasant ones that are most rare. Most, the best perfume in small vials. Throw yourself your case with all my heart and soul, but look especially good if this case. Reasonable chasing so it is good, but for the fact that eliminates otnepriyatnostey. All Russia is drinking Hamlet."

Just thought I'd keep y'all informed.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...