Tuesday, May 21, 2013

David Books, Word Classes, and Google Ngrams

David Brooks waxes poetic about word frequencies and the good old days in today's NYT: What Our Words Tell Us.

Update: Before reading my own most excellent original post below, here are two three well respected linguists who fisk Brooks' article as well:

John McWhorter: David Brooks' Favorite New Theory of Language Is Wrong. Money Quote:
...the faddish attempt to apply the Big Data approach to social psychology via Google’s Ngram viewer tool will shed much less light on these matters than many expect. In any language, concepts are expressed by several words and phrases at any given time, all of which morph eternally with the passage of time.

Robin Lakoff: What Our Words Don’t Tell Us. Money Quote:
It is hardly respectable scholarship to jump to the conclusion that changes in word frequency necessarily indicate changes in topics under discussion (new words may replace familiar ones but have similar meanings), and even if they do, it is very dubious – ethically questionable, you might say – to jump from there to the conclusion that these changes signify deep societal changes in the direction of moral decline, unless writers are prepared to make explicit and be prepared to defend their understanding of “morality” and “decline.” Social science is still, happily, distinguishable from theology.

Mark Liberman: Ngram morality. Money quote
David Brooks doesn't mention this ideological and temporal inconsistency in his sources. In general, as I've noted in discussions of his earlier columns, his "unparalleled ability to shape an intellectually interesting idea into the rhetorical arc of an 800-word op-ed piece" crucially depends on skillful editing — or revision — of his raw materials into a form that fits his theme.
My Original Post
Brooks cherry picks three recent Google Ngram analyses (by non linguists) and provides paper thin summaries of their findings, then concludes that America has lost is moral core. These analyses all depend crucially on the creation of word categories like “individualistic words” and “moral terms”. These are not quite synonyms*, but they require that the words in each class bear some semantic link between them. This begs the question: Are these groupings natural? Is there something psychologically real about them?

Linguists care about word classes quite a bit (computational linguists even more so). There are ways of constructing naturalistic sets of words. However, Brooks says nothing about how these studies performed their categorizations, so I thought I would post a quick review as it's important in judging the validity of the results.

Twenge et al
The first study by Twenge et al (which he doesn’t link to, but I do below) followed a scientifically reasonable path to create their word sets. They asked 53 Mechanical Turk participants to “generated words characteristic of individualism and communalism.” Then, they had a different set of 55 Mechanical Turk participants rate those words on a 7-point Likert scale. The top 20 words were then used as their search set. FYI, here are their two sets:

independent, individual, individually, unique, uniqueness, self, independence, oneself, soloist, identity, personalized, solo, solitary, personalize, loner, standout, single, personal, sole, and singularity
communal, community, commune, unity, communitarian, united, teamwork, team, collective, village, tribe, collectivization, group, collectivism, everyone, family, share, socialism, tribal, and union

UPDATE: For more on Twenge, commenter "unknown" helpfully suggests these Language Log posts:
Textual narcissism by Liberman
Textual narcissism, replication 2 by Liberman
It's all about who? by Liberman

Kesebir and Kesebir
Kesebir and Kesebir did 2 studies. In study one, they took ten words they found as synonyms of “virtue” in an unnamed thesaurus and searched Google’s Ngram for those words. Here are the ten: character, conscience, decency, dignity, ethics, morality, rectitude, righteousness, uprightness, and virtue.

In their second study, they constructed a set of 80 virtue words taken from websites about virtue in literature (e.g., honesty, patience, honor) then asked participants to rate each one as No = -1, Perhaps = 0, and Yes = 1. Then they took the 50 words with the highest averaged rating and search Ngrams for frequency.

Klein unapologetically gives no motivation for his word sets whatsoever. A “very casual paper” indeed.

The Problem
While I respect the attempt of the first two sets of authors to add some psychological reality to their linguistic categories, they fall for the same naïve assumption that plagued linguistics for hundreds of years: that people's conscious judgement of meta-linguistics is valid. For example, syntacticians discovered the folly of grammaticality judgments. I have been involved recently in a number of Mechanical Turk ratings tasks and we're finding that it is very difficult to get consistent ratings. I believe the same issue is at play here. Plus, ratings can easily be affected by context like surrounding text, yet none is given in these tasks. It's not clear what it means to rate isolated words. Word semantics by their very nature are contextual.

UPDATE: Commenter Arjan rightly brought up the great acceptability debates. One could claim that I am unfairly dismissing grammaticality judgments. And one could claim that I am not. The good folks at MIT's Tedlab have posted a few excellent resources on multiple sides of the controversy. Look under the 2010 heading on this page.

Words are not thought. These studies seem to be a variation on the “No word for X” syndrome (see here for a recent rant). Certain types of words may be used more or less frequently over some time-scale (like one century), but that doesn’t necessarily mean that we are thinking differently over that time-scale.

Unlike Brooks, I’ll link to the actual papers (all free, but the second two require email registration):

Increases in Individualistic Words and Phrases in American Books, 1960–2008. Jean M. Twenge, W. Keith Campbell and Brittany Gentile

The Cultural Salience of Moral Character and Virtue Declined in Twentieth Century America. Kesebir and Kesebir

Ngrams of the Great Transformations. Daniel B. Klein

UPDATE: *Rumor has it that WordNet has copyrighted the term "synset", so I'm being careful to avoid their cease and desist letter. Anyone know if there's truth to this rumor?


Arjan said...

Good points. Re:
For example, syntacticians discovered the folly of grammaticality judgments

I thought Sprouse and Almeida have shown how (surprisingly) reliable and replicable traditional methods of acceptability judments are?

Unknown said...

For some discussion and (non-)replication of the Twenge work, see

Chris said...

Arjan, I don't recall the specifics of Sprouse and Almeida (except that they're in a spat with Gibson and Fedorenko).

Check out this PDF: http://web.mit.edu/tedlab/tedlab_website/researchpapers/Sprouse&Almeida_InPress_LCP.pdf

But I do know that grammaticality judgments are prone to many interpretations. Note that I recognize a difference between "grammaticality judgment" and "acceptability judgments" with the difference being context. Asking someone if an construction is grammatical is a tighter standard to meet than asking if it is acceptable in certain contexts.

Add to this Colin Philips' stuff on grammatical illusions that are just plain weird, yet normal, and it all gets pretty muddled.

Chris said...

Unknown, Thanks, I remembered LL had been on top of the Twenge stuff but was a bit too lazy to look it up ;-)

I thought Fruwald had posted something good at Val Systems too but couldn't find it quickly.

I need to start using Pintrest more seriously or something. My memory plus google just ain't good enough anymore.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...