Thursday, June 10, 2010

Is Arabic The Least Positive Language? (hint, no) ... sigh

Sometimes bad science reporting is a function of bad science. Garbage in, garbage out.

There's been some buzz about new research regarding the bias of negative and positive words in English as well as cross linguistically. I have refrained from commenting because it sounded like typical bad reporting and misunderstanding of academic research. Then Andrew Sullivan got involved. Sigh. Sullivan has his strengths and weaknesses as a blogger. His strength shone brightly last summer when he helped publicize the Iranian green movement. His weakness, however, peeps out anytime he blogs about anything remotely related to science or academics (see HERE and HERE). His most recent silliness has the title The English Language Is An Optimist. His megaphone is so big, I feel someone must clear up the foggy facts and murky interpretations currently being disseminated.

To begin, the research under question is from Rozin et al., U Penn psychologists who appear to be focused on emotion research (full citation below). As far as I can tell, no linguists were involved (and boy oh boy, they should have been. Ya know, Penn has a linguistics department that is, let's just say, above average). The basic point of the research cited is this: Positive events are more common (more tokens), but negative events are more differentiated (more types).  Sullivan simply posts a quote from another blog which regurgitates the research as if it were true with no ciritical analysis on anyone's part. I will offer the much needed critical analysis here.

Here are the four facts about English that everyone seems to find so fascinating:

  • Negative words are often composed of the positive root negated with a prefix (e.g., unhappy, insincere, unpleasant), while the reverse is exceptional (e.g., unselfish, uncontaminated).
  • Negated positive adjectives tend to have a negative valence, whereas negated negative adjectives tend to be neutral in valence
  • Usually, only positive adjectives are used to refer to the whole positive negative dimension
  • In conjunctions or disjunctions, positive adjectives are usually mentioned before the opposite negative adjectives
One of the disappointing things about this buzz is that the facts gaining the buzz are decontextualized from the research reported in the paper (a common ingredient in these kinds of stories). As far as I can tell, all Rozin et al. did was this: take some random linguistic facts published in 1978, look up some arbitrary words in a frequency table, then administer a short one-on-one survey with a small group of informants. That's it! And that's not much. It's modest qualitative analysis masquerading as comprehensive quantitative data gathering.  The whole premise of this paper is based on a bold claim that "most of the events experienced in life have positive implications." They cite research on this that I have not looked into, so I have no clue what they mean by this. What do they mean by "event"? I suspect their use of "event" and the use of this word by semanticists (especially formal semanticists) is quite different. Linguists, and semanticists in particular, really care about defining what an "event" is. If you want to have some fun, ask five formal semanticists to define "event". Sparks will fly. Then, ask them 'when does the beginning of an event end?'. Oh my, fisticuffs are certain.

While linguists are obsessed with being quite disciplined with these kinds of things, psychologists don't seem to be.  I confess that what little emotion research I've read is disappointing. The field seems plagued by vague terms and weak methodology. But that's not what my principle critique will center on. I'm more interested in what they actually did. Let's walk through their methodology, shall we?

Take some random linguistic facts published in 1978
Rozin et al. report that "positive words (tokens, not types) occur with much higher frequency than negative words in English" but they only cite three studies, all of which were published before 1984, one of them in the 1960s, but their four big facts come from one study, Matlin, M. W., & Stang, D. J. (1978). The Polyanna principle. Selectivity in language, memory, and thought. Cambridge, MA: Schenkman Publishing Company. 1978!...In other words,  before the advent of large, easily searchable corpora. Rozin et al. give no operational definition of what a "positive word" or a "negative word" is. They appear to just assume that such things exist and they're easy to identify. In other words, they assume the linguistics part is easy, so why bother working hard at it. Bad psychologist, bad (imagine me slapping their noses with a newspaper while saying this). If we take a "negative word" to simply mean the negated form of another word, well, then, yeah, sure they're gonna be marked. If it's something else, we need to know what that something else is. If we don't have a good definition of what these things are, then how do we go about finding them?  Well, Rozin et al. just decided arbitrary intuition was good enough:

These ‘‘reference’’ adjectives were selected in advance by the authors, such that some were negative and some positive. They were common ajectives in English, but were selected by convenience, with the proviso that we knew in advance for all cases that the positive asymmetries we were exploring were present for these words in English. (emphasis added)

Their eight adjectives were pleasant, sad, dirty, disgusting, bad, sincere, pure, and beautiful.

Look up some arbitrary words in a single frequency table
Once they came up with their list, they looked up each word in Leech's Word frequencies in written and spoken English.

We confirmed this in a preliminary study, searching for positive and negative valenced adjective frequency in an extensive corpus of over 100 million words of both spoken and written British English (Leech, Rayson, & Wilson, 1971, also available on the Internet). We searched for frequency of English usage for the seven adjectives we examined across languages in the first part of the present study and their opposites (opposites listed after the solidus: pleasant/aversive, sad/ happy, dirty/clean, bad/good, sincere/no obvious opposite in English,  ure/contaminated, beautiful/ ugly). We also searched for the negation of any of these words, when it formed a word in English, which was the case for 5 of 7 positive words (unpleasant, unhappy, unclean, insincere, impure) 

That right there was the sum total of corpus research they did. Mere frequency counts don't tell us much. This is the worst kind of corpus linguistics where simple word counts are imbued with magic and meaning. Nope. Nothing terribly meaningful in word counts all by themselves.  It would have been easy to gather collocation data and give us some sense of what significant co-occurrence was going on. But no, they give us nothing.

Administer a short one-on-one survey with a small group of informants
We interviewed one native speaker of each of 20 languages, not including English. The languages were: Mandarin, Cantonese, Japanese, Korean, Vietnamese, Thai, Tagalog, Ibo, Arabic, Turkish, Tamil, Hindi, German, Icelandic, Swedish, French, Portuguese (Brazilian), Spanish, Russian, and Polish. The languages were selected by convenience ... The informants were asked ten questions about eight adjectives, half positive...It was essential that all informants had an intuitive sense of the language they were being interviewed about, since the questions had little to do with ‘‘rules’’ of syntax, but rather relied on what ‘‘sounds right’’. (emphasis & jumps added)

So they took one fluent speaker of Vietnamese, gave her/him an English adjective, and then asked these ten questions (and repeat for each speaker).
  • Is there a positive word?
  • Is there a negative word?
  • Can the positive word be negated?
  • Can the negative word be negated?
  • Is the negation of the positive word neutral or extreme? (# extreme)
  • Is the negation of the negative word neutral or extreme? (# extreme)
  • Would the informant rather be ‘‘unnegative’’ or ‘‘unpositive’’? (# prefer unneg.)
  • Can the negative word be used on the positive end of the spectrum? (# yes)
  • Can the positive word be used on the negative end of the spectrum? (# yes)
  • Does it sound better to say the negative or positive word first? (# pos. first)
This looks more like a weak study of lexical access, or cross-linguistic priming, than the study of positive/negative semantic space. Survey tools like this are best as a beginning, very preliminary stage of deeper research (i.e., not published). Descriptive linguists/grammarians work with speakers for years to tease out these kinds of semantic judgments. This is no easy thing to do. With only one informant per language and such little information (except English), it's hard to tell if this is just noise. The authors dismiss the noise potential in what, to me, sounded entirely illogical:  

Any uncertainty or inaccuracy of a single informant would, of course, not bias our findings but would add ‘‘noise’’ and make it more difficult to demonstrate a strong commonality across languages.

If one and only one informant were inaccurate, then that's noise, but what if all were slightly confused by an awkward task?  That's garbage. How would we know? The authors tried to address this by interviewing "10 native English speakers (8 of them students at the University of Pennsylvania), presenting our full protocol of questions for three of the adjectives: sad, good and pleasant, and for all five ‘‘unique’’ negative nouns." So they took a sub-set of their already small data and tested 10 informants in one language, assuming that the variation they found in that one language would be a good proxy for the variation in any other language.  I just don't think that's a wise assumption.

Ultimately my impression of this article was that it's weak research about a topic that people love so much, they're willing to take sound-bite blogging at face value. This is borderline rumor mongering. Did this research say that the English language is "optimistic", Andrew? No, it did not.  Did this particular study find that positive events outnumber negative events, Andrew? No, it did not.

Let me make my point by using their own data.  Rozin et al. found that English showed the largest percent of cases with positive dominating negative and Arabic the least*. Now, imagine I claim that Arabic is the least positive language? How happy would you be with this interpretation? Should we be any more happy with Rozin's interpretations? Sullivan's?

*I'm not sure I completely understand this result because they didn't publish their actual results, but I think it means that for the 7 adjectives, the biases in Arabic were all small (i.e., pos/neg were all similar).
Rozin, P., Berman, L., & Royzman, E. (2010). Biases in use of positive and negative words across twenty natural languages Cognition & Emotion, 24 (3), 536-548 DOI: 10.1080/02699930902793462


BThree said...

I think one of the major flaws in this study is also the morphology of Arabic, which does not encourage a lot of affixation in the formation of new lexical items. Arabic does have a negating particle, laa, which can be placed before an adjective, but it still exists as a separate word. It is relatively rare, and largely used to translate Western terminology. There also have ghayr, which is also a separate word placed before the adjective, as in the phrase "ghayr al-maghduubi" - "those who have not aroused anger". Contrast this with the English use of the Latin and Greek prefices "non" and "a-". Therefore, the burden of any negative connotations is carried entirely by the root. So, methodological shortcomings aside, it could be possible that Arabic has more "negative" words simply because the morphological poverty of the language places this burden on the lexicon.

Chris said...

BThree, you're absolutely right that the failure to take the internal morphological structure of the languages into consideration was a serious oversight.

Janak India - Surveying Instrument said...

Janak is the best manufacturers suppliers GPS X20 Single Frequency GP5 Receiver, CHC X20 LI GP5 Receiver surveying and positioning equipments dealers.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...