Thursday, October 4, 2007

Allies vs. Enemies

More on frequency and meaning. Here are the results of a “kitchen experiment” meant to test weather the relationship type “ally” could be inferred reliably from mere co-occurrences and conjunction words.

Assumption: If two names are conjoined by “and”, they are probably allies, not enemies.

Method: I took four names that have clear ally/enemy relationships and Googled each individually; then I Googled each combination in quotes (switching the names as well). The actual search queries were of the form "WINSTON CHURCHILL and FRANKLIN ROOSEVELT" but I edited them a bit in the table below to make them fit.

Names Alone

Google Hits

Adolf Hitler

2,460,000

benito mussolini

1,440,000

FRANKLIN ROOSEVELT

1,840,000

WINSTON CHURCHILL

2,330,000

Enemies

Google Hits

Adolf Hitler - WINSTON CHURCHILL

2,600

FRANKLIN ROOSEVELT - Adolf Hitler

596

WINSTON CHURCHILL - Adolf Hitler

1,680

WINSTON CHURCHILL - benito mussolini

504

benito mussolini - WINSTON CHURCHILl

7

benito mussolini - FRANKLIN ROOSEVELT

4

FRANKLIN ROOSEVELT - benito mussolini

1

Adolf Hitler - FRANKLIN ROOSEVELT

752

Allies

Google Hits

F. ROOSEVELT - WINSTON CHURCHILL

10,500

WINSTON CHURCHILL - F. ROOSEVELT

817

Adolf Hitler - benito mussolini

14,700

benito mussolini - Adolf Hitler

643

Results:
Allies
15,343
(14,700 + 643) --Adolf Hitler and benito mussolini
11,317
(10,500 + 817) -- FRANKLIN ROOSEVELT + WINSTON CHURCHILL

Enemies
4280
(2,600 + 1,680) -- WINSTON CHURCHILL + Adolf Hitler
1348 (596 + 752) -- FRANKLIN ROOSEVELT + Adolf Hitler
511 (504 + 7) -- WINSTON CHURCHILL+ benito mussolini
5 (4 + 1) -- FRANKLIN ROOSEVELT + benito mussolini

Discussion: The assumption is weakly supported. Roosevelt is conjoined with his ally Churchill more than 4 times as often as his enemy Hitler and more than 2000 times as often as Mussolini. Churchill is conjoined with his ally Roosevelt more than twice as often as he is conjoined with his enemy Hitler and more than 10 times as often as Mussolini.

The Flip-Flop Effect: The most linguistically interesting result is the more than ten-fold increase in hits that the “FRANKLIN ROOSEVELT and WINSTON CHURCHILL” query got over its “WINSTON CHURCHILL and FRANKLIN ROOSEVELT” brethren. An even greater effect is seen with Hitler/Mussolini flip-flop. Why is the Roosevelt-first collocation so much more frequent? My hunch is that there is some salience issue at work. The more salient member of the collocation will tend to be listed first.

Flaws: Surely there are more flaws to this kitchen experiment than can be enumerated easily. But the one obvious flaw that deserves mention is the normalization problem. Deciding which form of each name to use as a search was not trivial. Roosevelt is often referred to by his initials “FDR”, and both Hitler and Mussollini are commonly referred to by last name only. So this was an experiment in term collocation frequency at best, not person reference.

Note: I'm certain that either Mark Liberman or Arnold Zwicky over at Language Log have use the term “kitchen experiment” in their posts before, but a search of that site produced nothing. Hmmm, am I just imagining this term has been used before?

No comments:

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...