Saturday, May 24, 2008

On the strengths and weaknesses of “theoretically”

Do scientists use the word “theoretically” the opposite of the way non-scientists do? Let’s see. Below, I walk through a cheap and dirty corpus linguistics method for investigating distributional differences.

This week a colleague of mine brought up an interesting point about the lay person’s use of the word “theoretically” and I thought it merited some investigation and a post. My colleague’s point sprang from a quote that a political pundit made regarding the West Virginia primary and the impact of racism. The pundit was making what is now a rather conventional observation: roughly 20% of white Clinton voters in West Virginia were willing to publicly admit that race was a factor in their voting decision (I don’t know if this is true or not, but it has been repeated in the mainstream media many times, here’s one AP example); if we assume that most people who are at least slightly racist are also at least slightly private in their racism, then we can derive the deduction that more than 20% actually considered the race of the candidates when making their decisions, but some of them refused to admit it.

The pundit chose to express this reasoning by saying something like this (I don’t have a direct quote): ‘In WV, 20% of white Clinton voters admitted that race was a factor; theoretically, this means it’s possible that more than 20% actually considered race important’.

My colleague’s point was that this use of theoretically does not really mean the speaker is referencing some specific scientific theory which predicts a particular outcome. It means something closer to arguably or probably. I agreed and proposed that it might be acting like an inferential evidential, couching the claim in the guise of a fact derived by deductive reasoning.

This got me to wondering about how the word theoretically patterns in common usage. I decided to conduct a brief experiment comparing the words theoretically, arguably, and probably. In order to do this efficiently, I chose to use the freely available and wholly online Collins Cobuild Concordance and Collocations Sampler. This handy dandy online tool allows anyone to extract distributional facts about words quickly from a corpus of 56 million words composed of three subcorpora:

British books, ephemera, radio, newspapers, magazines (36m words)
American books, ephemera and radio (10m words)
British transcribed speech (10m words)

There are two basic tools available:

1) Corpus Concordance Sampler: provides the search word and the sentence it occurred in (well, not quite the sentence, but close enough for Saturday afternoon)

2) Collocation Sampler: provides the words that are statistically significantly associated with the search words (Mutual Information Score plus t-score of significance)

Let’s start with collocates. Below I’ve pasted the ten most significant collocates for the three words under investigation. You could interpret this table as saying something like this: the pronoun it is 3.5 times more likely to occur within 8 words of theoretically than you would assume from random co-occurrence. It is standard within corpus linguistics to interpret this non-random co-occurrence to mean that these two words have some semantically meaningful association (pssst, we should be careful not to over-interpret the meaning of the co-occurrence behavior of pronouns).

theoretically: Collins t-score

Collocate

Corpus Freq

Joint Freq

Significance

it

494702

33

3.555134

could

59556

12

3.027003

least

12333

8

2.717569

possible

10266

6

2.342936

be

234656

15

2.332595

can

113012

9

2.042261

was

340423

16

1.83627

should

35882

5

1.828091

less

14186

4

1.819667

is

407114

18

1.803011

arguably: Collins t-score

Collocate

Corpus Freq

Joint Freq

Significance

the

2313407

179

7.308255

most

43653

53

7.069587

is

407114

56

5.573264

best

20161

32

5.531725

greatest

2506

13

3.581149

important

13468

12

3.327601

was

340423

29

3.165729

finest

1067

7

2.631592

beautiful

4076

7

2.591662

more

94468

11

2.316599

probably: Collins t-score

Collocate

Corpus Freq

Joint Freq

Significance

would

97660

1076

26.18472

it

494702

2333

25.53529

will

111798

1092

25.52538

i

512080

2204

22.70137

think

70465

779

22.29877

you

421797

1797

20.274

ll

34908

545

20.02152

be

234656

1159

18.72308

d

43704

462

16.97461

most

43653

457

16.83863

Note that “d” is likely the contraction for “would” and “ll” is the contraction for “will”.

Analysis: the fact that jumps out at me is the pervasiveness of modal verbs on the lists for theoretically and probably. They both have four modals in their top ten. In all cases, they seem to be expressing epistemic modality in order to hedge the certainty of whatever is being claimed. On the other hand, the word arguably has zero modals in its top ten, but it does have four superlatives, while theoretically has zero superlatives and probably has only one. A picture appears to be emerging.

First Pass Interpretation: All three words appear to be used as hedges. But theoretically and probably appear to pattern closely together as generic hedges, while arguably seems to be a hedge that is strongly associated with superlative claims.

It seems to me that this use of theoretically as a hedge in lay discourse is in contrast with its use in scientific discourse. I would assume scientists are more likely to use theoretically in a sentence to strengthen their claim, not weaken it. However, in non-scientific circles I would guess that scientific theory is regarded with some suspicion. When a claim is based on a theory, non-scientist are more likely than not to consider it not really true. Using the word theoretically, for the lay person, is a way to say “I’m not sure…maybe not.”

Now let’s looks at the concordance. I took the concordance output and performed a little judgment task. For each output set, I created two alternative documents by replacing the target word with one of its alternatives. So the theoretically document had a theoretically_into_probably alternative and a theoretically_into_arguably alternative. I used a five point scale to judge the synonymy of each substitution (i.e., I asked myself ‘does this sentence mean the same thing with the substitution?’):

1. clearly the same meaning
2. kind of the same meaning
3. can’t decide
4. kind of different meaning
5. clearly different meaning

The winner, so to speak, of the most similar prize seems to be arguably into probably. This is to say that I find it generally synonymous to replace the word arguably with probably. Although the reverse, probably into arguably, was pretty good too.

The cases where the theoretically into probably substitution works best seems to be the sentence initial cases where the word is acting as an adverbial hedge over the proposition encoded by the whole clause. But these cases are few. Although these two words seemed to have similar collocates, they do not seem very similar in their actual distributions. This is driven, I think, by the existence of the semantically dissimilar technical use for theoretically. Neither probably nor arguably have this sort of clearly technical usage. Statisticians would likely use the noun probability rather than the adverb probably. I doubt logicians use arguably at all.

theoretically into probably

score

sentence

5

on the way much of the material treated theoretically probably in the lectures.

5

for me the course was very stimulating theoretically probably and practically.

4

Theoretically Probably, this means that at one point during the

4

Theoretically Probably, HMS Champion has been reduced to a lot

5

Let's say it could be done -- theoretically probably."

2

theoretically probably followed in the footsteps of the more

5

Theoretically Probably, underdevelopment could last forever. His

3

Theoretically Probably, it is based on the false notion that

5

`It" exists simply because it is theoretically probably defined,

2

The Social Democrats theoretically probably could now force the Kohl government to

theoretically into arguably

score

sentence

5

on the way much of the material treated theoretically arguably in the lectures.

5

for me the course was very stimulating theoretically arguably and practically.

2

Theoretically Arguably, this means that at one point during the

2

Theoretically Arguably, HMS Champion has been reduced to a lot

4

Let's say it could be done -- theoretically arguably."

4

Theoretically Arguably, underdevelopment could last forever.

5

probably claim that their disciplined, theoretically arguably-informed way of perceiving the social

5

Theoretically Arguably, it is based on the false notion that

5

`It" exists simply because it is theoretically arguably defined, and the ensuing model is then

4

The Social Democrats theoretically probably could now force the Kohl government to

arguably into probably

score

sentence

2

The finishes are arguably probably the finest in Tudor Court. [p] Adjacent to

2

Arguably Probably one of the finest album releases this year.

2

Orfeo is arguably probably the greatest dance work created in Britain

1

A really dark orange, arguably probably the darkest we have ever seen.

1

for instance the first, and arguably probably most important, aim to develop lively

3

Indeed, songs like `Green', arguably probably the standout track of their debut,

2

it belongs to arguably probably the toughest bunch of surfers in the world.

2

Wayne Westner, arguably probably the longest hitter on the European Tour,

2

Arguably Probably, Sheila Henry's behaviour in holding the

2

A week there is arguably probably the most interesting seven days currently

arguably into theoretically

score

sentence

5

The finishes are arguably theoretically the finest in Tudor Court.

5

Arguably Theoretically one of the finest album releases this year.

5

Orfeo is arguably theoretically the greatest dance work created in Britain

5

A really dark orange, arguably theoretically the darkest we have ever seen.

5

for instance the first, and arguably theoretically most important, aim to develop lively

5

Indeed, songs like `Green', arguably theoretically the standout track of their debut,

5

it belongs to arguably theoretically the toughest bunch of surfers in the world.

5

Wayne Westner, arguably theoretically the longest hitter on the European Tour,

5

Arguably Theoretically, Sheila Henry's behaviour in holding the

5

A week there is arguably theoretically the most interesting seven days currently

probably into arguably

score

sentence

2

your property is probably arguably worth far more than you owe on it.

5

You've probably arguably guessed that we couldn't resist using white

2

It probably arguably would, but you'd just have been postponing

4

It probably arguably sounds silly,' she adds,

3

We came to agreement at 1am, probably arguably at the highest pitch of drunkenness

4

a member and an ANC team which it said would probably arguably include Mr Mandela.

5

I'm probably arguably all wrong," he said.

4

it was the last that probably arguably accounted for their rapid answers.

2

Those in the high intelligence group are probably arguably similarly subject to difficulties

1

It was probably arguably the most comprehensive compilation of facts

probably into theoretically

score

sentence

4

your property is probably theoretically worth far more than you owe on it.

5

You've probably theoretically guessed that we couldn't resist using white

3

It probably theoretically would, but you'd just have been postponing

5

It probably theoretically sounds silly,' she adds, `but when it was

5

We came to agreement at 1am, probably theoretically at the highest pitch of drunkenness

4

a member and an ANC team which it said would probably theoretically include Mr Mandela.

5

I'm probably theoretically all wrong," he said.

5

it was the last that probably theoretically accounted for their rapid answers.

5

Those in the high intelligence group are probably theoretically similarly subject to difficulties

5

It was probably theoretically the most comprehensive compilation of facts

2 comments:

Chris said...

Ugh! My sentence tables are all screwed up. They should have the scores displayed too. I'm not in the mood to edit the HTML right now...maybe later.

Chris said...

as a quick fix, I just reversed the order of columns.

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

 [reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...