Monday, January 31, 2011

how (not) to do linguistics

Jonah Lehrer, the neuro-blogger, has a mixed track record, as far as I'm concerned. His initial blogging was nice, but a tad lightweight, then he started to sound a bit too Malcom Gladwell-ee (in that I wasn't entirely sure he knew what he was talking about beyond having a few short phone calls with one or two scientists then babbling on about a topic).

But he's hit a home run with this long New Yorker piece about the failure of the journal review process in science: The Truth Wears Off. He draws examples from medicine, physics, and psychology.

Perhaps the most disappointing part is the realization the the standards of testing and conclusiveness in linguistics are so far from those in more established science.

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity.

Repeating studies is virtually unheard of in linguistics. Also, Lehrer mentions the publication bias in journals. When a result is discovered, there is a bias towards positive results. After a while, once the result is accepted, then only negative results are published because only that is "interesting" anymore. But I would expand this point to say this same bias exists at every stage of the research process. We want to find things that happen, we don't care about spending 5 years and thousands of hours discovering that X does NOT cause Y! So when young grad students begin scoping out a new study, they throw away anything that doesn't seem fruitful, where fruitful is defined as yielding positive results. This bias affects the very foundation of the research process, namely answering the basic question: what should I study?

As a side note, engineers seem perfectly happy to follow through on null results. They need to know the full scope of their problem before solving it. Scientists can learn a lot from engineers (and vice versa).

[Psychology professor Jonathan] Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. “It would help us finally deal with all these issues that the decline effect is exposing.”

Coincidentally, I was recently tweeting with moximer and jasonpriem about this and we agreed that research wikis are worth explolring. My vision would be something akin to Wikipedia but where a researcher stores all of their data, stimuli, results, etc, finished or not. The data could be tagged as tentative, draft, failed, successful, etc. As the research goes on, the data get updated. Not only would this record failure (which, as Leherer points out in the article) is as valuable as success, it also records change. How did a study evolve over time?

True, the data would become huge over time across many disciplines, but that just means means we need better and better data mining tools (and the boys at LingPipe are working away at those tools).

HT rapella

4 comments:

Rachel Cotterill said...

So true! I'm in computational linguistics now, and it's still true to some extent... people don't always even publish their data to *enable* repeatability. I wish it wasn't so!

Chris said...

Rachel, this is not restricted to academics either. Police detectives are afflicted with similar problems when analyzing crimes. It takes great discipline to see ones own biases, but we also need institutional incentives to do so, and right now, the incentives are distorted.

Here my capitalist pig instincts take over and suggest that free market pressures (which are absent from both academic and law enforcement markets) play a good role in steadying the ship of progress.

G said...

Free market pressures helping to get better results in research??? Pull the other one, it has bells on.

While the free market (and academic ethics) may eventually get rid of researchers who produce results that are not replicable, it works both ways. The free market also provides large incentives for people to do sloppy-but-convincing work that will enable them to get their next academic position.

Ultimately, the free market has little to say about truth, because you cannot really judge the truth or falsity of research results until long after they are bought and paid for. Free markets work well when you know what you are buying. They work when you can make a detailed comparison of two things and pick the one that best meets your needs and budget.

Unfortunately, with research, you need to pay before you can see whether you have purchased a true or false result, and you rarely buy the same research twice. As a result, one can never make an informed decision on the research itself. Research is more like getting married than buying an iPod: what you will get out of either a marriage or a research project is unknowable at the moment of contract.

So, no, it is not a contract between two parties who can make accurate and informed estimates of the likely outcome. Why should you expect free markets to work in this case?

- said...

I'm new here, but enjoyed this article and your follow-on (capitolist pig) comment, and would like to ask a question that's been on my mind:
How could we improve the system by rewarding researchers to reveal their flaws and/or errors?

It seems now that there's no incentive for people to confess it when they [we] blow it ... especially after the fact.

Inventors have incentive to work through their failures and find an actual working solution (their truth). What about researchers, detectives, theologians, etc?

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...