Bob Carpenter recently made the following comment on one of my posts: I'm very excited to hear that linguists are beginning to take statistics seriously (again). I'd heard the same thing from Chris Manning a year or so ago, but then other linguists I queried were more skeptical about the role of statistics.
This brought to mind a post by Harvard economist Greg Mankiw called Why Aspiring Economists Need Math. Some of his comments are relevant to linguists (not all, though). I Googled around to see if anyone had already blogged something like this, but couldn't find much (I'd be happy to hear I missed something). Being a bold blogger with little fear of humiliation (often a poor combination, btw) I decided to take a stab at it (UPDATE: I finally discovered a post from summer 2009 by Liberman at LL on basically this same topic here).
Linguists should study math* because...
- Math is a tool that helps you.
- It's not that hard.
- Math is good training for the mind.
- Math is the future.
- You will be left behind without it.
- You will be a better linguist.
Math is a tool that helps you.
Math helps you find patterns and make reliable predictions, among other things. If you are truly serious about studying linguistics, you should be greedy to get your hands on any and all tools you can find that help you study whatever sub-field you specialize in. I have provided a list of Resources for Linguists on the right panel of this blog and I continue to update it as I find more. Tools are good.
It's not that hard.
The math a linguist needs ain't rocket science. And no one is asking you to be brilliant, just competent. And you don't need to obsess over it, just a few courses. The biggest challenge is to develop a set of learning materials that are geared towards non-majors. Math and stats book are generally poorly written for the lay audience and that turns off aspiring linguists and such. As I said in my response to Carpenter: There is a natural hurdle left to encouraging linguistics students to study stats: they don't like it, that's why they're linguists. I recall a professor promoting linguistics to a large general ed undergrad course by saying it was one of the few analytical, empirical fields that did not require math. That resonated with a lot of 19 year olds. A little hand holding at the undergrad level would go a long way. A simple "stats for linguists" handbook would be perfect. I know there are some new R books focused on language data, but I don't know if they do enough hand holding.
Math is good training for the mind.
To quote Mankiw: Math is good training for the mind. It makes you a more rigorous thinker. Most athletes do push-ups. Tennis players do push-ups. Swimmers do push-ups. Cricket players do push-ups. Speed skaters do push-ups. Why do athletes from such a wide range of sports do the same exercise? Because it's a good basic exercise that helps them regardless of their sport. Math is push-ups for your mind. Nuff said.Like it or not, mathematical models are fast becoming the best way to understand complex phenomenon. It's no coincidence that biologists, economists, sociologists, neuroscientists, etc. are developing mathematical models to understand their chosen phenomenon. They work. Once a phenomena reaches a certain level of complexity, the human mind is simply not able to understand it as a whole. Our brains evolved to reason about things close in time and space, but complex phenomena like language involve variables that are neither. How can we understand the interaction of thousands of variables? With mathematical models and statistical analysis. Math is not only "a" tool, it's the right tool.
Math is the future.
You will be left behind without it.
Any 21st century linguist will be required to read about and understand mathematical models as well as understand statistical methods of analysis. Whether you are interested in Shakespearean meter (pdf), the sociolinguistic perception of identity (pdf), Hindi verb agreement violations (pdf), or the perception of vowel duration (pdf), the use of math as a tool of analysis is already here and its prevalence will only grow over the next few decades. If you're not prepared to read articles involving the term Bayesian, or (p<.01), k-means clustering, confidence interval, latent semantic analysis, bimodal and unimodal distributions, N-grams**, etc, then you will be but a shy guest at the feast of linguistics.
You will be a better linguist.
In sum, you want to be a good linguist. That's why you're getting into this. That's why you've read this far. Language problems challenge and fascinate you. You lie awake at night thinking about them. You want to be a part of the community of scholars who work to unfold the mysteries of language. Math is a tool that will help you enter that community and contribute to it in a highly productive way.
HAVING SAID THAT...
It's equally fair to say that those who are more math oriented than linguistics oriented (like the NLPers, computational linguists and such who barge into our language territory with their fancy schmancy algorithms) should tread softly as well. Yes, it is our responsibility as linguists to understand the math, but it is your responsibility to understand the linguistics, and failing to do so can lead to flawed, vacuous, and even comical results. To quote Cab Calloway in The Blues Brothers, "your lazy butts are in this too." I have consistently used this blog to critique such foolishness (and the folks at Language Log have perfected the genre). It is a mistake to take the linguistics part lightly. It's not all math. It's a little math, but it's mostly linguistics. Here are some of my previous attempts to hold non-linguists accountable for their failure to take the linguistics part seriously enough:
- The Full Liberman (taking aim at a psychologist)
- Thinking Words (taking aim at a philosopher)
- SEX! TORTURE! BANANA! (taking aim at psychologists)
- On Linguistic Fingerprinting (taking aim at physicists)
- Draft of a post on sentiment analysis, in press, so to speak (taking aim at NLPers)
**I can imagine a reader complaining that these terms are not necessarily math/stats terms, strictly speaking. Fair enough. But I believe it is basically a math/stats education that will help an aspiring linguist understand and make use of them. Also fair enough?