Monday, January 18, 2010

Why Linguists Should Study Math

Bob Carpenter recently made the following comment on one of my posts: I'm very excited to hear that linguists are beginning to take statistics seriously (again). I'd heard the same thing from Chris Manning a year or so ago, but then other linguists I queried were more skeptical about the role of statistics.

This brought to mind a post by Harvard economist Greg Mankiw called Why Aspiring Economists Need Math. Some of his comments are relevant to linguists (not all, though). I Googled around to see if anyone had already blogged something like this, but couldn't find much (I'd be happy to hear I missed something). Being a bold blogger with little fear of humiliation (often a poor combination, btw) I decided to take a stab at it (UPDATE: I finally discovered a post from summer 2009 by Liberman at LL on basically this same topic here).

Linguists should study math* because...
  • Math is a tool that helps you. 
  • It's not that hard.
  • Math is good training for the mind.
  • Math is the future.
  • You will be left behind without it.
  • You will be a better linguist.
Math is a tool that helps you.
Math helps you find patterns and make reliable predictions, among other things. If you are truly serious about studying linguistics, you should be greedy to get your hands on any and all tools you can find that help you study whatever sub-field you specialize in. I have provided a list of Resources for Linguists on the right panel of this blog and I continue to update it as I find more. Tools are good.

It's not that hard.
The math a linguist needs ain't rocket science. And no one is asking you to be brilliant, just competent. And you don't need to obsess over it, just a few courses. The biggest challenge is to develop a set of learning materials that are geared towards non-majors. Math and stats book are generally poorly written for the lay audience and that turns off aspiring linguists and such. As I said in my response to Carpenter: There is a natural hurdle left to encouraging linguistics students to study stats: they don't like it, that's why they're linguists. I recall a professor promoting linguistics to a large general ed undergrad course by saying it was one of the few analytical, empirical fields that did not require math. That resonated with a lot of 19 year olds. A little hand holding at the undergrad level would go a long way. A simple "stats for linguists" handbook would be perfect. I know there are some new R books focused on language data, but I don't know if they do enough hand holding.

Math is good training for the mind.
To quote Mankiw: Math is good training for the mind. It makes you a more rigorous thinker. Most athletes do push-ups. Tennis players do push-ups. Swimmers do push-ups. Cricket players do push-ups. Speed skaters do push-ups. Why do athletes from such a wide range of sports do the same exercise? Because it's a good basic exercise that helps them regardless of their sport. Math is push-ups for your mind.  Nuff said. 

Math is the future.
Like it or not, mathematical models are fast becoming the best way to understand complex phenomenon.  It's no coincidence that biologists, economists, sociologists, neuroscientists, etc. are developing mathematical models to understand their chosen phenomenon. They work. Once a phenomena reaches a certain level of complexity, the human mind is simply not able to understand it as a whole. Our brains evolved to reason about things close in time and space, but complex phenomena like language involve variables that are neither. How can we understand the interaction of thousands of variables? With mathematical models and statistical analysis. Math is not only "a" tool, it's the right tool.

You will be left behind without it.
Any 21st century linguist will be required to read about and understand mathematical models as well as understand statistical methods of analysis. Whether you are interested in Shakespearean meter (pdf), the sociolinguistic perception of identity (pdf), Hindi verb agreement violations (pdf), or the perception of vowel duration (pdf), the use of math as a tool of analysis is already here and its prevalence will only grow over the next few decades. If you're not prepared to read articles involving the term Bayesian, or (p<.01), k-means clustering, confidence interval, latent semantic analysis, bimodal and unimodal distributions, N-grams**, etc, then you will be but a shy guest at the feast of linguistics.

You will be a better linguist.
In sum, you want to be a good linguist. That's why you're getting into this. That's why you've read this far. Language problems challenge and fascinate you. You lie awake at night thinking about them. You want to be a part of the community of scholars who work to unfold the mysteries of language. Math is a tool that will help you enter that community and contribute to it in a highly productive way.

HAVING SAID THAT...

It's equally fair to say that those who are more math oriented than linguistics oriented (like the NLPers, computational linguists and such who barge into our language territory with their fancy schmancy algorithms) should tread softly as well. Yes, it is our responsibility as linguists to understand the math, but it is your responsibility to understand the linguistics, and failing to do so can lead to flawed, vacuous, and even comical results. To quote Cab Calloway in The Blues Brothers, "your lazy butts are in this too." I have consistently used this blog to critique such foolishness (and the folks at Language Log have perfected the genre). It is a mistake to take the linguistics part lightly. It's not all math. It's a little math, but it's mostly linguistics. Here are some of my previous attempts to hold non-linguists accountable for their failure to take the linguistics part seriously enough:
  • Draft of a post on sentiment analysis, in press, so to speak (taking aim at NLPers)
*For simplicity's sake, I chose to conflate the fields of mathematics and statistics into the single term "math." I'm sure objections can be raised.

**I can imagine a reader complaining that these terms are not necessarily math/stats terms, strictly speaking. Fair enough. But I believe it is basically a math/stats education that will help an aspiring linguist understand and make use of them. Also fair enough?

12 comments:

Bob Carpenter recently made the following comment on one of my posts: I'm very excited to hear that linguists are beginning to take statistics seriously (again). I'd heard the same thing from Chris Manning a year or so ago, but then other linguists I queried were more skeptical about the role of statistics.

This brought to mind a post by Harvard economist Greg Mankiw called Why Aspiring Economists Need Math. Some of his comments are relevant to linguists (not all, though). I Googled around to see if anyone had already blogged something like this, but couldn't find much (I'd be happy to hear I missed something). Being a bold blogger with little fear of humiliation (often a poor combination, btw) I decided to take a stab at it (UPDATE: I finally discovered a post from summer 2009 by Liberman at LL on basically this same topic here).

Linguists should study math* because...
  • Math is a tool that helps you. 
  • It's not that hard.
  • Math is good training for the mind.
  • Math is the future.
  • You will be left behind without it.
  • You will be a better linguist.
Math is a tool that helps you.
Math helps you find patterns and make reliable predictions, among other things. If you are truly serious about studying linguistics, you should be greedy to get your hands on any and all tools you can find that help you study whatever sub-field you specialize in. I have provided a list of Resources for Linguists on the right panel of this blog and I continue to update it as I find more. Tools are good.

It's not that hard.
The math a linguist needs ain't rocket science. And no one is asking you to be brilliant, just competent. And you don't need to obsess over it, just a few courses. The biggest challenge is to develop a set of learning materials that are geared towards non-majors. Math and stats book are generally poorly written for the lay audience and that turns off aspiring linguists and such. As I said in my response to Carpenter: There is a natural hurdle left to encouraging linguistics students to study stats: they don't like it, that's why they're linguists. I recall a professor promoting linguistics to a large general ed undergrad course by saying it was one of the few analytical, empirical fields that did not require math. That resonated with a lot of 19 year olds. A little hand holding at the undergrad level would go a long way. A simple "stats for linguists" handbook would be perfect. I know there are some new R books focused on language data, but I don't know if they do enough hand holding.

Math is good training for the mind.
To quote Mankiw: Math is good training for the mind. It makes you a more rigorous thinker. Most athletes do push-ups. Tennis players do push-ups. Swimmers do push-ups. Cricket players do push-ups. Speed skaters do push-ups. Why do athletes from such a wide range of sports do the same exercise? Because it's a good basic exercise that helps them regardless of their sport. Math is push-ups for your mind.  Nuff said. 

Math is the future.
Like it or not, mathematical models are fast becoming the best way to understand complex phenomenon.  It's no coincidence that biologists, economists, sociologists, neuroscientists, etc. are developing mathematical models to understand their chosen phenomenon. They work. Once a phenomena reaches a certain level of complexity, the human mind is simply not able to understand it as a whole. Our brains evolved to reason about things close in time and space, but complex phenomena like language involve variables that are neither. How can we understand the interaction of thousands of variables? With mathematical models and statistical analysis. Math is not only "a" tool, it's the right tool.

You will be left behind without it.
Any 21st century linguist will be required to read about and understand mathematical models as well as understand statistical methods of analysis. Whether you are interested in Shakespearean meter (pdf), the sociolinguistic perception of identity (pdf), Hindi verb agreement violations (pdf), or the perception of vowel duration (pdf), the use of math as a tool of analysis is already here and its prevalence will only grow over the next few decades. If you're not prepared to read articles involving the term Bayesian, or (p<.01), k-means clustering, confidence interval, latent semantic analysis, bimodal and unimodal distributions, N-grams**, etc, then you will be but a shy guest at the feast of linguistics.

You will be a better linguist.
In sum, you want to be a good linguist. That's why you're getting into this. That's why you've read this far. Language problems challenge and fascinate you. You lie awake at night thinking about them. You want to be a part of the community of scholars who work to unfold the mysteries of language. Math is a tool that will help you enter that community and contribute to it in a highly productive way.

HAVING SAID THAT...

It's equally fair to say that those who are more math oriented than linguistics oriented (like the NLPers, computational linguists and such who barge into our language territory with their fancy schmancy algorithms) should tread softly as well. Yes, it is our responsibility as linguists to understand the math, but it is your responsibility to understand the linguistics, and failing to do so can lead to flawed, vacuous, and even comical results. To quote Cab Calloway in The Blues Brothers, "your lazy butts are in this too." I have consistently used this blog to critique such foolishness (and the folks at Language Log have perfected the genre). It is a mistake to take the linguistics part lightly. It's not all math. It's a little math, but it's mostly linguistics. Here are some of my previous attempts to hold non-linguists accountable for their failure to take the linguistics part seriously enough:
  • Draft of a post on sentiment analysis, in press, so to speak (taking aim at NLPers)
*For simplicity's sake, I chose to conflate the fields of mathematics and statistics into the single term "math." I'm sure objections can be raised.

**I can imagine a reader complaining that these terms are not necessarily math/stats terms, strictly speaking. Fair enough. But I believe it is basically a math/stats education that will help an aspiring linguist understand and make use of them. Also fair enough?

Stumble Upon Toolbar
B H said...

*For simplicity's sake, I chose to conflate the fields of mathematics and statistics into the single term "math." I'm sure objections can be raised.

Objection 1: Mathematics also encompasses the logic and formal languages that Chomsky allegedly introduced to linguistics (and are the subject of Partee et al.'s Mathematical Methods in Linguistics). It's a shame that too few papers in generative linguistics take formalization seriously because it could actually clarify debates and definitions (like what exactly OT is). So in addition to more rigorous statistical undertakings, we should also be calling for more papers like Scholz and Pullum's recent work together contrasting the applicability of different logics in linguistics.

Chris said...

B H, good points. 'm not sure which paper you're referring to, though this one seems close:
Contrasting Applications of Logic in Natural Language Syntactic Description (pdf)

Chris said...

Ooops, pressed publish instead of edit, in any case, yes, logic is well worth any linguists time, I agree. Thanks for the comments!

MISS LORI said...

It's great how you seem to have established a good following with others; who are either involved in linguistics or have some knowledge of what you do. I still need you to help me find out how to advertise for my own blog. I particularly liked your push up analogy, and the ending statement, nicely said.
"Math is push-ups for your mind. Nuff said."

Stan said...

Symbol manipulation in one area probably enhances one's effectiveness in others. I think it's fair to say that a basic understanding of maths is an advantage in almost any academic discipline, linguistics included. One could advance a similar case for music, or indeed psychology or history, and so on. Specialisation is de rigueur, unfortunately.

Chris said...

Stan, well said. Specialization can lead to re-inventing the wheel (as a square, usually) as well as general tunnel-vision. I'm a fan of multi-disciplinary teams. Sigh, these remain remarkably rare.

steve said...

*"I'm sure objections can be raised."

as a mathematician, i'd have to agree.

Chris said...

steve, yes, I probably could have just used "stats" and been safe. That's mostly what linguists need as a research tool (true for Mankiw's post as well). I used "math" simply because it's a term more people are going to relate to.

satur said...

just stumbled into your site:
do see "The Grid of Languagee: A Deep Structure Surfaces in Tagalog" by Luis Umali Stuart at LingBuzz (ling.auf.net/lingbuzz/000996).
i'd be happy to hear what you and your readers think.
LUS

WONDER_A said...

Great. Here I was deciding what branch of Linguistics to specialize in and I find this article. I'm severely dyscalculic!

techczech said...

I don't disagree with the sentiment but I must take strong exception to this statement: "mathematical models are fast becoming the best way to understand complex phenomenon"

1. Mathematical models do not equate understanding; they are what they are i.e. models - mathematical nets cast over the ocean of language to catch mathematically-catchable phenomena. On their own, they do nothing. For instance what does Zipf's law tell us about language? Nothing. Nevertheless, it is a very good model of certain property of written texts. What has tree-representation syntactic structure told us about language? Not a single thing (unless you're a Chomskean). However, it is extremely useful for modelling certain aspects of syntax. Hidden Markov models proved the best way to simulate speech recognition. Yet, they do not help us understand how we recognize fluent speech in the least. They model certain properties of speech to make them computable.

Mathematical models never describe (let alone understand) reality. They only model its computable aspects. Sure, they have been very successful in many practical (spectacular) respects. But they have failed just as often - NLP being the prime example.

2. It is true that certain phenomena (such as word frequencies, results of experiments, probabilities, etc.) can only be approached through mathematical models. But these models tend to serve as metaphors for non-mathematical understanding and do not represent actual understanding. Your example in a later post about the mystery of the real impact of word frequencies is a case in point.

Chris said...

techczech, yes you make some fair points. There are limits to the role math can play in any scholarly field. nonetheless, I still think any 24 year old entering a PhD program today would be putting themselves in a serious career hole if they didn't make some attempt at math competency.