Monday, April 21, 2008

On Jobs and NLP Degrees...

Thomas posted an interesting quandary recently. I'll summarize it this way: How does a person choose which M.S. program in NLP to attend? As far as Thomas and I are aware, there are no rankings for computational linguistics/NLP programs; so, is word of mouth all anyone has to go on? Does anyone out there know of any resources for helping someone like Thomas?

Does the NLP community out there care to contribute words of wisdom to the next generation of CL/NLP newbies?

You may wish to read my own discussion of what I perceive to be the difference between CL and NLP here.

Here was my advice to Thomas. You're free to attack it viciously.

I think the crucial question is about your goals: do you want to be an academic working on high level problems like parsing and discourse (in which case you're looking at getting a PhD), or do you want to get a job in industry (a PhD is good in industry, but there are plenty of NLP jobs for Master's level, even some for Bachelors)?

If industry is your answer, the school you choose won't really matter that much; it's the skills you develop. I'd strongly advise you to develop competency with machine learning, if you haven't already. You don't have to be great at it, just competent. That's a highly marketable skill set now, and will be for the foreseeable future. General competence with statistics and corpus linguistics is highly valued.

So, I'd ask each program where stats and ML fit into their programs (or how much flexibility they give you for taking electives).

And, just for the record, SUNY Buffalo has an M.S. in CL too. Not too late to apply. You can kinda surf Lake Erie (gotta be better than Georgia surfing).

(pssst, context for the surfing reference can be found on Thomas’ profile).

I scanned the last 10 or so NLP related non-academic job postings on The Linguist List and found a fair bit of consistency in the skills they were asking for. Above all else, they all wanted good programming skills. If you search Monster.com for "computational linguistics" I think you'll see an even greater emphasis on programming skills.

Here's a representative sample of the "requirements" from those Linguist List job postings. Taken all together, they may look intimidating, but this is a mash-up of ten+ postings. It's just meant to sketch what industry is looking for.
  • Experience in one or more of the following: MS SQL Database Server; Internet Information Services/Apache Tomcat; Windows operating systems;.NET; Java.
  • Strong programming skills in at least two of the following programming languages: Python, C++, Java and Perl
  • Multimodal statistical algorithms for language processing and modeling in both speech and handwriting applications
  • Develop tools for efficiently processing corpora of speech and/or sketch/handwriting data;
  • Work with a team of researchers and developers to successfully integrate research components and validate functionality;
  • Experience desired with statistical language modeling for either speech or handwriting applications (e.g., familiarity with CMU-Cambridge LM toolkit, SRILM toolkit, ATT FST toolkit, MALLET, Libbow, etc.);
  • Strong algorithmic skills and analytical background;
  • Demonstrated success in working in a fast-paced environment;
  • Ability to work effectively and successfully either independently and/or in a collaborative team environment.
  • Experience in the creation and exploitation of domain and task ontologies in text analytics
  • Strong background in statistical modelling required.
  • Knowledge of machine translation or natural language processing techniques
  • Ability to perform linguistic data analysis.
  • Proficiency in one or more scripting languages (Perl, Python, Ruby) or programming languages, particularly C++, is a plus.
  • MS or PhD in Computational Linguistics or related field.
  • Work experience in production-quality NLP systems.
  • Familiarity with Unix/Linux operating system environment is a plus.
  • Experience in machine learning, information retrieval, or data mining are all pluses.
  • Experience in the building of domain-specific ontologies is useful
  • Experience in statistical analysis and machine learning
  • Development, analysis, and support of grammar engine rules for English
  • Experience in corpus or text analysis, conversation analysis, or computational linguistics
  • Experienced architect/developer to design scalable enterprise application friendly implementations of spell checking, sentiment, named entity extraction

9 comments:

THOMAS AMUNDSEN said...

Hi Chris,

Thanks for taking the time to reply to my post and publishing a blog about the topic.

Those are all good points to consider.

Just for the record, the Professional Master's program in Computational Linguistics at the University of Washington involves _no_ Machine Learning classes.

Georgia Tech has two machine learning classes that I can remember from when I was an undergrad there. (I actually took one of them for a few weeks before dropping the class) Columbia only has 1 Machine Learning class. I'm not so sure about USC.

But the program at Washington looks like its very focused on the application side of things, with lots of projects and hands-on experience.

Anyway, thanks for the advice!

Anonymous said...

If you want to go into academia, apply to a Ph.D. program directly unless you have lots of background to make up (usually in math and CS).

These departments all have different perspectives determined by their researchers and their faculty administrators. So your best bet is to check out the kinds of projects the professors work on and the grad students work on.

Visit and talk to the other students to see what they're doing. It may seem expensive and time-consuming, but will be well worth it long run.

If you want to go somewhere good for comp ling, where are (1) Stanford, (2) Penn, (3) CMU and (4) UMass on your list? I know Stanford and CMU offer MS programs.

I also just wrote my own set of advice for students in the form of a blog post on a computational linguistics curriculum that'd be suitable for any level. No reason not to get started with the reading on your own.

Chris said...

Bob, thanks for the input. You're the right guy to give advice on this, to be sure.

Anonymous said...

Hi, I'm months late in commenting, but I'm a current comp ling MS student at Georgetown and all my electives have been in machine learning (they're very good about taking classes in the ling dept or cs dept). However, it's not a "professional masters" program in that there is a thesis requirement, but at least that demonstrates some research ability, which should make some employers happy, right?

Chris said...

Marianna, it's never too late to leave a comment, hehe.

And no, haha, demonstrating research ability will have little to no impact on an entry level NLP job, sorry.

I think Bob's post that he linked to above is well worth reading, I bow to his superior intellect. There are few people in the world better suited to giving this kind of advice than he.

I will say, and Bob or anyone else is welcomed to comment of course, that I wouldn't put too much faith in industry's concern for written research. The best thing you can do is have code samples up and ready for inspection. One well known CL company once asked me for a code sample 10,000 lines or longer. That is not the norm, of course, nonetheless, you should be prepared to show some original code if you want a serious position.

It might help if you take a look at the blog I link to from my main page, The Mendicant Bug, written by Jason, a recent MS in NLP from Carnegie Melon who has just started his first job in industry.

And damn, I miss DC. Go have a drink at the Black Cat for me, okay?

Jessica said...

I'm late as well in posting, but I was wondering if you are aware of any good master's programs in Europe? I'm living in the Netherlands and currently finishing up my degree. I began in America with Computer Science (my university didn't offer anything with NLP which is part of the reason I'm here now), and now I'm doing Information Science at the University of Groningen in The Netherlands. I want to apply to a solid Master's program and then go into industry (hopefully something nlp/bio related).

Thanks :)

Chris said...

Jessica, thanks for the question. It's never too late. Blog posts exists outside of time, hehe.

The three European CL programs I think of first are these:

Germany -- Saarland University, Saarbrücken, Department of Computational Linguistics and Phonetics

Scotland - University of Edinburgh, Center for Cognitive Science

Switzerland - University of Edinburgh, Center for Cognitive Science

My apologies to all the good programs I have omitted.

As for Bio NLP, I suggest you contact Bob Carpenter for his suggestions, he works on that professionally and he knows everyone (and everyone knows him, hehe). You should also Google around for BIO NLP conferences and see who's been presenting and where they teach at.

May I assume you have already attended and an ESSLI conference?

Also, I will repeat my basic piece of NLP advice: become competent in machine learning.

Jessica said...

I went to the Computational Linguistic Fall School in Potsdam, Germany last year (well, 2007 now), but I wasn't able to make it to ESSLLI in 2008 (I had just moved over here). I'm hoping to go this year. Now that I think about it, one of the professors at the Fall School was from the University of Edinburgh.

Thanks for the suggestions! I will check out those schools and try to talk with Bob Carpenter eventually here. I've used LingPipe briefly before but never looked at the blog :)

Abhishek Patnia said...

I am attending the HLT program at USC. Here are the highlights, the three main courses are

1. NLP(the basics)
2. Statistical NLP(ML application to NLP)
3. Probabilistic Graphical Models

Also the professors are good, some of the big names are

1. Hovy
2. Knight
3. Huang
4. Hobbs

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

 [reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...