Does the NLP community out there care to contribute words of wisdom to the next generation of CL/NLP newbies?
You may wish to read my own discussion of what I perceive to be the difference between CL and NLP here.
Here was my advice to Thomas. You're free to attack it viciously.
I think the crucial question is about your goals: do you want to be an academic working on high level problems like parsing and discourse (in which case you're looking at getting a PhD), or do you want to get a job in industry (a PhD is good in industry, but there are plenty of NLP jobs for Master's level, even some for Bachelors)?
If industry is your answer, the school you choose won't really matter that much; it's the skills you develop. I'd strongly advise you to develop competency with machine learning, if you haven't already. You don't have to be great at it, just competent. That's a highly marketable skill set now, and will be for the foreseeable future. General competence with statistics and corpus linguistics is highly valued.
So, I'd ask each program where stats and ML fit into their programs (or how much flexibility they give you for taking electives).
And, just for the record, SUNY Buffalo has an M.S. in CL too. Not too late to apply. You can kinda surf Lake Erie (gotta be better than
(pssst, context for the surfing reference can be found on Thomas’ profile).I scanned the last 10 or so NLP related non-academic job postings on The Linguist List and found a fair bit of consistency in the skills they were asking for. Above all else, they all wanted good programming skills. If you search Monster.com for "computational linguistics" I think you'll see an even greater emphasis on programming skills.
Here's a representative sample of the "requirements" from those Linguist List job postings. Taken all together, they may look intimidating, but this is a mash-up of ten+ postings. It's just meant to sketch what industry is looking for.
- Experience in one or more of the following: MS SQL Database Server; Internet Information Services/Apache Tomcat; Windows operating systems;.NET; Java.
- Strong programming skills in at least two of the following programming languages: Python, C++, Java and Perl
- Multimodal statistical algorithms for language processing and modeling in both speech and handwriting applications
- Develop tools for efficiently processing corpora of speech and/or sketch/handwriting data;
- Work with a team of researchers and developers to successfully integrate research components and validate functionality;
- Experience desired with statistical language modeling for either speech or handwriting applications (e.g., familiarity with CMU-Cambridge LM toolkit, SRILM toolkit, ATT FST toolkit, MALLET, Libbow, etc.);
- Strong algorithmic skills and analytical background;
- Demonstrated success in working in a fast-paced environment;
- Ability to work effectively and successfully either independently and/or in a collaborative team environment.
- Experience in the creation and exploitation of domain and task ontologies in text analytics
- Strong background in statistical modelling required.
- Knowledge of machine translation or natural language processing techniques
- Ability to perform linguistic data analysis.
- Proficiency in one or more scripting languages (Perl, Python, Ruby) or programming languages, particularly C++, is a plus.
- MS or PhD in Computational Linguistics or related field.
- Work experience in production-quality NLP systems.
- Familiarity with Unix/Linux operating system environment is a plus.
- Experience in machine learning, information retrieval, or data mining are all pluses.
- Experience in the building of domain-specific ontologies is useful
- Experience in statistical analysis and machine learning
- Development, analysis, and support of grammar engine rules for English
- Experience in corpus or text analysis, conversation analysis, or computational linguistics
- Experienced architect/developer to design scalable enterprise application friendly implementations of spell checking, sentiment, named entity extraction