Saturday, April 26, 2008
Smitten with Kunis
Wednesday, April 23, 2008
"LingPipe, I hate you"
Actually, no, I do not hate LingPipe. But someone does. It is the entertaining aspect of Sitemeter that led me to this discovery.
Occasionally I check my Sitemeter page view details because it's comforting to see that people actually do read my blog (even if y'all don't comment, thpppt!) . But far more entertainment value is gained from the information about how someone came to my site. I can see what search words brought someone here. I've been collecting some of the more amusing ones and I've been meaning to post about it, but today I discovered someone had gotten to my blog by searching Google for, and I quote, "lingpipe ihate you".
I don't know what deviltry the evil duo at LingPipe is up to, but they appear to have made an enemy.
Monday, April 21, 2008
On Jobs and NLP Degrees...
Does the NLP community out there care to contribute words of wisdom to the next generation of CL/NLP newbies?
You may wish to read my own discussion of what I perceive to be the difference between CL and NLP here.
Here was my advice to Thomas. You're free to attack it viciously.
I think the crucial question is about your goals: do you want to be an academic working on high level problems like parsing and discourse (in which case you're looking at getting a PhD), or do you want to get a job in industry (a PhD is good in industry, but there are plenty of NLP jobs for Master's level, even some for Bachelors)?
If industry is your answer, the school you choose won't really matter that much; it's the skills you develop. I'd strongly advise you to develop competency with machine learning, if you haven't already. You don't have to be great at it, just competent. That's a highly marketable skill set now, and will be for the foreseeable future. General competence with statistics and corpus linguistics is highly valued.
So, I'd ask each program where stats and ML fit into their programs (or how much flexibility they give you for taking electives).
And, just for the record, SUNY Buffalo has an M.S. in CL too. Not too late to apply. You can kinda surf Lake Erie (gotta be better than
Here's a representative sample of the "requirements" from those Linguist List job postings. Taken all together, they may look intimidating, but this is a mash-up of ten+ postings. It's just meant to sketch what industry is looking for.
- Experience in one or more of the following: MS SQL Database Server; Internet Information Services/Apache Tomcat; Windows operating systems;.NET; Java.
- Strong programming skills in at least two of the following programming languages: Python, C++, Java and Perl
- Multimodal statistical algorithms for language processing and modeling in both speech and handwriting applications
- Develop tools for efficiently processing corpora of speech and/or sketch/handwriting data;
- Work with a team of researchers and developers to successfully integrate research components and validate functionality;
- Experience desired with statistical language modeling for either speech or handwriting applications (e.g., familiarity with CMU-Cambridge LM toolkit, SRILM toolkit, ATT FST toolkit, MALLET, Libbow, etc.);
- Strong algorithmic skills and analytical background;
- Demonstrated success in working in a fast-paced environment;
- Ability to work effectively and successfully either independently and/or in a collaborative team environment.
- Experience in the creation and exploitation of domain and task ontologies in text analytics
- Strong background in statistical modelling required.
- Knowledge of machine translation or natural language processing techniques
- Ability to perform linguistic data analysis.
- Proficiency in one or more scripting languages (Perl, Python, Ruby) or programming languages, particularly C++, is a plus.
- MS or PhD in Computational Linguistics or related field.
- Work experience in production-quality NLP systems.
- Familiarity with Unix/Linux operating system environment is a plus.
- Experience in machine learning, information retrieval, or data mining are all pluses.
- Experience in the building of domain-specific ontologies is useful
- Experience in statistical analysis and machine learning
- Development, analysis, and support of grammar engine rules for English
- Experience in corpus or text analysis, conversation analysis, or computational linguistics
- Experienced architect/developer to design scalable enterprise application friendly implementations of spell checking, sentiment, named entity extraction
Monday, April 14, 2008
Bacon Strength
It seems to me that an automated version of Six Degrees of Kevin Bacon ought to work AT LEAST this well, right? You simply recommend any movie that shares a cast member with a rated movie. The closer two movies are in a Kevin Bacon network, the more strongly you recommend it. Let's call this Bacon Strength. Hmmmmm, wait a second, I might be on to something ... this could be bigger than Google ... why am I telling YOU people about this ... the idea is mine, do you hear! MINE!!!!
Plus, I'm completely amazed that at least four Chuck Norris movies are available for immediate online viewing, but only the first season of the new Dr. Who. wtf?
TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department
[reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...
-
The commenters over at Liberman's post Apico-labials in English all clearly prefer the spelling syncing , but I find it just weird look...
-
(image from Slate.com ) I tend to avoid Slate.com these days because, frankly, I typically find myself scoffing at some idiot article they&...
-
Matt Damon's latest hit movie Elysium has a few linguistic oddities worth pointing out. The film takes place in a dystopian future set i...