I recently received an email from a US undergraduate interested in tools and resources for NLP, particularly free tagged corpora. Luckily, the NLP field has matured into an open access friendly crowd, so there are lots of resources freely available. Maybe too many. To be honest, too many search result hits is a pain. Newbies aren't looking for ridiculously long lists of resources which they have to pick through exactly BECAUSE they're newbies! They don't know how to choose between them. And all too often expert/experienced NLPers will simply push their pet language or resources not because its appropriate for newbies, but because it's the pet of the expert.
So my unsolicited teachable moment #333256: give newbies/students recommendations that are appropriate for them, not appropriate for you.
For example, with all due respect, no newbie NLPer should go anywhere near the Stanford NLP Annotated List of Resources. I'm the first to admit that's a GREAT list of resources. No argument from me. But most of those resources requires at least basic familiarization with NLP before starting (most require more).
For true newbies, The Natural Language Toolkit remains my preferred option. Its excellent teaching book, tutorials, packaged corpora and data, and solid documentation make it the reigning king of NLP intro tools. Plus, it's a mature enough toolkit to be used for more extensive projects. Hard to go wrong.
FWIW, This post was not a paid endorsement of any kind. I have no professional or personal relationship with anyone involved in the NLTK. I follow several people involved with the project on Twitter. That's as close to a personal involvement as I get. This post is not meant as a commercial advertisement, but rather as my own personal opinion.
Subscribe to:
Post Comments (Atom)
TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department
[reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...
-
The commenters over at Liberman's post Apico-labials in English all clearly prefer the spelling syncing , but I find it just weird look...
-
Matt Damon's latest hit movie Elysium has a few linguistic oddities worth pointing out. The film takes place in a dystopian future set i...
-
(image from Slate.com ) I tend to avoid Slate.com these days because, frankly, I typically find myself scoffing at some idiot article they...
2 comments:
I started with the NLTK, but I've moved to JAVA and R since I'm not THAT interested in NLP, but rather Corpus Linguistics. But yeah, the NLTK is great for beginners.
Matias: yes, I hear a lot of that. NLTK is a great starter, and mature enough to handle some serious projects. But R is like a black hole, it just seems to be drawing all things statistical into it.
Post a Comment