Monday, April 14, 2008

Bacon Strength

Having only just recently taken the NetFlix plunge, I had been ignoring the flurry of interest amongst computational linguists about Recommender Systems. I am now fully aware of the profound need and utility of improving said systems. Somehow, NetFlix got from the set [Blue Velvet, Chinatown, Midnight Cowboy] to the recommendation The Wild Bunch. There must be a sub-culture growing around the absurdity and humor derivable from such recommendations. Imagine you decided to follow such recommendation religiously. Honestly, how long would it take you to get to Glitter? Scary thought, huh? Now you realize how crucial Recommender Systems are to the survival of humankind.

It seems to me that an automated version of Six Degrees of Kevin Bacon ought to work AT LEAST this well, right? You simply recommend any movie that shares a cast member with a rated movie. The closer two movies are in a Kevin Bacon network, the more strongly you recommend it. Let's call this Bacon Strength. Hmmmmm, wait a second, I might be on to something ... this could be bigger than Google ... why am I telling YOU people about this ... the idea is mine, do you hear! MINE!!!!

Plus, I'm completely amazed that at least four Chuck Norris movies are available for immediate online viewing, but only the first season of the new Dr. Who. wtf?

3 comments:

Jason M. Adams said...

Yeah, the online viewing selection blows. I was very disappointed with that when I signed back up in January or so. It got Glitter right for me (1 star), but I've also rated a relative crapload, so it should have plenty of data. I haven't seen it, but watching anything with Mariah Carey makes me start sawing at body parts with dull razors.

And friend me if you're bored.

THOMAS AMUNDSEN said...

Hi,

This might sound weird.

I am going to grad school this fall (for NLP), and I happened to stumble upon your pdf of which companies hire Computatoinal Linguistics.

I haven't yet decided where I am going to go to school. I was just wondering if you know about the quality and reputation of any of these programs - M.S. in Computer Science (concentration on NLP) at Georgia Tech, Columbia, or U Southern California. And the Professional Master's in Computational Linguistics at U Washington (in Seattle).

It's really hard to qualitatively compare these programs as there are no rankings for CL. I've talked to a few people that work in the field, but haven't got too much valuable insight.

Anyway, I'd appreciate any advice you can give me in my decision.

Thanks!

Chris said...

Thomas, it's a good question. As you said, there are no ranking for computational linguistics programs, so word of mouth is all anyone has to go on.

Of the three programs you mentioned, I'd say USC is probably the strongest (they also have paid summer internships!), but Emily Bender at Washington is very well respected. I don't know much about Georgia Tech.

I think the crucial question is about your goals: do you want to be an academic working on high level problems like parsing and discourse (in which case you're looking at getting a PhD), or do you want to get a job in industry?

If industry is your answer, the school you choose won't really matter that much; it's the skills you develop. I'd strongly advise you to develop competency with machine learning, if you haven't already. You don't have to be great at it, just competent. That's a highly marketable skill set now, and will be for the foreseeable future. General competence with statistics and corpus linguistics is highly valued.

So, I'd ask each program where stats and ML fit into their programs (or how much flexibility they give you for taking electives).

I think this topic is worth a full post on the blog. I'll try to post something more substantiative this weekend (I'd love to hear Jason's thoughts too).

And, just for the record, SUNY Buffalo has an M.S. in CL too. Not too late to apply. You can kinda surf Lake Erie (gotta be better than Georgia surfing).

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...