Sunday, December 9, 2007

The Ling-O-Sphere

I spent a good deal of Sunday afternoon trolling around linguistics blogs. While there are dozens of linguists with blogs, it’s hard to keep track of them all. The linguist List has a modest static list here. When I scan the blog roll at Language Log, it’s not even clear which ones are dedicated primarily to linguistics since many of the blog names are intentionally obscure. Also, many are defunct or stale as wishydig recently noted . I found a couple which had no posting in 2 years, many none for months. (UPDATE: while doing something else mildly productive, I literally clicked on EVERY single blog listed in Language Log's blog roll. If you deleted each one that was either dormant for at least 6 months or had little linguistics content, you’d delete at least 70%).

It would be nice to create a single site that aggregates all of our posts with regular updates. I mean something beyond Technorati or Digg or

I put the term “linguistics" into each of the three major social bookmarking sites above and frankly, the results were far from encouraging. Even though Technorati has a “blogs” tab, the first page of hits were not really linguistics blogs, as far as I could tell (the second page was more relevant). The Digg results were disappointing, to say the least. One reference to a Chomsky interview and one to a study on swearing, but again, none of the top hits appeared to be from blogs I would consider “linguistic blogs” (e.g., none are on the Language Log Other language blogs list). The returns at least put Language Log on top, but most of the first page returns were resource pages for computational linguistics, not blogs per se.

Imagine a site which automatically checks a given set of linguistics websites, then updates a topic cloud which clusters posts according to relevance for a particular topic, with links to each post within the cloud, plus a blog roll of all participating blogs on the right margin. I could imaging this happening in one of two ways (I prefer the first, but it's computationally complicated):

1) Search the participating blogs and perform some sort of cluster analysis of the words in each post, taking all the posts together as a corpus (perhaps an LSA style analysis), then create the cloud.

2) Create a fixed set of topic key words, and search for semantically similar words in each post. I could imagine WordNet being used for this

Whadda y'all thank?


Jason M. Adams said...

I've actually considered doing this. It wouldn't be hard, but it would take a bit of time and grad student life doesn't allow me to do much extra (and blogging takes up the rest of that time).

Chris said...

Thanks for the comments Jason. I kinda thought you might be a "go to" guy for something like this. Maybe this'll be a summer project, hehe.

Jason M. Adams said...

hehe, maybe. I actually just registered, so we'll see. is still up for grabs, though.

Wishydig said...

(Thanks for stopping by.)

Yeah-- it's the dynamic nature of blogging that makes this so difficult.

When I have the time to do so I go through all the blogs that the blogs in my sidebar link to. I find a few new ones occasionally. But every attempt to stay current results in a list that bursts like a supernova and eventually shrinks away to almost nothing.

It's the blogs that keep going strong that a helpful list would keep track of.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...