Via Sitemeter, I discovered that someone from Chrahnoh ('Toronto') stumbled onto my blog by Googling "How would a linguist respond to the semantic web". Having never actually posted on the topic, I nonetheless found it an intriguing question worth some follow-up.
As a primer, the semantic web is a movement, of sorts. It's goal is to make data on the web more easily processed by computers by categorizing it better. The point is to make humans do less and computers do more. This should make the web more efficient for humans because it will make finding things and doing things online easier and faster.
There are several semantic web strategies, but they mostly involve categorization, as far as I can tell. The idea being that a pre-categorized set of web pages is easier to automatically sort and process than non-categorized pages. Just like a library. If a library is composed of a pile of books on a floor, it will be difficult to find what you want. But, if that library is organized alphabetically and cross referenced every which way, it is much easier to use. So, the semantic web is an attempt by humans to über-categorize web pages. This can be done by enforcing mark-up standards like HTML which already requires web page source code to look a certain way. It could also be accomplished by post-processing. After someone has put up a web page, a bot comes along, processes it, and then assigns some categorization/indexing (this is Google-like). We're getting into heavy philosophical territory here, the kind that befuddled the greatest minds in history including Wittgenstein, Russell, and Aristotle. There is a long and difficult history behind the idea of trying to categorize the way the world is -- ontology.
My first impression is that linguists would love this. Linguists love ontologies and rules and categorization. Yippie! Linguists would insist on a certain cognitively natural ontology, but the basic idea fits nicely into the zeitgeist of traditional linguistics.
Having said that, this lone lousy linguist has trepidations. It seems ass-backwards. Imagine I started a movement to make the world an easier place to live in, so my strategy was to walk around sticking post-it notes onto EVERYTHING. If we could just put all the necessary post-it notes onto everything in the whole world, then everyone would know what everything is just by looking at the notes. Cool idea, huh!
No, bad idea. It's a classic fool's errand. While there may be a universal ontology, no one knows what it is. More to the point, it puts effort in the wrong place. We humans don't have post-it notes on everything to look at. We have a cognitive capacity that helps us look at new things and figure out what to do with them. We all have a super-Google in our heads, developed by evolution over a million years. It's not clear how much categorization information we store, but we clearly store associations between things. But I think it is the strategies for dealing with new things that makes human cognition so powerful, not a reliance on fitting things into an ontology.
I think that's the right model for the web. Let everyone put everything online. Develop smart search technologies that can look at new, uncategorized things and figure out what to do with them right now, on the fly.
I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...
I used the phrase god awful in a comment at Language Log and it occurs to me that it's an odd little creature. From the OED *: Pronu...
Purpose: This post reviews my experience interviewing for a Linguist position at Google in Santa Monica, CA on February 29, 2008. I've ...
Bob Carpenter recently made the following comment on one of my posts: I'm very excited to hear that linguists are beginning to take sta...