Saturday, May 31, 2008
Linguists and The Semantic Web
As a primer, the semantic web is a movement, of sorts. It's goal is to make data on the web more easily processed by computers by categorizing it better. The point is to make humans do less and computers do more. This should make the web more efficient for humans because it will make finding things and doing things online easier and faster.
There are several semantic web strategies, but they mostly involve categorization, as far as I can tell. The idea being that a pre-categorized set of web pages is easier to automatically sort and process than non-categorized pages. Just like a library. If a library is composed of a pile of books on a floor, it will be difficult to find what you want. But, if that library is organized alphabetically and cross referenced every which way, it is much easier to use. So, the semantic web is an attempt by humans to über-categorize web pages. This can be done by enforcing mark-up standards like HTML which already requires web page source code to look a certain way. It could also be accomplished by post-processing. After someone has put up a web page, a bot comes along, processes it, and then assigns some categorization/indexing (this is Google-like). We're getting into heavy philosophical territory here, the kind that befuddled the greatest minds in history including Wittgenstein, Russell, and Aristotle. There is a long and difficult history behind the idea of trying to categorize the way the world is -- ontology.
My first impression is that linguists would love this. Linguists love ontologies and rules and categorization. Yippie! Linguists would insist on a certain cognitively natural ontology, but the basic idea fits nicely into the zeitgeist of traditional linguistics.
Having said that, this lone lousy linguist has trepidations. It seems ass-backwards. Imagine I started a movement to make the world an easier place to live in, so my strategy was to walk around sticking post-it notes onto EVERYTHING. If we could just put all the necessary post-it notes onto everything in the whole world, then everyone would know what everything is just by looking at the notes. Cool idea, huh!
No, bad idea. It's a classic fool's errand. While there may be a universal ontology, no one knows what it is. More to the point, it puts effort in the wrong place. We humans don't have post-it notes on everything to look at. We have a cognitive capacity that helps us look at new things and figure out what to do with them. We all have a super-Google in our heads, developed by evolution over a million years. It's not clear how much categorization information we store, but we clearly store associations between things. But I think it is the strategies for dealing with new things that makes human cognition so powerful, not a reliance on fitting things into an ontology.
I think that's the right model for the web. Let everyone put everything online. Develop smart search technologies that can look at new, uncategorized things and figure out what to do with them right now, on the fly.
Friday, May 30, 2008
Globalization and Language
It’s obvious that globalized communications and popular culture will tend to homogenize local language varieties — but some varieties of English seem to be diverging more rapidly than ever.
English is a tool, just like a piece of technology. Much of the world’s economy is tied up in English-speaking countries and for that reason, English is like a cell phone provider offering the best plan. But if the dollar continues to drop, the most viable option could shift.
Languages evolve via as-yet-unknown cognitive mechanisms. I suspect that "globalized communications and popular culture" will not change the way languages evolve. At best they will simply speed up the existing process.
Saturday, May 24, 2008
On the strengths and weaknesses of “theoretically”
British books, ephemera, radio, newspapers, magazines (36m words)
American books, ephemera and radio (10m words)
British transcribed speech (10m words)
1) Corpus Concordance Sampler: provides the search word and the sentence it occurred in (well, not quite the sentence, but close enough for Saturday afternoon)
2) Collocation Sampler: provides the words that are statistically significantly associated with the search words (Mutual Information Score plus t-score of significance)
Collocate | Corpus Freq | Joint Freq | Significance |
it | 494702 | 33 | 3.555134 |
could | 59556 | 12 | 3.027003 |
least | 12333 | 8 | 2.717569 |
possible | 10266 | 6 | 2.342936 |
be | 234656 | 15 | 2.332595 |
can | 113012 | 9 | 2.042261 |
was | 340423 | 16 | 1.83627 |
should | 35882 | 5 | 1.828091 |
less | 14186 | 4 | 1.819667 |
is | 407114 | 18 | 1.803011 |
Collocate | Corpus Freq | Joint Freq | Significance |
the | 2313407 | 179 | 7.308255 |
most | 43653 | 53 | 7.069587 |
is | 407114 | 56 | 5.573264 |
best | 20161 | 32 | 5.531725 |
greatest | 2506 | 13 | 3.581149 |
important | 13468 | 12 | 3.327601 |
was | 340423 | 29 | 3.165729 |
finest | 1067 | 7 | 2.631592 |
beautiful | 4076 | 7 | 2.591662 |
more | 94468 | 11 | 2.316599 |
Collocate | Corpus Freq | Joint Freq | Significance |
would | 97660 | 1076 | 26.18472 |
it | 494702 | 2333 | 25.53529 |
will | 111798 | 1092 | 25.52538 |
i | 512080 | 2204 | 22.70137 |
think | 70465 | 779 | 22.29877 |
you | 421797 | 1797 | 20.274 |
ll | 34908 | 545 | 20.02152 |
be | 234656 | 1159 | 18.72308 |
d | 43704 | 462 | 16.97461 |
most | 43653 | 457 | 16.83863 |
1. clearly the same meaning
2. kind of the same meaning
3. can’t decide
4. kind of different meaning
5. clearly different meaning
theoretically into probably
score | sentence |
5 | on the way much of the material treated |
5 | for me the course was very stimulating |
4 |
|
4 |
|
5 | Let's say it could be done -- |
2 |
|
5 |
|
3 |
|
5 | `It" exists simply because it is |
2 | The Social Democrats |
theoretically into arguably
score | sentence |
5 | on the way much of the material treated |
5 | for me the course was very stimulating |
2 |
|
2 |
|
4 | Let's say it could be done -- |
4 |
|
5 | probably claim that their disciplined, |
5 |
|
5 | `It" exists simply because it is |
4 | The Social Democrats |
arguably into probably
score | sentence |
2 | The finishes are |
2 |
|
2 | Orfeo is |
1 | A really dark orange, |
1 | for instance the first, and |
3 | Indeed, songs like `Green', |
2 | it belongs to |
2 | Wayne Westner, |
2 |
|
2 | A week there is |
arguably into theoretically
score | sentence |
5 | The finishes are |
5 |
|
5 | Orfeo is |
5 | A really dark orange, |
5 | for instance the first, and |
5 | Indeed, songs like `Green', |
5 | it belongs to |
5 | Wayne Westner, |
5 |
|
5 | A week there is |
probably into arguably
score | sentence |
2 | your property is |
5 | You've |
2 | It |
4 | It |
3 | We came to agreement at 1am, |
4 | a member and an ANC team which it said would |
5 | I'm |
4 | it was the last that |
2 | Those in the high intelligence group are |
1 | It was |
probably into theoretically
score | sentence |
4 | your property is |
5 | You've |
3 | It |
5 | It |
5 | We came to agreement at 1am, |
4 | a member and an ANC team which it said would |
5 | I'm |
5 | it was the last that |
5 | Those in the high intelligence group are |
5 | It was |
Saturday, May 17, 2008
“hypercompetent”
In a recent article on Slate here, I ran across the following sentence:
Recently, polyglot conspiracy has posted about the sexism in the current political media coverage, and this may be an example
Saturday, May 10, 2008
"Love Means Never Having to Say ..."
But I'm a linguist, so let's get down to business. As far as I can tell, she blogs in THREE languages!! Spanish, German, and English. Her most up-to-date posts appear to be in Spanish, so I presume this is her blog language of choice. However, as the weeks and months go by, some of her older posts appear in either English or German translations. I'm curious to know if she is translating these herself, or getting someone to translate for her? Some of the English is quite good and enjoyable (with occasional stutters, of course).
The current English post (from March 5) is on apologies. Linguist have long been interested in the apology as a speech act, of course. There are whole subfields of Sociolinguistics and discourse pragmatics devoted it.
I've long felt that my use of the casual apology has little to do with any attempt on my part to ask for forgiveness. The most common situation in which I use "I'm sorry" or "excuse me" is one where someone else has made a mistake of some sort. Imagine I'm walking in to a store and someone has mistaken used the entrance as the exit and he bumps into me. I would most likely mumble lightly, "oops, sorry". Clearly, I am not at fault, yet I issue the apology. Why?
Here is my I-haven't-read-Grice-in-years analysis: by taking blame, so to speak, I am able to quickly signal to the offender that I am not issuing blame to them. Since they know they are to blame and not me (and they know that I know, blah blah blah), they can infer via the Maxim of Quality that I must be saying something else, like an indirect speech act. Using some chain of Gricean inference, they can probably construct the interpretation that I'm really saying "no apology is necessary".
It is an easy way for me to diffuse their trepidation about MY reaction. At around 6 foot 4 inches and 260lbs, I know I'm an intimidating presence. I don't want the other person to feel that their small mistake will be turned into a big one by the overreaction of some lumbering giant (actually, I'm quite quick on my feet, I was a helluva wrestler once, ya know).
So, here's 11 ways to say you're sorry (HT SenseList)
Catalan: Ho sento
Croatian: Žao mi je
Czech: Promiňte
Danish: Undskyld
Finnish: Anteeksi
Flemish: Het spijt me
Hungarian: Sajnos
Luxembourgish: Et deet mer leed
Maltese: Ma nitkellimx bil-Malti
Norwegian: Beklager
Polish: Przepraszam
Cheers.
Monday, May 5, 2008
The Perils of Planning
“Such a decision would give
Sunday, May 4, 2008
Iron Man Linguistics
You see, the group which has kidnapped the unfortunate pair goes undefined throughout the movie. We are largely left to draw our own conclusions about their origin, ideology, and motivation (though we get some minor clarification late in the movie). The one thing we learn about their diversity is that they speak a wide variety of languages, as Yinsin lists some of them for Stark. I don't remember the full list, but I believe they included "Arabic, Ashkun, Farsi, Pashto" amongst others. So, kudos to the screenwriters for, in the very least, scanning Ethnologue for an appropriate set of languages to list.
But there's one other language that Yinsin mentions, and it got my attention: Hungarian. A few scenes after Yinsin lists the various languages the group speaks (a list that does not include Hungarian), he and Stark are being yelled at by an unnamed thug. Stark asks Yinsin what he's saying and Yinsin says something like "I don't know. He's speaking Hungarian."
This was meant as a bit of comic relief, I believe. So the screenwriters may have chosen Hungarian at random. Perhaps any language that American audiences would perceive as unusual or unexpected would have done the trick. Perhaps it would have been even funnier if he said "I dunno, he's speaking Comanche (ba dum boom!)." I don't know, but my linguistics radar picked it up and I went searching for any connections Hungary might have with Afghanistan.
Alas, I have found few. I would have to make some serious leaps of logic to connect the dots, and I don't think the movie was going for that. The clarifying scenes late in the movie suggest that this groups' motivations are largely financial, not ideological or political, so we might assume this was some random Hungarian mercenary. As far as I can tell, this is the most logically consistent interpretation (unless I've misunderstood the movie's plot or dialogue, in which case ... never mind).
TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department
[reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...
-
The commenters over at Liberman's post Apico-labials in English all clearly prefer the spelling syncing , but I find it just weird look...
-
(image from Slate.com ) I tend to avoid Slate.com these days because, frankly, I typically find myself scoffing at some idiot article they&...
-
Matt Damon's latest hit movie Elysium has a few linguistic oddities worth pointing out. The film takes place in a dystopian future set i...