Why doesn't Google's translation tool automatically detect the language I paste in? This is not a terribly difficult problem to solve computationally. I suspect that if they took a bag o' trigrams (of characters, that is) and compared to a corpus using some kind of simple tf–idf weight, they'd get a pretty high degree of accuracy. Here are some distinctive trigrams from a page on Omniglot. Wanna guess the language based solely on these? I doubt it will be difficult. And I suspect that just one or two of these trigrams is distinctive enough to make an accurate guess.
UPDATE: thanks to the cemmentators for schooling me on this. In fact, Google DOES have a detect language function. I've been trying to find documentation on their methods but haven't had much luck. I did find this discussion of a different language detector that works rather differently than I proposed. Rather than compare trigrams of letters to language models, it looks up whole words in dictionaries. While I admit to the greater simplicity of this method, I think my idea is more betterer 'cause it's more linguisticy.
Notes on my searching:
Lots of programming language detecting tools.
Several human language detecting tools, but few discussed methodology