(screen shot from WebCorp)I have posted before about the use of Google as a linguistics search engine here. Today, I ran across WebCorp Live, which allows a user to perform some linguistically interesting searches over the web as a corpus. From their site:
WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web. WebCorp LSE offers:
* enhanced sentence boundary detection
* date identification
* 'boilerplate' removal
* collocation and other statistical analyses
* grammatical tagging
* language detection
* full pattern matching and wildcard search
In spirit, this is quite similar to Mark Davies excellent BYU Corpus resources. If I get a chance to play with it some more, I might try running some of my old dissertation searches though it. That should be a good test.
UPDATE: see my original post titled Google Linguistics which more specifically talks about using Google for research.