Saturday, November 28, 2009

Google Linguistics 2

(screen shot from WebCorp)

I have posted before about the use of Google as a linguistics search engine here. Today, I ran across WebCorp Live, which allows a user to perform some linguistically interesting searches over the web as a corpus. From their site:

WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web. WebCorp LSE offers:

* enhanced sentence boundary detection
* date identification
* 'boilerplate' removal
* collocation and other statistical analyses
* grammatical tagging
* language detection
* full pattern matching and wildcard search

In spirit, this is quite similar to Mark Davies excellent BYU Corpus resources. If I get a chance to play with it some more, I might try running some of my old dissertation searches though it. That should be a good test.

UPDATE: see my original post titled Google Linguistics which more specifically talks about using Google for research.

No comments:

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...