...there are still a fair number of misdated works, and there's no way to restrict a query by genre or topic. But in the end, the most important consequence of the Science paper, and of allowing public access to the data, is that it puts "culturomics" into conversational play.
Google Books can't use wildcards to search for parts of words. For example, try searching for freak* out (all forms: freak_, freaked, freaking, etc) or even a simple search like teenager* ... if Google Books doesn't know about part of speech tags or variant forms of a word, then how can it look at change in grammar? ... To use collocates with Google Books, you would have to manually download thousands or millions of hits to your hard drive, and then use another program to look for and categorize the collocates.
The Science paper says that "Culturomics is the application of high-throughput data collection and analysis to the study of human culture". But as long as the historical text corpus itself remains behind a veil at Google Books, then "culturomics" will be restricted to a very small corner of that definition, unless and until the scholarly community can reproduce an open version of the underlying collection of historical texts.
...this is just a collection of books - no newspapers, magazines, advertisements, or other orthographic places where culture resides. No websites, blogs, social networking sites. No spoken language, of course, so over 90 percent of the daily linguistic usage of the world isn't here...The approach, in other words, shows trends but can't interpret or explain them. It can't handle ambiguity or idiomaticity..
The Binder Blog:
The value of the Ngrams Viewer rests on a bold conceit: that the number of times a word is used at certain periods of time has some kind of relationship to the culture of the time. For example, the fact that the word “slavery” peaks around 1860 suggests that people in 1860 had a lot to say about slavery. Another spike around the 1970s meshes nicely with the Civil Rights Movement. Well, that’s sort of interesting. However, I didn’t need ngrams to tell me that a lot of people were writing about slavery in 1860. These data are broad but not deep, which makes them relatively useless to most humanities majors interested in intensive study.
The one positive comment that I think bears repeating is the role this fun little tool might play is sparking the imagination of young students interested in the role technology can play in the humanities.
Whatever misgivings scholars may have about the larger enterprise, the data will be a lot of fun to play around with. And for some—especially students, I imagine—it will be a kind of gateway drug that leads to more-serious involvement in quantitative research.