Tuesday, February 21, 2012

I want my historical sentiment analysis ... and I want it NOW!

The hot new shiny thang in 21st Century NLP is undeniably sentiment analysis. It's being used to track pop star popularity, political approval, and corporate satisfaction. But as far as I can tell, all of the current efforts are focused on contemporary, real-time analysis of emerging trends and changes. However, I'm stuck tonight desperately in need of automated historical sentiment analysis. Recently, the political fact checking web site Politifact came under scrutiny by left-leaning US pundits like Rachel Maddow because of a "Mostly false" rating they gave to a promotional ad by MSNBC host Lawrence O'Donnell (video here). Here is as faithful of a transcript of O'Donnell's words as I can make (the offending statement is in boldface):
Ya know, when the GIs came home after World War Two, six percent of the adults in this country were college graduates. The GI Bill pushed that up to twenty. The GI Bill put my father through college. He then was able to earn a living to put his five kids through college. It's the most successful educational program that we’ve ever had in this country, and the critics called it welfare.
Politifact rated the ad "Mostly False" here stating:
We found no evidence of critics referring to the GI Bill as welfare. Yet some fretted that the law’s unemployment compensation element would encourage laziness. We see a touch of truth to O’Donnell’s claim, which we rate Mostly False.
I was hoping to use simple, fast, freely available online tools to do some digging on this. I reasoned that Google Ngrams and BYU's COHA would be perfect tools for this job. Alas, they are not quite powerful or nuanced enough to tease apart the issues. Here's What I did:

  • Used Wiki to discover the original GI Bill, actually named Servicemen's Readjustment Act of 1944
  • Typed in "GI Bill" to Ngram Viewer. However, this tool does not allow collocates (I really want to see co-occurrences of "GI Bill" & "welfare").
  • Tried to use COHA to find collocates, but, sadly, that tool does not allow multi-word collocates ("You cannot have strings of two or more words in the [COLLOCATES] box.").
  • Tried a Google search on: welfare "GI Bill". This retrieved some hits, though too recent to count as examples of contemporary 1944 criticism.
  • When I Googled simply "Servicemen's Readjustment Act of 1944 welfare", I found several examples including what purports to be a contemporaneous reporting at As They saw It "News & articles published shortly after events occurred, they reflect the information available at that time and how people reacted" but I cannot verify the veracity of this site. Nonetheless, it does seem to support two ideas. The word "welfare" was commonly used in reference to the GI Bill and its use was overwhelmingly positive: 1944: Social Service, Public.

But this exposed a fundamental problem: In 1944, "welfare" didn't appear to have a pejorative sentiment. In fact, it seems to have a positive sentiment. Even if I have full access to huge historical corpora from which to extract instances of "GI Bill" and "welfare" collocates, how do I distinguish between positive uses of "welfare" and negative? To do this properly, I'd need quality sentiment analysis (also, I fear there'd be no small amount of Perl scripting involved. And, ya know, Perl is like the Merlot of programming. It's okay as a table wine when you're too tired to go to the store for a nice a Tempranillo or Cab Franc, but really, you'd never brag about drinking Merlot).

In the end, I was unable to use freely available, online corpus linguistic tools to properly evaluate Politifact's rating because the tools have not matured yet to provide quite the right analysis. However, the good news is that these tools could very easily be enhanced using exiting technologies to do exactly the job needed.

2 comments:

Umlud said...

Did you try "relief" or "the dole"?

Chris said...

Umlud: No, I was more interested in taking as literally as possible Politifact's claim in order to test the ability of online corpus linguistics tools to provide evidence for this sort of thing.

However, I understand why it would make sense to try synonyms. This article at mediaite.com makes the case:

--The memo also suggests that objections mentioning “relief” or “the dole” were equivalent to modern-day references to welfare. “If Mr. O’Donnell had used the word ‘relief’ instead of ‘welfare’” in the advertisement, the memo says, “no one would have understood what he meant.”--

http://www.mediaite.com/tv/rachel-maddow-declares-politifact-dead-over-bogus-lawrence-odonnell-gi-bill-ruling/

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...