Free Online NLP Resources - NLTK Still Rules Them All

I recently received an email from a US undergraduate interested in tools and resources for NLP, particularly free tagged corpora. Luckily, the NLP field has matured into an open access friendly crowd, so there are lots of resources freely available. Maybe too many. To be honest, too many search result hits is a pain. Newbies aren't looking for ridiculously long lists of resources which they have to pick through exactly BECAUSE they're newbies! They don't know how to choose between them. And all too often expert/experienced NLPers will simply push their pet language or resources not because its appropriate for newbies, but because it's the pet of the expert.

So my unsolicited teachable moment #333256: give newbies/students recommendations that are appropriate for them, not appropriate for you.

For example, with all due respect, no newbie NLPer should go anywhere near the Stanford NLP Annotated List of Resources. I'm the first to admit that's a GREAT list of resources. No argument from me. But most of those resources requires at least basic familiarization with NLP before starting (most require more).

For true newbies, The Natural Language Toolkit remains my preferred option. Its excellent teaching book, tutorials, packaged corpora and data, and solid documentation make it the reigning king of NLP intro tools. Plus, it's a mature enough toolkit to be used for more extensive projects. Hard to go wrong.

FWIW, This post was not a paid endorsement of any kind. I have no professional or personal relationship with anyone involved in the NLTK. I follow several people involved with the project on Twitter. That's as close to a personal involvement as I get. This post is not meant as a commercial advertisement, but rather as my own personal opinion.

Zero Dark Thirty - Torture Propaganda

I saw Zero Dark Thirty Monday afternoon, January 21st.

My comments below relate to Bigelow’s fictional representations of torture in Zero Dark Thirty, not to the debate over whether real torture led to the real capture of UBL. I believe torture did NOT. I believe Senators John McCain and Dianne Feinstein and the CIA when they say this movie is factually wrong. In the words of one friend: It’s a movie, not a documentary.

McCain & Feinstein letter here.
CIA response here.

I also do not share Andrew Sullivan’s first reaction. After seeing the film he wrote: Having studied this subject for years, I saw nothing obviously wrong. In fairness, his views are ever shifting. Sullivan’s detailed and complicated set of posts about this film here.

That said, I left the theater with no doubt in my mind that Zero Dark Thirty is unapologetic pro-torture propaganda. It might be just a movie, but it is propaganda. Bigelow’s claims to “depiction not endorsement” ring hollow when echoed through the vast dishonesty cocoon this movie is imprisoned within. Anyone who claims this movie is ambiguous or agnostic with respect to torture is lying, either to themselves or you. Note, I fisk a couple of Bigelow's claims about her approach to this film at the end of this review.

I will argue two main points about this movie:
  • It is pro-torture propaganda.
  • It is cowardly.
There are four elements to this film that make up my argument:
  1. The early scenes of torture are causally linked to the eventual discovery of the name of UBL’s courier.
  2. Throughout the film, there are multiple other prisoners who confirm the name of the courier only under torture or threat of torture, and their confirmations are crucial to the hunt.
  3. Throughout the film, multiple characters lament the end of the US torture and rendition program implying that lack of torture as a tool will hinder or end the hunt for UBL.
  4. The scenes of torture are sanitized versions of the real thing which lead naïve audiences to conclude that torture is acceptable.
The first two points are directly refuted by McCain, Feinstein, as well as the CIA directly. Inclusion of those elements with the addition of the third (characters implying torture is an effective tool of intelligence) can only be interpreted as pro-torture propaganda. Finally, the weak depiction of torture can best be interpreted as sanitizing the act to make torture more palatable.

Point 1: The early scenes of torture are causally linked to the eventual discovery of the name of UBL’s courier.
The early scenes of physical torture (the ones most often discussed) lead directly to the later scene where a "bluff" is used to finally get the name of UBL’s courier. The prisoner has been tortured for an unspecified length of time, but it has to be weeks or months, possibly years. Finally, after a 96 hour period of stress position and sleep deprivation, the CIA team believes his memory is destroyed to the point where they can lie to him and pretend he already told them the names of his “bothers” in a cell, including the name of UBL’s courier. This is the bluff. They give him food and water and rest and treat him nice while plying him for information while lying to him. And he does give up the name. But here’s the thing that is clear: no torture, no bluff, no name. There is a clear causal sequence of physical and psychological torture that leads to the prisoner giving up the name. If he wasn’t sleep deprived and desperately afraid of torture, he would not have given up the name. Why situating the extraction of information in this way tames Sullivan's anger is beyond me. Does Sullivan accept that all the torture in the world is fine, as long as the torturers ask nice in the end? It should go without saying that this causal link is not factual. Read the McCain letter and CIA response above.

Point 2: Throughout the film, there are multiple other men who confirm the name of the courier only under torture or threat of torture.
It is depicted throughout the film that only the use of torture convinces multiple other handcuffed, beaten prisoners to confirm the name of UBL’s courier. In this film, nearly all of the information that leads to UBL’s compound is obtained from beaten men locked inside US dark sites (with one paper exception near the very end, but this is the fourth or fifth confirmation; by itself, it was ignored for years within the logic of the film). No confirmations and that early piece of info means nothing to the analysts. It should go without saying that this not factual. Read the McCain letter and CIA response above.

Point 3: Throughout the film, multiple characters lament the end of the US torture and rendition program implying that lack of torture as a tool will hinder or end the hunt for UBL.
Add to all of this that multiple characters are shown lamenting the end of US torture after President Obama is elected. There are multiple implications that without torture, we would fail to find UBL. One character laments the end of the torture regime with almost identical tone that Duvall used in his famous Apocalypse Now line, “Someday this war's gonna end...”, but with none of the irony or implications. The end of systematic torture by the US government is given multiple laments, not one character is thankful. Not. One. This is not factually accurate. Many within the CIA and throughout the US intelligence community protested the use of torture from the very early stages and many applauded President Obama’s ending of torture. This film's lack of balance, failure to present the cacophony of anti-torture protest is tantamount to endorsement. It is the lie of omission.

Point 4: The scenes of torture are sanitized versions of the real thing which lead naïve audiences to conclude that torture is acceptable.
Andrew Sullivan wrote “No one can look at those scenes and believe for a second that torture is not being committed.” This is flat false and hopelessly naïve. Sullivan gives the average viewer too much credit for sensitivity. This film is not brave when depicting torture because the scenes of torture lack the voyeuristic close ups and duration of torture porn in movies like Saw or Hostel. Using the techniques of torture porn would have been far creepier and more effectively anti-torture. True torture porn would be too intense and courageous for the milquetoast Bigelow. True torture porn would force the audience to confront the deep evil of the practice. But Bigelow lets us (and herself) off far too easily. Her torture scenes are flat and tame compared to real torture porn (Sullivan’s thin skin is surprising here). Bigelow keeps her camera shots short with quick cutaways. She doesn’t linger. She is unwilling to thrust the vile ugliness of real torture under our nose. Yes, she shows us stress positions, but for ever so short a duration. So short, that it doesn't really sink in what is happening to the man’s body. Why not construct a 10 minutes long, single shot scene with the stress position center camera the whole time. How easy would the average person find it to watch that? How many people would walk out? How much ambiguity could possibly be left after such a viewing experience? But she did not do this. And for that reason she is a coward.

Compare her sanitized scenes with Tarantino’s scene in Reservoir Dogs of Mr. Blonde torturing a kidnapped policeman. Tarantino's ear slicing scene is a much more courageous expose of torture. In his scene, there is no doubt left that the perpetrator is a sick fuck with no moral core and no other interest than to inflict pain and suffering on his victim. Bigelow lacks the courage to paint CIA torturers this way. She eases her conscience by giving every torturer a moral out. Never do we see torture as pure sadism, pure vile desire to watch another human being writhe in pain.

This is Bigelow's cowardice. To ease her conscience, she ensures that every one of her torture victims is a murderer. And none of them die in front of our eyes. Her primary torture perpetrator is a CIA agent who consistently expresses recognition of the moral “murkiness” of his actions and eventually succumbs to the emotional weight and leaves the lifestyle, requiring a psychological break.

In this way, Bigelow makes torture look okay. THAT is why it is propaganda. Where is her artistic courage? Tarantino made his torture victim innocent, writhing in pain, and made him die before our eyes. Bigelow made her torture victim a known murderer, cut away from his pain quickly, and we last see him eating fresh food and resting. Who is the brave director?

By all credible accounts of US black site torture, many US torture victims were either innocent of any crime or guilty of little more than association. This is covered by Zero Dark Thirty, but with zero dark nuance. It is only through off hand comments thrown aside that we hear of such “depiction”. This movie never questions the deeply wrong idea that US torturers are free to use whatever means they care to, against anyone they choose, with no oversight. The very idea of oversight is ridiculed (see #3).

Tarantino and Coppola are truly brave directors who confronted torture and moral cowardice without flinching. If Bigelow thinks she has created nuanced ambiguous art with respect to torture, then she is genuinely bad at her job.

What is the point of America’s existence? I find this a remarkably easy question to answer. The point of my country's existence is to discover, no matter how long it takes, if it is possible for a large group of flawed human beings to form a more perfect union. To unite a diverse citizenry with a common bond of law and fairness. To rest our common safety, happiness, and prosperity on the foundation of transparency, trust, and freedom.

Are we Americans to content ourselves with being just another run-of-the-mill nation defined by arbitrary borders and historical accident? If yes, then I'll punch my ticket to that Mars thing.

I am neither naïve nor squeamish. I am prepared to accept rough men standing guard in the night prepared to visit violence on those who would do us harm. But if those rough men must use torture and un-American means to keep me safe, then the American experiment is dead. We have failed. We have failed the ideals of our founding fathers. We might as well call ourselves The United States of Fuck You We Don't Care Anymore.

When we give up the spirit of American freedom, we give up America.

Luckily, I believe Katherine Bigelow is wrong. I think she is just a mediocre director trying to sell tickets. This is a film, not a documentary. And it’s a bad film at that. Bigelow is overrated. Zero Dark Thirty has zero dark signature. There is no signature Bigelow style. She makes films with all the artistic flare of a Guns and Ammo catalog. Her narrative is uneven and she traffics in cliché relationships and factual errors.

Bigelow responds
Kathryn Bigelow responded to some of these controversies in this rather unsatisfying January 15th Op Ed Kathryn Bigelow addresses 'Zero Dark Thirty' torture criticism. It deserves a little fisking:
But I do wonder if some of the sentiments alternately expressed about the film might be more appropriately directed at those who instituted and ordered these U.S. policies, as opposed to a motion picture that brings the story to the screen.
No. My sentiments are directed at you because it is your movie that falsely implicates torture in the successful hunt for UBL. That's not a critique of the CIA's actual techniques, which remain classified. This is a critique of your film, which is public.
Those of us who work in the arts know that depiction is not endorsement. If it was, no artist would be able to paint inhumane practices, no author could write about them, and no filmmaker could delve into the thorny subjects of our time.
"Depiction is not endorsement"??? When you depict a set of events closely tracking recent historical events but then fake causality that did not exist, you have jumped the shark into the deep end of endorsement. It is cowardly and it IS "endorsement." If Bigelow doesn't get this, then it proves the point that she is genuinely not good at her job.

Louder Than Words - Book Review Part 2: Ch 2-4

I've finished chapter 4 of Ben Bergen's new embodied cognitive linguistics book Louder Than Words, which puts me about 1/3 of the way through (my Part 1 review here). For now, I'm just going to publish the highlights of my margin notes. This will likely be disjointed and out of context for anyone not also reading the book. I intend to write up a single review once done, but for now, I'm just doing a data dump. Here goes:

Chapter 2: Keep Your Mind on The Ball
  • I realize now that I am simply not the intended audience for this book. This is intended to occupy the space filled by Pinker, Oliver Sacks, and Malcolm Gladwell. It is pop intellectual intended for the lay audience, not someone with formal education in cognitive linguistics. Fair enough, There's nothing wrong with that. I simply must resign myself to be constantly disappointing at the lack of detail. My problem, not Bergen's.
  • Starts with lesson of athletes who learned that visualization techniques helped their physical techniques. Concludes that "when we're visualizing, our brain is doing the same thing it would in actual practice." He wisely backs off this bold claim as the chapter goes along, but his point is that we use the same brain regions to visual physical actions as we would if we actually performed the action.
  • My first objection is that this kind of intentional visualization is not equivalent to automatic thinking. He addresses this briefly later, but most of the experiments he reviews do in fact require some kind of intentional thinking/doing. I mentioned in my Part 1 review that he fails to use the terms "salience" or "attention", two very important word to cognitive scientists.
  • He dances around the notion of table-top objects without using the term. Perhaps too in-the-weeds philosophy for this kind of book.
  • The Perky effect = the boiling frog effect?
  • Little complaint: too many figures (most) are first mentioned on pages on which they do not occur, making me turn the page constantly to see what's being referenced. Plus, far too many typos. Basic Books needs to hire a Basic Copywriter.
  • Big Complaint: he's gotten lost in the woods of cog sci without making any obvious embodiment claims. Can Bergen give a simple explanation of the difference between "mental simulation" and "embodied mental simulation"? If yes, he forgot to include it in this chapter.
  • He writes about people thinking about an act like making a fist (page 44) and activating parts of the brain for actually making a fist. What about thinking about actions you've never performed before, like some wild yoga pose? Just thinking out loud... but not thinking out louder than words...

Chapter 3: Meaning and the Mind's Eye
  • Starts chapter saying humans are critically dependent on visual information and says we even encode this fact in our language with sayings like "you see what I mean?" and "the argument was clear." Okay, I get that this is a law book and he's trying to help the average Joe understand the basic point, but as a linguist I object to this on at least three grounds: 1) It's misleading. He cherry picks a couple of examples as if a grand pattern they make. But it's quite easy to come up with counter examples, like the now well known phrase popularized by The Wire "you feel me?" or "do you get it?"; 2) At best, these are English examples. Bergen's point is decidedly not bound by any one language. Does this pattern hold in other languages? Can we have some discussion of this? 3) These kinds of phrases are what linguists call "evidentials" and there is a long tradition of studying them cross-linguistically. Bergen makes no mention of this.
  • Bergen wants us to believe that being able to infer and reason given linguistic input is uniquely a feature of language itself. I find this a bit overstated.
  • The second half of this chapter really gets good, for me. It's mostly a review of the work of Rolf Zwaan and his students about how humans imagine the orientation of objects is influenced by our embodied interaction with them. This, to me, is the heart of the embodied cognition argument and this is the best reading so far. Bergen does a great job reviewing Zwaan's many clever but nuanced experiments. Most of my notes are detailed methodology questions I want to ask Zwaan about how he actually performed his experiments. Good stuff, but very in-the-weeds.

Chapter 4: Over the Top
  • It is taking all of my discipline to forgive Bergen for not only naming this chapter after a Sylvester Stallone movie, but of referencing the same movie throughout the chapter. Ohhhh Ben. It's okay to let that part of your childhood die.
  • Unlike Ben, I'll spare you the Stallone fanzine reminiscence, and make the simple point that a physical act described solely in language fires up our brains using the same areas we would use if we actually performed the action.
  • Here, Bergen's rhetoric about language leaves me frustrated. He wants us to be filled the wonder and power of language. I'm not. Language is an amazing cognitive function, but it ain't magic and I don't think we're doing anyone any good by adding smoke and mirrors to the cognitive linguistics discussion.
  • Bergen uses this as a stepping off point to talk about mirror neurons. While I've been casually reading* about mirror neurons for several years, I ain't no neuroscientist. I do recall some actual neuro-bloggers complaining about overstatement about mirror neurons though. It will take some time to dig up the references.
  • Overall, this is a juicy chapter will lots of experimental paradigms to drool over, if that's you kind of thing (it's my kind of thing).
  • One complaint though: this chapter does a great job of convincing me that there are priming effects with respect to actions and visualization. It does little to convince me there is some special "embodied simulation" effect. Priming is a well known phenomenon that seems to have explanatory value for much of the effects he discusses. i'm still waiting for that Ah Ha! moment that makes his argument all come together. But I'm a patient man, and this really is interesting stuff.
  • Just a head sup, after reading the discussion of "affordances", I can't help but want to recommend that Bergen read Pustejovsky #intheweeds.

*Hey, if Chewie can fly casually, then I can read casually.

Louder Than Words - Book Review Part 1

I have begun reading Ben Bergen's new cognitive linguistics book "Louder Than Words: The New Science of How the Mind Makes Meaning". From the book description:
In Louder than Words, cognitive scientist Benjamin Bergen draws together a decade’s worth of research in psychology, linguistics, and neuroscience to offer a new theory of how our minds make meaning...Through whimsical examples and ingenious experiments, Bergen leads us on a virtual tour of the new science of embodied cognition.
Let me note that the book description falsely suggests that this is Ben's theory; rather, as Ben rightly points out, embodied cognition is a research program dating back 40+ years. This book is Ben's attempt to survey the evidence for it. If this sounds like an update of Lakoff's 1990 Women, Fire, and Dangerous Things, you will be forgiven for making that connection as Bergen is Lakoff's student turned professor. Bergen is Luke to Lakoff's Obi-Wan, as it were.

I am basically sympathetic to embodied cognitive science as I was trained in cognitive linguistics at SUNY Buffalo by a host of Berkeley linguistics alumni including Len Talmy, Jean-Pierre Koenig, and Robert Van Valin (as well as non-Berkeley cognitive linguists such as David Zubin and less directly Jürgen Bohnemeyer).

However, while I have a lot of Berkeley blood in me, I am a skeptic by nature (see my various critiques of Borodistky and the Neo-Whorfians here). I never became a devotee of RRG as Van might have wanted. I never became a devotee of strong lexalism as JP might have wanted. I am naturally inclined to decline the kool-aid, regardless of who is offering.

I hope this will make me a good close reader of Ben's book.

So far I have only read Lakoff's short introduction and Bergen's chapter one "The Polar Bear's Nose". I'll make this first entry short.

First, introductions to academic work are difficult. You have to explain background assumptions to non-experts in a way that doesn't turn off the experts who read it. I remember having this experience while reading a colleague's intro chapter to her dissertation on the psycholinguistic processing of certain kinds of morphemes and she had a line at the opening something like "morphemes are the smallest meaningful unit..." and I kind of giggled at such a basic Ling 101 claim in a dissertation on psycholinguistics. But you have to have that kind of sentence just to prove your own basic level competence.

Such was my response to Ben's intro. It was basic stuff that any grad student in linguistics or cognitive science has been through a hundred time ad nauseam , but it has to be stated up front for the *others*.

I don't have anything major to say about the intro, but here are some things I've tweeted or noted in margins that bear adding:
  • Okay, this is trashy, but it bears stating. When I first read the title "Louder Than Words", my first thought was Brian Griffin's book "Faster Than The Speed of Love" from Family Guy.
  • Ugh! Overstatement. Reaction time and eye tracking are not "fine measures" that "peer inside the mind". They are useful, but they are the stone knives and bearskins of scientific tools. We use them because we don't have anything better ... yet. claims these tools have provided results that are "revolutionary" (p 5) - pure hyperbole. 
  • "Meaning is something that you do almost entirely in your mind." Bergen (p. 6). Dear Ben .. hmmm.... what's the non-almost part?
  • I think Bergen is guilty of constructing a straw man version of the Mentalese argument for symbolic reasoning. Bergen suggests that symbolic reasoning is incompatible with non-literal reference, variation, and creativity. False.
  • He hasn't explicitly mentioned "theory of mind" yet, but he's danced around it repeatedly. He clearly believes that humans alone possess the capacity to theorize about the mental states of others and that is the basis of language as our key cognitive advantage.
  • I don't like his rhetoric about the power of words. he makes them sound like knives you can stab people with ("A few words can change our religion. Words affect who we are" p 3). First, I think this is classic overstatement. I don not think words are knives. If serious academics write like this, how can we reasonably differentiate ourselves from those idiot Neuro-linguistic programming morons who write things like "Neuro-Linguistic Programming describes the fundamental dynamics between mind (neuro) and language (linguistic) and how their interplay affects our body and behavior (programming)." We can hardly complain about charlatans like these if our own best and brightest cognitive linguistics write almost identical sentences.
  • His treatment of "traditional theories of meaning" (p 6) is standard, and completely misses its utterly Western bias. I don't think the Chinese tradition of analyzing meaning looks anything like this.
  • His claims that we all have our own meaning of words like "dog" based on our experiences and memories (p 12) will likely be not born out by evidence ... by I'm open minded.
  • The embodied simulation hypothesis that states we imagine ourselves performing an act in order to understand it sounds a lot like the the motor theory of speech perception, which has been around a while. The recent work on mirror neurons may prove valuable for both lines of research.

Okay, good enough for now. On to chapter 2 "Keep Your Mind n the Ball".

Django In Air Conditioning

I finally saw "Django Unchained" and sadly, I was a little disappointed. Mostly because it just didn't seem to be a spaghetti western. For me, spaghetti westerns are defined by Clint Eastwood and Sergio Leone. The overwhelming aesthetic that permeates those movies is hot dusty sweat. They were filmed in the dusty heat of Spain and that look seared itself into my feelings about spaghetti westerns. Django has none of that. The first half of the movie is set in the cold snowy mountains and the second half in the lush green South in Spring (where the temp never seems to rise above 72 degrees Fahrenheit). No one sweats in this movie. How can this be a spaghetti western?

More disappointing is that there are no great scenes. Some great moments, some great lines, but they just didn't congeal into a coherent whole. I can ignore the plot holes and ridiculousness (Hollywood is filled with those), but I need the scenes to have that characteristic Tarantino structure and tension. In "Inglorious Basterds", the movie starts with an insanely tense scene between the Nazi Hans Landa and the farmer LaPadite, then there is a funny and brutal sequence in the basement bar with a German movie star co-conspirator, the Basterds, and a Gestapo officer, then there's the dinner table scene between Goebbels, Landa and Shosanna Dreyfus - that was tension incarnate! There's not a single scene that lives up to any of those in Django.

Then there's the music. Spaghetti westerns have operatic scores. But Django jumps from pop song to pop song faster than a wedding singer. It was jarring and incoherent. Just didn't work.

That said, it had enough juice to keep me entertained. It's a good movie, and Jamie Foxx has proved himself to be irresistible as an actor, but this was just not the masterpiece I was hoping for.

NLPers: How would you characterize your linguistics background?

