The basic issue was that a group of computational linguists from CMU (David Bamman, Brendan O’Connor, and Noah A. Smith) presented a paper about automatically learning character personas from freely available movie plot summaries at this summer's Association for Computational Linguistics conference in Bulgaria (full paper here).
Unfortunately, a couple of UT Austin scholars (Hannah Alpert-Abrams from comparative lit, and Dan Garrette from computer science) thought the paper made fatal flaws with respect to literary studies and asked LL to post their reply. In particular, they felt the the CMU team failed to use contemporary literary theory (or film theory), and instead relied on outdated ideas of persona. They made one other crucial complaint, that the data the CMU team used was flawed.
NLP engineers are good at finding data and working with it, but often bad at interpreting it. I don't mean they're bad at interpreting the results of complex analysis performed on data. I mean they are often bad at understanding the nature of their data to begin with. I think the most important argument the UT Austin team make against the CMU team is this (important point underlined and boldfaced just in case you're stupid):
By focusing on cinematic archetypes, Bamman et al.’s research misses the really exciting potential of their data. Studying Wikipedia entries gives us access into the ways that people talk about film, exploring both general patterns of discourse and points of unexpected divergence.In other words, the CMU team didn't truly understand what their data was. They didn't get data about Personas or Stereotypes in film. Rather, they got data about how a particular group of people talk about a topic. This is a well known issue in humanities studies of all kinds, but it's much less understood in sciences and engineering, as far as I can tell.
To his credit, CMU team member O'Connor addressed part of this in a response by saying:
We did not try to make a contribution to contemporary literary theory. Rather, we focus on developing a computational linguistic research method of analyzing characters in stories. We hope there is a place for both the development of new research methods, as well as actual new substantive findings.And here is where the culture clash erupts. While engineers and scientists are quite used to the idea that "proof of concept" methodology development is an acceptable topic for a refereed conference paper, it is almost unheard of in the humanities (the social sciences falls somewhere in between, and O'Connor notes this).
However, O'Connor didn't address their more substantive point that their underlying data was flawed. Again, with proof of concept papers, this is less of an issue. The UT Austin team made the point that the CMU team didn't ask questions that 'fit into academic discourse about film' (slight paraphrase). O'Connor countered that that was because they didn't even try. That was not their goal. As far as I can tell, the CMU team didn't give a hoot about the data at all. It happened to be a convenient data set that they could scrape freely and play with. If anyone has a movie plot data set that is balanced for things like gender, perspective, class, race, etc, I'm confident the CMU team would be happy to apply their process to it. But, the CMU team, as represented by O'Connor's reply, runs the risk as seeming aloof (at best). Showing such blatant disregard for the goals of the very humanities scholars they're trying to develop a method for will not win them many friends in English and comparative literature departments.
O'Connor mentioned that he believed "it’s most useful to publish part of the work early and get scholarly feedback, instead of waiting for years before trying to write a “perfect” paper." While I agree with the interactive feedback notion underlying his point, I have to say that he comes across as a bit smug and arrogant by saying it in this way. He was certainly not showing much respect to the traditions within humanities by adding the snide remark about a "perfect paper." Humanities is its own academic culture, with it's own traditions of what counts as publishable. Simply declaring his own academic traditions as preferable is not particularly respectful.
I also believe that the UT Austin team's response posted on Language Log was somewhat condescending and disrespectful of the CMU team (and some of the LL commenters called them out on it as well). This is a clash of academic cultures. Again, I am sympathetic to both sides. But they will continue to talk past each other until each understands the others' cultures better.
Accomplishments versus QuestsThere is a much larger point to be made about the kind of personalities that engineering tends to draw versus humanities. I'm speculating, but it's been my experience that engineers tend to be driven by accomplishment. Not solving big problems, just solving any problem. They spend a few hours getting a Python script to properly scrape and format plot summaries from an online database, and that makes them happy. They accomplished something. Humanities people tend to be driven by quests. Large scale goals to answer vague and amorphous questions.