The Lousy Linguist: clash of publishing cultures: NLP and literary study

Saturday, September 14, 2013

clash of publishing cultures: NLP and literary study

Language Log recently posted a clash of cultures guest post: Computational linguistics and literary scholarship. I am sympathetic to both sides (having lived in both worlds). The core issue was an NLP team asking NLP-type questions about film, and a humanities team asking humanities-type questions about data. And the two talked past each other. I believe this is largely due to two very different academic cultures, particularly with respect to the question: What counts as publishable?

The basic issue was that a group of computational linguists from CMU (David Bamman, Brendan O’Connor, and Noah A. Smith) presented a paper about automatically learning character personas from freely available movie plot summaries at this summer's Association for Computational Linguistics conference in Bulgaria (full paper here).

Unfortunately, a couple of UT Austin scholars (Hannah Alpert-Abrams from comparative lit, and Dan Garrette from computer science) thought the paper made fatal flaws with respect to literary studies and asked LL to post their reply. In particular, they felt the the CMU team failed to use contemporary literary theory (or film theory), and instead relied on outdated ideas of persona. They made one other crucial complaint, that the data the CMU team used was flawed.

NLP engineers are good at finding data and working with it, but often bad at interpreting it. I don't mean they're bad at interpreting the results of complex analysis performed on data. I mean they are often bad at understanding the nature of their data to begin with. I think the most important argument the UT Austin team make against the CMU team is this (important point underlined and boldfaced just in case you're stupid):

By focusing on cinematic archetypes, Bamman et al.’s research misses the really exciting potential of their data. Studying Wikipedia entries gives us access into the ways that people talk about film, exploring both general patterns of discourse and points of unexpected divergence.

In other words, the CMU team didn't truly understand what their data was. They didn't get data about Personas or Stereotypes in film. Rather, they got data about how a particular group of people talk about a topic. This is a well known issue in humanities studies of all kinds, but it's much less understood in sciences and engineering, as far as I can tell.

To his credit, CMU team member O'Connor addressed part of this in a response by saying:

We did not try to make a contribution to contemporary literary theory. Rather, we focus on developing a computational linguistic research method of analyzing characters in stories. We hope there is a place for both the development of new research methods, as well as actual new substantive findings.

And here is where the culture clash erupts. While engineers and scientists are quite used to the idea that "proof of concept" methodology development is an acceptable topic for a refereed conference paper, it is almost unheard of in the humanities (the social sciences falls somewhere in between, and O'Connor notes this).

However, O'Connor didn't address their more substantive point that their underlying data was flawed. Again, with proof of concept papers, this is less of an issue. The UT Austin team made the point that the CMU team didn't ask questions that 'fit into academic discourse about film' (slight paraphrase). O'Connor countered that that was because they didn't even try. That was not their goal. As far as I can tell, the CMU team didn't give a hoot about the data at all. It happened to be a convenient data set that they could scrape freely and play with. If anyone has a movie plot data set that is balanced for things like gender, perspective, class, race, etc, I'm confident the CMU team would be happy to apply their process to it. But, the CMU team, as represented by O'Connor's reply, runs the risk as seeming aloof (at best). Showing such blatant disregard for the goals of the very humanities scholars they're trying to develop a method for will not win them many friends in English and comparative literature departments.

O'Connor mentioned that he believed "it’s most useful to publish part of the work early and get scholarly feedback, instead of waiting for years before trying to write a “perfect” paper." While I agree with the interactive feedback notion underlying his point, I have to say that he comes across as a bit smug and arrogant by saying it in this way. He was certainly not showing much respect to the traditions within humanities by adding the snide remark about a "perfect paper." Humanities is its own academic culture, with it's own traditions of what counts as publishable. Simply declaring his own academic traditions as preferable is not particularly respectful.

I also believe that the UT Austin team's response posted on Language Log was somewhat condescending and disrespectful of the CMU team (and some of the LL commenters called them out on it as well). This is a clash of academic cultures. Again, I am sympathetic to both sides. But they will continue to talk past each other until each understands the others' cultures better.

Accomplishments versus Quests

There is a much larger point to be made about the kind of personalities that engineering tends to draw versus humanities. I'm speculating, but it's been my experience that engineers tend to be driven by accomplishment. Not solving big problems, just solving any problem. They spend a few hours getting a Python script to properly scrape and format plot summaries from an online database, and that makes them happy. They accomplished something. Humanities people tend to be driven by quests. Large scale goals to answer vague and amorphous questions.

5 comments:

Ted Underwood said...: I'm oddly positioned in this clash as a literary scholar, with very traditional literary training, who thinks the CMU team is doing top-notch work.

I confess I totally agree with the computer scientists' view of the issue here, which is basically that they're producing methods rather than literary results. I'm afraid my literary colleagues just aren't accustomed to separating those two things. The methods, as methods, are effectively invisible to us; we're only interested in critiquing results.

That's unfortunate. But selfishly, I can't complain, because it means I'll have a decent chance of being the first literary scholar to apply these methods to the periods and corpora that interest me most.; September 14, 2013 at 11:14 AM
Chris said...: Ted,

I think you're right that you have a head start in this field. Nothing wrong with that. I suspect both academic tradition will need to mingle more if either is to make much progress in the next few decades.; September 14, 2013 at 2:32 PM
Yisong Yue said...: I link hopped here from Brendan O'Connor's blog. I completely agree with Ted's comment, but I wanted to comment on the broader view from a computational person's perspective.

Computational people are often faced with the dual challenges of convincing both (A) domain experts as well as (B) other computational people working primarily with other domains. What often ends up happening is that (A) is under-prioritized in favor of (B) (although both do happen). As a consequence, conferences like EMNLP place a large emphasis on methodological advances. One could argue about what the right balance is -- I'm not quite sure myself.

The benefit of being methodologically focused is that it facilitates cross-over in how these methods can be applied to different domains (such as image/video analysis, behavioral analysis, biomedical domains, etc). We engineers want to develop methodological advances that are both generically useful and can stand the test of time. That is why I'm motivated to do the work I do.

I like taking on challenging datasets where there's some phenomenon of interest that is difficult for existing methods to model. Sometimes (often?), the phenomenon I end up focusing on lies outside what experts in that domain consider interesting or valuable. But the method is hopefully general enough to be able to capture other phenomena that domain experts do find interesting and valuable -- that is where (A) comes in.

I also find myself generally agreeing with Chris's comment on "Accomplishments versus Quests" (although obviously Chris's description is an over-simplification). In our hearts, we engineers DO want to discover new fundamental principles of building methods and tools that are broadly useful and also intellectually interesting -- that is our quest. However, the day-to-day churn of our research output does perhaps bias us towards "accomplishments" more so that other disciplines.; September 14, 2013 at 5:35 PM
Chris said...: Yisong,

Thanks for your perspective. I think the idea of interactive feedback at different stages of research is good for all disciplines. Unfortunately, I think it's rare for humanities conferences and journals to accept those kinds of papers. I've been out of the humanities for a while now, so things may have changed (though Ted's comments suggest they haven't), but I remember all the pressure being on theory and conclusions. You had to have something big to say to even consider publishing.; September 15, 2013 at 10:47 AM
Alon said...: @Ted:

I confess I totally agree with the computer scientists' view of the issue here, which is basically that they're producing methods rather than literary results.

the problem is not with producing methods rather than results. Anyone working in corpus stylistics, for example, would be happy to focus on methodological developments even with a less-than-interesting dataset.

The problem is that Bamman et al think they are building a method for analysing plots, and they are trying to pitch their findings in those terms. In fact, what they are building is a method for analysing how people talk about plots in a certain, very specific context.

There is no reason to believe that one maps to the other in a clear, predictable way, or even to believe that Bamman et al would yield the same results using an equivalent corpus from a source other than Wikipedia. These are issues that are well-known in corpus and computational approaches to language, and it wouldn't be unreasonable to expect Bamman et al to address them if they have any interest in promoting their method to the people who would actually welcome it.; October 11, 2013 at 10:49 AM

The Lousy Linguist

Saturday, September 14, 2013

clash of publishing cultures: NLP and literary study

5 comments:

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

Tools for Linguists

Favorite Posts