Monday, January 25, 2016

Genetic Mutation, Thick Data, and Human Intuition

There are two stories trending heavily in my social network sites that are seemingly unrelated, yet they share one obvious conclusion: the value of human intuition in finding needles in big data haystacks. Reading them highlighted to me the special role humans must still can play in the emerging 21st century world of big data.

In the first story, The Patient Who Diagnosed Her Own Genetic Mutation—and an Olympic Athlete's, a woman with muscular dystrophy sees a photo of an Olympic sprinter’s bulging muscles and thinks to herself, “she has the same condition I do.” What in the world would cause her to think that? There is no pattern in the data that would suggest this. The story is accompanied by a startling picture of two women who, at first glance, look nothing alike. But once guided by the needle in the haystack that this woman saw, a similarity is illuminated and eventually a connection is made between two medically disparate facts that, once combined, opened a new path of inquiry into muscle growth and dystrophy that is now a productive area of research. Mind you, no new chemical compound was discovered. No new technique or method that allowed scientists to see something that couldn’t be seen before was built. Nope. Nothing *new* came into being, but rather a connection was found between two things that all the world’s experts never saw before. One epiphany by a human being looking for a needle in a haystack. And she found it.

In the second story, Why Big Data Needs Thick Data, an anthropologist working closely to understand the user stories of just 100 Motorola cases discovers a pattern that Motorola’s own big data efforts missed. How? Because his case-study approach emphasized context. Money quote:
For Big Data to be analyzable, it must use normalizing, standardizing, defining, clustering, all processes that strips the the data set of context, meaning, and stories. Thick Data can rescue Big Data from the context-loss that comes with the processes of making it usable.
Traditional machine learning techniques are designed to find large patterns in big data, but those same techniques fail to address the needle in the haystack problem. This is where humans and intuition truly stands apart. Both of these articles are well worth reading in the context of discovering the gaps in current data analysis techniques that humans must fill.

UPDATE: Here's a third story making a similar point. a human being using an automatically culled dictionary noticed a misogynist tendency in the examples it provided. A rabid feminist writes

And here's a fourth: Algorithms Need Managers, Too. Money quote: "Google’s hard goal of maximizing clicks on ads had led to a situation in which its algorithms, refined through feedback over time, were in effect defaming people with certain kinds of names."


Dominik Lukeš said...

I'm not sure that these are the best examples to give. They take the 'human' ability and make it seem somehow computer-like but at tasks computers can't perform or are bad at. But the thing is computers have two options for solving problems: algorithm or statistics. In both cases, choosing the type of solution is a human task. Monitoring the output is as well.

But humans are just as bad at all these tasks as they are good. There are no 'unit tests' for judgement - it is always contextual and determined by a lot of background knowledge not available to an algorithm. For every woman who saw a pattern and recognized something important in it, there are a million who saw a pattern that was made up out of randomness.

The problem is that we even have to have this conversation to start with. Stuff like feelings that some sort of human turf being invaded when a guy looses at chess to a supercomputer when the surprise should be he does not loose to a calculator. I'm always reminded of the checklist manifesto in this context. It's when medical practitioners use checklists to follow procedures rather than relying on their judgement for routine behaviors and use judgment for non-routines. Computers are just glorified checklists. What is human is making judgements whether to use the checklists - and very often making the wrong call.

What worries me that all these 'AI', big data people are buying their own hype - but the source of it seems to be in their belief that human cognition ('intelligence') is somehow algorithmic but only in a flawed way. But it should have been obvious how wrong this is with the burst of the first AI bubble.

Sorry, if this feels a bit ranty. I'm not sure I'm expressing myself properly here.

Chris said...

Dominik, sorry for the late reply, but I think you make very thoughtful comments and I wanted to wait until I had time to respond as thoughtfully. Unfortunately, time keeps skipping away. You make a very good point that both of these stories cherry pick an anecdote where humans happened to do well where computers didn't, but surely there are many other examples to the contrary. Yes, I agree. I was taken by the allegory of the stories. And a big YES that the *story* of AI has been mis-told as a story of humans-as-computers. Intelligence is not strictly algorithmic, I suspect, but I cannot prove that. It is worth noting that the major success stories of contemporary AI involve so-called *deep-learning*, which is simply a way of saying that the successful algorithms that humans are using to solve problems (like image categorization) work in ways so opaque that even the smartest people using them can't explain them. The magic of hidden layers solves everything. Presto!

Sorry, if this feels a bit ranty. I'm not sure I'm expressing myself properly here.

Dominik Lukeš said...

Thanks for the response Chris. I agree on the issue of 'deep learning'. But perhaps we should take cognitive acts such as image categorization to be the operational units of cognition. One of the things linguistics has never dealt with is the issue of word recognition and frame triggering. What sort of process is taking place when you see a word and recognize all the complex frames, schemas and scenarios that are associated with it. How do they interact when words are put together in a phrase or clause? How does that meaning come together? The reason, I think, is that we are thinking of all these acts as fundamental units. Which is probably the right way to go about it. However, when it came to modelling these 'units' computationally, we just contented ourselves with simple string matching and database look up. (And before computers, formal logical models did the same thing in different ways.)

But that is the wrong way to model this. Or rather models based around this approach are not modelling language but a caricature of language. Human analysts of language have the advantage of already being in possession of this 'black box' faculty of instant recognition and categorization (albeit subject to errors and inconsistencies) but computers need a way to replicate this. So perhaps starting with things like deep learning image categorization is the right place. Because word categorization is more like image recognition than like the sort serial processes AI has dealt with so far. It is also possible that what has been done so far will not generalize or scale up. But without it, AI can't get anywhere near to what has been claimed for it.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...