Monday, March 7, 2011

turning gaga into water = 200 terabytes

How much storage would it take to store the first 5 years of a child's linguistic environment? Apparently, 200 terabytes. From Fast Company:

...cognitive scientist Deb Roy Wednesday shared a remarkable experiment that hearkens back to an earlier era of science using brand-new technology. From the day he and his wife brought their son home five years ago, the family's every movement and word was captured and tracked with a series of fisheye lenses in every room in their house. The purpose was to understand how we learn language, in context, through the words we hear. A combination of new software and human transcription called Blitzscribe allowed them to parse 200 terabytes of data to capture the emergence and refinement of specific words in Roy’s son’s vocabulary.

The data visualization techniques he uses are pretty cutting edge ... and awesome! I love the fact that he is trying to use visualization techniques to help us understand something beyond raw statistics (which is where most graphs and pie charts die  miserable deaths). Statistics are like molecules. Visualize them one by one and it's difficult for the average person to conceptualize the big picture of how they work together to create a grander whole. Roy appears to be trying to get beyond the yawn-inducing graphs that plague modern science. I mean, he uses freaky-deaky time-worms! How cool is that!

Roy talk's about feed-back loops as well:

..."Caregiver speech dipped to a minimum and slowly ascended back out in complexity.” In other words, when mom and dad and nanny first hear a child speaking a word, they unconsciously stress it by repeating it back to him all by itself or in very short sentences. Then as he gets the word, the sentences lengthen again. The infant shapes the caregivers’ behavior, the better to learn.

He gave a TED talk recently, but the video is not yet available.

1 comment:

Anonymous said...

I saw the TED video some days ago, here's a link:

I really enjoyed the part where he explains his amazing project. But when he turns to talking about social media I thought he was going a bit too "far out". I'm nevertheless in complete awe about what he has done, and I think his research will be highly valuable in understanding how children aquire their first language. I don't evnvy those who are working on transcribing all that material, though..

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...