Wednesday, October 16, 2013

Weka data mining and the power of the masses

I recently completed the 5 hour Weka Data Mining MOOC and I was very impressed. I beta tested the first week last March and was enthusiastic. My enthusiasm was warranted.

The core idea is not to teach data mining per se, but rather to teach the user friendly GUI that makes data mining a simple matter of button clicks. It's the WYSIWYG approach to data analysis that could tip the momentum behind data mining over the point where everyone gets to play. For example, below is the Weka GUI with their sample diabetes data displayed:

Below is the same data set after the decision tree classifier J48 has been run (with default parameters).

This took me all of 45 seconds with zero programming (I'll agree with you that 73.8% accuracy is meh, if you'll agree with me that 45 seconds and default parameters is hella rad).

To be clear, the course is actually not a data mining course per se. Rather, it's a tutorial about their GUI. It shows you how to click buttons in order to load data sets, choose features, run various learning algorithms like decision trees, Naive Bayes, logistic regression, etc. What it does not do is teach you how these algorithms work (with a minor exception of a nice decision tree video).   More than anything else, this MOOC shows you how valuable Weka is for rapid prototyping. With this tool, you could run a dozen algorithms with a dozen feature variations over a data set in minutes. With ZERO programming!

I cannot stress enough how powerful this idea is. For those of you who don't appreciate how much more culturally powerful Microsoft Word is than LaTeX, you may not appreciate this power. It's the power of the masses. LaTeX does not have the power of the masses. Python does not have the power of the masses. But Weka has the potential to bring data mining to high school students, English majors, hipsters, unemployed copy writers, etc. Weka has made me more excited about the future of data mining than any other single tool.

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

 [reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...