Training Multiple SVM Classifiers with Apache Pig

Inspired by Twitter‘s publication about “Large Scale Machine Learning” I turned to Pig when it came to implement a SVM classifier for Record Linkage. Searching for different solutions I also came across a presentation of the Huffington Post using a similar approach to training multiple SVM models. The overall idea is to use Hadoop to train multiple models with different parameters at the same time, selecting the best model for the actual classification. There are some limitations to this approach, which I’ll try to address at the end of this post, but first let me describe my approach to training multiple SVM classifiers with Pig.

Disclaimer: This post does not describe the process of training one model in parallel but training multiple models at the same time on multiple machines.
Continue reading “Training Multiple SVM Classifiers with Apache Pig”