Hadoop: Counting Words

As you may know, Hadoop is a distributed System for counting words. Of course it is not, but the “Word Count” program is a widely accepted example of MapReduce. To be true it is so widely applied, that many people feel that the “Word Count” example is overused. Than again it is a straightforward example of how MapReduce works. In this post I give some other examples of counting words. One of the example is implemented with Hadoop Streaming API and Node.js.

  1. Bash
  2. Node.js

    Sample execution:

    Don’t forget to sort and shuffle, which is the phase of Hadoop before the reducer starts ( | sort | ).
  3. Node.js + Hadoop Streaming