Provisioning a HDP Dev Cluster with Vagrant

Setting up a production or development Hadoop cluster used to be much more tedious then it is today with tools like Puppet, Chef, and Vagrant. Additionally the Hadoop community kept busy investing in the ease of deployments listening to demands of experienced system administrators. The latest of such investments is Ambari Blueprints.

With Ambari Blueprints dev-ops are capable of configuring an automated setup of individual components on each node across a cluster. This further can be re-used to replicate the setup on to different clusters for development, integration, or production.

In this post we are going to setup up a three node HDP 2.1 cluster for development on a local machine by using Vagrant and Ambari.
Most of what will be presented here builds up on previous work published by various author which are referenced at the end of this post.

HDP Setup with Vagrant

Vagrant let’s you easily setup up virtual environments in a snap completely described in code. Although Vagrant uses Ruby prior knowledge of the language is not required. Spinning up your virtual environment is as easy as running  vagrant init  and  vagrant up  from you command line interface. Vagrant runs your setup on VirtualBox, VMware or any other supported provider.

Central components of a Vagrant setup are a Vagrantfile and a Box. While Vagrantfiles are used to describe the individual setup in the Ruby language, is a Box a Vagrant package of such a system including the bare (or pre-installed) image of the underlying operating system. Boxes can be published and shared. A resources to find a certain Box of you need could be for example Vagrantcloud.

Here we are using a CentOS 6.5 box with pre-installed Puppet provided by Puppetlabs:

For provisioning Vagrant can be used with Shell, Chef, or Puppet among others. Even all of them can be used at the same time. In the here described setup we are going to use Puppet as our provisioning system to setup HDP 2.1. Each of the nodes having it’s own puppet script:

For further details about Vagrant and how to install it on your system please refer to the documentation that can be found here.

For the purposes of this example we would want a three node CentOS cluster. This can be reached by applying a multiple node setup in Vagrant. On node one we want to install the Ambari server which requires a slightly different Puppet script as you will see later. We also forward the Ambari port from localhost to the guest system. This is the complete Vagrantfile used:

Provisioning HDP 2.1

Installing a HDP cluster using Ambari can be achieved following this documentation step by step. Here we would want to automate the whole process. According to the documentation in a first step we need to install  ntp  service, disable  iptables  as it might interfere with our services, and at last install Apache Ambari. During this setup we have to make sure networking is setup correctly as the hosts need to be able to discover each other either by proper setup of DNS or using  /etc/hosts  file. As we need to apply this for each host separately we are going to place them into seperated Puppet modules that can be reused. The Puppet modules used here are interfering_services, ntp, etchosts.

The  interfering_services  Puppet Module

Here we would like to disable iptables and Package Kit.

The  ntp  Puppet Module

Installing time service on each cluster with this module.

The  etchosts  Puppet Module

Installing Ambari

Three out of the three nodes will run an Ambari agent while one runs the server. Here again we going to use Puppet modules to provision the Ambari server and agents onto the nodes.

The  ambari-server  Puppet Module

First we need to install the repository from where we want to install Ambari from. Here we add the  location to the repository list. Then install the Ambari server package and run the setup.

The ambari-agent Puppet Module

As for the server setup for the agent we also need to install the repository first before initializing the agent itself.

Having setup this modules we can now easily reference those from our provisioning script at node one, two, and three. The scripts for node two and three are almost identical.

Puppet script node one:

Puppet script for node two/three:

Taking it from here we would already be able to provision the complete Hadoop cluster using Ambari’s automated installation process. Just point your browser to localhost:8080  and login using  admin  as your user name and password.

A better way would be to use Ambari Blueprints to provision the complete cluster automatically.

Further Readings

2 thoughts on “Provisioning a HDP Dev Cluster with Vagrant”

Leave a Reply