Automated Kerberos Install for HDP w/ Ambari + Puppet

With the release of Ambari 2.x kerberizing a HDP install improved quite a bit. Looking back at Kerberized Hadoop Cluster – A Sandbox Example compared to today most of the there described steps are much easier by now and can be automated. For long I was looking to include it into my existing Vagrant project for an end to end setup of a kerberized cluster. With the writing of this post I finally had the opportunity to do so.

In this post I would like to describe the parts added to the Vagrant setting needed to accomplish an end to end setup of a kerberized HDP cluster. Before the final step of the cluster setup by using the Ambari REST API, a KDC with credentials needs to be created. A Puppet module was created and included to achieve the installation of a MIT Kerberos install. Continue reading “Automated Kerberos Install for HDP w/ Ambari + Puppet”


Azure VNet for Your HDP Cluster

In a series of blog posts I demonstrated how to create a custom OS Image for automatic provisioning of HDP with Vagrant on the Azure Cloud. On GitHub I share the result of a first layout among other provisioning scripts. Until now the setup was done without the proper network configuration required for the communication of the individual components of the cluster.

With the new release of the vagrant-azure plugin it will be possible to setup the cluster in a dedicated VNet. This is the last missing peace in the series of work I published to allow the automated provisioning of HDP in Azure. Unfortunately this is not quite true, as the current Ruby SDK of Azure does not allow the passing of IP addresses to the machines. We therefor have to create host entries currently by hand. It could be possible to setup a DNS or use Puppet to conduct the host mapping in a automated fashion, but I at least was not able to do so as part of this work here. Continue reading “Azure VNet for Your HDP Cluster”

HDP Ansible Playbook Example

In my existing collection of automated install scripts for HDP I always try to extend it with further examples of different provisioners, providers, and settings. Recently I added with hdp22-n1-centos6-ansible an example Ansible environment for preparing a HDP 2.2 installation on one node.

Ansible differs from other provisioners like Puppet or Chef by a simplified approach in dependence on SSH. It behaves almost as a distributed shell putting little dependencies on existing hosts. Where for example Puppet makes strong assumptions about the current state of a system with one or multiple nodes, does Ansible more or less reflect a collection of tasks a system gone through to reach it’s current state. While some celebrate Ansible for it’s simplicity do others abandon it for it’s lack of strong integrity.

In this post I would like to share a sample Ansible Playbook to prepare a HDP 2.2 Amabri installation using Vagrant with Virtualbox. You can download and view the in this post discussed example here. Continue reading “HDP Ansible Playbook Example”

Try Now: HDP 2.2 on Windows Azure

HDInsight the Hadoop cloud offering from Windows Azure is a great way to use BigData as a service solutions, but there is more. With the general availability of HDP 2.2 announced this week it is great opportunity to extend the existing HDP Vagrant collection with the Windows Azure provider. In this blog post I want to demonstrate the needed steps to quickly setup a 6 node Hadoop cluster using the provided script. Except for preliminary setup steps it only takes a little adjustment of the Vagrantfile and two commands to setup the whole cluster.

Our 5 node cluster will consist of two master nodes, three data nodes, and one edge node with the Apache Knox gateway installed among other client libraries. Let’s jump in right now. Continue reading “Try Now: HDP 2.2 on Windows Azure”

Collection of HDP Vagrant Scripts

After writing about provisioning Hadoop cluster with Vagrant I started a collection of cluster setups using the HDP distribution. The examples use different versions, operating systems, Vagrant providers, and node sizes. With Ambari blueprints different scenarios can be provided with a simple command. With this post I would like to share this scripts using Github here. In addition with the event of HDP 2.2 two examples using the technical preview version of HDP were added to this repository:

The naming convention for each environment is as follows:


The environments can be run and setup with the one simple command:

vagrant up && ./

As a requirement VirtualBox and Vagrant need to be installed.

The master_blueprint.json contains possible Ambari blueprint components and configurations.

Examples at Github:


Provisioning a Cluster Using Vagrant and OpenStack

Vagrant has become very popular for provisioning virtual machines for development. Usually it’s used in combination with VirtualBox on a local machine. But Vagrant supports multiple other visualization providers, in fact one can build a custom provider as needed. If the local machine is not sufficient for the needs of development moving to the cloud seems like a reasonable thing to do using AWS, Rackspace, or OpenStack. Continue reading “Provisioning a Cluster Using Vagrant and OpenStack”

Provisioning a HDP Dev Cluster with Vagrant

Setting up a production or development Hadoop cluster used to be much more tedious then it is today with tools like Puppet, Chef, and Vagrant. Additionally the Hadoop community kept busy investing in the ease of deployments listening to demands of experienced system administrators. The latest of such investments is Ambari Blueprints.

With Ambari Blueprints dev-ops are capable of configuring an automated setup of individual components on each node across a cluster. This further can be re-used to replicate the setup on to different clusters for development, integration, or production.

In this post we are going to setup up a three node HDP 2.1 cluster for development on a local machine by using Vagrant and Ambari.
Most of what will be presented here builds up on previous work published by various author which are referenced at the end of this post. Continue reading “Provisioning a HDP Dev Cluster with Vagrant”