Setting up a production or development Hadoop cluster used to be much more tedious then it is today with tools like Puppet, Chef, and Vagrant. Additionally the Hadoop community kept busy investing in the ease of deployments listening to demands of experienced system administrators. The latest of such investments is Ambari Blueprints.
With Ambari Blueprints dev-ops are capable of configuring an automated setup of individual components on each node across a cluster. This further can be re-used to replicate the setup on to different clusters for development, integration, or production.
In this post we are going to setup up a three node HDP 2.1 cluster for development on a local machine by using Vagrant and Ambari.
Most of what will be presented here builds up on previous work published by various author which are referenced at the end of this post.
HDP Setup with Vagrant
Vagrant let’s you easily setup up virtual environments in a snap completely described in code. Although Vagrant uses Ruby prior knowledge of the language is not required. Spinning up your virtual environment is as easy as running vagrant init and vagrant up from you command line interface. Vagrant runs your setup on VirtualBox, VMware or any other supported provider.
Central components of a Vagrant setup are a Vagrantfile and a Box. While Vagrantfiles are used to describe the individual setup in the Ruby language, is a Box a Vagrant package of such a system including the bare (or pre-installed) image of the underlying operating system. Boxes can be published and shared. A resources to find a certain Box of you need could be for example Vagrantcloud.
Here we are using a CentOS 6.5 box with pre-installed Puppet provided by Puppetlabs:
config.vm.box = "puppetlabs/centos-6.5-64-puppet" config.vm.box_url = "http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130731.box"
For provisioning Vagrant can be used with Shell, Chef, or Puppet among others. Even all of them can be used at the same time. In the here described setup we are going to use Puppet as our provisioning system to setup HDP 2.1. Each of the nodes having it’s own puppet script:
one.vm.provision "puppet" do |puppet| puppet.manifests_path = "manifest" puppet.module_path = "modules" puppet.manifest_file = "one.pp" end
For further details about Vagrant and how to install it on your system please refer to the documentation that can be found here.
For the purposes of this example we would want a three node CentOS cluster. This can be reached by applying a multiple node setup in Vagrant. On node one we want to install the Ambari server which requires a slightly different Puppet script as you will see later. We also forward the Ambari port from localhost to the guest system. This is the complete Vagrantfile used:
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing! VAGRANTFILE_API_VERSION = "2" Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.box = "puppetlabs/centos-6.5-64-puppet" config.vm.box_url = "http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130731.box" config.vm.synced_folder "ssh", "/root/.ssh" config.vm.define :one do |one| one.vm.hostname = "one.cluster" one.vm.network :private_network, ip: "192.168.0.101" one.vm.provider :virtualbox do |vb| vb.customize ["modifyvm", :id, "--memory", 2048] end one.vm.network "forwarded_port", guest: 8080, host: 8080 one.vm.provision "puppet" do |puppet| puppet.manifests_path = "manifest" puppet.module_path = "modules" puppet.manifest_file = "one.pp" end one.vm.provision "shell" do |s| s.inline = "sudo chmod 600 /root/.ssh" end end config.vm.define :two do |two| two.vm.hostname = "two.cluster" two.vm.network :private_network, ip: "192.168.0.102" two.vm.provider :virtualbox do |vb| vb.customize ["modifyvm", :id, "--memory", 2048] end two.vm.provision "puppet" do |puppet| puppet.manifests_path = "manifest" puppet.module_path = "modules" puppet.manifest_file = "two.pp" end two.vm.provision "shell" do |s| s.inline = "sudo chmod 600 /root/.ssh" end end config.vm.define :three do |three| three.vm.hostname = "three.cluster" three.vm.network :private_network, ip: "192.168.0.103" three.vm.provider :virtualbox do |vb| vb.customize ["modifyvm", :id, "--memory", 2048] end three.vm.provision "puppet" do |puppet| puppet.manifests_path = "manifest" puppet.module_path = "modules" puppet.manifest_file = "three.pp" end end end
Provisioning HDP 2.1
Installing a HDP cluster using Ambari can be achieved following this documentation step by step. Here we would want to automate the whole process. According to the documentation in a first step we need to install ntp service, disable iptables as it might interfere with our services, and at last install Apache Ambari. During this setup we have to make sure networking is setup correctly as the hosts need to be able to discover each other either by proper setup of DNS or using /etc/hosts file. As we need to apply this for each host separately we are going to place them into seperated Puppet modules that can be reused. The Puppet modules used here are interfering_services, ntp, etchosts.
The interfering_services Puppet Module
Here we would like to disable iptables and Package Kit.
class interfering_services { # Disable Package KIT file { 'packageKit': path => "/etc/yum/pluginconf.d/refresh-packagekit.conf", ensure => "present", replace => true, content => " enabled=0" } # Stop IP Tables exec { "stop_ip_tables": path => ["/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/"], command => "service iptables stop" } exec { "stop_ip_tables6": path => ["/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/"], command => "service ip6tables stop" } }
The ntp Puppet Module
Installing time service on each cluster with this module.
class ntp { package { 'ntp': name => "ntp", ensure => present } service { 'ntp-services': name => "ntpd", ensure => running, require => Package[ntp] } }
The etchosts Puppet Module
# Ensure that the machines in the cluster can find each other without DNS class etchosts ($ownhostname) { host { 'host_one': name => 'one.cluster', alias => ['one', 'one.cluster'], ip => '192.168.0.101', } host { 'host_two': name => 'two.cluster', alias => ['two', 'two.cluster'], ip => '192.168.0.102', } host { 'host_three': name => 'three.cluster', alias => ['three', 'three.cluster'], ip => '192.168.0.103', } file { 'agent_hostname': path => "/etc/hostname", ensure => present, replace => true, content => "${ownhostname}", # own hostname owner => 1546 } file { 'agent_sysconfig': path => "/etc/sysconfig/network", ensure => present, replace => true, content => "NETWORKING=yes nHOSTNAME=${ownhostname}" # own hostname } }
Installing Ambari
Three out of the three nodes will run an Ambari agent while one runs the server. Here again we going to use Puppet modules to provision the Ambari server and agents onto the nodes.
The ambari-server Puppet Module
First we need to install the repository from where we want to install Ambari from. Here we add the public-rep-1.hortonworks.com location to the repository list. Then install the Ambari server package and run the setup.
class ambari_server ($ownhostname) { Exec { path => ["/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/"] } # Ambari Repo exec { 'get-ambari-server-repo': command => "wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.6.1/ambari.repo", cwd => '/etc/yum.repos.d/', creates => '/etc/yum.repos.d/ambari.repo', user => root } # Ambari Server package { 'ambari-server': ensure => present, require => Exec[get-ambari-server-repo] } exec { 'ambari-setup': command => "ambari-server setup -s", user => root, require => Package[ambari-server] } service { 'ambari-server': ensure => running, require => [Package[ambari-server], Exec[ambari-setup]], start => Exec[ambari-server-start] } exec { 'ambari-server-start': command => "ambari-server start", require => Service[ambari-server], onlyif => 'ambari-server status | grep "not running"' } }
The ambari-agent Puppet Module
As for the server setup for the agent we also need to install the repository first before initializing the agent itself.
class ambari_agent ($ownhostname, $serverhostname) { Exec { path => ["/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/"] } # Ambari Repo exec { 'get-ambari-agent-repo': command => "wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.6.1/ambari.repo", cwd => '/etc/yum.repos.d/', creates => '/etc/yum.repos.d/ambari.repo', user => root } # Ambari Agent package { 'ambari-agent': ensure => present, require => Exec[get-ambari-agent-repo] } file_line { 'ambari-agent-ini-hostname': ensure => present, path => '/etc/ambari-agent/conf/ambari-agent.ini', line => "hostname=${serverhostname}", # server host name match => 'hostname=*', require => Package[ambari-agent] } exec { 'hostname': command => "hostname ${ownhostname}", # own host name user => root } exec { 'ambari-agent-start': command => "ambari-agent start", user => root, require => [Package[ambari-agent], Exec[hostname], File_line[ambari-agent-ini-hostname]], onlyif => 'ambari-agent status | grep "not running"' } }
Having setup this modules we can now easily reference those from our provisioning script at node one, two, and three. The scripts for node two and three are almost identical.
Puppet script node one:
include interfering_services # Install and enable ntp include ntp # Ensure that servers can find themselves even in absence of dns class { 'etchosts': ownhostname => 'one.cluster' } # Install and enable ambari server class { 'ambari_server': ownhostname => 'one.cluster' } # Install and enable ambari agent class { 'ambari_agent': ownhostname => 'one.cluster', serverhostname => 'one.cluster' } # Establish ordering Class['interfering_services'] -> Class['ntp'] -> Class['etchosts'] -> Class['ambari_server'] -> Class['ambari_agent']
Puppet script for node two/three:
include interfering_services # Install and enable ntp include ntp # Ensure that servers can find themselves even in absence of dns class { 'etchosts': ownhostname => 'one.cluster' } class { 'ambari_agent': serverhostname => "one.cluster", ownhostname => "two.cluster" } # Establish ordering Class['interfering_services'] -> Class['ntp'] -> Class['etchosts'] -> Class['ambari_agent']
Taking it from here we would already be able to provision the complete Hadoop cluster using Ambari’s automated installation process. Just point your browser to localhost:8080 and login using admin as your user name and password.
A better way would be to use Ambari Blueprints to provision the complete cluster automatically.
Very informative to someone with no Puppet experience. Thanks for helping me add some tools to my Hadoop toolbox.
LikeLike