Building HDP on Docker

Docker is a great tool that automates the deployment of software across a Linux operating system. While the fundamental idea behind Docker is to stack specialized software together to form a complex system, there is no particular rule of how big or small the software for a container can or should be. Running the complete HDP stack in a single container can be achieved as well as running each service of HDP in it’s own container.

Docker allows you to run applications inside containers. Running an application inside a container takes a single command: docker run. Containers are based off of images defining software packages and configurations. hkropp/hdp-basic is such an image in which the HDP services are running. The image was build using Ambari blueprint orchastrated by a Dockerfile. The hostname was specified to be n1.hdp throughout the build process and hence needs also to be specified when running it. The Dockerfile for this image is located here. This posts describes how to build HDP on top of Docker.

Prerequisite Setup

Before getting started a Docker environment needs to be installed. A quick way to get started is Boot2Docker. Boot2Docker is a VirtualBox image based on Tiny Core Linux with Docker installed. It can be used with Mac OS X or Windows. Other ways to install Docker can be found here.

Boot2Docker

Once installed Boot2Docker can be used via command line tool boot2docker. With it we can initialize the VM, boot it up, and prepare our shell for docker.

# getting help
$ boot2docker
Usage: boot2docker [<options>] {help|init|up|ssh|save|down|poweroff|reset|restart|config|status|info|ip|shellinit|delete|download|upgrade|version} [<args>]

# init a VM with 8GB RAM and 8 CPUs
$ boot2docker init --memory=8192 --cpus=8

# boot up the image
$ boot2docker up

# shutdown the vm
$ boot2docker down

# setup the shell
$ boot2docker shellinit

# delete the vm completely (to use again an init required)
$ boot2docker delete

# test running
$ docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): darwin/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64

Running hdp-basic

With the Docker environment setup the image can be run like this:

$ docker run -d 
-p 8080:8080 
-h n1.hdp 
hkropp/hdp-basic:0.1 
/start-server 

Unable to find image 'hkropp/hdp-basic:0.1' locally
0.1: Pulling from hkropp/hdp-basic

If not already installed locally this will fetch the image from Docker Hub. After that the image is run in daemon mode as the -d  flag indicates. The -p flag lets Docker know to expose this port to the host VM. With this Ambari can be accessed using the $ boot2docker ip  and port 8080 – http://$(boot2docker ip):8080 The hostname is set to be n1.hdp because the image was configured with this hostname. By executing the /start-server script at boot time the Ambari server is started together with all installed services.

The Dockerfile

Building this image was achieved using this Dockerfile, while the installation of HDP was done using Ambari Shell with Blueprints. Helpful about Ambari Shell is the fact that an blueprint install can be executed blocking further process until the install has finished (–exitOnFinish true). From the install-cluster.sh script:

java -jar /tmp/ambari-shell.jar --ambari.host=$HOST << EOF
blueprint add --file /tmp/blueprint.json
cluster build --blueprint hdp-basic
cluster assign --hostGroup host_group_1 --host $HOST
cluster create --exitOnFinish true
EOF

The image is based from a centos:6.6 image. Throughout the build a consistent hostname is being used for the configuration and installation. Doing this with Docker builds is actually not very easy to achieve. By design Docker tries to make the context a container can run in as less restrictive as possible. Assigning a fixed host name to an image is restricting these context. In addition every build step creates a new image with a new host name. Setting the host name before each step requires root privileges which are not given. To work around this the ENV command was used to set the HOSTNAME and to make it resolvable before any command that required the hostname a script was executed to set it as part of the /etc/hosts file.

Part of the Dockerfile:

# OS
FROM centos:6.6

# Hostname Help
ENV HOSTNAME n1.hdp
ADD set_host.sh /tmp/

...

RUN /tmp/set_host.sh && /tmp/install-cluster.sh

Part of the set_host.sh:

#!/bin/bash

echo $(head -1 /etc/hosts | cut -f1) n1.hdp >> /etc/hosts

The Ambari agents support dynamic host configuration by defining a script.

Dockerfile:

# Setup networking for Ambari agent/server
ADD hostname.sh /etc/ambari-agent/conf/hostname.sh
#RUN sed -i "s/hostname=.*/hostname=n1.hdp/" /etc/ambari-agent/conf/ambari-agent.ini
RUN sed -i "/[agent]/ a public_hostname_script=/etc/ambari-agent/conf/hostname.sh" /etc/ambari-agent/conf/ambari-agent.ini
RUN sed -i "/[agent]/ a hostname_script=/etc/ambari-agent/conf/hostname.sh" /etc/ambari-agent/conf/ambari-agent.ini
RUN sed -i "s/agent.task.timeout=900/agent.task.timeout=2000/" /etc/ambari-server/conf/ambari.properties

hostname.sh:

#!/bin/bash

# echo $(hostname -f) # for dynamic host name
echo "n1.hdp"

Starting HDP

start-server is the script that is executed during startup of the container. Here the Ambari server and agent are started. The Ambari Shell is again being used to start up the all installed HDP services.

#!/bin/bash

while [ -z "$(netstat -tulpn | grep 8080)" ]; do
  ambari-server start
  ambari-agent start
  sleep 5
done

sleep 5

java -jar /tmp/ambari-shell.jar --ambari.host=n1.hdp << EOF
services start
EOF

while true; do
  sleep 3
  tail -f /var/log/ambari-server/ambari-server.log
done

Further Readings

Upgrade Docker to Master on OSx

Docker is a fast moving project enjoying a lot of popularity among developers across all branches. With this wide support the Docker ecosystem is evolving almost every day reaching from Deis as a PaaS platform, cluster management in CoreOS and Kubernets. Even Microsoft is considering Docker support for their next version of Microsoft Server. In this post I would like to demonstrate how to upgrade to a master release of Docker running on Mac OSx with Boot2Docker which comes handy when trying to keep up with the latest development or move a around a current bug already fixed in a new release. Likely the here stated notes will also be useful with other environments. Continue reading “Upgrade Docker to Master on OSx”