Docker is a great tool that automates the deployment of software across a Linux operating system. While the fundamental idea behind Docker is to stack specialized software together to form a complex system, there is no particular rule of how big or small the software for a container can or should be. Running the complete HDP stack in a single container can be achieved as well as running each service of HDP in it’s own container.
Docker allows you to run applications inside containers. Running an application inside a container takes a single command: docker run. Containers are based off of images defining software packages and configurations. hkropp/hdp-basic is such an image in which the HDP services are running. The image was build using Ambari blueprint orchastrated by a Dockerfile. The hostname was specified to be n1.hdp throughout the build process and hence needs also to be specified when running it. The Dockerfile for this image is located here. This posts describes how to build HDP on top of Docker.
Prerequisite Setup
Before getting started a Docker environment needs to be installed. A quick way to get started is Boot2Docker. Boot2Docker is a VirtualBox image based on Tiny Core Linux with Docker installed. It can be used with Mac OS X or Windows. Other ways to install Docker can be found here.
Boot2Docker
Once installed Boot2Docker can be used via command line tool boot2docker. With it we can initialize the VM, boot it up, and prepare our shell for docker.
# getting help
$ boot2docker
Usage: boot2docker [<options>] {help|init|up|ssh|save|down|poweroff|reset|restart|config|status|info|ip|shellinit|delete|download|upgrade|version} [<args>]
# init a VM with 8GB RAM and 8 CPUs
$ boot2docker init --memory=8192 --cpus=8
# boot up the image
$ boot2docker up
# shutdown the vm
$ boot2docker down
# setup the shell
$ boot2docker shellinit
# delete the vm completely (to use again an init required)
$ boot2docker delete
# test running
$ docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): darwin/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64
Running hdp-basic
With the Docker environment setup the image can be run like this:
$ docker run -d
-p 8080:8080
-h n1.hdp
hkropp/hdp-basic:0.1
/start-server
Unable to find image 'hkropp/hdp-basic:0.1' locally
0.1: Pulling from hkropp/hdp-basic
If not already installed locally this will fetch the image from Docker Hub. After that the image is run in daemon mode as the -d flag indicates. The -p flag lets Docker know to expose this port to the host VM. With this Ambari can be accessed using the $ boot2docker ip and port 8080 – http://$(boot2docker ip):8080 The hostname is set to be n1.hdp because the image was configured with this hostname. By executing the /start-server script at boot time the Ambari server is started together with all installed services.
The Dockerfile
Building this image was achieved using this Dockerfile, while the installation of HDP was done using Ambari Shell with Blueprints. Helpful about Ambari Shell is the fact that an blueprint install can be executed blocking further process until the install has finished (–exitOnFinish true). From the install-cluster.sh script:
java -jar /tmp/ambari-shell.jar --ambari.host=$HOST << EOF
blueprint add --file /tmp/blueprint.json
cluster build --blueprint hdp-basic
cluster assign --hostGroup host_group_1 --host $HOST
cluster create --exitOnFinish true
EOF
The image is based from a centos:6.6 image. Throughout the build a consistent hostname is being used for the configuration and installation. Doing this with Docker builds is actually not very easy to achieve. By design Docker tries to make the context a container can run in as less restrictive as possible. Assigning a fixed host name to an image is restricting these context. In addition every build step creates a new image with a new host name. Setting the host name before each step requires root privileges which are not given. To work around this the ENV command was used to set the HOSTNAME and to make it resolvable before any command that required the hostname a script was executed to set it as part of the /etc/hosts file.
Part of the Dockerfile:
# OS
FROM centos:6.6
# Hostname Help
ENV HOSTNAME n1.hdp
ADD set_host.sh /tmp/
...
RUN /tmp/set_host.sh && /tmp/install-cluster.sh
Part of the set_host.sh:
#!/bin/bash
echo $(head -1 /etc/hosts | cut -f1) n1.hdp >> /etc/hosts
The Ambari agents support dynamic host configuration by defining a script.
Dockerfile:
# Setup networking for Ambari agent/server
ADD hostname.sh /etc/ambari-agent/conf/hostname.sh
#RUN sed -i "s/hostname=.*/hostname=n1.hdp/" /etc/ambari-agent/conf/ambari-agent.ini
RUN sed -i "/[agent]/ a public_hostname_script=/etc/ambari-agent/conf/hostname.sh" /etc/ambari-agent/conf/ambari-agent.ini
RUN sed -i "/[agent]/ a hostname_script=/etc/ambari-agent/conf/hostname.sh" /etc/ambari-agent/conf/ambari-agent.ini
RUN sed -i "s/agent.task.timeout=900/agent.task.timeout=2000/" /etc/ambari-server/conf/ambari.properties
hostname.sh:
#!/bin/bash
# echo $(hostname -f) # for dynamic host name
echo "n1.hdp"
Starting HDP
start-server is the script that is executed during startup of the container. Here the Ambari server and agent are started. The Ambari Shell is again being used to start up the all installed HDP services.
#!/bin/bash
while [ -z "$(netstat -tulpn | grep 8080)" ]; do
ambari-server start
ambari-agent start
sleep 5
done
sleep 5
java -jar /tmp/ambari-shell.jar --ambari.host=n1.hdp << EOF
services start
EOF
while true; do
sleep 3
tail -f /var/log/ambari-server/ambari-server.log
done
Further Readings