Docker is a great tool that automates the deployment of software across a Linux operating system. While the fundamental idea behind Docker is to stack specialized software together to form a complex system, there is no particular rule of how big or small the software for a container can or should be. Running the complete HDP stack in a single container can be achieved as well as running each service of HDP in it’s own container.
Docker allows you to run applications inside containers. Running an application inside a container takes a single command: docker run. Containers are based off of images defining software packages and configurations. hkropp/hdp-basic is such an image in which the HDP services are running. The image was build using Ambari blueprint orchastrated by a Dockerfile. The hostname was specified to be n1.hdp throughout the build process and hence needs also to be specified when running it. The Dockerfile for this image is located here. This posts describes how to build HDP on top of Docker.
Prerequisite Setup
Before getting started a Docker environment needs to be installed. A quick way to get started is Boot2Docker. Boot2Docker is a VirtualBox image based on Tiny Core Linux with Docker installed. It can be used with Mac OS X or Windows. Other ways to install Docker can be found here.
Boot2Docker
Once installed Boot2Docker can be used via command line tool boot2docker. With it we can initialize the VM, boot it up, and prepare our shell for docker.
# getting help $ boot2docker Usage: boot2docker [<options>] {help|init|up|ssh|save|down|poweroff|reset|restart|config|status|info|ip|shellinit|delete|download|upgrade|version} [<args>] # init a VM with 8GB RAM and 8 CPUs $ boot2docker init --memory=8192 --cpus=8 # boot up the image $ boot2docker up # shutdown the vm $ boot2docker down # setup the shell $ boot2docker shellinit # delete the vm completely (to use again an init required) $ boot2docker delete # test running $ docker version Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.1 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 786b29d OS/Arch (server): linux/amd64
Running hdp-basic
With the Docker environment setup the image can be run like this:
$ docker run -d -p 8080:8080 -h n1.hdp hkropp/hdp-basic:0.1 /start-server Unable to find image 'hkropp/hdp-basic:0.1' locally 0.1: Pulling from hkropp/hdp-basic
If not already installed locally this will fetch the image from Docker Hub. After that the image is run in daemon mode as the -d flag indicates. The -p flag lets Docker know to expose this port to the host VM. With this Ambari can be accessed using the $ boot2docker ip and port 8080 – http://$(boot2docker ip):8080 The hostname is set to be n1.hdp because the image was configured with this hostname. By executing the /start-server script at boot time the Ambari server is started together with all installed services.
The Dockerfile
Building this image was achieved using this Dockerfile, while the installation of HDP was done using Ambari Shell with Blueprints. Helpful about Ambari Shell is the fact that an blueprint install can be executed blocking further process until the install has finished (–exitOnFinish true). From the install-cluster.sh script:
java -jar /tmp/ambari-shell.jar --ambari.host=$HOST << EOF blueprint add --file /tmp/blueprint.json cluster build --blueprint hdp-basic cluster assign --hostGroup host_group_1 --host $HOST cluster create --exitOnFinish true EOF
The image is based from a centos:6.6 image. Throughout the build a consistent hostname is being used for the configuration and installation. Doing this with Docker builds is actually not very easy to achieve. By design Docker tries to make the context a container can run in as less restrictive as possible. Assigning a fixed host name to an image is restricting these context. In addition every build step creates a new image with a new host name. Setting the host name before each step requires root privileges which are not given. To work around this the ENV command was used to set the HOSTNAME and to make it resolvable before any command that required the hostname a script was executed to set it as part of the /etc/hosts file.
Part of the Dockerfile:
# OS FROM centos:6.6 # Hostname Help ENV HOSTNAME n1.hdp ADD set_host.sh /tmp/ ... RUN /tmp/set_host.sh && /tmp/install-cluster.sh
Part of the set_host.sh:
#!/bin/bash echo $(head -1 /etc/hosts | cut -f1) n1.hdp >> /etc/hosts
The Ambari agents support dynamic host configuration by defining a script.
# Setup networking for Ambari agent/server ADD hostname.sh /etc/ambari-agent/conf/hostname.sh #RUN sed -i "s/hostname=.*/hostname=n1.hdp/" /etc/ambari-agent/conf/ambari-agent.ini RUN sed -i "/[agent]/ a public_hostname_script=/etc/ambari-agent/conf/hostname.sh" /etc/ambari-agent/conf/ambari-agent.ini RUN sed -i "/[agent]/ a hostname_script=/etc/ambari-agent/conf/hostname.sh" /etc/ambari-agent/conf/ambari-agent.ini RUN sed -i "s/agent.task.timeout=900/agent.task.timeout=2000/" /etc/ambari-server/conf/ambari.properties
#!/bin/bash # echo $(hostname -f) # for dynamic host name echo "n1.hdp"
Starting HDP
start-server is the script that is executed during startup of the container. Here the Ambari server and agent are started. The Ambari Shell is again being used to start up the all installed HDP services.
#!/bin/bash while [ -z "$(netstat -tulpn | grep 8080)" ]; do ambari-server start ambari-agent start sleep 5 done sleep 5 java -jar /tmp/ambari-shell.jar --ambari.host=n1.hdp << EOF services start EOF while true; do sleep 3 tail -f /var/log/ambari-server/ambari-server.log done
Further Readings
- Docker Docs (Master Branch)
- Boot2Docker
- Ambari Shell
- Ambari Shell Documentation
- Docker: Up and Running (Amazon)
- Docker in Action (Manning) (Amazon)
- Docker on Ambari
Building HDP on Docker http://t.co/kPMEgVUcZZ
LikeLike
Von @jonbros: Building HDP on Docker http://t.co/8buMafCF7K #IronBloggerMUC
LikeLike