Kerberos Ambari Blueprint Installs

Apache Ambari rapidly improves support for secure installations and managing security in Hadoop. Already now it is fairly convenient to create kerberized clusters in a snap with automated procedures or the Ambari wizard.

With the latest release of Ambari kerberos setups get baked into blueprint installations making separate methods like API calls unnecessary. In this post I would like to briefly discuss the new option in Ambari to use pure Blueprint installs for secure cluster setups. Additionally explaining some of the prerequisites for a sandbox demo like install. Continue reading “Kerberos Ambari Blueprint Installs”

Automated Kerberos Install for HDP w/ Ambari + Puppet

With the release of Ambari 2.x kerberizing a HDP install improved quite a bit. Looking back at Kerberized Hadoop Cluster – A Sandbox Example compared to today most of the there described steps are much easier by now and can be automated. For long I was looking to include it into my existing Vagrant project for an end to end setup of a kerberized cluster. With the writing of this post I finally had the opportunity to do so.

In this post I would like to describe the parts added to the Vagrant setting needed to accomplish an end to end setup of a kerberized HDP cluster. Before the final step of the cluster setup by using the Ambari REST API, a KDC with credentials needs to be created. A Puppet module was created and included to achieve the installation of a MIT Kerberos install. Continue reading “Automated Kerberos Install for HDP w/ Ambari + Puppet”

Secure Kafka Java Producer with Kerberos

The most recent release of Kafka 0.9 with it’s comprehensive security implementation has reached an important milestone. In his blog post Kafka Security 101 Ismael from Confluent describes the security features part of the release very well.

As a part II of the here published post about Kafka Security with Kerberos this post discussed a sample implementation of a Java Kafka producer with authentication. It is part of a mini series of posts discussing secure HDP clients, connecting services to a secured cluster, and kerberizing the HDP Sandbox (Download HDP Sandbox). In this effort at the end of this post we will also create a Kafka Servlet to publish messages to a secured broker.

Kafka provides SSL and Kerberos authentication. Only Kerberos is discussed here. Continue reading “Secure Kafka Java Producer with Kerberos”

A Secure HDFS Client Example

It takes about 3 lines of Java code to write a simple HDFS client that can further be used to upload, read or list files. Here is an example:

Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://one.hdp:8020");
FileSystem fs = FileSystem.get(conf);

This file system API gives the developer a generic interface to (any supported) file system depending on the protocol being use, in this case hdfs. This is enough to alter data on the Hadoop Distributed Filesystem, for example to list all the files under the root folder:

FileStatus[] fsStatus = fs.listStatus(new Path("/"));
for(int i = 0; i < fsStatus.length; i++){
   System.out.println(fsStatus[i].getPath().toString());
}

For a secured environment this is not enough, because you would need to consider these further aspects:

  1. A secure protocol
  2. Authentication with Kerberos
  3. Impersonation (proxy user), if designed as a service

What we discuss here for a sample HDFS client can in variance also be applied to other Hadoop clients.

Continue reading “A Secure HDFS Client Example”

Connecting Tomcat to a Kerberized HDP Cluster

At some point you might require to connect your dashboard, data ingestion service or similar to a secured and kerberized HDP cluster. Most Java based webcontainers do support Kerberos for both client and server side communication. Kerberos does require very thoughtful configuration but rewards it’s users with an almost completely transparent authentication implementation that simply works. Steps described in this post should enable you to connect your application with a secured HDP cluster. For further support read the links listed at the end of this writing. A sample project is provided on github for hands-on exercises. Continue reading “Connecting Tomcat to a Kerberized HDP Cluster”

Installing HttpFS Gateway on a Kerberized Cluster

HttpFS gateway is the preferred way of accessing the Hadoop filesystem using HTTP clients like curl. Additionally it can be used from from the hadoop fs command line tool ultimately being a replacement for the hftp protocol. HttpFS, unlike HDFS Proxy, has full support for all file operations with additional support for authentication. Given it’s stateless protocol it is ideal to scale out Hadoop filesystem access using HTTP clients.

In this post I would like to show how to install and setup a HttpFS gateway on a secure and kerberized cluster. By providing some troubleshooting topics, this post should also help you, when running into problems while installing the gateway. Continue reading “Installing HttpFS Gateway on a Kerberized Cluster”

Kerberized Hadoop Cluster – A Sandbox Example

The groundwork of any secure system installation is a strong authentication. It is the process of verifying the identity of a user by comparing known factors. Factors can be:

  1. Shared Knowledge
    A password or the answer to a question. It’s the most common and not seldom the only factor used by computer systems today.
  2. Biometric Attributes
    For example fingerprints or iris pattern
  3. Items One Possess
    A Smart Card or phone. Phone is probably one of the most common factors in use today aside a shared knowledge.

A system that takes more than one factor into account for authentication is also know as a multi-factor authentication system. Knowing the identity of a user up to a specific certainty can not be overestimated.

All other components of a save environment, like Authorization, Audit, Data Protection, and Administration, heavily rely on a strong authentication. Authorization or Auditing only make sense if the identity of a user can not be compromised. In Hadoop today there exist solution for nearly all aspects of enterprise grade security layers, especially with the event of Apache Argus. Continue reading “Kerberized Hadoop Cluster – A Sandbox Example”