Environments dedicated for a HDP install without connection to the internet require a dedicated HDP repository all nodes have access to. While such a setup can differ slightly depending on the connection, if they have temporary or no internet access, in any case they need a file service holding a copy of the HDP repo. Most enterprises have a dedicated infrastructure in place based on Aptly or Satellite. This post describes the setup of an Nginx host serving as a HDP repository host. Continue reading “HDP Repo with Nginx”
Month: February 2016
Secure Kafka Java Producer with Kerberos
The most recent release of Kafka 0.9 with it’s comprehensive security implementation has reached an important milestone. In his blog post Kafka Security 101 Ismael from Confluent describes the security features part of the release very well.
As a part II of the here published post about Kafka Security with Kerberos this post discussed a sample implementation of a Java Kafka producer with authentication. It is part of a mini series of posts discussing secure HDP clients, connecting services to a secured cluster, and kerberizing the HDP Sandbox (Download HDP Sandbox). In this effort at the end of this post we will also create a Kafka Servlet to publish messages to a secured broker.
Kafka provides SSL and Kerberos authentication. Only Kerberos is discussed here. Continue reading “Secure Kafka Java Producer with Kerberos”
A Secure HDFS Client Example
It takes about 3 lines of Java code to write a simple HDFS client that can further be used to upload, read or list files. Here is an example:
Configuration conf = new Configuration(); conf.set("fs.defaultFS","hdfs://one.hdp:8020"); FileSystem fs = FileSystem.get(conf);
This file system API gives the developer a generic interface to (any supported) file system depending on the protocol being use, in this case hdfs. This is enough to alter data on the Hadoop Distributed Filesystem, for example to list all the files under the root folder:
FileStatus[] fsStatus = fs.listStatus(new Path("/")); for(int i = 0; i < fsStatus.length; i++){ System.out.println(fsStatus[i].getPath().toString()); }
For a secured environment this is not enough, because you would need to consider these further aspects:
- A secure protocol
- Authentication with Kerberos
- Impersonation (proxy user), if designed as a service
What we discuss here for a sample HDFS client can in variance also be applied to other Hadoop clients.
Connecting Tomcat to a Kerberized HDP Cluster
At some point you might require to connect your dashboard, data ingestion service or similar to a secured and kerberized HDP cluster. Most Java based webcontainers do support Kerberos for both client and server side communication. Kerberos does require very thoughtful configuration but rewards it’s users with an almost completely transparent authentication implementation that simply works. Steps described in this post should enable you to connect your application with a secured HDP cluster. For further support read the links listed at the end of this writing. A sample project is provided on github for hands-on exercises. Continue reading “Connecting Tomcat to a Kerberized HDP Cluster”