Sample HDFS HA Client

In any HDP cluster with a HA setup with quorum there are two NameNodes configured with one working as the active and the other as the standby instance. As the standby node does not accept any write requests, for a client try to write to HDFS it is fairly important to know which one of the two NameNodes it the active one at any given time. The discovery process for that is configured through the hdfs-site.xml.

For any custom implementation it’s becomes relevant to set and understand the correct parameters if a current hdfs-site.xml configuration of the cluster is not given. This post gives a sample Java implementation of a HA HDFS client. Continue reading “Sample HDFS HA Client” →

Big Team Big Win

Call For Abstract: Hadoop Summit 2017 in Munich

Next years Hadoop Summit will be held in Munich on April 5-6, 2017 which will be an exceptional opportunity for the community in Munich to present itself to the best and brightest in the data community.

Please take this opportunity to hand in your abstract now with only a few days left!

Submit Abstract: http://dataworkssummit.com/munich-2017
Deadline: Monday, November 21, 2016.
2017 Agenda: http://dataworkssummit.com/munich-2017/agenda/

The 2017 tracks include:

Applications
Enterprise Adoption
Data Processing & Warehousing
Apache Hadoop Core Internals
Governance & Security
IoT & Streaming
Cloud & Operations
Apache Spark & Data Science

Why DataWorks?

We want to expand the ecosystem to include technologies that were not explicitly in the Hadoop Ecosystem. For instance, in the community showcase we will have the following zones:

Apache Hadoop Zone
IoT & Streaming Zone
Cloud & Operations Zone
Apache Spark & Data Science Zone

The goal is to increase the breadth of technologies we can talk about and increase the potential of a data summit.

Future of Data Meetups

Want to present at Meetups?

If you would like to present at a Future of Data Meetup please don’t hesitate to reach out to me and send me a message.

Want to host a Meetup? Become a Sponsor?

We are also looking for rooms and organizations willing to host one of our Future of Data Meetups or become a sponsor. Please reach out and let me know.

Meetups:

Connecting Livy to a Secured Kerberized HDP Cluster

Livy.io is a proxy service for Apache Spark that allows to reuse an existing remote SparkContext among different users. By sharing the same context Livy provides an extended multi-tenant experience with users being capable of sharing RDDs and YARN cluster resources effectively.

In summary Livy uses a RPC architecture to extend the created SparkContext with a RPC service. Through this extension the existing context can be controlled and shared remotely by other users. On top of this Livy introduces authorization together with enhanced session management.

livy-architecture

Analytic applications like Zeppelin can use Livy to offer multi-tenant spark access in a controlled manner.

This post discusses setting up Livy with a secured HDP cluster.

Continue reading “Connecting Livy to a Secured Kerberized HDP Cluster” →