Secure Kafka Java Producer with Kerberos

The most recent release of Kafka 0.9 with it’s comprehensive security implementation has reached an important milestone. In his blog post Kafka Security 101 Ismael from Confluent describes the security features part of the release very well.

As a part II of the here published post about Kafka Security with Kerberos this post discussed a sample implementation of a Java Kafka producer with authentication. It is part of a mini series of posts discussing secure HDP clients, connecting services to a secured cluster, and kerberizing the HDP Sandbox (Download HDP Sandbox). In this effort at the end of this post we will also create a Kafka Servlet to publish messages to a secured broker.

Kafka provides SSL and Kerberos authentication. Only Kerberos is discussed here. Continue reading “Secure Kafka Java Producer with Kerberos”

Advertisement

Apache Kafka: Queuing for Hadoop

Apache Kafka is a distributed system designed for streams that is often being categorized as a messaging system but provides a fundamentally different abstraction, although it serves a similar role. The key abstraction of Kafka to keep in mind is a structured commit log of events. With events being any kind of system, user, or machine emitted data. Kafka is built to:

  • Fault-tolerant
  • High throughput
  • Horizontally scalable
  • Allow geographically distributing data streams and processing.

A constantly growing number of data generated at today companies is event data. While there is an approach to combine machine generated data under the umbrella term of
Internet-of-Things (IoT) it is crucial to understand that business is inherently event driven.
A purchase, a customer claim or registration are just examples of such events. Business is interactive. When analyzing the data time matters. Most of this data has it’s highest value when analyzed close to or even in real-time.

Apache Kafka was created to solve two main problems that arise from the ever increasing demand for stream data processing. Designed for reliability Kafka is capable of scaling against the growing demand for events passing. Secondly Kafka can interact with various applications and platform for the same events, which helps to orchestrated today’s complex architectures providing a central message hub for each system. Today chances are that all that data will end up in Hadoop for further or even real time analyses making Kafka a queue to Hadoop. Continue reading “Apache Kafka: Queuing for Hadoop”