Simple Spark Streaming & Kafka Example in a Zeppelin Notebook

Apache Zeppelin is a web-based, multi-purpose notebook for data discovery, prototyping, reporting, and visualization. With it’s Spark interpreter Zeppelin can also be used for rapid prototyping of streaming applications in addition to streaming-based reports.

In this post we will walk through a simple example of creating a Spark Streaming application based on Apache Kafka. Continue reading “Simple Spark Streaming & Kafka Example in a Zeppelin Notebook” →

Connecting Livy to a Secured Kerberized HDP Cluster

Livy.io is a proxy service for Apache Spark that allows to reuse an existing remote SparkContext among different users. By sharing the same context Livy provides an extended multi-tenant experience with users being capable of sharing RDDs and YARN cluster resources effectively.

In summary Livy uses a RPC architecture to extend the created SparkContext with a RPC service. Through this extension the existing context can be controlled and shared remotely by other users. On top of this Livy introduces authorization together with enhanced session management.

livy-architecture

Analytic applications like Zeppelin can use Livy to offer multi-tenant spark access in a controlled manner.

This post discusses setting up Livy with a secured HDP cluster.

Continue reading “Connecting Livy to a Secured Kerberized HDP Cluster” →

Zeppelin Login with Demo LDAP of Knox

With the introduction of ZEPPELIN-548 it now supports Apache Shiro based AD and LDAP authentication. This quick example demonstrates the connection of Zeppelin to the Knox Demo LDAP server. Continue reading “Zeppelin Login with Demo LDAP of Knox” →

Fitbit Visualization with Apache Zeppelin

In a recent post I demonstrated how easy it is to connect to a REST API like the one of Fitbit with Scala to collect JSON data. Taking up the results of that post here, I would like to demonstrate how Apache Zeppelin can be used to also fetch but in the end visualize the data. Based on the once collected data Zeppelin allows to easily visualize the output through different graphs.

Apache Zeppelin itself is a notebook like, web-based data analytic tool with a specific focus on exploratory data analysis in modern BigData architectures supporting multiple interpreters like Tajo, Spark, Hive, HBase and more. Saying this, it is important to point out, that in this here described case only Scala is being used to display the received data. But this use case could easily be extended to include Apache Hive or Spark. Continue reading “Fitbit Visualization with Apache Zeppelin” →