HttpFS gateway is the preferred way of accessing the Hadoop filesystem using HTTP clients like curl. Additionally it can be used from from the hadoop fs command line tool ultimately being a replacement for the hftp protocol. HttpFS, unlike HDFS Proxy, has full support for all file operations with additional support for authentication. Given it’s stateless protocol it is ideal to scale out Hadoop filesystem access using HTTP clients.
In this post I would like to show how to install and setup a HttpFS gateway on a secure and kerberized cluster. By providing some troubleshooting topics, this post should also help you, when running into problems while installing the gateway. Continue reading “Installing HttpFS Gateway on a Kerberized Cluster”
Today I signed the Reactive Manifesto, which is prominently backed by developers from companies like Netflix, Typesafe, Twitter, or Oracle. It is my strong believe that the ever growing size of data processing needs a coherent approach towards event driven architectures to meet today’s demands.
In the same area I see projects like Kafka around which recently a new spinoff out of LinkedIN was announced, Confluent. I also count Spark, backed by Databricks and currently seeing a lot of attention, as an example of a new generation of reactive applications.
This are some of the resources that got my attention:
- Reactive Manifesto
- Reactive Streams
- Advanced Reactive Programming with Akka and Scala
- Introducing Actors Akka Notes Part 1
- Clustering reactmq with Akka Cluster
- Replacing ZeroMQ with RTI Context DDS in an Actor Based System
- Evaluating Persistent Replicated Message Queues
- Reactive Queue with Akka Reactive Streams
- Making the Reactive Queue Durable with Akka Persistence
- Scala and the Akka Event Bus
With HDP 2.2 on the verge of existence it is a good idea to begin a deep-dive into the new features in Hadoop with this Webinar series:
YARN is changing the face of Big Data as we know it today. Breaking with by now well established patterns like MapReduce YARN gives clients the ability to run diverse distributed algorithms on one cluster under one resource management. In addition with Apache Slider comes the possibility to ‘slide’ existing long running service on to the same cluster with the same resource provider.
With the up-coming release of HDP 2.2 HBase services are running under the management of YARN. The same will be true for Storm getting us one step closer to the vision of an Enterprise Hadoop Data Lake. One important aspect in this scenario is yet to come: YARN-796 aka YARN Labels.
Recently the technical preview of Hortonworks Data Platform 2.2 was released with a Sandbox image for download. Giving you the possibility to try out the concepts of tomorrows Big Data platform today with this tutorials. You can also try some of the virtual environments I’ve put together here hdp22-n1-centos6-puppet or hdp22-n3-centos6-puppet.
We are experiencing the dawn of a new era in Hadoop, Hadoop v2. Are you YARN ready? Here are some resources to get you going:
After writing about provisioning Hadoop cluster with Vagrant I started a collection of cluster setups using the HDP distribution. The examples use different versions, operating systems, Vagrant providers, and node sizes. With Ambari blueprints different scenarios can be provided with a simple command. With this post I would like to share this scripts using Github here. In addition with the event of HDP 2.2 two examples using the technical preview version of HDP were added to this repository: https://github.com/hkropp/vagrant-hdp
The naming convention for each environment is as follows:
The environments can be run and setup with the one simple command:
vagrant up && ./install_blueprint.sh
As a requirement VirtualBox and Vagrant need to be installed.
The master_blueprint.json contains possible Ambari blueprint components and configurations.
Examples at Github: https://github.com/hkropp/vagrant-hdp
The up-coming release of HDP 2.2 will contain some important forward-facing changes to the Hadoop Platform. Together with partners Hortonworks is shaping the future of Big Data in an open community. Looking at some of the key new features we get a pretty clear picture of what the future of Hadoop is going to look like. The quickest way to get started now is to download the HDP Sandbox here. Continue reading “Tech. Preview: HDP 2.2”
Apache Argus, the Apache open source project, with it’s comprehensive security offering for today’s Hadoop installations is likely to become an important cornerstone of modern enterprise BigData architectures. It’s by today already quite sophisticate compared to other product offerings.
Key aspects of Argus are the Administration, Authorization, and Audit Logging covering most security demands. In the future we might even see Data Protection (encryption) as well.
Argus consists of four major components that tied together build a secure layer around your Hadoop installation. Within Argus it is the Administration Portal, a web application, that is capable of managing and accessing the Audit Server and Policy Manager, also two important components of Apache Argus. At the client side or a the Hadoop services like the HiveServer2 or the NameNode Argus installs specific agents that encapsulate requests based on the policies specified.
A key aspect of Argus is, that the clients don’t have to request the Policy Server on every single client call, but are updated in a certain interval. This improves the scalability and also ensures that clients continue working even when the Policy Server is down.
Let’s go ahead an install a most recent version of Apache Argus using the HDP Sandbox 2.1. By installing the Policy Manager, Hive, and HDFS Agent you should have a pretty good idea of how Argus operates and a pretty solid environment to test specific use cases.
In this part we’ll only install the Policy Manager of Argus synced together with our OpenLdap installation for user and group management. We will use our kerberized HDP Sandbox throughout this post. Continue reading “Securing Your Datalake With Apache Argus – Part 1”