Securing Your Datalake With Apache Argus – Part 1

Apache Argus, the Apache open source project, with it’s comprehensive security offering for today’s Hadoop installations is likely to become an important cornerstone of modern enterprise BigData architectures. It’s by today already quite sophisticate compared to other product offerings.

Key aspects of Argus are the Administration, Authorization, and Audit Logging covering most security demands. In the future we might even see Data Protection (encryption) as well.

Argus a Comphrensive ApproachArgus consists of four major components that tied together build a secure layer around your Hadoop installation. Within Argus it is the Administration Portal, a web application, that is capable of managing and accessing the Audit Server and Policy Manager, also two important components of Apache Argus. At the client side or a the Hadoop services like the HiveServer2 or the NameNode Argus installs specific agents that encapsulate requests based on the policies specified.

Argus Architectur OverviewA key aspect of Argus is, that the clients don’t have to request the Policy Server on every single client call, but are updated in a certain interval. This improves the scalability and also ensures that clients continue working even when the Policy Server is down.

Let’s go ahead an install a most recent version of Apache Argus using the HDP Sandbox 2.1. By installing the Policy Manager, Hive, and HDFS Agent you should have a pretty good idea of how Argus operates and a pretty solid environment to test specific use cases.

In this part we’ll only install the Policy Manager of Argus synced together with our OpenLdap installation for user and group management. We will use our kerberized HDP Sandbox throughout this post. Continue reading “Securing Your Datalake With Apache Argus – Part 1”

Advertisement

Hadoop Security: 10 Resources To Get You Started

As Hadoop emerges into the center of todays enterprise data architecture, security becomes a critical requirement. This can be witnessed by the most recent acquisitions of leading Hadoop vendors and also by the numerous projects centered around security that have been launched or are getting more traction recently.

Here are 10 resources to get you started about the topic:

  1. Hadoop Security Design (2009 White Paper)
  2. Hadoop Security Design? – Just Add Kerberos? Really?(Black Hat 2010)
  3. Hadoop Poses a Big Data Security Risk: 10 Reasons Why
  4. Apache Knox – A gateway for Hadoop clusters
  5. Apache Argus
  6. Project Rhino
  7. Protegrity Big Data Protector
  8. Dataguise for Hadoop
  9. Secure JDBC and ODBC Clients’ Access to HiveServer2
  10. InfoSphere Optim Data Masking

Further Readings