Securing Your Datalake With Apache Argus – Part 1

Apache Argus, the Apache open source project, with it’s comprehensive security offering for today’s Hadoop installations is likely to become an important cornerstone of modern enterprise BigData architectures. It’s by today already quite sophisticate compared to other product offerings.

Key aspects of Argus are the Administration, Authorization, and Audit Logging covering most security demands. In the future we might even see Data Protection (encryption) as well.

Argus a Comphrensive ApproachArgus consists of four major components that tied together build a secure layer around your Hadoop installation. Within Argus it is the Administration Portal, a web application, that is capable of managing and accessing the Audit Server and Policy Manager, also two important components of Apache Argus. At the client side or a the Hadoop services like the HiveServer2 or the NameNode Argus installs specific agents that encapsulate requests based on the policies specified.

Argus Architectur OverviewA key aspect of Argus is, that the clients don’t have to request the Policy Server on every single client call, but are updated in a certain interval. This improves the scalability and also ensures that clients continue working even when the Policy Server is down.

Let’s go ahead an install a most recent version of Apache Argus using the HDP Sandbox 2.1. By installing the Policy Manager, Hive, and HDFS Agent you should have a pretty good idea of how Argus operates and a pretty solid environment to test specific use cases.

In this part we’ll only install the Policy Manager of Argus synced together with our OpenLdap installation for user and group management. We will use our kerberized HDP Sandbox throughout this post.

Getting Apache Argus

Installing Apache Argus currently involves quite some hassle. Current effort is mostly devoted to improve this with future releases. In order to get Apache Argus up and running we’ll have to check out the code, build it using maven, and upload each individual component to the appropriate nodes.

Preliminary Setup

Throughout this and the following posts we will demonstrate Argus capabilities of applying rule based authorization to central Hadoop services like HDFS or Hive. In order to do so we need for one thing a central user/group management stored in a central directory service like OpenLdap. The cluster should be kerberized to ensure proper authentication. OpenLdap should contain example users and groups that we can elaborate on.

You’ll find a tutorial for installing OpenLdap here. For kerberizing you can refere to this post here. As example users and groups for OpenLdap please download base.ldif, users.ldif, and groups.ldif. Make sure to create a principal for each user in the KDC:

Installing the Portal and Policy Manager

We’ll start our installing of Apache Argus by setting up the Policy Manager and Administrator Portal. In order to do so we have to untar arugs-0.4.0-admin.tar file prior to configuring and running the provided installation script.

In  install.properties it is important to point to an existing database with proper credential settings. An important aspect of the database created here is to store the policies setup by administrative users. Those policies will be cached at the client so that not every query needs to go through the database. Another benefit of caching the policies at the agents is, that the cluster continuous to be available even if the policy server is down. The only slide downside of this approach is that there is a small delay when updating or creating new policies. Usually this should not be more than 30 seconds.

Below a basic configuration for our Argus installation is provided using MySQL in combination with very basic authentication values:

Running  install.sh from the same folder should install the Argus Policy Manager on your machine. After successfully running the script you should be able to access the administration portal by pointing your browser to http://localhost:6080 (make sure your Sandbox forwards the port correctly). If anything goes wrong dig through the error message and check your configuration if correct.

Congratulation you have a running policy manager for your Hadoop setup.

Argus Policy ManagerSynchronising Users and Groups Using LDAP

Argus can be setup to synchronise users and groups from an existing directory service. The component we’ll have to install for this is called ugsync, which probably stands for user-group synchronisation. Untar the existing archive in your agus directory.

The installation of the user-group synchronisation module follows the same pattern as the
policy manager we’ve installed previously. You should also find a  install.properties file here that we are going to adjust to our needs. If you have followed the OpenLdap setup of a previous post you should be able to use the below sample configuration:

If setup correctly running the install.sh script should install the user-group synchronisation service. Pointing you browser to http://localhost:6080/index.html#!/users/usertab should display you the groups and users contained in LDAP.

Argus UGSYNCCongratulations, again! We are now ready to create policies based on the users we have in our directory service. Left is to secure the Hadoop services using the Argus agents, what we’ll do next. See up-coming posts for more information on how to secure your Data Lake.

5 thoughts on “Securing Your Datalake With Apache Argus – Part 1”

  1. I am seeing a permissions error. Any ideas?

    c:ranger>git clone git://git.apache.org/incubator-argus.git
    Cloning into ‘incubator-argus’…
    fatal: remote error: access denied or repository not exported: /incubator-argus.
    git

    c:ranger>

      1. Steve, Apache Argus changed/renamed to Apache Ranger.
        Best to download HDP 2.2 Sandbox and try the included Ranger tutorials, if you want to demo.
        Let me know if you need any assistance

Leave a Reply

Your email address will not be published. Required fields are marked *