Kerberized Hadoop Cluster – A Sandbox Example

The groundwork of any secure system installation is a strong authentication. It is the process of verifying the identity of a user by comparing known factors. Factors can be:

  1. Shared Knowledge
    A password or the answer to a question. It’s the most common and not seldom the only factor used by computer systems today.
  2. Biometric Attributes
    For example fingerprints or iris pattern
  3. Items One Possess
    A Smart Card or phone. Phone is probably one of the most common factors in use today aside a shared knowledge.

A system that takes more than one factor into account for authentication is also know as a multi-factor authentication system. Knowing the identity of a user up to a specific certainty can not be overestimated.

All other components of a save environment, like Authorization, Audit, Data Protection, and Administration, heavily rely on a strong authentication. Authorization or Auditing only make sense if the identity of a user can not be compromised. In Hadoop today there exist solution for nearly all aspects of enterprise grade security layers, especially with the event of Apache Argus.

HDP Security Overview + Argus
HDP Security Overview + Argus (source: hortonworks.com)

They all start with implementing a strong authentication using Kerberos. Concerning Kerberos with Hadoop there is no real choice as one is either left with no authentication (simple) or Kerberos. The other real solution would be to disallow any access to the cluster only within a DMZ using Knox gateway.

Know Gateway Overview
Know Gateway Overview (source: hortonworks.com)

Not only since Microsoft integrated Kerberos as part of Active Directory it can be seen as the most widely used authentication protocol used today. Commonly when speaking of Kerberos today people refer to Kerberos5 which was published in 1993.

From here we’ll go ahead an configure a Hadoop installation with Kerberos. As an example we are going to use Hortonworks Sandbox together with Ambari.

Setting up Kerberos on CentOS

Before we get started we’ll setting up a kerberized environment we would need to install Kerberos on the sandbox VM. In addition we also need to create a security realm of which to use during setup.

Installing Kerberos on CentOS:

Creating our realm:

Before you proceed according to this configuration it is a good idea to already add  sandbox.mycorp.net to your  /etc/hosts file, as well as changing the ACL configuration for krb5 ( /var/kerberos/krb5kdc/kadm5.acl) to contain the following:

Now we are ready to be creating our Kerberos database:

After that we can start kadmin and create a administrative user according to the ACL we defined earlier:

We can now start Kerberos service krb5kdc and test our administrative user:

Kerberizing Hadoop with Ambari

UPDATE:
Read here for using the Ambari security wizard since Ambari 2.X to kerberize a HDP cluster with exiting KDC.

Ambari gives us a smooth wizard we can follow in order to kerberize our Hadoop installation. Go to Ambari Admin an follow the Security menu and the provided enabling process.

Ambari Security WorkflowBy clicking on the “Enable Security” button you will start the wizard:

Ambari Security - Get StartedYou can then use Ambari to configure the realm for the required keytabs that we’ll be creating throughout the rest of this process.

Ambari Security - Enable Security WizardThis will create a CSV file that we can download and use for generating a keytab creation script. First download the CSV file as follows:

Ambari Security - Download CSV w: keytabsMove the downloaded CSV file to your sandbox using scp. Ambari provides us with a script we can use, which will generate the keytabs based on the CSV file we previously created.

This should create you the needed keytabs to run Hadoop services in a kerberized environment. The keytabs are made available in the keytabs_sandbox.hortonworks.com folder or as a tar archive keytabs_sandbox.hortonworks.com.tar.

Unfortunately is the keytabs.sh script not complete. In order to crate also a keytab for the ResourceManager of YARN, go into the gen_keytabs.sh script and copy for example the line kadmin.local -q "addprinc -randkey oozie/sandbox.hortonworks.com@MYCORP.NET" and change  oozie to rm .

Make sure you move the keytabs to  /etc/security/keytabs and set the permissions accordingly:

We are now ready to have Ambari restart the services while applying the Kerberos configuration. This should bring everything back up in a kerberized enviornment.

Ambari Security - DoneFurther Readings

 

13 thoughts on “Kerberized Hadoop Cluster – A Sandbox Example”

  1. Best blog for ambari+kerberos …..this tutorial cleared lot of doubts .Im not able to start service timeline server,it fails at 35 percent, any Idea—————-error is as follows—–>

    2014-10-31 09:32:57,633 - Generating config: /etc/hadoop/conf/mapred-site.xml
    2014-10-31 09:32:57,633 - File['/etc/hadoop/conf/mapred-site.xml'] {'owner': 'mapred', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
    2014-10-31 09:32:57,635 - Changing owner for /etc/hadoop/conf/mapred-site.xml from 514 to mapred
    2014-10-31 09:32:57,635 - XmlConfig['capacity-scheduler.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'configuration_attributes': ..., 'configurations': ...}
    2014-10-31 09:32:57,647 - Generating config: /etc/hadoop/conf/capacity-scheduler.xml
    2014-10-31 09:32:57,647 - File['/etc/hadoop/conf/capacity-scheduler.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
    2014-10-31 09:32:57,648 - Changing owner for /etc/hadoop/conf/capacity-scheduler.xml from 514 to hdfs
    2014-10-31 09:32:57,649 - File['/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid'] {'action': ['delete'], 'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps
    cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1'}
    2014-10-31 09:32:57,677 - Execute['ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec && /usr/hdp/current/hadoop-yarn-client/sbin/yarn-daemon.sh --config /etc/hadoop/conf start timelineserver'] {'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1', 'user': 'yarn'}
    2014-10-31 09:32:58,772 - Execute['ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1'] {'initial_wait': 5, 'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1', 'user': 'yarn'}
    2014-10-31 09:33:03,903 - Error while executing command 'start':
    Traceback (most recent call last):
    File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 122, in execute
    method(env)
    File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.py", line 42, in start
    service('timelineserver', action='start')
    File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/service.py", line 59, in service
    initial_wait=5

    1. Ambari disables the Application Timeline Server (ATS) automatically as part of the “kerberization”. The ATS does not play well together with Kerberos at this time. You’ll notice that’s mentioned when starting the process, and again on the final page of the wizard (shown above, as the last figure). It is not advisable to try to start it manually.

    2. A bit late, but if this is not working configure a new keytab like ats/_HOST@MYCORP.NET and configure your parameters :

      yarn.timeline-service.principal = ats/_HOST@MYCORP.NET
      yarn.timeline-service.keytab = /etc/security/keytabs/ats.service.keytab
      yarn.timeline-service.http-authentication.type = kerberos
      yarn.timeline-service.http-authentication.kerberos.principal = HTTP/_HOST@MYCORP.NET
      yarn.timeline-service.http-authentication.kerberos.keytab = /etc/security/keytabs/spnego.service.keytab

      This is working for me with HDP2.1.3.

Leave a Reply

Your email address will not be published. Required fields are marked *