Kerberized Hadoop Cluster – A Sandbox Example

The groundwork of any secure system installation is a strong authentication. It is the process of verifying the identity of a user by comparing known factors. Factors can be:

  1. Shared Knowledge
    A password or the answer to a question. It’s the most common and not seldom the only factor used by computer systems today.
  2. Biometric Attributes
    For example fingerprints or iris pattern
  3. Items One Possess
    A Smart Card or phone. Phone is probably one of the most common factors in use today aside a shared knowledge.

A system that takes more than one factor into account for authentication is also know as a multi-factor authentication system. Knowing the identity of a user up to a specific certainty can not be overestimated.

All other components of a save environment, like Authorization, Audit, Data Protection, and Administration, heavily rely on a strong authentication. Authorization or Auditing only make sense if the identity of a user can not be compromised. In Hadoop today there exist solution for nearly all aspects of enterprise grade security layers, especially with the event of Apache Argus. Continue reading “Kerberized Hadoop Cluster – A Sandbox Example”

Apache Knox: A Hadoop Bastion

Lately a lot of effort went into making Hadoop setups more secure for enterprise ready installations. With Apache Knox comes a connecting strap for your cluster that acts like a bastion server shielding direct access to your nodes. Knox is stateless and can therefor easily scale horizontally with the obvious limitation of also just supporting stateless protocols. Knox provides the following functionality:

  1. Authentication
    Users and groups can be managed using LDAP or Active Directory
  2. Federation/SSO
    Knox uses HTTP header based identity federation
  3. Authorization
    Authorization is mainly supported on service level through access control lists (ACL)
  4. Auditing
    Access through Knox is audited for

Here we are going to explore the necessary steps for a Knox setup. In this setup the authentication process is going through a LDAP directory service running on the same node as Knox while separated from the Hadoop cluster. Knox comes with an embedded Apache Directory for demo purposes. You can also read here on how to setup a secure OpenLDAP. Knox LDAP service can be started like this:

Here we are going to explorer necessary steps to setup Apache Know for your environment. Continue reading “Apache Knox: A Hadoop Bastion”

Hadoop Security: 10 Resources To Get You Started

As Hadoop emerges into the center of todays enterprise data architecture, security becomes a critical requirement. This can be witnessed by the most recent acquisitions of leading Hadoop vendors and also by the numerous projects centered around security that have been launched or are getting more traction recently.

Here are 10 resources to get you started about the topic:

  1. Hadoop Security Design (2009 White Paper)
  2. Hadoop Security Design? – Just Add Kerberos? Really?(Black Hat 2010)
  3. Hadoop Poses a Big Data Security Risk: 10 Reasons Why
  4. Apache Knox – A gateway for Hadoop clusters
  5. Apache Argus
  6. Project Rhino
  7. Protegrity Big Data Protector
  8. Dataguise for Hadoop
  9. Secure JDBC and ODBC Clients’ Access to HiveServer2
  10. InfoSphere Optim Data Masking

Further Readings