Lately a lot of effort went into making Hadoop setups more secure for enterprise ready installations. With Apache Knox comes a connecting strap for your cluster that acts like a bastion server shielding direct access to your nodes. Knox is stateless and can therefor easily scale horizontally with the obvious limitation of also just supporting stateless protocols. Knox provides the following functionality:
- Authentication
Users and groups can be managed using LDAP or Active Directory - Federation/SSO
Knox uses HTTP header based identity federation - Authorization
Authorization is mainly supported on service level through access control lists (ACL) - Auditing
Access through Knox is audited for
Here we are going to explore the necessary steps for a Knox setup. In this setup the authentication process is going through a LDAP directory service running on the same node as Knox while separated from the Hadoop cluster. Knox comes with an embedded Apache Directory for demo purposes. You can also read here on how to setup a secure OpenLDAP. Knox LDAP service can be started like this:
cd {KNOX_HOME}
bin/ldap.sh start
Here we are going to explorer necessary steps to setup Apache Know for your environment. Continue reading “Apache Knox: A Hadoop Bastion”