Automated Kerberos Install for HDP w/ Ambari + Puppet

With the release of Ambari 2.x kerberizing a HDP install improved quite a bit. Looking back at Kerberized Hadoop Cluster – A Sandbox Example compared to today most of the there described steps are much easier by now and can be automated. For long I was looking to include it into my existing Vagrant project for an end to end setup of a kerberized cluster. With the writing of this post I finally had the opportunity to do so.

In this post I would like to describe the parts added to the Vagrant setting needed to accomplish an end to end setup of a kerberized HDP cluster. Before the final step of the cluster setup by using the Ambari REST API, a KDC with credentials needs to be created. A Puppet module was created and included to achieve the installation of a MIT Kerberos install.

Install and Setup MIT Kerberos with Puppet

Additionally to the already existing parameters for the KDC the realm (:krb5_realm ) and the KDC host (:krb5_host) have to be added to the HDP config file required for the setup (example config hdp.rb):

{
    :hdp_ambari => "2.2.0.0",
    :blueprint_name => "n1-hdp-basic",
    :krb5_realm => "MYCORP.NET",
    :krb5_kdc => "one.hdp",
    :hdp_os => "centos7",
....
}

This parameters are added to the puppet.facter to be used by the Vagrant file:

node_config.vm.provision "puppet" do |puppet|
  puppet.environment_path = ENVIRONMENT_PATH
  puppet.environment = opts[:node_env]
  puppet.module_path = MODULES_PATH
  #puppet.manifest_file = opts[:manifest_file].to_s
  puppet.facter = {
    "ownhostname" => opts[:name],
    "ambarihostname" => AMBARI_HOST_NAME,
    "blueprint_name" => hdp_conf[:blueprint_name],
    "hdp_ambari" => hdp_conf[:hdp_ambari],
    "hdp_os" => hdp_conf[:hdp_os], 
    "hdp_stack" => hdp_conf[:hdp_stack],
    "hdp_update" => hdp_conf[:hdp_update],
    "hdp_util" => hdp_conf[:hdp_util],
    "krb5_realm" => hdp_conf[:krb5_realm],
    "krb5_kdc" => hdp_conf[:krb5_kdc],
    "jdk_version" => hdp_conf[:jdk_version],
  }
end

The Puppet module for the KDC setup is placed under puppet/modules/kerberos containing a pretty straight forward implementation of a typical module. There is probably only one thing that might catch your attention, which is the use of the rng service. In some environments there is not enough entropy (randomness) for the KDC to encrypt its database, without this a install would timeout. The service is not stopped at the end keeping it running. You can stop it manually after the install if you really want.

class kerberos($krb5_kdc="one.hdp", $krb5_realm="MYCORP.NET") {
  
  Exec {
    path => ["/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/"] 
  }

  package { ['krb5-server', 'krb5-libs', 'krb5-workstation']:
    ensure  => present,
  }

  exec { 'install_rng':
    command => 'yum -y install rng-tools',
  }

  file { "/etc/krb5.conf":
    owner => root,
    group => root,
    mode => "755",
    replace => true,
    content => epp("kerberos/krb5.conf.epp", {"krb5_kdc" => $krb5_kdc, "krb5_realm" => $krb5_realm}),
    require => Package[['krb5-server', 'krb5-libs', 'krb5-workstation']],
  }

  file { "/var/kerberos/krb5kdc/kdc.conf":
    owner => root,
    group => root,
    mode => "755",
    replace => true,
    content => epp("kerberos/kdc.conf.epp", {"krb5_realm" => $krb5_realm}),
    require => File["/etc/krb5.conf"],
  }

  file { "/var/kerberos/krb5kdc/kadm5.acl":
    owner => root,
    group => root,
    mode => "755",
    replace => true,
    content => epp("kerberos/kadm5.acl.epp", {"krb5_kdc" => $krb5_kdc, "krb5_realm" => $krb5_realm}),
    require => File["/var/kerberos/krb5kdc/kdc.conf"],
  }

  service {"rngd": # "start_rng":
    ensure => "running",
    #command => "/etc/init.d/rngd start",
    require => Exec['install_rng'],
  }
 
  exec {"create_kdb5":
    command => "kdb5_util create -s -P hadoop",
    creates => "/var/kerberos/krb5kdc/principal",
    require => Service["rngd"], # Exec["start_rng"],
  }

  exec {"create_krb5_adm":
    command => 'kadmin.local -q "addprinc -pw hadoop hdp/admin"',
    require => Exec["create_kdb5"],
  }

  service {"kadmin":
    ensure => "running",
    require => Exec["create_krb5_adm"],
  }

  service {"krb5kdc":
    ensure => "running",
    require => Exec["create_krb5_adm"],
  }

  #exec {"stop_rng":
  #  command => "/etc/init.d/rngd stop",
  #  require => Exec["create_krb5_adm"],
  #} 
}

Creating the KDC is one part but also the cluster needs to be kerberized. A specific puppet module was created for that part, which can be added to the node manifest if wanted.

class hdp_setup::kerberize_cluster($ambarihostname='one.hdp', $blueprint_name='n1-hdp-basic', $krb5_kdc='one.hdp', $krb5_realm='MYCORP.NET', ) {
    include hdp_setup
    
    Exec {
        path => ["/bin/", "/sbin/", "/usr/bin/", "/usr/sbin/"]
    }
    
    exec { "/vagrant/bin/kerberize_cluster.py admin admin ${ambarihostname} 8080 ${blueprint_name} ${krb5_kdc} ${krb5_realm} hdp/admin@MYCORP.NET hadoop": 
        timeout => 3600,
    }
}

The module is quite simple and can be added to a node manifest in the following way:

...
class {'hdp_setup::kerberize_cluster':
  ambarihostname => $ambarihostname,
  blueprint_name => $blueprint_name,
  krb5_kdc => $krb5_kdc,
  krb5_realm => $krb5_realm,
     
}
...
Class['interfering_services']
-> Class['ntp'] 
-> Class['etchosts'] 
-> Class['hdp_mysql::oozie']
-> Class['hdp_mysql::hive']
-> Class['hdp_mysql::ranger'] 
-> Class['ambari_server'] 
-> Class['ambari_agent'] 
-> Class['hdp_setup::blueprint_install']
...
-> Class['hdp_setup::kerberize_cluster']

Automated Kerberos Install

As you can already see from the above Puppet module a python script is being used to setup the cluster. The script itself uses the Ambari REST API to add the Kerberos service to the cluster, create the appropriate config, enable Kerberos, and restart the cluster. The full script can be found here.

Adding the Kerberos service with it’s component KERBEROS_CLIENT to the cluster:

execute_request('POST', '/api/v1/clusters/%s/services/KERBEROS' % clustername)
execute_request('POST', '/api/v1/clusters/%s/services/KERBEROS/components/KERBEROS_CLIENT' % clustername)

Next we need to create the configuration for the Kerberos service. It consists of two configuration parts kerberos-env and krb-conf. Typically the krb5-conf configuration would also contain the content of the krb5.conf file under /etc/. Ambari would also manage the content in that file. If your are managing that content by yourself you can disable this functionality by setting manage_krb5_conf to false. This is also done here, as the krb5.conf is created and managed by Puppet with the above module and the template can be found here. Here we leave content emtpy and tell Ambari not to manage the krb5.conf:

print "Crete krb5-env"
msg = '''[ {"Clusters": {
    "desired_configs": {
      "type": "kerberos-env",
      "tag": "version2",
      "properties": {
        "type": "kerberos-env",
        "tag": "version1",
        "properties": {
          "kdc_type": "mit-kdc",
          "encryption_types": "aes des3-cbc-sha1 rc4 des-cbc-md5",
          "realm": "%s",
          "kdc_host": "%s",
          "admin_server_host": "%s",
          "executable_search_paths": "/usr/bin, /usr/kerberos/bin, /usr/sbin, /usr/lib/mit/bin, /usr/lib/mit/sbin"
        }
      }
    }
  }
}]''' % (realm, kdc_host, kdc_host)
log_debug('krb5-env msg: %s' % msg.replace('n', ''))
execute_request('PUT', '/api/v1/clusters/%s' % clustername, msg)

print "Create krb5-conf"
msg = '''[ { "Clusters": 
    { "desired_config": { 
        "type": "krb5-conf",
        "tag": "version1",
        "properties": {
            "conf_dir" : "/etc",
            "content" : "",
            "domains" : "",
            "manage_krb5_conf" : "false"
        }
} } } ]'''
log_debug('krb5-conf msg: %s' % msg.replace('n', ''))
execute_request('PUT', '/api/v1/clusters/%s' % clustername, msg )

Next the components will get installed cluster wide on each host:

msg = '''{"host_components" : [{"HostRoles" : {"component_name":"KERBEROS_CLIENT"}}]}'''
for host in return_hosts():
  execute_request('POST', '/api/v1/clusters/%s/hosts?Hosts/host_name=%s' % (clustername, host["Hosts"]["host_name"]), msg )

print "Stop the cluster"
msg = '{"ServiceInfo": {"state" : "INSTALLED"}}'
execute_and_wait_completed('PUT', '/api/v1/clusters/%s/services/KERBEROS' % clustername, msg)

msg = '{"ServiceInfo": {"state" : "INSTALLED"}}'
execute_and_wait_completed('PUT', '/api/v1/clusters/%s/services' % clustername, msg)

Last but not least the kerberos install needs to be enabled by providing the necessary admin credentials for the creation of the principals. Ambari will create the required principals distributing the keytabs to each host. With this the enablement is accomplished and the cluster can be restarted.

rint "Enable Kerberos"
msg = '''{
  "session_attributes" : {
    "kerberos_admin" : {
      "principal" : "%s", "password" : "%s" }
    },
    "Clusters": {
      "security_type" : "KERBEROS"
  }
}''' % (principal, princ_password)
execute_and_wait_completed('PUT', '/api/v1/clusters/%s' % clustername, msg)

print "Restart Cluster"
msg = '{"ServiceInfo": {"state" : "STARTED"}}'
execute_and_wait_completed('PUT', '/api/v1/clusters/%s/services' % clustername, msg)

Done is the kerberized install.

Further Readings

 

 

Leave a comment