Connecting Livy to a Secured Kerberized HDP Cluster

Livy.io is a proxy service for Apache Spark that allows to reuse an existing remote SparkContext among different users. By sharing the same context Livy provides an extended multi-tenant experience with users being capable of sharing RDDs and YARN cluster resources effectively.

In summary Livy uses a RPC architecture to extend the created SparkContext with a RPC service. Through this extension the existing context can be controlled and shared remotely by other users. On top of this Livy introduces authorization together with enhanced session management.

livy-architecture

Analytic applications like Zeppelin can use Livy to offer multi-tenant spark access in a controlled manner.

This post discusses setting up Livy with a secured HDP cluster.

As a long running service one of the requirements to connect Livy to a secured HDP cluster is the existence of a service principal. This service principals has to be readable by the livy user, as well as the hive principal for the the HiveContext.

Livy requires that this service principal is configured with a couple of different parameters, namely livy.server.launch.kerberos.[principal|keytab] and livy.server.auth.kerberos.[principal|keytab]. Also livy.server.auth.type needs to be set to kerberos.

livy.impersonation.enabled = true
livy.server.auth.type = kerberos
livy.server.launch.kerberos.principal = livy/node1.hdp@HDP.CORP
livy.server.launch.kerberos.keytab = /etc/security/keytabs/livy.service.keytab
livy.server.auth.kerberos.principal = HTTP/node1.hdp@HDP.CORP
livy.server.auth.kerberos.keytab = /etc/security/keytabs/spnego.service.keytab

This livy.server.auth.type will also set authentication for the Livy server itself. For example to configure Zeppelin with authentication for Livy you need to set the following in the interpreter settings:

"zeppelin.livy.principal": "zeppelin/node1.hdp@HDP.CORP",
"zeppelin.livy.keytab": "/etc/security/keytabs/zeppelin.service.keytab",

The launch parameters are used during startup:

export SPARK_HOME=/usr/hdp/current/spark-client
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=/usr/lib/jvm/java-1.8.0-openjdk/bin:$PATH
export HADOOP_CONF_DIR=/etc/hadoop/conf
export LIVY_SERVER_JAVA_OPTS="-Xmx2g"

Kinit is not required with 0.3 of Livy, which is the version being used here.

With livy 0.2 it is required to kinit the livy user before starting the web-service:

$ su livy
$ kinit -kt /etc/security/keytabs/livy.service.keytab 
  livy/node1.hdp@HDP.CORP
$ bin/livy-server start

Authorization

With authentication enabled setting authorization will likely be required. For this Livy provides access control settings to control which users have access to the resources.

livy.server.access_control.enabled = true
livy.server.access_control.users = livy,zeppelin

Further for services like Zepplin impersonation settings are required. In order for the zeppelin user to be able to impersonate other users it requires to be a super user.

livy.superusers=zeppelin

HiveContext

If you have issues with the HiveContext being steup getting similar exceptions like this:

INFORMATION: 16/11/05 18:20:35 INFO metastore: Trying to connect to metastore with URI thrift://node1.hdp:9083
INFORMATION: 16/11/05 18:20:35 ERROR TSaslTransport: SASL negotiation failure
INFORMATION: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
INFORMATION:    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
INFORMATION:    at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
INFORMATION:    at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
INFORMATION:    at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
INFORMATION:    at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
INFORMATION:    at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
INFORMATION:    at java.security.AccessController.doPrivileged(Native Method)
INFORMATION:    at javax.security.auth.Subject.doAs(Subject.java:422)
INFORMATION:    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
INFORMATION:    at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
INFORMATION:    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:420)
INFORMATION:    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
INFORMATION:    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
INFORMATION:    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
INFORMATION:    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
INFORMATION:    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
INFORMATION:    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
INFORMATION:    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
INFORMATION:    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
INFORMATION:    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
INFORMATION:    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
INFORMATION:    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
INFORMATION:    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
INFORMATION:    at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
INFORMATION:    at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
INFORMATION:    at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
INFORMATION:    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
INFORMATION:    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
INFORMATION:    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
INFORMATION:    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
INFORMATION:    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
INFORMATION:    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
INFORMATION:    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
INFORMATION:    at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:345)
INFORMATION:    at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255)
INFORMATION:    at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:459)
INFORMATION:    at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:233)
INFORMATION:    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:236)
INFORMATION:    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
INFORMATION:    at com.cloudera.livy.repl.SparkInterpreter$$anonfun$start$1.apply(SparkInterpreter.scala:95)
INFORMATION:    at com.cloudera.livy.repl.SparkInterpreter$$anonfun$start$1.apply(SparkInterpreter.scala:82)
INFORMATION:    at com.cloudera.livy.repl.SparkInterpreter.restoreContextClassLoader(SparkInterpreter.scala:305)
INFORMATION:    at com.cloudera.livy.repl.SparkInterpreter.start(SparkInterpreter.scala:82)
INFORMATION:    at com.cloudera.livy.repl.Session$$anonfun$1.apply(Session.scala:59)
INFORMATION:    at com.cloudera.livy.repl.Session$$anonfun$1.apply(Session.scala:57)
INFORMATION:    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
INFORMATION:    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
INFORMATION:    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
INFORMATION:    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
INFORMATION:    at java.lang.Thread.run(Thread.java:745)
INFORMATION: Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
INFORMATION:    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
INFORMATION:    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
INFORMATION:    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
INFORMATION:    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
INFORMATION:    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
INFORMATION:    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
INFORMATION:    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
INFORMATION:    ... 49 more
INFORMATION: 16/11/05 18:20:35 WARN metastore: Failed to connect to the MetaStore Server...
INFORMATION: 16/11/05 18:20:35 INFO metastore: Waiting 1 seconds before next connection attempt.

You can either remove or make sure you use the correct hive-site.xml under /usr/hdp/current/spark-client/conf. Just copy it from /etc/hive/conf/:

$ cp /etc/hive/conf/hive-site.xml /usr/hdp/current/spark-client/conf/

It should also be able to disable the HiveContext completely by setting livy.repl.enableHiveContext  to false.

livy.repl.enableHiveContext = false

Further Readings

12 thoughts on “Connecting Livy to a Secured Kerberized HDP Cluster

  1. How does Livy proxy the user? Per task? Do you know how quotas are assigned to users, like how do you stop one Livy user from using all of the resources available to the Executors?

    Like

  2. Thanks for this post. I am trying to setup Zeppelin/Livy/Spark. All these are running on the same machine. My end goal is to be able to run Zeppelin/Livy/Spark with impersonation. So far, I have successfully configured Zeppelin with Spark. However, I want to use multi-tenancy, and for that I wanted to configure Zeppelin with Livy and Spark.

    For Livy, I provided the following two paths
    export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
    export HADOOP_CONF_DIR=/etc/hadoop/conf

    With the above settings, I can run the following command successfully in Zeppelin:
    %livy.spark
    sc.version

    However, the following command fails:
    %livy.sql
    select * from myDB.table1

    I see the following error:
    :14: error: not found: value sqlContext
    sqlContext.sql(“select * from datalake.combination2”).show(1000)

    I have not enabled Shiro authentication for Zeppelin yet. My assumption was that Livy would log into Spark using the default user as I provide the Spark home directory. Could you please point out how can I fix the above issue?

    Like

    1. Hi, THanks for the post. I can start the Livy server with Kerberos enables. Butwhen I do a requests.post(host, headers, data) it throwas a Authentication required error. ANy help would be useful here

      Like

      1. There are two parts to this. You need SPNEGO auth with Livy or Livy auth to YARN/Hadoop is failing.
        The error logs would be useful to help.
        2nd kinit and use: curl –negotiate -u :
        Search for curl SPNEGO for details.

        Like

      2. Hi,
        SPNEGO Auth seems to be working fie on the server(I presume)

        I start Livy after setting the following parameters
        livy.impersonation.enabled = true
        livy.server.auth.type = kerberos
        livy.server.launch.kerberos.principal = @.COM
        livy.server.launch.kerberos.keytab = /home/pathtokeytab/.keytab
        livy.server.auth.kerberos.principal = HTTP/@.COM
        livy.server.auth.kerberos.keytab = /home/pathtokeytab/.keytab

        The server logs give me this o/p.

        17/02/02 23:18:50 INFO StateStore$: Using BlackholeStateStore for recovery.
        17/02/02 23:18:50 INFO BatchSessionManager: Recovered 0 batch sessions. Next session id: 0
        17/02/02 23:18:50 INFO InteractiveSessionManager: Recovered 0 interactive sessions. Next session id: 0
        17/02/02 23:18:50 INFO LivyServer: SPNEGO auth enabled (principal = HTTP/@.COM)
        17/02/02 23:18:51 INFO KerberosAuthenticationHandler: Login using keytab /home/pathtokeytab/.keytab, for principal HTTP/@.COM
        17/02/02 23:18:51 WARN RequestLogHandler: !RequestLog
        17/02/02 23:18:51 INFO WebServer: Starting server on http://:8998

        This is running in the US Data center. I believe this means that Liy has started successfulyy with Kerberos

        Now from the client machine In Singapore I run the below commands

        c:FASTPython2.7.12>python
        Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (
        AMD64)] on win32
        Type “help”, “copyright”, “credits” or “license” for more information.
        >>> import requests
        >>> import json
        >>> from requests_kerberos import HTTPKerberosAuth, REQUIRED
        >>> headers = {‘Content-Type’: ‘application/json’}
        >>> krb = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
        >>> r = requests.post(‘http://:8998/sessions’, headers=headers, auth=krb)
        >>> r.raise_for_status()
        Traceback (most recent call last):
        File “”, line 1, in
        File “c:FASTPython2.7.12libsite-packagesrequests-2.11.1-py2.7.eggreques
        tsmodels.py”, line 883, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
        requests.exceptions.HTTPError: 403 Client Error: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) for url: http://:8998/sessions
        >>>

        Like

      3. Hi, very interesting. I can’t remember, if I ever did this with Python like that.

        Your issue is “pretty straightforward” as you can see in the error message:
        “403 Client Error: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) for url: http://:8998/sessions

        You are having issues obtaining proper Kerberos credentials on your machine. Now this again could be related to multiple aspects around your current setup.

        The most common reason people getting this error is, when they have improper access rights set on the keytab, so that the executing user is not able to read it.

        In your case, where is the KDC Realm? In US? Can you access it from your location? What are your krb5 confs on your local machine? Are you sharing the same Realm?

        Put simple, your authentication on your local machine is not working properly. Try to enable debug logs for Kerberos and can you try to curl from a machine in the US DC?

        Like

      4. Hi,
        The KDC realm is in the US.
        I do have an account in the US domain where the hadoop servers reside, and if i try to do a kinit from my local machine with the full path which incudes the KDC realm of the US domain it does generate the kerberos cache but I cant seem to figure out where the Keytab file is. I cant seem to find the Keytab file that resides on the US domain too qhen i run the kinit there. I have raised this with my engg team. curl command with negotiate fails for me with the 401 authentication required error. Funnily if i run the same command on the browser it goes through, engg thinks its because the browsers authenticate to the US domain as opposed to the curl command which authenticates to the Asia domain in the firm
        But whats really perplexing is if i use the hdfs.ext.kerberos python library and use kerberos auth it authentcates correctly.
        Nevertheless thanks for your original post and your immediate responses to my comments. Ill keep you posted on what engg comes back with

        Like

    1. Please note the livy conf settings

      livy.impersonation.enabled = true
      livy.server.auth.type = kerberos
      livy.server.launch.kerberos.principal = @domain.COM
      livy.server.launch.kerberos.keytab = /home/rc/.keytab
      livy.server.auth.kerberos.principal = HTTP/@domain.COMM
      livy.server.auth.kerberos.keytab = /home/rc/.keytab

      Like

      1. Hi,

        So the kerberos issue got resolved.
        I was using the wrong SPNEGO principal and keytab. Once I got that regenerated and applied kerberos auth name rules correctly to remove my domain name (SGP domain) before the kerberos auth is called, it worked fine and successfully authenticated.

        Issue now is if i turn on impersonation. It fails with the error User not authorised in Livy.

        Like

  3. Hey. Great post about Livy and kerberos.
    I have a kerberized EMR cluster running on AWS that comes with most settings around kerberos and livy pre-configured.

    I created a principal called “dataengineering” and I can programatically hit Livy if:
    – I ssh to the server
    – kinit dataengineering
    – then call a python script passing the dataengineering principal.

    However, I am struggling to access the Livy UI through the browser. I get a “HTTP ERROR: 403”

    output from “/var/log/livy/livy-livy-server.out”
    WARN AuthenticationFilter: AuthenticationToken ignored: Unauthorized access

    Livy configuration file (/usr/lib/livy/conf/livy.conf)

    livy.impersonation.enabled = true
    livy.superusers = dataengineering,livy,HTTP
    livy.server.auth.type = kerberos
    livy.server.launch.kerberos.principal = livy/@EC2.INTERNAL
    livy.server.launch.kerberos.keytab = /etc/livy.keytab
    livy.server.auth.kerberos.principal = HTTP/@EC2.INTERNAL
    livy.server.auth.kerberos.keytab = /etc/livy.keytab

    EMR comes with entries for both HTTP and livy on “/etc/livy.keytab”

    I am unsure how the browser handles it and as per the configuration above plus the logs I believe a principal called “HTTP” is used. I can see this line in the logs LivyServer: SPNEGO auth enabled (principal = HTTP/@EC2.INTERNAL).

    Have you ever faced this issue ?

    Like

Leave a comment