Connecting Livy to a Secured Kerberized HDP Cluster

Livy.io is a proxy service for Apache Spark that allows to reuse an existing remote SparkContext among different users. By sharing the same context Livy provides an extended multi-tenant experience with users being capable of sharing RDDs and YARN cluster resources effectively.

In summary Livy uses a RPC architecture to extend the created SparkContext with a RPC service. Through this extension the existing context can be controlled and shared remotely by other users. On top of this Livy introduces authorization together with enhanced session management.

livy-architecture

Analytic applications like Zeppelin can use Livy to offer multi-tenant spark access in a controlled manner.

This post discusses setting up Livy with a secured HDP cluster.

As a long running service one of the requirements to connect Livy to a secured HDP cluster is the existence of a service principal. This service principals has to be readable by the livy user, as well as the hive principal for the the HiveContext.

Livy requires that this service principal is configured with a couple of different parameters, namely  livy.server.launch.kerberos.[principal|keytab] and livy.server.auth.kerberos.[principal|keytab]. Also livy.server.auth.type needs to be set to kerberos.

This livy.server.auth.type will also set authentication for the Livy server itself. For example to configure Zeppelin with authentication for Livy you need to set the following in the interpreter settings:

The launch parameters are used during startup:

Kinit is not required with 0.3 of Livy, which is the version being used here.

With livy 0.2 it is required to kinit the livy user before starting the web-service:

Authorization

With authentication enabled setting authorization will likely be required. For this Livy provides access control settings to control which users have access to the resources.

Further for services like Zepplin impersonation settings are required. In order for the zeppelin user to be able to impersonate other users it requires to be a super user.

HiveContext

If you have issues with the HiveContext being steup getting similar exceptions like this:

You can either remove or make sure you use the correct hive-site.xml under /usr/hdp/current/spark-client/conf. Just copy it from /etc/hive/conf/:

It should also be able to disable the HiveContext completely by setting  livy.repl.enableHiveContext  to false.

Further Readings

11 thoughts on “Connecting Livy to a Secured Kerberized HDP Cluster”

  1. Thanks for this post. I am trying to setup Zeppelin/Livy/Spark. All these are running on the same machine. My end goal is to be able to run Zeppelin/Livy/Spark with impersonation. So far, I have successfully configured Zeppelin with Spark. However, I want to use multi-tenancy, and for that I wanted to configure Zeppelin with Livy and Spark.

    For Livy, I provided the following two paths
    export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
    export HADOOP_CONF_DIR=/etc/hadoop/conf

    With the above settings, I can run the following command successfully in Zeppelin:
    %livy.spark
    sc.version

    However, the following command fails:
    %livy.sql
    select * from myDB.table1

    I see the following error:
    :14: error: not found: value sqlContext
    sqlContext.sql(“select * from datalake.combination2”).show(1000)

    I have not enabled Shiro authentication for Zeppelin yet. My assumption was that Livy would log into Spark using the default user as I provide the Spark home directory. Could you please point out how can I fix the above issue?

    1. Hi, THanks for the post. I can start the Livy server with Kerberos enables. Butwhen I do a requests.post(host, headers, data) it throwas a Authentication required error. ANy help would be useful here

      1. There are two parts to this. You need SPNEGO auth with Livy or Livy auth to YARN/Hadoop is failing.
        The error logs would be useful to help.
        2nd kinit and use: curl –negotiate -u :
        Search for curl SPNEGO for details.

        1. Hi,
          SPNEGO Auth seems to be working fie on the server(I presume)

          I start Livy after setting the following parameters
          livy.impersonation.enabled = true
          livy.server.auth.type = kerberos
          livy.server.launch.kerberos.principal = @.COM
          livy.server.launch.kerberos.keytab = /home/pathtokeytab/.keytab
          livy.server.auth.kerberos.principal = HTTP/@.COM
          livy.server.auth.kerberos.keytab = /home/pathtokeytab/.keytab

          The server logs give me this o/p.

          17/02/02 23:18:50 INFO StateStore$: Using BlackholeStateStore for recovery.
          17/02/02 23:18:50 INFO BatchSessionManager: Recovered 0 batch sessions. Next session id: 0
          17/02/02 23:18:50 INFO InteractiveSessionManager: Recovered 0 interactive sessions. Next session id: 0
          17/02/02 23:18:50 INFO LivyServer: SPNEGO auth enabled (principal = HTTP/@.COM)
          17/02/02 23:18:51 INFO KerberosAuthenticationHandler: Login using keytab /home/pathtokeytab/.keytab, for principal HTTP/@.COM
          17/02/02 23:18:51 WARN RequestLogHandler: !RequestLog
          17/02/02 23:18:51 INFO WebServer: Starting server on http://:8998

          This is running in the US Data center. I believe this means that Liy has started successfulyy with Kerberos

          Now from the client machine In Singapore I run the below commands

          c:\FAST\Python\2.7.12>python
          Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (
          AMD64)] on win32
          Type “help”, “copyright”, “credits” or “license” for more information.
          >>> import requests
          >>> import json
          >>> from requests_kerberos import HTTPKerberosAuth, REQUIRED
          >>> headers = {‘Content-Type’: ‘application/json’}
          >>> krb = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
          >>> r = requests.post(‘http://:8998/sessions’, headers=headers, auth=krb)
          >>> r.raise_for_status()
          Traceback (most recent call last):
          File “”, line 1, in
          File “c:\FAST\Python\2.7.12\lib\site-packages\requests-2.11.1-py2.7.egg\reques
          ts\models.py”, line 883, in raise_for_status
          raise HTTPError(http_error_msg, response=self)
          requests.exceptions.HTTPError: 403 Client Error: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) for url: http://:8998/sessions
          >>>

          1. Hi, very interesting. I can’t remember, if I ever did this with Python like that.

            Your issue is “pretty straightforward” as you can see in the error message:
            “403 Client Error: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) for url: http://:8998/sessions

            You are having issues obtaining proper Kerberos credentials on your machine. Now this again could be related to multiple aspects around your current setup.

            The most common reason people getting this error is, when they have improper access rights set on the keytab, so that the executing user is not able to read it.

            In your case, where is the KDC Realm? In US? Can you access it from your location? What are your krb5 confs on your local machine? Are you sharing the same Realm?

            Put simple, your authentication on your local machine is not working properly. Try to enable debug logs for Kerberos and can you try to curl from a machine in the US DC?

          2. Hi,
            The KDC realm is in the US.
            I do have an account in the US domain where the hadoop servers reside, and if i try to do a kinit from my local machine with the full path which incudes the KDC realm of the US domain it does generate the kerberos cache but I cant seem to figure out where the Keytab file is. I cant seem to find the Keytab file that resides on the US domain too qhen i run the kinit there. I have raised this with my engg team. curl command with negotiate fails for me with the 401 authentication required error. Funnily if i run the same command on the browser it goes through, engg thinks its because the browsers authenticate to the US domain as opposed to the curl command which authenticates to the Asia domain in the firm
            But whats really perplexing is if i use the hdfs.ext.kerberos python library and use kerberos auth it authentcates correctly.
            Nevertheless thanks for your original post and your immediate responses to my comments. Ill keep you posted on what engg comes back with

    1. Please note the livy conf settings

      livy.impersonation.enabled = true
      livy.server.auth.type = kerberos
      livy.server.launch.kerberos.principal = @domain.COM
      livy.server.launch.kerberos.keytab = /home/rc/.keytab
      livy.server.auth.kerberos.principal = HTTP/@domain.COMM
      livy.server.auth.kerberos.keytab = /home/rc/.keytab

      1. Hi,

        So the kerberos issue got resolved.
        I was using the wrong SPNEGO principal and keytab. Once I got that regenerated and applied kerberos auth name rules correctly to remove my domain name (SGP domain) before the kerberos auth is called, it worked fine and successfully authenticated.

        Issue now is if i turn on impersonation. It fails with the error User not authorised in Livy.

Leave a Reply

Your email address will not be published. Required fields are marked *