Sliding Applications onto YARN

Along with a Hadoop cluster installation usually come some well established services which are part of certain use cases. Rarely is it possible to fully satisfy complex use cases by only applying MapReduce. There could be ElasticSearch for search or a Cassandra cluster for indexing. This and other complementary components, like HBase, Storm, or Hive, of a Hadoop cluster bring the burden of additional complexity when it comes to cluster planing, management, or monitoring. Think for example of the memory planning of a Datanode also running Cassandra. You would have to choose upfront of how many of the given memory you allocate to each. Think of what also will happen as you remove or add new Cassandra nodes to the cluster?

YARN was designed to manage different sets of workloads on a Hadoop setup aside MapReduce. So with modern Hadoop installations the solution to deal with the above challenges means to port the needed services to YARN. Some of the common services have been or are being ported to YARN in a YARN-Ready program led by Hortonworks. As porting existing services to YARN can be by it’s own quite challenging Apache Slider (incubating) was developed to support long-running services by YARN without the requirement to make any changes. Apache Slider’s promise is to run this applications inside YARN unchanged.

What is Apache Slider?

With Apache Slider it is possible to deploy long-running application unchanged in a YARN context. This gives consistent and flexible resource management to existing distributed applications along with other jobs being executed on a Hadoop installation. The aspects of a long-running application you typically care for are as follows:

  • Installation & Configuration
  • Starting & Restart
  • Reconfiguration & Rolling Updates
  • Status Reporting & Logging
  • Rebalancing (Deactivate/Activate)
  • Upgrades

Apache Slider aims to provide support for all of the six dimension in addition to providing Security, High Availability, and Packaging. In total the aim of Apache Slider can be described as the following:

Make it possible and easy to deploy and manage existing applications on a YARN cluster

Currently the project is incubating to become part of the Apache Software Foundation. It’s ready to be downloaded and tried out on existing use cases and applications. It’s available as a Tech Preview with HDP and will reach GA with the next HDP release.

How Slider Works

Before we are going to look at the way Slider works we are going to step back a little and look at the way YARN is working right now. With YARN we have a client that talks to the Resource Manager (RM) to create an Application Master (AM). The Application Master then creates resources it requires depending on the capacity it gets in negotiation with the RM. As the RM frees this resources the AM is then able to create Containers that are managed by YARN Node Managers installed on each node. This Containers are the smallest unit of execution within the YARN framework.

YARN Workflow

Apache Slider works similar to the way YARN functions in that it uses a client to create an Application Master (AM) through the Resource Manager (RM). And again the AM is responsible of managing the required resources in negotiation with the RM. With Apache Slider you are looking at specialized implementation of a client and Application Master. In addition within every Container started by Slider there exist a special Slider Agent providing an interface to the running application.

Slider Components Overview:

  • AppMaster
    The AppMaster is responsible for handling common YARN interactions. In addition to that it communicates directly with the Slider Client for example to exchange status of it’s running components or forward configurations to it. The AppMaster is also responsible for orchestrating publishing needs as keeping the registry up to date.
  • Slider Agent
    The Slider Agent (re-)starts the given service using the also given configuration. It is also responsible for issuing reconfigurations that also might require restarts. In general the agent is responsible for the service it was assigned to. This includes also Heartbeat information alongside with logging and monitoring in general. It handles port allocations and publishing.
  • Slider Client
    With the client certain or custom application life cycle commands can be issued. It is the control interface to the running service deployed.

How Slider Views & Runs Your Application

For Slider each application is a set of components. From it’s perspective this components are in a sense daemons that are run using some kind of configuration together with a binary executable that is run in a container. Each of this components also can have multiple instances. Taking HBase for example you would typically run a ReginServer, a Master, and maybe also a HBase REST service which are all different components of the same application. In addition you would preferably have multiple RegionServers running that are instances of the same component in the sense of Slider.

Slider application are packaged using a description format that is understood by it’s components. It’s very important for the Slider AppMaster and Slider Agent to be able to understand and read the package format distributed with each application. Further more the Slider Agent needs the required binaries to be executed.

Slider By Example

To get up and running Slider ships with some example application that can be deployed using Slider. Downloading Slider here and extracting it on to your local machine will give you access to some of the working examples inside the 

  folder.

The best way to get started with Apache Slider is to take one of the existing examples in app-packages, copy it to a separate folder, and adjusting it to run with simple service of your own like Tomcat or Memcached.

Further Readings

3 thoughts on “Sliding Applications onto YARN”

Leave a Reply

Your email address will not be published. Required fields are marked *