Tech. Preview: HDP 2.2

The up-coming release of HDP 2.2 will contain some important forward-facing changes to the Hadoop Platform. Together with partners Hortonworks is shaping the future of Big Data in an open community. Looking at some of the key new features we get a pretty clear picture of what the future of Hadoop is going to look like. The quickest way to get started now is to download the HDP Sandbox here.

Key new features of HDP 2.2

SQL Enterprise Ready

For a very long time the community has worked to complete the compliance of Hive SQL compared to ANSI SQL. Now with HDP 2.2 comes UPDATE and DELETE with ACID capabilities. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.

This together with a Cost Based Optimizer will ensure that Apache Hive stays the defacto standard for SQL on Hadoop. SQL will be the preferred way to access data from Hadoop, especially in an Enterprise environment. Stinger.next is another major step in that direction.

Slider & Rolling Upgrades: Fulfillment of the Datalake Promise

With HDP 2.2 finally all components are running or can be run managed by a central resource manager: YARN. Apache Slider helps to fulfill the Datalake promise, which is one cluster for various (all) data and services.

Rolling upgrade is another aspect of this. With rolling upgrades the Lake as the central point of service continuously serves it’s valuable needs. You can also read about Slider here.

Source: http://hortonworks.com/blog/announcing-hdp-2-2/
Source: http://hortonworks.com/blog/announcing-hdp-2-2/

Administration and Security

Ease of deployment, ease of administration, and Enterprise grade security have lately been the driving factors of Hadoop development. With HDP 2.2 comes a giant leap towards a convenient operations of clusters with Apache Ambari.

Apache Ranger, formally Apache Argus or Xasecure, is one of the most sophisticated security offerings currently available for Hadoop. Finally delivered within one of the leading Hadoop distributions.

Complete List of HDP 2.2 New Features

Apache Hadoop YARN

  • Slide existing services onto YARN through ‘Slider’
  • GA release of HBase, Accumulo, and Storm on YARN
  • Support long running services: handling of logs, containers not killed when AM dies, secure token renewal, YARN Labels for tagging nodes for specific workloads
  • Support for CPU Scheduling and CPU Resource Isolation through CGroups

Apache Hadoop HDFS

  • Heterogeneous storage: Support for archival tier
  • Rolling Upgrade (This is an item that applies to the entire HDP Stack. YARN, Hive, HBase, everything. We now support comprehensive Rolling Upgrade across the HDP Stack).
  • Multi-NIC Support
  • Heterogeneous storage: Support memory as a storage tier (Tech Preview)
  • HDFS Transparent Data Encryption (Tech Preview)

Apache Hive, Apache Pig, and Apache Tez

  • Hive Cost Based Optimizer: Function Pushdown & Join re-ordering support for other join types: star & bushy.
  • Hive SQL Enhancements including:
    • ACID Support: Insert, Update, Delete
    • Temporary Tables
  • Metadata-only queries return instantly
  • Pig on Tez
  • Including DataFu for use with Pig
  • Vectorized shuffle
  • Tez Debug Tooling & UI

Apache HBase, Apache Phoenix, & Apache Accumulo

  • HBase & Accumulo on YARN via Slider
  • HBase HA
    • Replicas update in real-time
    • Fully supports region split/merge
    • Scan API now supports standby RegionServers
  • HBase Block cache compression
  • HBase optimizations for low latency
  • Phoenix Robust Secondary Indexes
  • Performance enhancements for bulk import into Phoenix
  • Hive over HBase Snapshots
  • Hive Connector to Accumulo
  • HBase & Accumulo wire-level encryption
  • Accumulo multi-datacenter replication

Apache Storm

  • Storm-on-YARN via Slider
  • Ingest & notification for JMS (IBM MQ not supported)
  • Kafka bolt for Storm – supports sophisticated chaining of topologies through Kafka
  • Kerberos support
  • Hive update support – Streaming Ingest
  • Connector improvements for HBase and HDFS
  • Deliver Kafka as a companion component
  • Kafka install, start/stop via Ambari
  • Security Authorization Integration with Ranger

Apache Spark

  • Refreshed Tech Preview to Spark 1.1.0 (available now)
  • ORC File support & Hive 0.13 integration
  • Planned for GA of Spark 1.2.0
  • Operations integration via YARN ATS and Ambari
  • Security: Authentication

Apache Solr

  • Added Banana, a rich and flexible UI for visualizing time series data indexed in Solr

Cascading

  • Cascading 3.0 on Tez distributed with HDP — coming soon

Hue

  • Support for HiveServer 2
  • Support for Resource Manager HA

Apache Falcon

  • Authentication Integration
  • Lineage – now GA. (it’s been a tech preview feature…)
  • Improve UI for pipeline management & editing: list, detail, and create new (from existing elements)
  • Replicate to Cloud – Azure & S3

Apache Sqoop, Apache Flume & Apache Oozie

  • Sqoop import support for Hive types via HCatalog
  • Secure Windows cluster support: Sqoop, Flume, Oozie
  • Flume streaming support: sink to HCat on secure cluster
  • Oozie HA now supports secure clusters
  • Oozie Rolling Upgrade
  • Operational improvements for Oozie to better support Falcon
  • Capture workflow job logs in HDFS
  • Don’t start new workflows for re-run
  • Allow job property updates on running jobs

Apache Knox & Apache Ranger (Argus) & HDP Security

  • Apache Ranger – Support authorization and auditing for Storm and Knox
  • Introducing REST APIs for managing policies in Apache Ranger
  • Apache Ranger –  Support native grant/revoke permissions in Hive and HBase
  • Apache Ranger –  Support Oracle DB and  storing of audit logs in HDFS
  • Apache Ranger to run on Windows environment
  • Apache Knox to protect YARN RM
  • Apache Knox support for HDFS HA
  • Apache Ambari install, start/stop of  Knox

Apache Slider

  • Allow on-demand create and run different versions of heterogeneous applications
  • Allow users to configure different application instances differently
  • Manage operational lifecycle of application instances
  • Expand / shrink application instances
  • Provide application registry for publish and discovery

Apache Ambari

  • Support for HDP 2.2 Stack, including support for Kafka, Knox and Slider
  • Enhancements to Ambari Web configuration management including: versioning, history and revert, setting final properties and downloading client configurations
  • Launch and monitor HDFS rebalance
  • Perform Capacity Scheduler queue refresh
  • Configure High Availability for ResourceManager
  • Ambari Administration framework for managing user and group access to Ambari
  • Ambari Views development framework for customizing the Ambari Web user experience
  • Ambari Stacks for extending Ambari to bring custom Services under Ambari management
  • Ambari Blueprints for automating cluster deployments
  • Performance improvements and enterprise usability guardrails

Further Readings

2 thoughts on “Tech. Preview: HDP 2.2

Leave a comment