Storm Flux: Easy Streaming Deployment

With Flux for Apache Storm deploying streaming topologies for real-time processing becomes less programmatic and more declarative. Using Flux for deployments makes it less likely you will have to re-compile your project just because you have re-configured or re-arranged your topology. It leverages YAML, a human-readable serialization format, to describe a topology on a whole. You might still need to write some classes, but by taking advantage of existing, generic implementations this becomes less likely.

While Flux can also be used with an existing topology, for this post we’ll take the Hive-Streaming example you can find here (blog post) to create the required topology from scratch using Flux. For experiments and demo purposes you can use the following Vagrant setup to run a HDP cluster locally.

Making Flux Part of Your Project

The main Flux class can be used to deploy the in YAML notation described topology to Storm. Topologies also with Flux are always deployed using the storm executable with the exception of using Flux instead of a custom main class describing the topology. In order to ship Flux with an existing project it needs to be part of the classpath. Maven can be used to achieve this.

Additionally we can set Flux as the main class using the Maven Shade plugin as following:

Flux YAML Topology

Essentially a topolgoy consists of six core parts. It has a Name, topology specific Config, class Components, Spouts, Bolts, and a Streams definition. All this parts can be configured in

The topology relies on certain components for streaming sock prices to Hive. For once the topology relies on the Hive Streaming bolt that comes with Storm. The other component is the Kafka spout as stock prices get published to the stock_price topic. The Kafka spout also needs Zookeeper, which would also be configured in the components section of the toplogy.

ZkHosts takes a comma separated list of nodes as a single string as a constructor argument. This can be provided by using the constructorAgs configuration. The configuration of Zookeeper looks as following:

The Zookeeper component can be referenced by the Kafka spout configuration, as this needs a ZkHosts  instance as a constructor argument. Configuring our Kafka Spout:

For Hive Streaming we need a HiveOptions instance to define batch sizes and timeouts. But the most important part is the definition of the column fields including the partition fields also. For this we need to configure the DelimitedRecordHiveMappercarefully. The configuring for Hive Streaming can be found below:

After the stock information is read from the Kafka topic a simple bolt for splitting the line and emitting the individual fields to Hive. The StockDataBolt does not take any arguments and can be directly configured in the bolts section. With this we have all the required configuration to define our stream:

Deploying the Topology

Each topolgoy is still deployed in the same manner but using storm jar, only the main class is now Flux. The deployment can be local by supplying a –locate as a parameter or remotely with –remote. A full list of options can be seen here.

The expected output can be seen below and should describe the kind of topology that was deployed:

Publishing data to the Kafka the topic for stocks:

Further Readings

4 thoughts on “Storm Flux: Easy Streaming Deployment”

Leave a Reply