Getting Started with ORC and HCatalog

ORC (Optimized Row Columnar) is a columnar file format optimized to improve performance of Hive. Through the Hive metastore and HCatalog reading, writing, and processing can also be accomplished by MapReduce, Pig, Cascading, and so on. It is very similar to Parquet which is being developed by Cloudera and Twitter. Both are part of the most current Hive release and available to be used immediately. In this post I would like to describe some of the key concepts of ORC and demonstrate how to get started quickly using HCatalog. Continue reading “Getting Started with ORC and HCatalog”