How to Build Optimal Hive Tables Using ORC, Partitions and Metastore Statistics

hive-logo

Creating Hive tables is a common experience to all of us that use Hadoop. It enables us to mix and merge datasets into unique, customized tables. And, there are many ways to do it.

We have some recommended tips for Hive table creation that can increase your query speeds and optimize and reduce the storage space of your tables. And it’s simpler than you might think. Continue reading

Advertisements

How to Write ORC Files and Hive Partitions in Spark

sporc

ORC, or Optimized Row Columnar, is a popular big data file storage format. Its rise in popularity is due to it being highly performant, very compressible, and progressively more supported by top-level Apache products, like Hive, Crunch, Cascading, Spark, and more.

I recently wanted/needed to write ORC files from my Spark pipelines, and found specific documentation lacking. So, here’s a way to do it. Continue reading

How to Sqoop an RDBMS Source Directly to a Hive Table In Any Format

This tutorial will accomplish a few key feats that make ingesting data to Hive far less painless. In this writeup, you will learn not only how to Sqoop a source table directly to a Hive table, but also how to Sqoop a source table in any desired format (ORC, for example) instead of just plain old text.

Continue reading

Apache Crunch Tutorial 9: Reading & Ingesting Orc Files

LITTLE_CRUNCH

This post is the ninth in a hopefully substantive and informative series of posts about Apache Crunch, a framework for enabling Java developers to write Map-Reduce programs more easily for Hadoop.

Continue reading

Benefits of the Orc File Format in Hadoop, And Using it in Hive

logo

As a developer/engineer in the Hadoop and Big Data space, you tend to hear a lot about file formats. All have their own benefits and trade-offs: storage savings, split-ability, compression time, decompression time, and much more. All of these factors play a huge role in what file formats you use for your projects, or as a team or company-wide standard. Continue reading