This tutorial will accomplish a few key feats that make ingesting data to Hive far less painless. In this writeup, you will learn not only how to Sqoop a source table directly to a Hive table, but also how to Sqoop a source table in any desired format (ORC, for example) instead of just plain old text.
It’s time to break down what they mean, how to use them, and how to get the best of both worlds. Continue reading
This post is the ninth in a hopefully substantive and informative series of posts about Apache Crunch, a framework for enabling Java developers to write Map-Reduce programs more easily for Hadoop.
As a developer/engineer in the Hadoop and Big Data space, you tend to hear a lot about file formats. All have their own benefits and trade-offs: storage savings, split-ability, compression time, decompression time, and much more. All of these factors play a huge role in what file formats you use for your projects, or as a team or company-wide standard. Continue reading