How to Sqoop an RDBMS Source Directly to a Hive Table In Any Format

This tutorial will accomplish a few key feats that make ingesting data to Hive far less painless. In this writeup, you will learn not only how to Sqoop a source table directly to a Hive table, but also how to Sqoop a source table in any desired format (ORC, for example) instead of just plain old text.

Continue reading

Preparing for the HDPCD Exam: Data Ingestion


In order to do big data, you need… DATA. No surprise there! Hadoop is a different beast than other environments and getting data into HDFS can be a bit intimidating if you’re not familiar. If only there were good documentation about these tasks…

Luckily there is good documentation! This post will cover the basics involved in ingesting data into a Hadoop cluster using the HDPCD Exam study guide.  Continue reading

What is a Hadoop Developer?


The Big Data industry has a problem: what makes a Hadoop Developer? Is it someone who has general knowledge about the many tools available in a typical Hadoop ecosystem? Or is it someone who regularly commits to the Apache projects and pushes Hadoop to new levels? I think it’s somewhere in the middle of both. Continue reading

Pulling Data from Teradata to Hadoop with Apache Sqoop


If you have a Hadoop cluster, it’s rare that you don’t have some traditional row-column data you want to query. To do queries on that RDBMS (Relational Database Management System) data, you’ll want to pull that data from its system (perhaps a SQL Server, Oracle Database, or Teradata warehouse), and store it on Hadoop. Continue reading