Preparing for the HDPCD Exam: Data Analysis With Hive


With your data now in HDFS in an “analytic-ready” format (it’s all cleaned and in common formats), you can now put a Hive table on top of it.

Apache Hive is a RDBMS-like layer for data in HDFS that allows you to run batch or ad-hoc queries in a SQL-like language. This post will go over what you need to know about Apache Hive in preparation for the HDPCD Exam.  Continue reading