Create a Hive UDF: More Flexible Array Access

 

webHeaderHadoopstersNew

This article will show you how to create a simple UDF that offers more flexibility in interacting with arrays in Hive, such as a negative indexing approach to element access. Continue reading

How to Create a Simple Hive UDF

java_hive

There are many functions in Hive that can help analyze your data. But there are times when you need more functionality, sometimes custom. Or at least functionality that is possible without paragraphs of ugly, layered-sub-queried SQL.

That’s where Hive UDFs come in very handy. Continue reading

Preparing for the HDPCD Exam: Data Analysis With Hive

HWX_Badges_Cert_Color_Dev

With your data now in HDFS in an “analytic-ready” format (it’s all cleaned and in common formats), you can now put a Hive table on top of it.

Apache Hive is a RDBMS-like layer for data in HDFS that allows you to run batch or ad-hoc queries in a SQL-like language. This post will go over what you need to know about Apache Hive in preparation for the HDPCD Exam.  Continue reading

Preparing for the HDPCD Exam: Data Transformation

HWX_Badges_Cert_Color_Dev

So after getting data into HDFS, it’s often not pretty. At the very least, it’s a little disorganized, sparse, and in generally not ready for analytics. It’s a Certified Developer’s job to clean it up a little.

That’s where Apache Pig can come in handy! This post will cover the basics in transforming data in HDFS using Apache Pig for preparation of the HDPCD Exam.  Continue reading