Apache Ambari: Hello World!

Ambari-logo-300x141

Hello World with Ambari Views

Previously, I gave a brief overview of what an Ambari View is and how it can be beneficial to you.Let’s dive in! Continue reading

Advertisements

Preparing for the HDPCD Exam: Data Analysis With Hive

HWX_Badges_Cert_Color_Dev

With your data now in HDFS in an “analytic-ready” format (it’s all cleaned and in common formats), you can now put a Hive table on top of it.

Apache Hive is a RDBMS-like layer for data in HDFS that allows you to run batch or ad-hoc queries in a SQL-like language. This post will go over what you need to know about Apache Hive in preparation for the HDPCD Exam.  Continue reading

Preparing for the HDPCD Exam: Data Transformation

HWX_Badges_Cert_Color_Dev

So after getting data into HDFS, it’s often not pretty. At the very least, it’s a little disorganized, sparse, and in generally not ready for analytics. It’s a Certified Developer’s job to clean it up a little.

That’s where Apache Pig can come in handy! This post will cover the basics in transforming data in HDFS using Apache Pig for preparation of the HDPCD Exam.  Continue reading

Preparing for the HDPCD Exam: Data Ingestion

HWX_Badges_Cert_Color_Dev

In order to do big data, you need… DATA. No surprise there! Hadoop is a different beast than other environments and getting data into HDFS can be a bit intimidating if you’re not familiar. If only there were good documentation about these tasks…

Luckily there is good documentation! This post will cover the basics involved in ingesting data into a Hadoop cluster using the HDPCD Exam study guide.  Continue reading

How to Run a Jar in Oozie with Java Actions

oozie_282x1178

You probably know how jars work. Jars, short for Java Archives, are zipped up packages of Java class files with or without dependencies included. In most cases, it’s just your application code, and dependencies live elsewhere and are exported into a classpath. While we’ll cover that topic another day, let’s focus on the task at hand: getting your Jar running in Oozie. Continue reading

Loading Data into Hive Using a Custom SerDe

Welcome back! If you read my previous post, you know that we’ve run into an issue with our Chicago crime data that we just loaded into HIve. Specifically, one of the columns has commas included implicitly in the row data. Read on to learn how to fix this!

Continue reading

Analyzing Chicago Crime Data with Apache Hive on HDP 2.3

After a brief hiatus in the great state of Alaska, I’m back to discuss actually analyzing data on your new Hadoop cluster that we set up together in previous blog posts. Specifically we’ll be looking at crime data from the City of Chicago from 2001 to the day this was first written, 8/26/2015. There’s a couple things we need to take care of before we get started though, Sherlock.

Continue reading

The Basics of Administrating a Hadoop Cluster

So assuming you followed and completed my first post, Getting Started with Hortonworks Data Platform 2.3, you should now have your very own Hadoop cluster (albeit, it pales slightly to Yahoo!’s reported 4,500 node cluster).

Continue reading