If you’ve spent any time with a Hortonworks Data Platform cluster, you’re familiar with Ambari. It’s one of the finest, open source cluster management tools that allows you to easily first launch a cluster, add or remove nodes, change configurations and add services to your cluster. Using Ambari takes a lot of the guesswork out of managing a hadoop cluster and I absolutely love it.
The one downside of Ambari is that it can be tedious to add functionality to the core client. For that reason, the smart people building the tool in Apache decided to add something called an Ambari View. An Ambari View is a way to extend the functionality of Ambari without going down the rabbit hole of modifying Ambari’s source code. Views are essentially plug-and-play tools that only require restarting your cluster to work.
In the following blog post, I’ll discuss getting your View off the ground and show you several tips about actually using them.
Next Post: Apache Ambari: Hello World!
Hello World with Ambari Views
Previously, I gave a brief overview of what an Ambari View is and how it can be beneficial to you.Let’s dive in! Continue reading
With your data now in HDFS in an “analytic-ready” format (it’s all cleaned and in common formats), you can now put a Hive table on top of it.
Apache Hive is a RDBMS-like layer for data in HDFS that allows you to run batch or ad-hoc queries in a SQL-like language. This post will go over what you need to know about Apache Hive in preparation for the HDPCD Exam. Continue reading
So after getting data into HDFS, it’s often not pretty. At the very least, it’s a little disorganized, sparse, and in generally not ready for analytics. It’s a Certified Developer’s job to clean it up a little.
That’s where Apache Pig can come in handy! This post will cover the basics in transforming data in HDFS using Apache Pig for preparation of the HDPCD Exam. Continue reading
In order to do big data, you need… DATA. No surprise there! Hadoop is a different beast than other environments and getting data into HDFS can be a bit intimidating if you’re not familiar. If only there were good documentation about these tasks…
Luckily there is good documentation! This post will cover the basics involved in ingesting data into a Hadoop cluster using the HDPCD Exam study guide. Continue reading
The Big Data industry has a problem: what makes a Hadoop Developer? Is it someone who has general knowledge about the many tools available in a typical Hadoop ecosystem? Or is it someone who regularly commits to the Apache projects and pushes Hadoop to new levels? I think it’s somewhere in the middle of both. Continue reading
So assuming you followed and completed my first post, Getting Started with Hortonworks Data Platform 2.3, you should now have your very own Hadoop cluster (albeit, it pales slightly to Yahoo!’s reported 4,500 node cluster).
As my first post, I’m going to walk through setting up Hortonworks Data Platform (HDP) 2.3. HDP is very nice because it is free to use at any level for any sized cluster, from curious developers with virtual environments to Fortune 50 companies with 100+ node clusters. The cost comes from requiring support on Hortonworks‘ software. To get your very own Hadoop cluster going, read on!
I learned today about a cool ETL/data pipeline/make your life easier tool that was recently released by the NSA (not kidding) as a way to manage the flow of data in and out of system: Apache NiFi. To me, that functionality seems to match PERFECTLY with what people like to do with Hadoop. This guide will just set up NiFi, not do anything with it (that’ll come later!)