The Big Data industry has a problem: what makes a Hadoop Developer? Is it someone who has general knowledge about the many tools available in a typical Hadoop ecosystem? Or is it someone who regularly commits to the Apache projects and pushes Hadoop to new levels? I think it’s somewhere in the middle of both.A Hadoop Developer knows about the core Hadoop components like Hive, Pig, Oozie, HDFS, Sqoop, or Flume (list not comprehensive), they can use all these tools together to ingest, analyze and present data, and they’re not afraid to look into the source code to see what’s really going on behind the scenes.
From there, I propose that there are certain tiers of developer (modeled after the Sith in Star Wars, naturally): Apprentice Hadoopster, Marauding Hadoopster, Master Hadoopster, and Hadoop Lord.
The Apprentice Hadoopster has basic working knowledge of the Forc… I mean Hadoop tools: he has read the tutorials, spun up a virtual machine or two, scratched at CSV data with Pig or Hive, and can hold a conversation about Hadoop.
The Marauding Hadoopster is well versed in Hadoop tools. He is able to use his knowledge to solve any problem using Hadoop. These solutions may not be the most eloquent nor might they be the best solution, however. He can answer questions about virtually any tool: syntax, usage, and origin–whether or not he’s always right is less important. He is able to show an Apprentice Hadoopster the path to enlightenment, but being a mentor for the apprentice might not be the best idea.
The Master Hadoopster is an expert when it comes to knowledge of Hadoop. He has spent years using the various tools and is familiar with how and when they should be used. He also knows when NOT to use a certain tool. He knows where to look for information, he knows who to ask for help, and he knows the difference between a data problem and a big data problem. He can and should mentor and train apprentices and marauders.
The Hadoop Lords are Doug Cutting and Mike Cafarella, the inventors of Hadoop. That is all.
So really there are 3 levels of Hadoopster (unless you are Doug or Mike, in which case: hey!) that you should really worry about. And honestly, Hadoop is such a huge and intricate ecosystem of tools that a Master in one area will be an Apprentice in another. One person cannot know everything about every tool in Hadoop. Anyone who tells you otherwise does not know anything about Hadoop, I guarantee it.
So say you’ve been doing the “Hadoop Thing” for a while now and feel that you deserve some recognition of your mastery of Big Data. One such gratification of that desire is the Hortonworks Data Platform Certified Developer (HDPCD) Exam. From that website:
The purpose of this exam is to provide organizations that use Hadoop with a means of identifying suitably qualified staff to develop Hadoop applications for storing, processing, and analyzing data stored in Hadoop using the open-source tools of the Hortonworks Data Platform (HDP), including Pig, Hive, Sqoop and Flume.
I think that’s a noble purpose for a certification exam. Also considering the exam is being given by one of the largest distributors of Hadoop in the world, I think they have a decent grasp on what a Hadoop Developer should be able to do.
In the coming posts, I’ll go over everything you need to know for this exam according to Hortonworks. Not as a way to help others cheat on it, but to help others along the path of enlightenment. I also want to make sure that I understand everything that a Certified Developer needs to know.
Knowledge is Power!