This series is designed to be a “get off the ground” guide to Apache Oozie, a job scheduling framework for Hadoop. Oozie offers multi-action workflow scheduling, ability to run actions in parallel, and great APIs. This guide is designed to help you answer your Oozie technical questions.
In the world of Hadoop, we have jobs. We have complex, multi-action, iterative jobs. We have jobs that need to be scheduled by something other than your co-worker Clint sitting on the command line on a 32-bit Thinkpad.
In comes Apache Oozie, one of the first and most popular tools on the Hadoop common stack. Besides the initially unsettling name, what do we know about Oozie?
Oozie is an Apache top-level project used to schedule jobs and workflows on Apache Hadoop. Oozie combines multiple jobs sequentially into one logical sequence of work, known to many as a workflow. Oozie is integrated with the native Hadoop stack (which means it’s probably part of your distribution), with YARN as its architectural center (Yet Another Resource Negotiator), and supports Hadoop jobs for MapReduce, Pig, Hive, and Sqoop. and Spark (all are top-level Apache projects).
Oozie can also schedule jobs specific to a system, like Java programs (jars), shell and Python scripts, and more. Anything that can run on your cluster can, by and large, be ran via an Oozie workflow.
Why Use Oozie?
Simple. You need one or more of the following:
- chain actions together in a sequence to form a complete job
- run certain actions in parallel
- schedule jobs to run on a regular basis (hourly, daily, weekly, etc)
- Oozie workflows are Directed Acyclical Graphs (DAGs)
- Oozie coordinators are the components that make scheduled, recurring workflows possible
Oozie can be enhanced with the use of another Apache top-level project, Apache Falcon, for a finer degree of instance control and abstraction.
Now… you know what Oozie is. Next time I’ll talk about the technical components, how they work, and how to get a workflow built and scheduled! Thanks for reading.
Next Tutorial >>