I learned today about a cool ETL/data pipeline/make your life easier tool that was recently released by the NSA (not kidding) as a way to manage the flow of data in and out of system: Apache NiFi. To me, that functionality seems to match PERFECTLY with what people like to do with Hadoop. This guide will just set up NiFi, not do anything with it (that’ll come later!)
Things you’ll need:
- Maven > 3.1
And to use with Hadoop, obviously you’ll need:
- HDP > 2.1
You don’t even need root access!
Here’s how to get it running on your HDP2.3 cluster:
First we need to actually get the source code:
wget https://github.com/apache/nifi/archive/master.zip unzip master.zip cd nifi-master export MAVEN_OPTS="-Xms1024m -Xmx3076m -XX:MaxPermSize=256m" mvn -T C2.0 clean install
(Takes about 8 minutes to run all the tests)
After that’s done,
cd nifi-assembly/target tar -zxvf nifi-0.3.1-SNAPSHOT-bin.tar.gz cd nifi-0.3.1-SNAPSHOT vi conf/nifi.properties
On line 106 ( :106 in vim)
nifi.web.http.port=9000 bin/nifi.sh install service nifi start
if you’re not root.
Then navigate to http://localhost:9000/nifi
There you have it! With any luck, you now have NiFi installed!