If Hindenburg taught us anything, you don’t want to mix Zeppelins with Sparks any day.
Apache Zeppelin, however, is a wonderful tool that combines Apache Spark with interactive data analytics and shareable notebooks and makes your big data usable!
Let’s get it installed and then do stuff!
FYI: My VirtualBox crashed horrendously when I tried to shut it down after writing this post (unrelated to installing Zeppelin) so I’m currently stuck not being able to do follow up posts.
Boot up your HDP2.3 VM that we set up previously and log in as root (password: hadoop).
I like adding a user to control different services so:
Then as root, let’s get the latest version of Git, mostly because it’s a good thing to know how to do:
yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
yum install gcc perl-ExtUtils-MakeMaker
yum remove git
wget https://github.com/git/git/archive/v2.4.8.tar.gz tar xzf git-2.0.4.tar.gz cd git-2.0.4 make prefix=/home/zeppelin/prerequisites/git all make prefix=/home/zeppelin/prerequisites/git install echo "export PATH=$PATH:/home/zeppelin/prerequisites/git/bin" >> /home/zeppelin/.bashrc source /home/zeppelin/.bashrc git --version
Now let’s switch to our new zeppelin user:
su – zeppelin
Let’s find out what version of Hadoop we’re running:
You should get 2.7.blahblahblah
And if you want to use Spark (you do) we need that version too:
You should see the ASCII art go flying by and it’ll tell you 1.3.1.
Neat, let’s do it!
git clone https://github.com/apache/incubator-zeppelin.git
cd /home/zeppelin/incubator-zeppelin mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests
Make sure you pay attention to dashes and equal signs in that above command. Anyways, this will take a few minutes since we have to download stuff. Holy cow it’s still going.
Hey, 15 minutes later it worked!
Now let’s do the configuration dance:
cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-site.xml.template /home/zeppelin/incubator-zeppelin/conf/zeppelin-site.xml
Edit that file and change the port to something that doesn’t conflict with our cluster, I picked 4999 because I’m edgy.
cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh.template /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh
hdp-select status hadoop-client
Remember the output of that hdp-select command, we need it below:And add the following to zeppelin-env.sh
export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_JAVA_OPTS="-Dhdp.version=220.127.116.11-2557"
where the output of the hdp-select command is the number for ZEPPELIN_JAVA_OPTS.
NOW DID ALL THIS WORK??
cd /home/zeppelin/incubator-zeppelin bin/zeppelin-daemon.sh start
Credit where it's due: I took most of the steps from https://zeppelin.incubator.apache.org/docs/0.5.6-incubating/install/yarn_install.html but tweaked them a little since we're already running a bunch of stuff from HDP.