Getting Started with Apache Zeppelin

If Hindenburg taught us anything, you don’t want to mix Zeppelins with Sparks any day.

Apache Zeppelin, however, is a wonderful tool that combines Apache Spark with interactive data analytics and shareable notebooks and makes your big data usable!

Just don’t fill your computer with hydrogen gas.

Let’s get it installed and then do stuff!

FYI: My VirtualBox crashed horrendously when I tried to shut it down after writing this post (unrelated to installing Zeppelin) so I’m currently stuck not being able to do follow up posts.

Boot up your HDP2.3 VM that we set up previously and log in as root (password: hadoop).

I like adding a user to control different services so:

useradd zeppelin

Then as root, let’s get the latest version of Git, mostly because it’s a good thing to know how to do:

yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
yum install gcc perl-ExtUtils-MakeMaker
yum remove git
mkdir /home/zeppelin/prerequisites
wget https://github.com/git/git/archive/v2.4.8.tar.gz tar xzf git-2.0.4.tar.gz cd git-2.0.4 make prefix=/home/zeppelin/prerequisites/git all make prefix=/home/zeppelin/prerequisites/git install echo "export PATH=$PATH:/home/zeppelin/prerequisites/git/bin" >> /home/zeppelin/.bashrc source /home/zeppelin/.bashrc git --version

Now let’s switch to our new zeppelin user:

su – zeppelin

Let’s find out what version of Hadoop we’re running:

hadoop version

You should get 2.7.blahblahblah

And if you want to use Spark (you do) we need that version too:

spark-shell

You should see the ASCII art go flying by and it’ll tell you 1.3.1.

Neat, let’s do it!

cd /home/zeppelin/
git clone https://github.com/apache/incubator-zeppelin.git 

cd /home/zeppelin/incubator-zeppelin mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests

Make sure you pay attention to dashes and equal signs in that above command. Anyways, this will take a few minutes since we have to download stuff. Holy cow it’s still going.

Hey, 15 minutes later it worked!

Now let’s do the configuration dance:

cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-site.xml.template /home/zeppelin/incubator-zeppelin/conf/zeppelin-site.xml

 

Edit that file and change the port to something that doesn’t conflict with our cluster, I picked 4999 because I’m edgy.

cp /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh.template /home/zeppelin/incubator-zeppelin/conf/zeppelin-env.sh

hdp-select status hadoop-client 

Remember the output of that hdp-select command, we need it below:
And add the following to zeppelin-env.sh

export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557"

where the output of the hdp-select command is the number for ZEPPELIN_JAVA_OPTS.

NOW DID ALL THIS WORK??

cd /home/zeppelin/incubator-zeppelin
bin/zeppelin-daemon.sh start
Credit where it's due: I took most of the steps from https://zeppelin.incubator.apache.org/docs/0.5.6-incubating/install/yarn_install.html but tweaked them a little since we're already running a bunch of stuff from HDP.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s