Spark History Server Automatic Cleanup

largelogpile
I wonder how much paper you’d need to print 1.5 Tb of logs…

If you’ve been running Spark applications for a few months, you might start to notice some odd behavior with the history server (default port 18080). Specifically, it’ll take forever to load the page, show links to applications that don’t exist or even crash. Three parameters take care of this once and for all. Continue reading

The Basics of Administrating a Hadoop Cluster

So assuming you followed and completed my first post, Getting Started with Hortonworks Data Platform 2.3, you should now have your very own Hadoop cluster (albeit, it pales slightly to Yahoo!’s reported 4,500 node cluster).

Continue reading

Getting Started with Hortonworks Data Platform 2.3

As my first post, I’m going to walk through setting up Hortonworks Data Platform (HDP) 2.3. HDP is very nice because it is free to use at any level for any sized cluster, from curious developers with virtual environments to Fortune 50 companies with 100+ node clusters. The cost comes from requiring support on Hortonworks‘ software. To get your very own Hadoop cluster going, read on!
Continue reading