Friday, July 27, 2012

Data Architect's journey with Hadoop

Posting here has been pretty lame.  Mostly b/c I've been busy as hell with work.  We've had two large released in the span of 6 months that while awesome, has left me with ZERO time to do anything here.

Anyway, its finally time to have some fun.  We're getting ready to roll out our own API platform so each team can call our service to record events they want us to track.  Considering the amount of data this service will be consuming (a metric shit-ton), we're looking at Hadoop as the back-end.

When offered the opportunity to POC Hadoop, I jumped at the chance.  I love new stuff, and this is totally different from anything I've used before.  And it relies on two things I'm exceptionally WEAK at; Linux and Java.  What better way to strengthen my weaknesses than to jump head first into something that will test the hell out of it (its how I got to where I am today).

Anyway enough of that.  My Hadoop cluster is currently being setup by our IT department.  In the meantime I'm prepping as much as I can.  I figured I would catalog my journey here.  As I believe the way the data industry is going, many traditional data warehouse developers will be looking at Hadoop as a solution.

So to get started, I hit the books and the net.


  1. Brent Ozar gives a really good overview of what Hadoop is, and how as a data warehouse developer, we may interact with it:   http://www.brentozar.com/archive/2011/11/hadoop-basics-for-sql-server-dbas/
  2. Hadoop, The Definitive Guide (Tom White).  Its the first edition, so its a few years old, but it is of course still relevant.
  3. Prior to getting our POC environment, I want to setup a local dev Hadoop environment.  As we're going to POC a Cloudera version, I'm starting here:   https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM+CDH3u3
    1. I'm also setting up Eclipse as my Java IDE.  I'm going to rib something up on windows using the following a guide.  When I get this up and running, I'll make sure to post about it:   http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
So that's it for now.  I will update as this project moves forward over the coming weeks.  Hopefully I won't suck at updating this blog, and more importantly, hopefully I won't fail more than usual getting this project off the ground.

GLHF

No comments:

Post a Comment