Yesterday was a very good day for the Hadoop project.
Yahoo! announced they used a roughly 3800 node cluster to sort thru a Petabyte of data in a little over 16 hours. It’s an amazing feat for any project but especially one with so much potential as Hadoop.
The other good news was the release of mrtoolkit, a map-reduce library written in Ruby. It utilizes Hadoop Streaming and will make it easy to run jobs and crunch data. It comes out of the New York Times dev group and I applaud them.
I’ll have to figure out what the difference is between mrtoolkit and Wukong is so hopefully some sort of merging of the two can happen.