Making Hadoop 1000x Faster for Graph Problems
Wednesday, July 27, 2011 at 9:07AM
HighScalability Team in Hadoop, Map Reduce, Paper, Strategy, graph

Dr. Daniel Abadi, author of the DBMS Musings blog and Cofounder of Hadapt, which offers a product improving Hadoop performance by 50x on relational data, is now taking his talents to graph data in Hadoop's tremendous inefficiency on graph data management (and how to avoid it), which shares the secrets of getting Hadoop to perform 1000x better on graph data.

TL;DR:

Voila! That's a 10x * 10x * 10x = 1000x performance improvement on graph problems using techniques that make a lot of sense. What may be less obvious is the whole idea of keeping the Hadoop shell and making the component parts more efficient for graph problems. Hadoop stays Hadoop externally, but internally has graph super powers. These are strategies you can use.

What I found most intriguing is thinking about the larger consequences of Hadoop being inefficient. There's more in play than I had previously considered. From the most obvious angle, money, we are used to thinking this way about mass produced items. If a widget can be cost reduced by 10 cents and millions of them are made, we are talking real money. If Hadoop is going to be used for the majority of data mining problem, then making it more efficient adds up to real effects. Going to the next level, the more efficient Hadoop becomes, the quicker important problems facing the world will be solved. Interesting.

For more details please read the original article and the paper describing the work: Scalable SPARQL Querying of Large RDF Graphs.

Relate Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.