High Scalability -

Entries in mapreduce (6)

Tuesday

Jan042011

Map-Reduce With Ruby Using Hadoop

Tuesday, January 4, 2011 at 9:03AM

A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

Click to read more ...

Phil Whelan |

1 Comment |

Permalink |

Print Article

Email Article

tagged

aamazon ec2,

hdfs,

jclouds,

whirr in

Hadoop,

amazon,

ruby

Monday

Jun292009

eHarmony.com describes how they use Amazon EC2 and MapReduce

Monday, June 29, 2009 at 7:31AM

This slide show presents eHarmony.com experience (one of the biggest dating sites out there) in using Amazon EC2 and MapReduce to scale their service.

Go to the Slideshare presentation

mg1313 |

Need help with your Hadoop deployment? This company may help!

Wednesday, October 15, 2008 at 8:05AM

A group of top Silicon Valley engineers (ex-Yahoo, Facebook, Google) have come together to launch a new startup called Cloudera. Not yet launched, it intends to help other companies adopt a promising software platform called Hadoop.

Hadoop is an open-source software project (written in Java) designed to let developers write and run applications that process huge amounts of data. While it could potentially improve a wide range of other software, the ecosystem supporting its implementation is still developing. Which is where Cloudera hopes to make a place for itself.

More on Hadoop: It uses the Google-introduced MapReduce systems framework that divides applications into small blocks of work, creating multiple replicas of data blocks that it places on various computer nodes.

It is already in use at large companies like Yahoo.

Read more about Cloudera here.

Click to read more ...

mg1313 |

Is MapReduce going mainstream?

Saturday, October 4, 2008 at 11:09PM

Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids

Click to read more ...

natis |

MapReduce framework Disco

Wednesday, September 3, 2008 at 2:42PM

Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.

Click to read more ...

tmielika |

1 Comment |

Permalink |

Print Article

Email Article

Product,

Python,

erlang,

map-reduce,

mapreduce

Monday

Aug112008

Distributed Computing & Google Infrastructure

Monday, August 11, 2008 at 10:27PM

A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5.

Click to read more ...

Todd Hoff |

2 Comments |

Permalink |

Print Article

Email Article

BigTable,

GFS,

google,

mapreduce