High Scalability -

Entries in hive (3)

Monday

Aug032009

Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2

Monday, August 3, 2009 at 11:18AM

This tutorial will show you how to use Amazon EC2 and Cloudera's Distribution for Hadoop to run batch jobs for a data intensive web application.

During the tutorial, we will perform the following data processing steps.... read more on Cloudera website

mg1313 |

Hive - A Petabyte Scale Data Warehouse using Hadoop

Wednesday, June 10, 2009 at 7:02AM

This post about using Hive and Hadoop for analytics comes straight from Facebook engineers.

Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics.

These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product.

As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook.

Read the rest of the article on Engineering @ Facebook's Notes page

Todd Hoff |