Entries in hive (3)

Monday
Aug032009

Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2

This tutorial will show you how to use Amazon EC2 and Cloudera's Distribution for Hadoop to run batch jobs for a data intensive web application.

During the tutorial, we will perform the following data processing steps.... read more on Cloudera website

Wednesday
Jun102009

Hive - A Petabyte Scale Data Warehouse using Hadoop

This post about using Hive and Hadoop for analytics comes straight from Facebook engineers.

Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics.

These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product.

As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook.

Read the rest of the article on Engineering @ Facebook's Notes page

Monday
May112009

Facebook, Hadoop, and Hive

Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first.

Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.