Entries in data (4)

Thursday
Jun272019

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Ready to transition from a commercial database to open source, and want to know which databases are most popular in 2019? Wondering whether an on-premise vs. public cloud vs. hybrid cloud infrastructure is best for your database strategy? Or, considering adding a new database to your application and want to see which combinations are most popular? We found all the answers you need at the Percona Live event last month, and broke down the insights into the following free trends reports:

Click to read more ...

Tuesday
Feb192019

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.

Sharding with Redis Cluster

Click to read more ...

Wednesday
Jun262013

Leveraging Cloud Computing at Yelp - 102 Million Monthly Vistors and 39 Million Reviews

This is a guest post by Yelp's Jim Blomo. Jim manages a growing data mining team that uses Hadoop, mrjob, and oddjob to process TBs of data. Before Yelp, he built infrastructure for startups and Amazon. Check out his upcoming talk at OSCON 2013 on Building a Cloud Culture at Yelp.

In Q1 2013, Yelp had 102 million unique visitors (source: Google Analytics) including approximately 10 million unique mobile devices using the Yelp app on a monthly average basis. Yelpers have written more than 39 million rich, local reviews, making Yelp the leading local guide on everything from boutiques and mechanics to restaurants and dentists. With respect to data, one of the most unique things about Yelp is the variety of data: reviews, user profiles, business descriptions, menus, check-ins, food photos... the list goes on.  We have many ways to deal data, but today I’ll focus on how we handle offline data processing and analytics.

In late 2009, Yelp investigated using Amazon’s Elastic MapReduce (EMR) as an alternative to an in-house cluster built from spare computers.  By mid 2010, we had moved production processing completely to EMR and turned off our Hadoop cluster.  Today we run over 500 jobs a day, from integration tests to advertising metrics.  We’ve learned a few lessons along the way that can hopefully benefit you as well.

Job Flow Pooling

Click to read more ...

Monday
Aug032009

Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2

This tutorial will show you how to use Amazon EC2 and Cloudera's Distribution for Hadoop to run batch jobs for a data intensive web application.

During the tutorial, we will perform the following data processing steps.... read more on Cloudera website