High Scalability -

Entries by HighScalability Team (1576)

Friday

Oct302009

Hot Scalabilty Links for October 30 2009

Friday, October 30, 2009 at 6:58AM

Life beyond Distributed Transactions: an Apostate’s Opinion by Pat Helland. In particular, we focus on the implications that fall out of assuming we cannot have large-scale distributed transactions.
Tragedy of the Commons, and Cold Starts - Cold application starts on Google App Engine kill your application's responsiveness.
Intel’s 1M IOPS desktop SSD setup by Kevin Burton. What do you get when you take 7 Intel SSDs and throw them in a desktop? 1M IOPS
Videos from NoSQL Berlin sessions. Nicely done talks on CAP, MongoDB, Redis, 4th generation object databases, CouchDB, and Riak.
Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean of Google describing how they do their thing. Here are some glosses on the talk by Greg Linden and James Hamilton. You really can't do better than Greg and James.
- Advice from Google on Large Distributed Systems by Greg Linden. A nice summary of Jeff Dean's talk. A standard Google server appears to have about 16G RAM and 2T of disk; Things will crash. Deal with it!; When designing for scale, you should design for expected load, ensure it still works at x10, but don't worry about scaling to x100.
- Jeff Dean: Design Lessons and Advice from Building Large Scale Distributed Systems by James Hamilton. A data center wide storage hierarchy; Failure Inevitable; Excellent set of distributed systems rules of thumb; Typical first year for a new cluster; GFS Usage at Google; Working on next generation Big Table system called Spanner.

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

google,

hot links,

nosql

Thursday

Oct292009

Paper: No Relation: The Mixed Blessings of Non-Relational Databases

Thursday, October 29, 2009 at 9:14AM

This excellent survey of the field was written by Ian Thomas Varley as part of his Master of Science in Engineering program.

The aim of this paper is to explore the conceptual design space of non-relational databases as compared to traditional relational databases. It is clear that the design needs of the two paradigms are different, but how fundamental are the differences, and what strategies can we use to transition our conceptual designs from one to the other?

There are a few things to like about this paper. A running a example is used to show the different ways to model data depending on which type of solution you are targeting, especially covering how many-to-many relationships are modeled, data integrity, and how to support optional attributes. There's also a brief survey of some of the major systems.

The most interesting section of the report is where it tackles the problem of design for non-relational systems. The approach has two different phases: design questions and design strategies.

The questions you should ask yourself about your problem are:

Click to read more ...

HighScalability Team |

5 Comments |

Permalink |

Print Article

Email Article

Paper,

key-value store,

nosql

Wednesday

Oct282009

And the winner is: MySQL or Memcached or Tokyo Tyrant?

Wednesday, October 28, 2009 at 7:05AM

Matt, from the ever excellent MySQL Performance Blog, decided to run a test using a simple scenario drawn from his client experience in the gaming space. The scenario: read a row based on a primary key, update the row, write it to disk, and use the row to lookup another row. Matt ran three different tests explained in a series of three different articles: MySQL and MySQL + Memcached, Memcached Only, and Tokyo Tyrant.

The lovingly compiled details along with many cool graphs are in the articles, but in general the lessons learned are:

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Monday

Oct262009

Facebook's Memcached Multiget Hole: More machines != More Capacity

Monday, October 26, 2009 at 7:00AM

When you are on the bleeding edge of scale like Facebook is, you run into some interesting problems. As of 2008 Facebook had over 800 memcached servers supplying over 28 terabytes of cache. With those staggering numbers it's a fair bet to think they've seen their share of Dr. House worthy memcached problems.

Jeff Rothschild, Vice President of Technology at Facebook, describes one such problem they've dubbed the Multiget Hole.

You fall into the multiget hole when memcached servers are CPU bound, adding more memcached servers seems like the right way to add more capacity so more requests can be served, but against all logic adding servers doesn't help serve more requests. This puts you in a hole that adding more servers can't dig you out of. What's the treatment?

Click to read more ...

HighScalability Team |

14 Comments |

Permalink |

facebook

Thursday

Oct222009

Paper: The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM

Thursday, October 22, 2009 at 5:40AM

Stanford Info Lab is taking pains to document a direction we've been moving for a while now, using RAM not just as a cache, but as the primary storage medium. Many quality products have built on this model. Even if the vision isn't radical, the paper does produce a lot of data backing up the transition, which is in itself helpful. From the The Abstract:

Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale grace-fully to meet the needs of large-scale Web applications, and improvements in disk capacity have far out-stripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of data-intensive applications.

Hacker News Thread
Are Cloud Based Memory Architectures the Next Big Thing?
Stanford Info Lab
The SSD Relapse: Understanding and Choosing the Best SSD
Put that database in memory by Greg Linden
Randy Shoup & John Ousterhout at HPTS 2009 by James Hamilton

HighScalability Team |

4 Comments |

Permalink |

memory-grid

Monday

Oct192009

Drupal's Scalability Makeover - You give up some control and you get back scalability

Monday, October 19, 2009 at 9:18AM

Drupal 7 is having a scalability makeover. Karoly Negyesi, Drupal Core Developer and Public Development Team Lead, explains the process in this video: Drupal 7 APIs, scalability mindset. Karoly states the general theme of the changes as: You give up some control and you get back scalability. An interesting comment on the politics of scalability?

Makeover may not be quite the right word though. A makeover implies a cosmetic change, looking better by changing the surface. Drupal's changes will go deeper than that, right to Drupal's core. It's a genuine and authentic change that will hopefully allow one of the Internet's most venerable Content Management Systems (CMSs) to compete with a constant stream of younger and sexier models.

Drupal is based on an older LAMP stack approach where PHP modules are scooped up and merged together each time a request is made to Drupal. Drupal's most intriguing idea is how it is built, expands, and changes by weaving together a single system out of individual components called modules. Built-in modules include comments, RSS, contact forms, forums, and Clean URLs. Add in modules include things like CSE to add Google's Custom Search Engine, modules to add in AdSense, CAPTCHA, and Sitemaps. Drupal establishes AOP extension points that allow modules to work remarkably well together, creating a site that feels like one single site even though it has been constructed from dozens of modules hunted and gathered from all over the digital world.

The problem is the PHP code can directly access the database and directly render to the UI, there is little required layering. Part of Drupal's amazing configurability and extensibility has been how easy it is for everything to work together by changing the database. But when there's no layering it's almost impossible to optimize the system. If you have 20 different modules they each can make 20 separate calls to the database when what we really want is one call. And because of the direct SQL access when the number of writes increases there's no systematic way to distribute the writes across multiple servers. So we see as Drupal sites grow in the number of modules and the number of users both performance and scalability tank.

The younger models architect their systems differently. Sites like Google, Amazon, Facebook are written terms of an API and a framework, a service based approach. Using a service based approach the web tier can be programmed in terms of services that themselves are scalable so the entire system is scalable. When the API is skipped there are no leverage points that can be made to scale. It becomes a big ball of mud.

More layering and more APIs is exactly the direction Drupal is taking. Exactly how is Drupal changing?

Click to read more ...

HighScalability Team |

7 Comments |

Permalink |

Print Article

Email Article

Thursday

Oct152009

Hot Scalability Links for Oct 15 2009

Thursday, October 15, 2009 at 9:22AM

Update: Social networks in the database: using a graph database. Anders Nawroth puts graphs through their paces by representing, traversing, and performing other common social network operations using a graph database.

Update: Deployment with Capistrano by Charles Max Wood. Simple step-by-step for using Capistrano for deployment.

Log-structured file systems: There's one in every SSD by Valerie Aurora. SSDs have totally changed the performance characteristics of storage! Disks are dead! Long live flash!

An Engineer's Guide to Bandwidth by DGentry. It's a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions.

Analyzing air traffic performance with InfoBright and MonetDB by Vadim of the MySQL Performance Blog.

Scalable Delivery of Stream Query Result by Zhou, Y ; Salehi, A ; Aberer, K. In this paper, we leverage Distributed Publish/Subscribe System (DPSS), a scalable data dissemination infrastructure, for efficient stream query result delivery.

HighScalability Team |

Why are Facebook, Digg, and Twitter so hard to scale?

Tuesday, October 13, 2009 at 7:35AM

Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild, Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out.

Traditional websites are easier to scale than social networking sites for two reasons:

Click to read more ...

HighScalability Team |

20 Comments |

Permalink |

facebook

Monday

Oct122009

High Performance at Massive Scale – Lessons learned at Facebook

Monday, October 12, 2009 at 1:53PM

Jeff Rothschild, Vice President of Technology at Facebook gave a great presentation at UC San Diego on our favorite subject: "High Performance at Massive Scale – Lessons learned at Facebook". The abstract for the talk is:

Facebook has grown into one of the largest sites on the Internet today serving over 200 billion pages per month. The nature of social data makes engineering a site for this level of scale a particularly challenging proposition. In this presentation, I will discuss the aspects of social data that present challenges for scalability and will describe the the core architectural components and design principles that Facebook has used to address these challenges. In addition, I will discuss emerging technologies that offer new opportunities for building cost-effective high performance web architectures.

There's a lot of interesting about this talk that we'll get into later, but I thought you might want a head start on learning how Facebook handles 30K+ machines, 300 million active users, 20 billion photos, and 25TB per day of logging data.

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Strategy

Entries by HighScalability Team (1576)

Related Articles