High Scalability -

Thursday

Oct142010

I, Cloud

Thursday, October 14, 2010 at 4:50PM

Do we need Three Laws of Cloud? Not yet. Neither should we be overly concerned regarding reports of cloud leading to the elimination of IT.

Click to read more ...

Lori MacVittie, F5 Networks |

5 Comments |

Permalink |

Print Article

Email Article

Tuesday

Oct122010

The CIO’s Problem: Cloud “Mess” or Cloud “Mash”

Tuesday, October 12, 2010 at 4:44PM

Loved those mainframe days – you only needed one, but then came along the AS400’s and soon you had ten – but wait, you needed client server and SOA, oh sh%# – now I have ten thousand servers and I need to consolidate server and datacenter operations!

Is Cloud Computing going to follow the same path?

Click to read more ...

Lori MacVittie, F5 Networks |

1 Comment |

Permalink |

Print Article

Email Article

Friday

Oct082010

4 Scalability Themes from Surgecon

Friday, October 8, 2010 at 8:42AM

Robert Haas in his SURGE Recap of the Surge conference, reflected a bit, and came up with an interesting checklist of general themes from what he was seeing. I'm directly quoting his post, so please see the post for a full discussion. He uses this framework to think about the larger picture and where PostgreSQL stands in its progression.

Make use of the academic literature. Inventing your own way to do something is fine, but at least consider the possibility that someone smarter than you has thought about this problem before.
Failures are inevitable, so plan for them. Try to minimize the possibility of cascading failures, and plan in advance how you can operate in degraded mode if disaster (or the Slashdot effect) strikes.
Disk technology matters. Drive firmware bugs are common and nightmarish, and you can expect very limited help from the manufacturer, especially if the drive is billed as consumer-grade rather than enterprise-grade. SSDs can save you a lot of money, both because a given number of dollars buys more IOs-per-second, and because electricity isn't free.
Large data sets require horizontal scalability. In the era of 1TB drives, "large" doesn't mean quite what it used to, but even though the amount of data you can manage with one machine is growing all the time, the amount of data people want to manage is growing even faster.

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Strategy

Thursday

Oct072010

Hot Scalability Links For Oct 8, 2010

Thursday, October 7, 2010 at 2:46PM

So what happened at the Surge 2010 Conference? A lot...
- Cosimo Streppone with a nicely detailed day by day summary.
- Conference organizer Theo Schlossnagle says: Surge. Year one. Done. Awesome.
- Robert Haas with SURGE Recap
- From SEOmoz we have their SurgeCon Review – our thoughts on a fabulous web scalability conference
- Scaling myYearbook.com
- Availability, The Cloud and Everything by Joe Williams of Cloudant.
- There's Plenty of Room at the Bottom. An Invitation to Explore with Network Flows by Benjamin Black.
Quotable quotes for 200 Alex:
- postwait: That which cannot be measured cannot be scaled.
- gsylvest: Cameron Purdy:Instead of performance, think about scalability. Michael Nygaard:Instead of scalability, think of response time distributions
- jkalucki: Sometimes this job is a bit too much like high-energy physics: Blast the invisible with a beam of hell fury, then decode the backscatter.
- DrHayt: Shard for availability, not scalability. Do what is necessary to make sure that your shards to not share the same failure domain.
Click to read more ...

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

hot links

Tuesday

Oct052010

Who's Hiring?

Box.net is Looking for a Software Engineer.
Wiredrive is looking for an Operations Engineer
deviantART is Hiring a Web Application Developer

Cool Products and Services

Download this free whitepaper from Joyent: Performance and Scale In Cloud Computing
CloudSigma. Instantly scalable European cloud servers.
ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
www.site24x7.com : Monitor End User Experience from a global monitoring network.

Click to read more ...

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Paper: An Analysis of Linux Scalability to Many Cores

Monday, October 4, 2010 at 6:19AM

An Analysis of Linux Scalability to Many Cores, by a number of MIT researchers, is a refreshingly practical paper on what it takes to scale Linux and common applications like Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce to run on 48 core systems. A very timely paper given moderately massive multicore systems are reportedly the near future of computing.

This paper must have taken a lot of work. They both tracked down bottlenecks in a number of applications and the Linux kernel and they also tried to fix them. Modestly speaking the authors said they made "modest" changes to the kernel and applications, but there's nothing modest about what they did. It's excellent work.

After the next bit, which is the abstract, there is a list of the problems they found and how they fixed them.

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Paper

Friday

Oct012010

Hot Scalability Links For Oct 1, 2010

Friday, October 1, 2010 at 8:22AM

Zynga Moves 1 Petabyte Of Data Daily; Adds 1,000 Servers A Week by Leena Rao. 10 percent of the world’s internet population (approximately 215 million monthly users) has played a Zynga game.
Jeremy Cole writes an incredible article explaining why MySQL will swap on a NUMA system when there's apparently a lot of free RAM. The MySQL “swap insanity” problem and the effects of the NUMA architecture. These boxes are now essentially appliances, maybe it's time for a special built OS that doesn't try to act like a timesharing system sharing scarce resources?
Linux Kernel Tuning for C500k by Jared "Lucky" Kuolt. Instead of running 100 servers with 10,000 connections each, we’d rather run 2 servers with 500,000 connections apiece.

Click to read more ...

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Friday

Oct012010

Google Paper: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Friday, October 1, 2010 at 7:47AM

This paper, Large-scale Incremental Processing Using Distributed Transactions and Notifications by Daniel Peng and Frank Dabek, is Google's much anticipated description of Percolator, their new real-time indexing system.

The abstract:

Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efﬁciency.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Paper,

google

Thursday

Sep302010

More Troubles with Caching

Thursday, September 30, 2010 at 2:52PM

As a tasty pairing with Facebook And Site Failures Caused By Complex, Weakly Interacting, Layered Systems, is another excellent tale of caching gone wrong by Peter Zaitsev, in an exciting twin billing: Cache Miss Storm and More on dangers of the caches. This is fascinating case where the cause turned out to be software upgrade that ran long because it had to be rolled back. During the long recovery time many of the cache entries timed out. When the database came back, slam, all the clients queried the database to repopulate the cache and bad things happened to the database. The solution was equally interesting:

Click to read more ...

HighScalability Team |

6 Comments |

Permalink |

Print Article

Email Article

Strategy,

cache

Thursday

Sep302010

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems

Thursday, September 30, 2010 at 6:37AM

Facebook has been so reliable that when a site outage does occur it's a definite learning opportunity. Fortunately for us we can learn something because in More Details on Today's Outage, Facebook's Robert Johnson gave a pretty candid explanation of what caused a rare 2.5 hour period of down time for Facebook. It wasn't a simple problem. The root causes were feedback loops and transient spikes caused ultimately by the complexity of weakly interacting layers in modern systems. You know, the kind everyone is building these days. Problems like this are notoriously hard to fix and finding a real solution may send Facebook back to the whiteboard. There's a technical debt that must be paid.

The outline and my interpretation (reading between the lines) of what happened is:

Click to read more ...

HighScalability Team |

17 Comments |

Permalink |

admin,

postmortem