High Scalability -

Entries by HighScalability Team (1576)

Thursday

Oct212010

Machine VM + Cloud API - Rewriting the Cloud from Scratch

Thursday, October 21, 2010 at 8:26AM

Write a little "Hello World" program these days and it runs inside a bewildering Russian Doll of nested environments, each layer adding its own special performance and complexity tax. First, a language executes in its own environment of data structure libraries, memory management, and so on. That, more often than not, will run inside a language VM like the JVM, CLR, or V8. The language VM will in-turn run inside a process that runs inside an OS. An application will run in one or more threads inside a process. And the whole thing will run inside a machine sharing VM layer like Xen. And across all of that are frameworks for monitoring, elasticity, storage, and so on. That's a lot of overhead for a such a little program.

What if we could remove all these taxes and run directly on the new bare metal, which some consider to be a combination of Machine VM + Cloud API? That's exactly what a system called Mirage, described in the paper Turning down the LAMP: Software Specialisation for the Cloud, sets out to do by treating the cloud virtual hardware as a compiler target, and converting high-level language source code directly into kernels that run on it.

Click to read more ...

HighScalability Team |

9 Comments |

Permalink |

Print Article

Email Article

Paper,

amazon

Tuesday

Oct192010

Who's Hiring?

Playfish is hiring an Engineering Director, Marketing & CRM and a Sr. Systems Operation Engineer.
Electronic Arts Mobile is looking for Server Engineers.
Tagged is hiring Engineers, Product Managers, Game Designers, and more.
Undertone is looking for a Software Engineer and a QA Engineer.
Box.net is Looking for a Software Engineer.
Wiredrive is looking for an Operations Engineer
deviantART is Hiring a Web Application Developer

Fun and Informative Events

Membase Meetups Coming to Major US Cities. The first of these technical meetups is on October 28 at Zynga’s San Francisco offices.

Cool Products and Services

Download this free whitepaper from Joyent: Performance and Scale In Cloud Computing
CloudSigma. Instantly scalable European cloud servers.
ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
www.site24x7.com : Monitor End User Experience from a global monitoring network.

Click to read more ...

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Troubles with Sharding - What can we learn from the Foursquare Incident?

Friday, October 15, 2010 at 8:15AM

For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems. Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too.

Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes?

Click to read more ...

HighScalability Team |

19 Comments |

Permalink |

Print Article

Email Article

Strategy,

postmortem,

sharding

Friday

Oct082010

4 Scalability Themes from Surgecon

Friday, October 8, 2010 at 8:42AM

Robert Haas in his SURGE Recap of the Surge conference, reflected a bit, and came up with an interesting checklist of general themes from what he was seeing. I'm directly quoting his post, so please see the post for a full discussion. He uses this framework to think about the larger picture and where PostgreSQL stands in its progression.

Make use of the academic literature. Inventing your own way to do something is fine, but at least consider the possibility that someone smarter than you has thought about this problem before.
Failures are inevitable, so plan for them. Try to minimize the possibility of cascading failures, and plan in advance how you can operate in degraded mode if disaster (or the Slashdot effect) strikes.
Disk technology matters. Drive firmware bugs are common and nightmarish, and you can expect very limited help from the manufacturer, especially if the drive is billed as consumer-grade rather than enterprise-grade. SSDs can save you a lot of money, both because a given number of dollars buys more IOs-per-second, and because electricity isn't free.
Large data sets require horizontal scalability. In the era of 1TB drives, "large" doesn't mean quite what it used to, but even though the amount of data you can manage with one machine is growing all the time, the amount of data people want to manage is growing even faster.

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Strategy

Thursday

Oct072010

Hot Scalability Links For Oct 8, 2010

Thursday, October 7, 2010 at 2:46PM

So what happened at the Surge 2010 Conference? A lot...
- Cosimo Streppone with a nicely detailed day by day summary.
- Conference organizer Theo Schlossnagle says: Surge. Year one. Done. Awesome.
- Robert Haas with SURGE Recap
- From SEOmoz we have their SurgeCon Review – our thoughts on a fabulous web scalability conference
- Scaling myYearbook.com
- Availability, The Cloud and Everything by Joe Williams of Cloudant.
- There's Plenty of Room at the Bottom. An Invitation to Explore with Network Flows by Benjamin Black.
Quotable quotes for 200 Alex:
- postwait: That which cannot be measured cannot be scaled.
- gsylvest: Cameron Purdy:Instead of performance, think about scalability. Michael Nygaard:Instead of scalability, think of response time distributions
- jkalucki: Sometimes this job is a bit too much like high-energy physics: Blast the invisible with a beam of hell fury, then decode the backscatter.
- DrHayt: Shard for availability, not scalability. Do what is necessary to make sure that your shards to not share the same failure domain.
Click to read more ...

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

hot links

Tuesday

Oct052010

Who's Hiring?

Box.net is Looking for a Software Engineer.
Wiredrive is looking for an Operations Engineer
deviantART is Hiring a Web Application Developer

Cool Products and Services

Download this free whitepaper from Joyent: Performance and Scale In Cloud Computing
CloudSigma. Instantly scalable European cloud servers.
ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
www.site24x7.com : Monitor End User Experience from a global monitoring network.

Click to read more ...

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Paper: An Analysis of Linux Scalability to Many Cores

Monday, October 4, 2010 at 6:19AM

An Analysis of Linux Scalability to Many Cores, by a number of MIT researchers, is a refreshingly practical paper on what it takes to scale Linux and common applications like Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce to run on 48 core systems. A very timely paper given moderately massive multicore systems are reportedly the near future of computing.

This paper must have taken a lot of work. They both tracked down bottlenecks in a number of applications and the Linux kernel and they also tried to fix them. Modestly speaking the authors said they made "modest" changes to the kernel and applications, but there's nothing modest about what they did. It's excellent work.

After the next bit, which is the abstract, there is a list of the problems they found and how they fixed them.

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Paper

Friday

Oct012010

Hot Scalability Links For Oct 1, 2010

Friday, October 1, 2010 at 8:22AM

Zynga Moves 1 Petabyte Of Data Daily; Adds 1,000 Servers A Week by Leena Rao. 10 percent of the world’s internet population (approximately 215 million monthly users) has played a Zynga game.
Jeremy Cole writes an incredible article explaining why MySQL will swap on a NUMA system when there's apparently a lot of free RAM. The MySQL “swap insanity” problem and the effects of the NUMA architecture. These boxes are now essentially appliances, maybe it's time for a special built OS that doesn't try to act like a timesharing system sharing scarce resources?
Linux Kernel Tuning for C500k by Jared "Lucky" Kuolt. Instead of running 100 servers with 10,000 connections each, we’d rather run 2 servers with 500,000 connections apiece.

Click to read more ...

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Friday

Oct012010

Google Paper: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Friday, October 1, 2010 at 7:47AM

This paper, Large-scale Incremental Processing Using Distributed Transactions and Notifications by Daniel Peng and Frank Dabek, is Google's much anticipated description of Percolator, their new real-time indexing system.

The abstract:

Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efﬁciency.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

HighScalability Team |

Post a Comment |

Permalink |

Print Article

Email Article

Paper,

google

Thursday

Sep302010

More Troubles with Caching

Thursday, September 30, 2010 at 2:52PM

As a tasty pairing with Facebook And Site Failures Caused By Complex, Weakly Interacting, Layered Systems, is another excellent tale of caching gone wrong by Peter Zaitsev, in an exciting twin billing: Cache Miss Storm and More on dangers of the caches. This is fascinating case where the cause turned out to be software upgrade that ran long because it had to be rolled back. During the long recovery time many of the cache entries timed out. When the database came back, slam, all the clients queried the database to repopulate the cache and bad things happened to the database. The solution was equally interesting:

Click to read more ...

HighScalability Team |

6 Comments |

Permalink |

Print Article

Email Article

Strategy,

cache