Six Lessons Learned Deploying a Large-scale Infrastructure in Amazon EC2

Lessons learned from OpenX's large-scale deployment to Amazon EC2:
Lessons learned from OpenX's large-scale deployment to Amazon EC2:
I had posted a note the other day about collectl and its ganglia interface but perhaps I wasn't provocative enough to get any responses so let me ask it a different way, specifically how do people monitor their clusters and more importantly how often? Do you monitor to get a general sense of what the system is doing OR do you monitor with the expectation that when something goes wrong you'll have enough data to diagnose the problem? Or both? I suspect both... Many cluster-based monitoring tools tend to have a data collection daemon running on each target node which periodically sends data to some central management station. That machine typically writes the data to some database from which it can then extract historical plots. Some even put up graphics in real-time. From my experience working with large clusters - and I'm talking either many hundreds or even 1000s of nodes, most have to limit both the amount of data they manage centrally as well as the frequency that they collect it, otherwise they'll overwhelm their management station because most DBs can't write hundreds of counters many times/minute from thousands of nodes. As a related example, how many of you run sar at the default monitoring interval of 10 minutes? Do you really think you're getting useful information? What happens if you have a 2 minutes burst of 100% network load and you're idle the other 8 minutes? Sar will happily tell you the network load was 20% and you'll never know your network is tanking. The point of all this is I do think there's a place for central monitoring, though I'm personally not a fan because of the inaccuracy of infrequent data samples, but I also appreciate some data is better than none, as long as you realize the inherent accuracy problems. And that's where collectl comes in and my previous comment about ganglia. When I wrote collectl my overarching design goal, from which I haven't wavered, was to provide highly accurate local data with minimal overhead so you will take samples in the 1-10 second range without fear of impacting the rest of the system. You can literally sample just about everything going on every 10 seconds and use <0.1% of the CPU. If you're willing to give up a few more tenths of a percent you can even monitor processes and slab activity, though you should only sample them at a 60 second frequency because it IS expensive to monitor them. However I also realize this doesn't do any good if do have 1Ks of machine you want to watch and so that's where the socket interface comes in over which collectl can send data to a central manager at that same frequency OR if you prefer have is send its remote data at a different rate, giving you the best of both worlds. Collectl can provide it's data to a central management station while at the same time providing local logging for accuracy, which will let you do a deep dive into the data if a problem does arrive for which there is not enough data stored centrally. My point about the ganglia interface was my response to the fact that a lot of of people running large (as well as smaller) clusters do use ganglia but like most central monitoring stations have to give up the accuracy of finer-grained data and I was just wondering if anyone looking at this forum use ganglia and if they might be interested in trying out the collectl interface to it. -mark
Photograpic picture to me is window, an address to that specific moment what do your think about that?
This is the intriguing quote by Bill Venners in an interview with Twitter's Alex Payne on Twitter's heretical switch from a pure Ruby stack to a Ruby on Rails stack on the front-end and JVM/Scala on the back-end:
So performance was also one of the problems with JRuby, which I [Bill Venners] think helps explain better why they'd [Twitter] prefer Scala over Ruby or JRuby for some things. I have often heard Rubyists say that although Ruby is slower than Java, for many things it is plenty fast enough, and they are right. The logic goes further, saying that servers are cheap, and programmers expensive, so it makes sense to tradeoff some runtime performance for programmer productivity. And I think that's very often true too, but not always. If you have enough traffic, at some point the cost of servers outweighs the cost of programmers. I'm not sure whether Twitter is past that point, but they get a lot of traffic. And frankly this isn't an intrinsic tradeoff. Other dynamic languages are faster than Ruby, and Scala is too. And people can be quite productive in these other languages too, including Scala.I feel Alex's Max Payne. You might wonder why the geekosphere cares so passionately which technology stack Twitter uses? Well, it's Twitter and it's Ruby on Rails. That's like the Lindsay Lohan and Samantha Ronson of tech buzz. It creates it's own self-sustaining posting reaction. Boom! It took some giant cajones to switch from a well defended platform like Ruby on Rails to an obscure language like Scala. Few people would have been brave enough to pull the trigger on that decision. Twitter didn't take this large leap out of ignorance or incompetence. Twitter's Steve Jenson said they spent several weeks going over our options, running extensive load tests, and presented our findings to the team at each stage. We did our due diligence. They did the work and came to conclusions valid for their situation. They have to follow their own bliss. They aren't telling you to use Scala. They aren't telling you not to use Ruby. Have at it. But they have chosen the path less traveled and seem happy with the direction they are heading. If you aren't happy with their decision then that's a you problem, not a them problem. This points out for me the evolving nature of the two tier web: a client tier and a back-end tier accessed only via an API. That back-end tier can be implemented in anything at all. Twitter likes the JVM for it's undeniable performance, reliability, and scalability. They may like Scala because it has a lot of cool features, is concise, has static typing, performs well, and is a pleasure to develop in. When people jumped with a happy little grin on their face to Ruby, they loved a lot of things about Ruby that helped them make that decision to switch. Maybe Scala is cool, effective, and fun to use too? So instead of worrying how the homeboys have ever so indirectly dissed Ruby, maybe it's worth taking a look at Scala to see if it's actually any good? As a start try Bill Venners on the rise of Scala. It's a great overview of Scala and why it's a worthy next evolution. Then maybe take a peek at First Steps to Scala. And if you want to take a look at some real-life code try Kestrel - tiny queue system based on starling. Knowing only the vaguest bits about Scala before my own little exploration, I come out impressed and interested. I like a lot of things Bill had to say: Scala means scalable language as in it scales to different sized tasks and domains; Scala's static typing is valuable, especially in a team project; Scala has the feel of a scripting language, but can also work as a systems language; Scala is an artful blend of a fully Object and fully Functional language; Scala supports imperative programming, but with a functional bent; Scala can express more in types than Java so you get more rigorous type checking; Scala is concise so can say more with less; Scala treats libraries like creating internal DSLs; Scala helps programmer productivity like Ruby and Python. Sounds like Scala is worth a deeper look to me. Can't we all just get along? Well, that's not how this post started at least. It started with Bill's refreshingly anti-establishment statement: If you have enough traffic, at some point the cost of servers outweighs the cost of programmers. This is an idea we don't hear much of anymore in this era of abundance. Servers cost real cash though, especially for bootstrappers and the truly humongous. So if you can cut your cash requirements by putting in a little programmer elbow grease, that makes a lot of sense to me. Performance still matters.
Want your apps to run faster? Here’s what not to do. By: Bart Smaalders, Sun Microsystems. Performance Anti-Patterns: - Fixing Performance at the End of the Project - Measuring and Comparing the Wrong Things - Algorithmic Antipathy - Reusing Software - Iterating Because That’s What Computers Do Well - Premature Optimization - Focusing on What You Can See Rather Than on the Problem - Software Layering - Excessive Numbers of Threads - Asymmetric Hardware Utilization - Not Optimizing for the Common Case - Needless Swapping of Cache Lines Between CPUs For more detail go there
Update 4:: Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3:: Scaling Digg and Other Web Applications. Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades. Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month? Site: http://digg.com
It's been awhile since I've said anything about collectl and I wanted to let this group know I'm currently working on an interface to ganglia since I've seen a variety of posts ranging from how much data to log and where to log it as well as which tools/mechanism to use for logging. From my perspective there are essentially 2 camps on the monitoring front - one says to have distributed agents all sending their data to a central point, but don't send too much or too often. The other camp (which is the one I'm in) says do it all locally with a highly efficient data collector, because you need a lot of data (I also read a post in here about logging everything) and you can't possibly monitors 100s or 1Ks of nodes remotely at the granularity necessary to get anything meaningful. Enter collectl and its evolving interface for ganglia. This will allow you to log lots of detailed data on local nodes at the usual 10 sec interval (or more frequent if you prefer) at about 0.1% system overhead while sending a subset at a lower rate to the ganglia gmonds. This would give you the best of both worlds but I don't know if people are too married to the centralized concept to try something different. I don't know how many people who follow this forum have actually tried it, I know at least a few of you have, but to learn more just go to http://collectl.sourceforge.net/ and look at some of documentation or just download the rpm and type 'collectl'. -mark
Art of scalability - series in my blog about Scalability principles, and guild lines. To read the whole series: Art of scalability (1) - Scalability principles Art of scalability (2) - Scalability guidelines part 1 Art of scalability (3) - Scalability guidelines part 2 Art of scalability (4) - Scalability guidelines part 3
Ebay[1]
Starts in 1995, initial name AuctionWeb (V1) :
- very simple architecture
- based on perl
- no database, for data persistence they used plain files
Because of rapid growth they needed to improve their architecture and so V2 (clever name) was born:
- replaced perl with C/C++
- started using a database in a master-slave configuration
- C++ back-end
- XSLT front-end
Any request will lead to an XML file being created in C++ and the XLST processor will transform that into html.
*pretty sophisticated architecture for the 90s, XLST was cutting-edge back then*
That hold out pretty well for a while but in the late 90s ebay experienced an exponential growth.
They started having some trouble with outages and needed improvements, so V3 was developed:
- based on java
- search engine still used C++
- proof that relational databases can scale (aggressive caching)
- developed a messaging layer for making a lot of asyncronious calls, they actually ended up being sued because of the delay in which images appear on the site after an item gets posted :-)
- although they switched to java the basic principle of generating xml files for each request was still used.
Combining the need for a multilingual website with the boom of AJAX technologies and flash applications they started to doubt their XSL system and moved on to what became in 2006 V4 of ebay:
- remove everything that could be replaced with java, Ebay loves Java
Everything is Java (Code):
- Image - Java class
- Link - Java class
- Javascript - Java classes
- Content - Java classes
=> lot's of code to write, they're using Eclipse for developing.
For more details check:
Eclipse at Ebay
Tailoring Eclipse to the eBay architecture
[1] Images and ideas/info are from the links above.
Ladar Levison of Lavabit has written an incredible article on how they took a centralized off-the-shelf email server that could handle only few thousand users and built their own custom distributed infrastructure for handling hundreds of thousands of email users. Lavabit processes 70 gigabytes of data per day, is made up of 26 servers, hosts 260,000 email addresses, and processes 600,000 emails a day. That's a lot of email.
Lavabit's mission has a little edge to it too:
Lavabit was founded as a direct reaction to the larger free e-mail services available. We felt it was possible to create an e-mail service that was fast, reliable, feature rich and didn't achieve profitability by prostituting its user base to marketers.
What I really like about this article is that Lavabit has some challenging elements in dealing with different email protocols while being able to scale to a lot of users. There's more going on than just trying to scale out a database. Many products contain complicated bits like this, so it's interesting to see how Ladar handled them. There are lots of useful details that will help anyone build their own system. Putting in this extra work in is what Ladar thinks makes Lavabit different:
One of the ways to gain an advantage over your competition is to invest the time and money needed to build systems that are better than what is easily available to your competition. It is the custom platform we developed that has allowed us to thrive while many other free email companies either stopped offering their service for free, or shut down altogether.
Since Ladar was so thorough I saved article as a separate html file. Please select the visit link to read the entire article. I'd like to thank Ladar again for taking the time and making the effort to document their architecture for the benefit of the community at large to learn from.