Stack Overflow Architecture
Update 2: Stack Overflow Architecture Update - Now At 95 Million Page Views A Month
Update: Startup – ASP.NET MVC, Cloud Scale & Deployment shows an interesting alternative approach for a Windows stack using ServerPath/GoGrid for a dedicated database machine, elastic VMs for the front end, and a free load balancer.
Stack Overflow is a much loved programmer question and answer site written by two guys nobody has ever heard of before. Well, not exactly. The site was created by top programmer and blog stars Jeff Atwood and Joel Spolsky. In that sense Stack Overflow is like a celebrity owned restaurant, only it should be around for a while. Joel estimates 1/3 of all the programmers in the world have used the site so they must be serving up something good.
I fell in deep like with Stack Overflow for purely selfish reasons, it helped me solve a few difficult problems that were jabbing my eyes out with pain. I also appreciate their no-apologies anthropologically based design philosophy. Use design to engineer in the behaviours you want to encourage and minimize the responses you want to discourage. It's the conscious awareness of the mechanisms that creates such a satisfying synergy.
What is key about the Stack Overflow story for me is the strong case they make for scale up as a viable solution for a certain potentially large class of problems. The publicity these days is all going scale out using NoSQL databases.
If you need to Google scale then you really have no choice but to go the NoSQL direction. But Stack Overflow is not Google and neither are most sites. When thinking about your design options keep Stack Overflow in mind. In this era of multi-core, large RAM machines and advances in parallel programming techniques, scale up is still a viable strategy and shouldn't be tossed aside just because it's not cool anymore. Maybe someday we'll have the best of both worlds, but for now there's a big painful choice to be made and that choice decides your fate.
Joel boasts that for 1/10 the hardware they have performance comparable to similarly size sites. He wonders if these other sites have good programmers. Let's see how they did it and you be the judge.
Site: http://stackoverflow.com
The Stats
Platform
- 2 x Lenovo ThinkServer RS110 1U
- 4 cores, 2.83 Ghz, 12 MB L2 cache
- 500 GB datacenter hard drives, mirrored
- 8 GB RAM
- 500 GB RAID 1 mirror array
- 1 x Lenovo ThinkServer RD120 2U
- 8 cores, 2.5 Ghz, 24 MB L2 cache
- 48 GB RAM
Lessons Learned
This is a mix of lessons taken from Jeff and Joel and comments from their posts.It's true there's not much about their architecture here. We know about their machines, their tool chain, and that they use a two-tier architecture where they access the database directly from the web server code. We don't know how they implement tags, etc. If interested you'll be able to glean some of this information from an explanation of their schema.
Discussion
As an architecture profile candidate Stack Overflow has earned two important HighScalability badges: the Microsoft Stack Badge and and the Scale Up Badge. Both controversial and interesting topics of discussion.Microsoft Stack Badge
The Microsoft Stack Badge was earned because Stack Overflow uses the entire Microsoft Stack: OS, database, C#, Visual Studio, and ASP .NET. People are always interested in how MS compares to LAMP, but I don't have many case studies to show them.Markus Frind of Plenty of Fish fame is often used as a Microsoft stack poster child, but since he explicitly uses as little of the stack as possible he's not really a good example. Stack Overflow on the other hand is brash in proclaiming their love for MS, even when that love is occasionally spurned.
It's hard to separate out the Microsoft stack and the scale up approach because for licensing reasons they tend to go together. If you find yourself in the position of transitioning from scale up to scale out by adding dozens of cores, MS licensing will bite you.
Licensing aside I personally find C#, Visual Studio, and .Net a very productive environment. C#/.Net is at least as good as Java/JVM. ASP .NET has always been a confusing mess to me. The knock against SQL Server is you have to pay for it and if that doesn't bother you then it's a solid choice. The Windows OS may not be as solid as other alternatives but it works well enough.
So for a scale up solution a Microsoft stack works, especially if you are already Windows centric.
Scale Up Badge
This won't be a reenactment of the scale out vs scale up vs rent vs buy wars. For a thorough discussion of these issues please take a look at Scaling Up vs. Scaling Out and Server Hosting — Rent vs. Buy?. If you aren't confused and if your head doesn't hurt after reading all that then you haven't properly understood the material :-)The Scale Up Badge was awarded because Stack Overflow uses a scale up strategy to meet their scaling requirements. When they reach a limit they scale vertically by buying a bigger machine and adding more memory.
Stack Overflow is in the sweet spot for scale up. It's not too large, but with an Alexa ranking of 1,666 and 16 million page views a month it's still a substantial site. Not Google scale, and probably will never have to be, but those are numbers many sites would be thrilled to have. Yet they aren't uploading large amounts of media. They aren't dealing with billions of tweets across complex social networks with millions of users. Their number of users is self limiting. And there are still directions they can take if they need to scale (caching, more web servers, faster disks, more denormalization, more memory, some partitioning, etc). All-in-all it's a well done and very useful two-tier CRUD application.
NoSQL is Hard
So should Stack Overflow have scaled out instead of up, just in case?What some don't realize is NoSQL is hard. Relational databases have many many faults, but they make a lot of common tasks simple while hiding both the cost and complexity. If you want to know how many black Prius cars are in inventory, for example, then that's pretty easy to do.
Not so with most NoSQL databases (I'll speak generally here, some NoSQL databases have more features than others). You would have program a counter of black Prius cars yourself, up front, in code. There are no aggregate operators. You must maintain secondary indexes. There's no searching. There are no distributed queries across partitions. There's no Group By or Order By. There are no cursors for easy paging through result sets. Returning even 100 large records at time may timeout. There may be quotas that are very restrictive because they must limit the amount of IO for any one operation. Query languages may lack expressive power.
The biggest problem of all is that transactions can not span arbitrary boundaries. There are no ACID guarantees beyond a single record or small entity group. Once you wrap your head around what this means for the programmer it's not a pleasant prospect at all. References must be manually maintained. Relationships must be manually maintained. There are no cascading deletes that act correctly during a failure. Every copy of denormalized data must be manually tracked and updated taking into account the possibility of partial failures and externally visible inconsistency.
All this functionality must be written manually by you in your code. While flexibility to write your own code is great in an OLAP/map-reduce situation, declarative approaches still cover a lot of ground and make for much less brittle code.
What you gain is the ability to write huge quantities of data. What you lose is complacency. The programmer must be very aware at all times that they are dealing with a system where it costs a lot to perform distribute operations and failure can occur at anytime.
All this may be the price of building a truly scalable and distributed system, but is this really the price you want to pay?
The Multitenancy Problem
With StackExchange Stack Overflow has gone into the multi-tenancy business. They are offering StackExchange either self-hosted or as a hosted white label application.It will be interesting to see if their architecture can scale to handle a large number of sites. Salesorce is the king of multitenancy and although it's true they use Oracle as their database, they basically use very little of Oracle and have written their own table structure, indexing and query processor on top of Oracle. All in order to support multitenancy.
Salesforce went extreme because supporting a lot of different customers is way more difficult than it seems, especially once you allow customization and support versioning.
Clearly all customers can't run in one server for security, customization, and scaling reasons.
You may think just create a database for each customer, share a server for a certain number of customers, and then add more servers as needed. As long as a customer doesn't need more than one server you are golden.
This doesn't seem to work well in practice. Oddly database managers aren't optimized for adding or updating databases. Creating databases is a heavyweight operation and can degrade performance for existing customers as system locks are taken. Upgrade issues are also problematic. Adding columns locks tables which causes problems in high traffic situations. Adding new indexes can also take a very long time and degrade performance. Plus each customer will likely have specializations that makes upgrading even more complicated.
To get around these problems Salesforce's Craig Weissman, Chief Architect, created an innovative approach where tables are not created for each customer. All data from all customers is mapped into the same data table, including indexes. The schema for that table looks something like orgid, oid, value0, value1...value500. "orgid" is the organization ID and is how data is never mixed up. It's a very wide and sparse table, which Oracle seems to handle well. Hundreds and hundreds of "tables" and custom fields are mapped into the data table.
With this approach Salesforce has no option other than to build their own infrastructure to interpret what's in that table. Oracle is left to handle transactions, concurrency, and deadlock detection. The advatange is because there's an interpreted layer handling versions and upgrades is relatively simple because the handling logic can be baked in. Strange but true.
Reader Comments (33)
Amazing post and Stackoverflow is something which most of us can relate to and can target ourselves to reach somewhere close to that traffic, which makes this article even more personal and almost a moment of thought.
I believe a lot of performance can be achieved if pages are divided into different parts which are cached for different time periods, there are lots of areas on a webpage that doesn't need frequent updates and shouldn't be fetched from databases based on an overall cached expiration time.
On LAMP, there are lots of ways to improve performance with half the headache, for e.g. eAccelerator combined with pagecaching can decrease a database loadtime from 25secs to 2 secs (Yeah... Try that).
Its great to be a performance-holic as it eventually improves the quality and user experience.
For anyone reading this, the architecture's changed quite a bit since the first post. See the links below for updated info (most up to date first):
2012-10-12 :: http://meta.stackoverflow.com/questions/10369/which-tools-and-technologies-build-the-stack-exchange-network
2011-09-30 :: http://blog.serverfault.com/2011/09/30/the-stack-exchange-architecture-2011-edition-episode-1/
2011-02-11 :: http://blog.serverfault.com/2011/02/11/stack-exchanges-architecture-in-bullet-points/
Here is another database that is compatible with Mono:
http://www.kellermansoftware.com/p-43-ninja-net-database-pro.aspx
Wow ! useful information so I can analyze my architecture. The point excited me is that they are using MVC with LINQ-to-SQL.
Can you share some IIS configurations?
Nice post. BTW why is the date in comments showing as 'November 29, 1990' ???
Why do the comments in the posts have Nov 29 1990 for the date ?
> Why do the comments in the posts have Nov 29 1990 for the date ?
The import script that I wrote didn't handle comment timestamps correctly so they all went to the default.