Thursday
Nov042010
Facebook at 13 Million Queries Per Second Recommends: Minimize Request Variance

Facebook gave a MySQL Tech Talk where they talked about many things MySQL, but one of the more subtle and interesting points was their focus on controlling the variance of request response times and not just worrying about maximizing queries per second.
But first the scalability porn. Facebook's OLTP performance numbers were as usual, quite dramatic:
- Query response times: 4ms reads, 5ms writes.
- Rows read per second: 450M peak
- Network bytes per second: 38GB peak
- Queries per second: 13M peak
- Rows changed per second: 3.5M peak
- InnoDB disk ops per second: 5.2M peak
Some thoughts on creating quality, not quantity:
- They don't care about average response times, instead, they want to minimize variance. Every click must be responded to quickly. The quality of service for each request matters.
- It's OK if a query is slow as long as it is always slow.
- They don't try to get the highest queries per second out of each machine. What is important is that the edge cases are not the bad.
- They figure out why the response time for the worst query is bad and then fix it.
- The performance community is often focussed on getting the highest queries per second. It's about making sure they have the best mix of IOPs available, cache size, and space.
To minimize variance they must be able notice, diagnose, and then fix problems:
- They measure how things work in operation. They can monitor at subsecond levels so they catch problems.
- Servers have miniature fractures in their performance which they call "stalls." They've built tools to find these.
- Dogpile collection. Every second it notices if something is wrong and ships it out for analysis.
- Poor man's profiler. Attach GDB to servers to know what's going on, they can see when stalls happen.
- Problems are usually counter-intuitive this can never happen type problems.
- Extending a table locks the entire instance.
- Flushing dirty pages was actually blocking.
- How statistics get sampled in InnoDB.
- Problems happen on medium-loaded systems too. Their systems aren't that loaded to ensure quality of service, yet the problems still happen.
- Analyze and understand every layer of the software stack to see how it performs at scale.
- Monitoring system monitors different aspects of performance so they can notice a change in performance, drill down to the host, then drill down to the query that might be causing the problem, then kill the query, and then trace it back to the source file where it occurred.
- They have a small team, so they make very specific changes to Linux and MySQL to support their use cases. Longer term changes are made by others.
Please watch the MySQL Tech Talk for more color and details.
Related Articles
- Good Hacker News Thread
Reader Comments (16)
Cheers for another great post and the reminder that predictability is more important to scalability than sheer performance.
Excellent information. All I ever read is how to get the highest queries/s. This approach made me turn my head a bit at first, but then it made complete sense.
Thanks for the extract. Sometimes it's hard to watch another 1.5h long movie.
Anyone know why Facebook chose mysql over postgresql? I'm not suggesting that I would do either and I'm sure there will be plenty of people with their own personal opinions of which is best but I'm truly curious for Facebook's reasons.
I think that you're missing an important point and potentially confusing the matter.
Queries/sec is an important measure that should not be ignored. These numbers are the only reasonable way of benchmarking and comparing different architectures. When you're designing your system, those numbers, however abstract and meaningless, are about the only way of giving you an idea of the relative merits of each choice.
But when you go live and measure your actual performance, what you should get is essentially a nice, stable performance baseline under average load. If only the real world was so nice. Instead, you probably sometimes get your expected performance, and have peaks and valleys where things do not quite go as expected.
What you can do? Once you've deployed your carefully designed architecture you cannot easily roll it back. It does not feel right to throw away all the time and effort involved, especially if the problem is happening only in a few corner cases. However, you still need to improve performance. What to do?
That's the time to look at your variance and worst case numbers. Anything you do to improve those gives you a better average response time, which is what you're looking for.
Rob Mills:
I have no special knowledge about facebook, but the story usally goes something like this:
* Hey, check out this cool thing I built with mysql!
* We're getting some users, we better throw better hardware at our database
* Boy, we're really growing, let's throw the best hardware we can at it!
* Yikes, the best hardware we can isn't good enough. We can spend 6-12 months porting this over to a database that scales better vertically or we can spend 2-6 months to shard our data.
From there you can guess which option most fast-growing technology companies pick.
Great post , I have always wondered why variance does not get equal importance as average(iops) in all the performance talks. Average of 1ms and 1s is 500ms, but I would rather prefer 2 500ms anyday!
Also good response by Zach on why not postgres.
@Rob Mills
In general Postgresql is faster at complex queries with a lot of joins and such, while MySQL is faster at simple queries such as primary key look up.
At facebook's scale I doubt they are doing any joins or complex queries, hence the decision to use MySQL makes sense. Probably the same reason why other large scale sites such as Google, Yahoo also use MySQL.
for people plugging PG, there're multiple reasons.
One is that PG doesn't scale that well on multiple cores as MySQL nowadays.
Another is in fundamental differences of storage architecture - all MySQL/InnoDB data is either a clustered primary key, or secondary key with PK pointers - logical relationships between entries allow to have index-only scans, which are a must for web-facing databases (good response times, no variance).
One more reason is that in heavily indexed databases vacuum will have to do full index passes, rather than working with LRU.
As for sharding, etc - there's no way to scale vertically infinitely - so the "stupid people shard" point is very very moot.
It is much cheaper to go the commodity hardware path.
Zach,
I thought it might involve a little of that.
Andy,
Thanks for that explanation. Is this the difference between myisam and innodb? myisam is designed for primary key lookups right where innodb is more for transactional? Or do I have that wrong?
@Rob Mills,
I doubt many people are using MyISAM. With MyISAM you could lose your data if your computer crashes. Also MyISAM has very poor concurrency because it uses table locks. Any time you write to an MyISAM table it locks the entire table and no one else could even read from it. That's a non-starter for many usages.
One reason InnoDB is faster than Postgresql for simple queries is that it uses clustering index. When you look up by PK in InnoDB, you can get all your data from the index in one go. With Postgresql it requires an additional trip to the table data itself.
@Andy,
If I remember correctly (and the Facebook employees were correct), the their MySQL database usage is remarkably simple and does ZERO joins. It is all handled by the PHP code, which they compile into machine native code using HipHop.
@jbo5112
I guess Facebook does the same with MySql , they use a version of MySql known as MySql @ Facebook , where in they have contributions to it both by MySql and Facebook developers.
You can check Mysql @ facebook and the facebook page for more details on the same
@Consultuning
I know this is an old thread, but I think it's worth pointing out where you and the article author differ in your starting assumptions just so as to de-muddy the water.
You said "Anything you do to improve [the outliers] gives you a better average response time" . In my experience sometimes you can trade average performance for dependability and that's what the author is talking about e.g. in OLAP you might use forced intermediate materialisation if you need to ensure the optimiser never gets caught chasing its tail even though that might slow down the typical query a little.
If, like facebook, consistency is your priority then it's not best left to an afterthought after going through general good practice and tuning. It really can mean making painful, unpurist choices right from the start that could even guarantee you a lower average. This other post seems to explain that well:
I see your point, in that reading this article could give the impression that they don't mind page clicks taking 30 seconds so long as they all do. Obviously you should start with an average low enough to equate to an acceptable typical user experience. This article is about making it not just typical but guaranteed. Assuming, that they have sensible upper limits, benchmarking % executions under that limit does seem like a reasonable approach, at least for testing your own setups. Yes, the semi-arbitrariness of what you consider acceptable doesn't give you such an easily generalisable number to go bragging on message boards about but it does better help you provide a good service in your own context.
How are queries per second calculated? If the same query is executed in multiple shards simultaneously, is it counted as one or the # of shards?
Great post everyone. Lots of good information.
Paul -