Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning
In Scaling writes in MySQL (slides) Philip Tellis, while working for Yahoo, describes how using time based partitions they were able to increase their write capability from 2100 inserts per second (7 million a day) to a sustained 8500 inserts per second (734 million a day). This was capacity enough to handle the load during Michael Jackson's memorial service. In summary, the secrets to scalable writes are:
- Bulk inserts push up insert rate
- Partitioning lets you insert more records
- Partition based on incoming data for fast inserts
Partitioning is a standard approach for handling high write loads because it means data can be written to different hard disks in parallel. In this example Phillip created a separate table for each day with each table having it's own database file. Each table is partitioned on time, 12 partitions per day, 2 hours of data per partition. Huge log streams are often handled this way. Other advantages of this approach: 1) fast drop table operations 2) space for dropped tables is reclaimed immediately. The problems are that it's difficult to: 1) read by a natural primary key 2) make cross partition queries. But the goal here is fast writes, not analytics or easy reads, so if you want to do something meaningful with the data you may need to load it into another system.
If you are ravenous for even more information take a look at Maximal write througput in MySQL, from the wizards at the MySQL Performance Blog.
Reader Comments (4)
Umm, have to disagree with:
"Partitioning is a standard approach for handling high write loads because it means data can be written to different hard disks in parallel." Not true when you are partitioning on time of the insert, because in the given period, you always write into just one partition. Partitioning is usually used to avoid having huge tables and indices, because they are difficult to work with. To spread the load on more disks you should use different techniques (hash partitioning could be one of them, but there more).
@Michal - from the slides they partition using the equation ((time div 3600) mod 24) - which would spread the writes over 24 partitions, so in that respect his comment is correct.
Hi Todd,
Writing to multiple hard disks, each on a separate box is achieved through sharding, and you normally do this to increase write throughput. My goal was to increase write throughput on a single box. We don't use sharding. We use partitioning which is a feature built in to MySQL 5.1. It splits a single table into multiple files on disk, but all these files belong to the same table and must be kept together at all times.
The trick here was to make sure that all btree operations happened in RAM. Partitioning helps because each partition is then smaller than the amount of RAM allocated to the innodb_buffer_pool. This helps us minimise disk operations which is what makes the write throughput go up.
Oh, and thanks for the link :)