- High Scalability

I recently came across a TPC-C benchmark results held on MySQL based RDS databases. You can see it here. I think the results may bring light to many questions concerning MySQL scalability in general and RDS scalability in particular. (For disclosure, I'm working for ScaleBase where we run an internal scale out TPC-C benchmark these days, and will publish results soon).

TPC-C

TPC-C is a standard database benchmark, used to measure databases. The database vendors invest big bucks in running this test, and showing off which database is faster, and can scale better.

It is a write intensive test, so it doesn’t necessarily reflect the behavior of the database in your application. But it does give some very important insights on what you can expect from your database under heavy load.

The Benchmark Process

First of all, I have some comments for the benchmark method itself.

Generally - the benchmarks were held in an orderly fashion and in a rather methodological way – which increases the credibility of the results.
The benchmark generator client was tpcc-mysql which is an open-source implementation provided by Percona. It’s a good implementation, although I think DBT-2, provided by Apache, is more widely used.
Missing is the mix weights of the different TPC-C transactions in the benchmark. The TPC-C benchmark contains 6 types of transactions (“new order”, “delivery”, “stock query”, etc.) and the mix weights are important.
The benchmark focused on throughput, not latency. I think that although TPC-C is mostly about throughput (transactions per minute) it’s recommended to address the latency (“avg response time” for example) as well.
TPC-C “number of warehouses” determines the size of the database being tests. Number of warehouses ranged between 1 and 32 in this benchmark. In MBs we’re talking 100MB-3GB. That’s usually a small database, and I would be interested in seeing how the benchmark ranks with 1000 warehouses (around 100 GB) or even more.
The entire range RDS instance types were examined, starting from small to quadruple Extra Large (4XL). Some very interesting results turned out, which will be the focus of this blog post.

Results Analysis

The benchmark results are surprising.

With hardly any dependency on the database size, MySQL reaches its optimal throughput at around 64 concurrent users. Anything above that causes throughput degradation.
Throughput is improving as machines get stronger. However, there is a sweet-spot, a point where adding hardware doesn’t help performance. The sweet spot is around the XL machine, which reaches a throughput of around 7000 tpm. 2XL and 4XL machines don’t improve this limit.

It would seem that the Scale-Up scalability approach is rather limited, and bounded. And unfortunately, has some rather close bounds.

So, what’s the bottleneck?

CPU is unlikely candidate, as CPU power doubles on 2XL and quadruple on 4XL machines.
I/O is also an unlikely candidate. Since machines memory (RAM) doubles on 2XL and quadruple on 4XL machines. More RAM means more database cache buffers, thus reduced I/O.
Concurrency? Number of concurrent users? Well we saw that optimal throughput is achieved with around 64 concurrent sessions on the database. See figure 4b in the benchmark report. While with 1 user the throughput 1,000 transactions per user, with 256 users it drops to 1 transaction per user!

Results Explanation

It definitely weird that when more parallel users are added, we get performance degradation – a lower throughput than the one we have with fewer parallel users.

Well, the bottleneck is the database server itself. A database is a complex sophisticated machine. With tons of tasks to perform in each and every millisecond it runs. For each query, the database needs to parse, optimize, find an execution plan, execute it, and manage transaction logs, transaction isolation and row level locks.

Consider the following example. A simple update command needs an execution plan to get the qualifying rows to update and then, reading those rows, lock each and every row. If the command updates 10k rows – this can take a while. Then, each update is executed on the tables, on each of the relevant indexes of that table, and also written to the transaction log file. And that’s for just a simple update.

Or another example - A complex query takes a long time to run. Even 1 second is a lot of time in a highly loaded database. During its run, rows from source table(s) of that query are updated (or added, or deleted). The database must maintain “Isolation level” (see our isolation level blog). Meaning that the user’s result must be the snapshot of the query as it was when the query started. New values in a row that was updated after the query began, even if committed, should not be part of the result. Rather, the database should go and find the “old snapshot” of the row, meaning the way the row looked at the beginning of the query. InnoDB stores information about old versions of rows in its tablespace. This information is stored in a data structure called a rollback segment. And if we get back to the previous example – here’s another task the database has to do – update the rollback segment.

We must remember, every command issued to the database, actually generates dozens of recursive commands inside the database engine. When sessions or users concurrency goes up, load inside the database engine increases exponentially. The database is a great machine, but it has its limits.

So What Do We Do?

So, we understand the database is crowded, and has hardware sweet-spots, which complicates the Scale Up solution – which is expensive, and doesn’t give the required performance edge. Even specialized databases have their limit, and while the sweet-spot changes, every database has one.

There are allot of possible solutions to this problem – adding a caching layer is a must, to decrease the number of database hits, and any other action that can reduce the number of hits on the database (like NoSQL solutions) is welcomed.

But the actual database solution is scale-out. Instead of 1 database engine, we’ll have several, say 10. Instead of 128 concurrent users or even 256 concurrent users (that according to the TPC-C benchmark bring worst results), we’ll have 10 databases with 26 users on each, and each database can reach 64 users (up to 640 concurrent users). And of course, if needed, we can add more databases to handle the increased scale.

It will be interesting to see how scale out solutions for databases handle TPC-C.