How to choose an in-memory NoSQL solution: Performance measuring

The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.
We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.
Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.
Databases and their configurations
In-memory NoSQL database management system is a database management system that stores all the data in the main memory and persists each update on disk. Persistency is provided by saving each data modification request in a binary log. Since the log is written in append-only mode, it is rarely a bottleneck. Both read and write workloads are processed without significant disk head movement.
Redis has been around since 2009 and created and supported by Salvatore Sanfilippo, the newest version of is 3.0.5.
We provided two configuration files for Redis 3.0.4 – with an append-only file for data persistence and without it like a cache server. Redis was built from source.
We also used Microsoft Azure Redis Cache. It is based on Redis cloud service managed by Microsoft.
Tarantool is an open-source NoSQL database management system and Lua application server developed in my.com. The first version of Tarantool was released in 2008 and the newest version is 1.6.7.
We provided four configuration files for Tarantool 1.6.7-126-gb35aff9 – both with write-ahead log and without it and both with tree (ordered) and hash (unordered) indices. No tuning options exist in Tarantool and no tuning was done.
Memcached is a general-purpose distributed memory caching system which has been developed in 2003.
Memcached does not have any mode with data persistence so it was tested only with other data-bases which are not configured with any append-only binary log. We used Memcached 1.4.14-0ubuntu9 from Ubuntu repository.
Couchbase Server, originally known as Membase, is an open-source, distributed NoSQL document-oriented database. The newest stable release is 4.0.
We used Couchbase 4.0.0-4047-1 package from official site without any extra configuration.
Append-only files in Redis and write-ahead logs in Tarantool options enable data persistence for current databases. Comparisons only for similar configurations of different databases are described in this paper. It means we don’t compare, for example, Redis with enabled append-only files and Tarantool with disabled write-ahead logs.
Yahoo! Cloud Serving Benchmark
Yahoo! Cloud Serving Benchmark, or YCSB is a powerful utility for performance measuring of a wide range of NoSQL databases including in-memory and on-disk solutions. YCSB is a branch standard for performance measuring of NoSQL solutions, which is why we are using it. We are interested in Redis and Tarantool drivers which are included in YCSB and the Memcached driver which is created by us based on the spymemcached library. The source of this YCSB branch can be seen here.
YCSB provides few core workload types that are presented in its own directory as configuration files. There are six major workload types named by letters from A to F.
Workload A is an update heavy workload. It has a mix of 50/50 reads and writes. An application example is a session store recording recent actions. Workload B is primarily a read workload. It has a 95/5 read/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags. Workload C is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop). In Workload D, new records are inserted and the most recently inserted records are the most popular. Application example: user status updates; people want to read the latest. In Workload E, short ranges of records are queried instead of individual records. Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id). In Workload F, the client will read a record, modify it, and write back the changes. Application example: user database, where user records are read and modified by the user or to record user activity.
We have changed two parameters in each of these configuration files: recordcount to 2000000 and operationcount to 5000000. YCSB is a multithreaded tester and we start it with 8, 16, 32, 64, 128 and 256 threads.
Now we will show and describe some packs of plots drawn by us in R. Sources of plot scrips can be downloaded here.
Plots
Tarantool (HASH)
Tarantool (TREE)
Redis
Azure Redis Cache
Memcached
CouchBase
Tarantool with both hash and tree indices is the best for all investigated workloads. It creates a lock-free in-memory engine, which does not consist of any mutexes or other concurrency primitives and uses cooperative multitasking. After considering these graphs, we can conclude that high throughput is one of the strengths of the Tarantool database.





The design of Tarantool shows the minimal average latency for read requests too. As we can see on these plots, this is true for any workload. On the 95% requests Tarantool reaches the lowest latency too. (This measure is related to average latency but they are not the same).
However, on the 99% fastest requests, Tarantool does not reaches lowest latency for any workload. By this measure, Tarantool is really close to Redis in all cases and beaten by it in some of them. This situation can be described as follows: Tarantool executes part of the queries with a small latency and another part with a large latency, while Redis executes all requests with a middle latency.



























For cases without write-ahead logs, Memcached and Couchbase exhibit better latency.
In any case Tarantool is better than Redis by average latency and 95th percentile, but not by 99th percentile. This situation is similar with read latency and can be described in a similar way.
























Conclusion
We described YCSB and have provided the results of comparing four popular databases, but the most significant idea considered in this paper is the way of choosing the right solution for the current workload. By looking at the plots placed within this article, it is simple to find the most suitable solution with respect to your workload type, database clients count and your expectations.
The links on our VMs images, YCSB with Memcached module and R scripts are specially published so that you can conduct your own tests and verify our results or get results for instances of different configurations (both hardware and software).
Through all tests we executed, Tarantool showed the best result for the count requests per second and for many of tests latency values on any type of examined workloads. Therefore, we can decide that for most of typical projects Tarantool suits them more that popular solutions such as Redis, CouchBase or Memcached. This is the basis of our decision to use Tarantool for our projects here at my.com.
Reader Comments (10)
any news on automatic-sharding for tarantool ?
also, is it possible to have something simmilar to redis pipeline (sending batch of commands) ?
Broken link in the article on the link to the source of the plot scripts.
The url : http://articles.rvncerr.org/how-to-chose-an-in-memory-nosql-solution-performance-measuring/r/
Resolves as a 404 error
Why is the average latency higher than the 95%ile, and 99%iles? That doesn't look right.
Do try Aerospike. My guess would be its 2-3x faster than products you just compared. It also has automatic sharding and both memory and/or disk storage.
Found this component for sharding in Tarantool:
https://github.com/tarantool/shard
Looks like this provides automatic sharding...
Thanks for the data.
I have to say the HW details and configurations you have chosen seems random however.
- is this a single node test? from the text, sounds like this was a single node test which isn't meaningful in any way. is that the case? testing clustered/distributed system SW against a single node arch is not meaningful in my view. Replication and durability in caching is critically important for mission critical data. Without replicated durability of data, any node failure (which could be more than once a week on azure due to azure fabric updates etc) will cost you a ton of additional latency to warm up your cold cache.
- Unknown Versions: I cannot find the version of Couchbase you mentioned on their downloads site. 4.0 release has the build number 4051. The version you tested must be a preview or a daily build.
- Old Clients: YCSB repo you guys used seem out of date. Couchbase repo is 3 years old.
- Too few clients: you seemed to have stopped too early in YCSB client concurrency. 250 is not a large enough count. Many tests are done using 500 or more clients.
Inserts and updates seems to be faster on Memcache and Couchbase, but the reads appear to slower than the others.
Memcache, being a pure in-memory cache without persistence, I expected it to perform better than the others in the read latency comparisons as well. What do you think could be the reason for slow reads in Memcache?
After reading this, I would rather use Redis for a read heavy application than Memcache.
Hi All,
Sorry for the late answer. Thank you for your remarks.
> Broken link in the article on the link to the source of the plot scripts.
I've fixed it.
> Why is the average latency higher than the 95%ile, and 99%iles? That doesn't look right.
Why not? Imagine 999 fast queries and 1 slow query.
> is this a single node test?
Yes. I'm going to try distributed solutions soon.
> I cannot find the version of Couchbase you mentioned on their downloads site.
When I was running the benchmark, CouchBase 4.x hadn't been released.
> YCSB repo you guys used seem out of date. Couchbase repo is 3 years old.
I used memcached driver for benchmarking CouchBase.
> Too few clients.
Thank you! There will be up to 1024 clients in my next try.
Measuring average latency is non sense: check out Gil Tene's presentation about measuring latency the right way. http://www.infoq.com/presentations/latency-lessons-tools
Nice work!
May I suggest that you also do a benchmark when there's a failure (for the distributed configuration that you're gonna try) to compare how the performance of these systems are affected.
Many Thanks !