Entries in Database (63)

Tuesday
Mar172015

In-Memory Computing at Aerospike Scale: When to Choose and How to Effectively Use JEMalloc

This is a guest post by Psi Mankoski (email), Principal Engineer, Aerospike.

When a customer’s business really starts gaining traction and their web traffic ramps up in production, they know to expect increased server resource load. But what do you do when memory usage still keeps on growing beyond all expectations? Have you found a memory leak in the server? Or else is memory perhaps being lost due to fragmentation? While you may be able to throw hardware at the problem for a while, DRAM is expensive, and real machines do have finite address space. At Aerospike, we have encountered these scenarios along with our customers as they continue to press through the frontiers of high scalability.

In the summer of 2013 we faced exactly this problem: big-memory (192 GB RAM) server nodes were running out of memory and crashing again within days of being restarted. We wrote an innovative memory accounting instrumentation package, ASMalloc [13], which revealed there was no discernable memory leak. We were being bitten by fragmentation.

This article focuses specifically on the techniques we developed for combating memory fragmentation, first by understanding the problem, then by choosing the best dynamic memory allocator for the problem, and finally by strategically integrating the allocator into our database server codebase to take best advantage of the disparate life-cycles of transient and persistent data objects in a heavily multi-threaded environment. For the benefit of the community, we are sharing our findings in this article, and the relevant source code is available in the Aerospike server open source GitHub repo. [12]

Executive Summary

  • Memory fragmentation can severely limit scalability and stability by wasting precious RAM and causing server node failures.

  • Aerospike evaluated memory allocators for its in-memory database use-case and chose the open source JEMalloc dynamic memory allocator.

  • Effective allocator integration must consider memory object life-cycle and purpose.

  • Aerospike optimized memory utilization by using JEMalloc extensions to create and manage per-thread (private) and per-namespace (shared) memory arenas.

  • Using these techniques, Aerospike saw substantial reduction in fragmentation, and the production systems have been running non-stop for over 1.5 years.

Introduction

Click to read more ...

Wednesday
Mar042015

10 Reasons to Consider a Multi-Model Database

This is a guest post by Nikhil Palekar, Solutions Architect, FoundationDB.

The proliferation of NoSQL databases is a response to the needs of modern applications. Still, not all data can be shoehorned into a particular NoSQL model, which is why so many different database options exist in the market. As a result, organizations are now facing serious database bloat within their infrastructure.

But a new class of database engine recently has emerged that can address the business needs of each of those applications and use cases without also requiring the enterprise to maintain separate systems, software licenses, developers, and administrators.

These multi-model databases can provide a single back end that exposes multiple data models to the applications it supports. In that way, multi-model databases eliminate fragmentation and provide a consistent, well-understood backend that supports many different products and applications. The benefits to the organization are extensive, but some of the most significant benefits include:

1. Consolidation

Click to read more ...

Thursday
Jan222015

As a DBA Expert, which database would you choose?

This is a guest post by Jenny Richards, a professional database administrator who is currently employed at Remote DBA.

In the world of databases, there is no single silver bullet fitting for every gun. How you select the database to use is very dependent on every other factor of your work: 

  • Who are you and what do you do? 
  • What is your end goal – what are you working to achieve?
  • How much data do you intend to store?
  • On what language and OS platforms do your applications run?
  • What is your budget?
  • Will you also require data warehousing, decision support systems and/or BI?

Background information

Click to read more ...

Wednesday
Aug272014

The 1.2M Ops/Sec Redis Cloud Cluster Single Server Unbenchmark

This is a guest post by Itamar Haber, Chief Developers Advocate, Redis Labs.

While catching up with the world the other day, I read through the High Scalability guest post by Anshu and Rajkumar's from Aerospike (great job btw). I really enjoyed the entire piece and was impressed by the heavy tweaking that they did to their EC2 instance to get to the 1M mark, but I kept wondering - how would Redis do?

I could have done a full-blown benchmark. But doing a full-blown benchmark is a time- and resource-consuming ordeal. And that's without taking into account the initial difficulties of comparing apples, oranges and other sorts of fruits. A real benchmark is a trap, for it is no more than an effort deemed from inception to be backlogged. But I wanted an answer, and I wanted it quick, so I was willing to make a few sacrifices to get it. That meant doing the next best thing - an unbenchmark.

An unbenchmark is, by (my very own) definition, nothing like a benchmark (hence the name). In it, you cut every corner and relax every assumption to get a quick 'n dirty ballpark figure. Leaning heavily on the expertise of the guys in our labs, we measured the performance of our Redis Cloud software without any further optimizations. We ran our unbenchmark with the following setup:

Click to read more ...

Wednesday
Aug202014

Part 2: The Cloud Does Equal High performance

This a guest post by Anshu Prateek, Tech Lead, DevOps at Aerospike and Rajkumar Iyer, Member of the Technical Staff at Aerospike.

In our first post we busted the myth that cloud != high performance and outlined the steps to 1 Million TPS (100% reads in RAM) on 1 Amazon EC2 instance for just $1.68/hr. In this post we evaluate the performance of 4 Amazon instances when running a 4 node Aerospike cluster in RAM with 5 different read/write workloads and show that the r3.2xlarge instance delivers the best price/performance.

Several reports have already documented the performance of distributed NoSQL databases on virtual and bare metal cloud infrastructures:

Click to read more ...

Monday
Aug182014

1 Aerospike server X 1 Amazon EC2 instance = 1 Million TPS for just $1.68/hour

This a guest post by Anshu Prateek, Tech Lead, DevOps at Aerospike and Rajkumar Iyer, Member of the Technical Staff at Aerospike.

Cloud infrastructure services like Amazon EC2 have proven their worth with wild success. The ease of scaling up resources, spinning them up as and when needed and paying by unit of time has unleashed developer creativity, but virtualized environments are not widely considered as the place to run high performance applications and databases.

Cloud providers however have come a long way in their offerings and need a second review of their performance capabilities. After showing 1 Million TPS on Aerospike on bare metal servers, we decided to investigate cloud performance and in the process, bust the myth that cloud != high performance.

We examined a variety of Amazon instances and just discovered the recipe for processing 1 Million TPS in RAM on 1 Aerospike server on a single C3.8xlarge instance - for just $1.68/hr !!!

According to internetlivestats.com, there are 7.5k new tweets per second, 45k google searches per second and 2.3 Million emails sent per second. What would you build if you could process 1 Million database transactions per second for just $1.68/hr?

Click to read more ...

Wednesday
Aug132014

Hamsterdb: An Analytical Embedded Key-value Store

 

In this post, I’d like to introduce you to hamsterdb, an Apache 2-licensed, embedded analytical key-value database library similar to Google's leveldb and Oracle's BerkeleyDB.

hamsterdb is not a new contender in this niche. In fact, hamsterdb has been around for over 9 years. In this time, it has dramatically grown, and the focus has shifted from a pure key-value store to an analytical database offering functionality similar to a column store database. 

hamsterdb is single-threaded and non-distributed, and users usually link it directly into their applications. hamsterdb offers a unique (at least, as far as I know) implementation of Transactions, as well as other unique features similar to column store databases, making it a natural fit for analytical workloads. It can be used natively from C/C++ and has bindings for Erlang, Python, Java, .NET, and even Ada. It is used in embedded devices and on-premise applications with millions of deployments, as well as serving in cloud instances for caching and indexing.

hamsterdb has a unique feature in the key-value niche: it understands schema information. While most databases do not know or care what kind of keys are inserted, hamsterdb supports key types for binary keys...

Click to read more ...

Wednesday
Aug062014

Tokutek White Paper: A Comparison of Log-Structured Merge (LSM) and Fractal Tree Indexing

What data structure does your database use? It's not something the typical database user spends much time pondering. Since data structure is destiny, what data structure your database uses is a key point to consider in your selection process.

We know CouchDB uses a modified B+ tree. We've learned a lot fascinating details over the years about the use of Log-structured merge-trees in Cassandra, HBase and LevelDB. So B+ trees and LSMs seem familiar by now.

What may not be so familiar is Tokutek's Fractal Tree Indexing technology that is supposed to be even better than B+ trees and LSMs.

As a comparison between Fractal Tree Indexing and LSMs, Bradley Kuszmaul, Chief Architect at Tokutek, has written a detailed paper, a must read for the algorithmically inclined or someone interested in database internals: A Comparison of Log-Structured Merge (LSM) and Fractal Tree Indexing.

Here's a quick intro to Fractal Tree (FT) indexes:

Click to read more ...

Wednesday
Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...

Wednesday
Jan152014

Vedis - An Embedded Implementation of Redis Supporting Terabyte Sized Databases

I don't know about you, but when I first learned about Redis my initial thought was wow, why hasn't anyone done this before? My next thought was why put this functionality in a separate process? Why not just embed it in your own server code and skip the network path completely? Especially in a Service Oriented Architecture there's no need for an extra hop or extra software installation and configuration.

Now you can embed Redis-like code directly into your server with Vedis - an embeddable datastore C library built with over 70 commands similar in concept to Redis but without the networking layer since Vedis run in the same process of the host application. It's transactional, cross platform, thread safe, key-value, supports terabyte sized databases, has a GPL-like license (which isn't great for commercial apps), and supports an on-disk as well as in-memory datastore.

More about Vedis:

Click to read more ...