High Scalability -

Snabb Switch - Skip the OS and Get 40 million Requests Per Second in Lua

Thursday, February 13, 2014 at 8:56AM

Snabb Switch - a toolkit for solving novel problems in networking. If you are building a new packet-processing network appliance then you can use Snabb Switch to get the job done more quickly.

Here's a great impassioned overview from erichocean:

Or, you could just avoid the OS altogether: https://github.com/SnabbCo/snabbswitch

Our current engineering target is 1 million writes/sec and > 10 million reads/sec on top of an architecture similar to that, on a single box, to our fully transactional, MVCC database (write do not block reads, and vice versa) that runs in the same process (a la SQLite), which we've also merged with our application code and our caching tier, so we're down to—literally—a single process for what would have been at least three separate tiers in a traditional setup.

The result is that we had to move to measuring request latency in microseconds exclusively. The architecture (without additional application-specific processing) supports a wire-to-wire messaging speed of 26 nanoseconds, or approx. 40 million requests per second. And that's written in Lua!

To put that in perspective, that kind of performance is about 1/3 of what you'd need to be able to do to handle Facebook's messaging load (on average, obviously, Facebook bursts higher than the average at times...).

Point being, the OS is just plain out-of-date for how to solve heavy data plane problems efficiently. The disparity between what the OS can do and what the hardware is capable of delivering is off by a few orders of magnitude right now. It's downright ridiculous how much performance we're giving up for supposed "convenience" today.

4 Comments |

Permalink |

Product

Wednesday

Jan152014

Vedis - An Embedded Implementation of Redis Supporting Terabyte Sized Databases

Wednesday, January 15, 2014 at 8:56AM

I don't know about you, but when I first learned about Redis my initial thought was wow, why hasn't anyone done this before? My next thought was why put this functionality in a separate process? Why not just embed it in your own server code and skip the network path completely? Especially in a Service Oriented Architecture there's no need for an extra hop or extra software installation and configuration.

Now you can embed Redis-like code directly into your server with Vedis - an embeddable datastore C library built with over 70 commands similar in concept to Redis but without the networking layer since Vedis run in the same process of the host application. It's transactional, cross platform, thread safe, key-value, supports terabyte sized databases, has a GPL-like license (which isn't great for commercial apps), and supports an on-disk as well as in-memory datastore.

More about Vedis:

6 Comments |

Permalink |

Database,

Product

Wednesday

Apr172013

Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS

Wednesday, April 17, 2013 at 9:25AM

Tachyon (github) is interesting new filesystem brought to by the folks at the UC Berkeley AMP Lab:

Tachyon is a fault tolerant distributed ﬁle system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.It offers up to 300 times higher throughput than HDFS, by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that is frequently read.

It has a Java-like File API, native support for raw tables, a pluggable file system, and it works with Hadoop with no modifications.

It might work well for streaming media too as you wouldn't have to wait for the complete file to hit the disk before rendering.

Discuss on Hacker News

When all the Program's a Graph - Prismatic's Plumbing Library

Thursday, February 14, 2013 at 11:35AM

At some point as a programmer you might have the insight/fear that all programming is just doing stuff to other stuff.

Then you may observe after coding the same stuff over again that stuff in a program often takes the form of interacting patterns of flows.

Then you may think hey, a program isn't only useful for coding datastructures, but a program is a kind of datastructure and that with a meta level jump you could program a program in terms of flows over data and flow over other flows.

That's the kind of stuff Prismatic is making available in the Graph extension to their plumbing package (code examples), which is described in an excellent post: Graph: Abstractions for Structured Computation.

You may remember Prismatic from previous profile we did on HighScalability: Prismatic Architecture - Using Machine Learning On Social Networks To Figure Out What You Should Read On The Web. We learned how Prismatic, an interest driven content suggestion service, builds programs in terms of graph stuff:

Performance data for LevelDB, Berkley DB and BangDB for Random Operations

Thursday, November 29, 2012 at 9:15AM

This is a guest post by Sachin Sinha, Founder of Iqlect and developer of BangDB.

The goal for the paper is to provide the performances data for following embedded databases under various scenarios for random operations such as write and read. The data is presented in graphical manner to make the data self explanatory to some extent.

LevelDB:
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Leveldb is based on LSM (Log-Structured Merge-Tree) and uses SSTable and MemTable for the database implementation. It's written in C++ and availabe under BSD license. LevelDB treats key and value as arbitrary byte arrays and stores keys in ordered fashion. It uses snappy compression for the data compression. Write and Read are concurrent for the db, but write performs best with single thread whereas Read scales with number of cores

BerkleyDB:
BerkleyDB (BDB) is a library that provides high performance embedded database for key/value data. Its the most widely used database library with millions of deployed copies. BDB can be configured to run from concurrent data store to transactional data store to fully ACID compliant db. It's written in C and availabe under Sleepycat Public License. BDB treats key and value as arbitrary byte arrays and stores keys in both ordered fashion using BTREE and un-ordered way using HASH. Write and Read are concurrent for the db, and scales well with number of cores especially the Read operation

BangDB:
BangDB is a high performance embedded database for key value data. It's a new entrant into the embedded db space. It's written in C++ and available under BSD license. BangDB treats key and value as arbitrary byte arrays and stores keys in both ordered fashion using BTREE and un-ordered way using HASH. Write, Read are concurrent and scales well with the number of cores

The comparison has been done on the similar grounds (as much as possible) for all the dbs to measure the data as crisply and accurately as possible.

The results of the test show BangDB faster in both reads and writes:

26 Comments |

Permalink |

key-value store

Tuesday

Oct092012

Batoo JPA - The new JPA Implementation that runs over 15 times faster...

Tuesday, October 9, 2012 at 9:15AM

This post is by Hasan Ceylan, an Open Source software enthusiast from Istanbul.

I loved the JPA 1.0 back in early 2000s. I started using it together with EJB 3.0 even before the stable releases. I loved it so much that I contributed bits and parts for JBoss 3.x implementations.

Hasan Ceylan |

39 Comments |

Permalink |

tagged

batoo,

hibername,

j2ee,

java,

jpa in

Android,

Java,

Performance,

Product,

j2EE

Monday

Sep102012

Russ’ 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware

Monday, September 10, 2012 at 9:20AM

My name is Russell Sullivan, I am the author of AlchemyDB: a highly flexible NoSQL/SQL/DocumentStore/GraphDB-datastore built on top of redis. I have spent the last several years trying to find a way to sanely house multiple datastore-genres under one roof while (almost paradoxically) pushing performance to its limits.

I recently joined the NoSQL company Aerospike (formerly Citrusleaf) with the goal of incrementally grafting AlchemyDB’s flexible data-modeling capabilities onto Aerospike’s high-velocity horizontally-scalable key-value data-fabric. We recently completed a peak-performance TPS optimization project: starting at 200K TPS, pushing to the recent community edition launch at 500K TPS, and finally arriving at our 2012 goal: 1M TPS on $5K hardware.

Getting to one million over-the-wire client-server database-requests per-second on a single machine costing $5K is a balance between trimming overhead on many axes and using a shared nothing architecture to isolate the paths taken by unique requests.

Even if you aren't building a database server the techniques described in this post might be interesting as they are not database server specific. They could be applied to a ftp server, a static web server, and even to a dynamic web server.

Here is my personal recipe for getting to this TPS per dollar...

17 Comments |

Permalink |

Product,

Strategy

Monday

Aug202012

The Performance of Distributed Data-Structures Running on a "Cache-Coherent" In-Memory Data Grid

Monday, August 20, 2012 at 9:15AM

This is a guest post by Ron Pressler, the founder and CEO of Parallel Universe, a Y Combinator company building advanced middleware for real-time applications.

A little over a month ago, we open-sourced a new in-memory data grid called Galaxy. An in-memory data grid, or IMDG, is a clustered data storage and processing middleware that uses RAM as the authoritative and primary storage, and distributes data over a cluster for purposes of data and processing scalability and high-availability. A common feature of IMDGs is co-location of code and data, meaning that application code runs on all cluster nodes, each instance processing those data items residing in the local node's RAM.

While quite a few commercial and open-source IMDGs are available (like Terracotta, Gigaspaces, Oracle Coherence, GemFire, Websphere eXtreme Scale, Infinispan and Hazelcast), Galaxy has adopted a completely different architecture from all other IMDGs, to service some usage scenarios ill-fitted to the other solutions.

All other IMDGs, as well as most distributed NoSQL databases (like Riak and Cassandra) employ what is known as distributed hash-tables (DHTs) to partition and locate data items in the cluster. DHTs assign a data item to one or more cluster nodes based on a static hash value computed for each item's key (those systems provide access to items by keys). This means that an item's owning cluster-node(s) can be easily located, and require just one network roundtrip per access in the worst case (for a read or a write). However, that one network roundtrip is also required in the common case.

Galaxy, on the other hand, dynamically migrates items among cluster nodes making a different tradeoff: accessing an item might take more than one network roundtrip in the worst-case scenario, but the common case requires no hops at all ...

1 Comment |

Permalink |

imdg

Tuesday

Aug142012

MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)

Tuesday, August 14, 2012 at 9:13AM

This is an interview with MemSQL cofounder’s Eric Frenkiel and Nikita Shamgunov, in which they try to answer critics by going into more depth about their technology.

MemSQL ruffled a few feathers with their claim of being the fastest database in the world. According to their benchmarks MemSQL can execute 200K TPS on an EC2 Quadruple Extra Large and on a 64 core machine they can push 1.2 million transactions a second.

Benchmarks are always a dark mirror, so make of them what you will, but the target market for MemSQL is clear: projects looking for something both fast and familiar. Fast as in a novel design using a combination of technologies like MVCC, code generation, lock-free data structures, skip lists, and in-memory execution. Familiar as in SQL and nothing but SQL. The only interface to MemSQL is SQL.

It’s right to point out MemSQL gets a boost by being a first release. Only a limited subset of SQL is supported, neither replication or sharding are implemented yet, and writes queue in memory before flushing to disk. The next release will include a baseline distributed system, native replication, n-way joins, and subqueries. Maintaining performance as more features are added is a truer test.

And MemSQL is RAM based, so of course it’s fast, right? Even among in-memory databases MemSQL hopes to convince you they’ve made some compelling design choices. The reasoning for their design goes something like: