
This is a guest post by Ron Pressler, the founder and CEO of Parallel Universe, a Y Combinator company building advanced middleware for real-time applications.
A little over a month ago, we open-sourced a new in-memory data grid called Galaxy. An in-memory data grid, or IMDG, is a clustered data storage and processing middleware that uses RAM as the authoritative and primary storage, and distributes data over a cluster for purposes of data and processing scalability and high-availability. A common feature of IMDGs is co-location of code and data, meaning that application code runs on all cluster nodes, each instance processing those data items residing in the local node's RAM.
While quite a few commercial and open-source IMDGs are available (like Terracotta, Gigaspaces, Oracle Coherence, GemFire, Websphere eXtreme Scale, Infinispan and Hazelcast), Galaxy has adopted a completely different architecture from all other IMDGs, to service some usage scenarios ill-fitted to the other solutions.
All other IMDGs, as well as most distributed NoSQL databases (like Riak and Cassandra) employ what is known as distributed hash-tables (DHTs) to partition and locate data items in the cluster. DHTs assign a data item to one or more cluster nodes based on a static hash value computed for each item's key (those systems provide access to items by keys). This means that an item's owning cluster-node(s) can be easily located, and require just one network roundtrip per access in the worst case (for a read or a write). However, that one network roundtrip is also required in the common case.
Galaxy, on the other hand, dynamically migrates items among cluster nodes making a different tradeoff: accessing an item might take more than one network roundtrip in the worst-case scenario, but the common case requires no hops at all ...
Click to read more ...