Entries in CAP (3)

Monday
Oct182010

NoCAP

In this post i wanted to spend sometime on the CAP theorem and clarify some of the confusion that i often see when people associate CAP with scalability without fully understanding the implications that comes with it and the alternative approaches

You can read the full article here

Thursday
Aug122010

Think of Latency as a Pseudo-permanent Network Partition

The title of this post is a quote from Ilya Grigorik's post Weak Consistency and CAP Implications. Besides the article being excellent, I thought this idea had something to add to the great NoSQL versus RDBMS debate, where Mike Stonebraker makes the argument that network partitions are rare so designing eventually consistent systems for such rare occurrence is not worth losing ACID semantics over. Even if network partitions are rare, latency between datacenters is not rare, so the game is still on.

The rare-partition argument seems to flow from a centralized-distributed view of systems. Such systems are scale-out in that they grow by adding distributed nodes, but the nodes generally do not cross datacenter boundaries. The assumption is the network is fast enough that distributed operations are roughly homogenous between nodes.

Click to read more ...

Tuesday
Feb032009

Paper: Optimistic Replication

To scale in the large you have to partition. Data has to be spread around, replicated, and kept consistent (keeping replicas sufficiently similar to one another despite operations being submitted independently at different sites). The result is a highly available, well performing, and scalable system. Partitioning is required, but it's a pain to do efficiently and correctly. Until Quantum teleportation becomes a reality how data is kept consistent across a bewildering number of failure scenarios is a key design decision. This excellent paper by Yasushi Saito and Marc Shapiro takes us on a wild ride (OK, maybe not so wild) of different approaches to achieving consistency. What's cool about this paper is they go over some real systems that we are familiar with and cover how they work: DNS (single-master, state-transfer), Usenet (multi-master), PDAs (multi-master, state-transfer, manual or application-specific conflict resolution), Bayou (multi-master, operation-transfer, epidemic propagation, application conflict resolution), CVS (multi-master operation-transfer, centralized, manual conflict resolution). The paper then goes on to explain in detail the different approaches to achieving consistency. Most of us will never have to write the central nervous system of an application like this, but knowing about the different approaches and tradesoffs is priceless. The abstract:

Data replication is a key technology in distributed data sharing systems, enabling higher availability and performance. This paper surveys optimistic replication algorithms that allow replica contents to diverge in the short term, in order to support concurrent work practices and to tolerate failures in low-quality communication links. The importance of such techniques is increasing as collaboration through wide-area and mobile networks becomes popular. Optimistic replication techniques are different from traditional “pessimistic” ones. Instead of synchronous replica coordination, an optimistic algorithm propagates changes in the background, discovers conflicts after they happen and reaches agreement on the final contents incrementally. We explore the solution space for optimistic replication algorithms. This paper identifies key challenges facing optimistic replication systems — ordering operations, detecting and resolving conflicts, propagating changes efficiently, and bounding replica divergence—and provides a comprehensive survey of techniques developed for addressing these challenges.
If you can't wait to know the ending, here's the summary of the paper:
We summarize some of the lessons learned from our own experience and in reviewing the literature. Optimistic, asynchronous data replication is an appealing technique; it indeed improves networking flexibility and scalability. Some environments or application areas could simply not function without optimistic replication. However, optimistic replication also comes with a cost. The algorithmic complexity of ensuring eventual consistency can be high. Conflicts usually require application-specific resolution, and the lost update problem is ultimately unavoidable. Hence our recommendations: (1) Keep it simple. Traditional, pessimistic replication, with many off-the-shelf solutions, is perfectly adequate in small-scale, fully connected, reliable networking environments. Where pessimistic techniques are the cause of poor performance or lack of availability, or do not scale well, try single-master replication: it is simple, conflictfree, and scales well in practice. State transfer using Thomas’s write rule works well for many applications. Advanced techniques such as version vectors and operation transfer should be used only when you need flexibility and semantically rich conflict resolution. (2) Propagate operations quickly to avoid conflicts. While connected, propagate often and keep replicas in close synchronization. This will minimize divergence when disconnection does occur. (3) Exploit commutativity. Commutativity should be the default; design your system so that non-commutative operations are the uncommon case. For instance, whenever possible, partition data into small, independent objects. Within an object, use monotonic data structures such as an append-only log, a monotonically increasing counter, or a union-only set. When operations are dependent upon each other, represent the invariants explicitly.

Related Articles

  • The End of an Architectural Era (It’s Time for a Complete Rewrite)
  • Big Table
  • Google's Paxos Made Live – An Engineering Perspective
  • Dynamo: Amazon’s Highly Available Key-value Store
  • Eventually Consistent - Revisited by Werner Vogels

    Click to read more ...