Monday
Oct182010
NoCAP
Monday, October 18, 2010 at 4:00PM
In this post i wanted to spend sometime on the CAP theorem and clarify some of the confusion that i often see when people associate CAP with scalability without fully understanding the implications that comes with it and the alternative approaches
You can read the full article here
Reader Comments (5)
As pointed out in many reactions to that post especially on Alex Popescu's blog, you have many misconceptions about what CAP is about.
This, in my opinion, illustrates best that the whole post misses the point, confuses people even more and poses a big red warning flag on any system created by "Gigaspaces".
You can of course achieve scalability without compromising consistency. CAP does not state otherwise. What it does state is that in a distributed system you will either have to sacrifice certain amounts of Availability or Partition tolerance.
Nothing less, nothing more. This has been formally proven.
Coming back to the quote from above I would say you chose the name because you want to ride the hype wave which you funnily critize in your very post.
I wont even go into details why the system you "came up with" (it's not the first with such a design) has many real world problems, Alex and Jeff already gave some very good points regarding this.
Why do you think that CAP and NoSQL models became a hot topic these days when the theorem itself exists since 2002?
Hint - see Avinash Lakshman description on the motivation behind the Cassandra project here
This entire post was triggered by recent discussions and meetings that i had in the past few weeks where different people came up with Eventual Consistency as part of a requirements for scalability. In other words people associate CAP with Scalability and for a reason.
Have you red the entire thread of discussion with Alex/Jeff?
Gilbert and Lynch define partition tolerance as follows:
There is nothing specifically in their definition about WAN or the fact that the two nodes need to live on separate geographical locations.
Quoting my response to Alex on that regard:
Of course they became a hot topic because of nowerdays load figures but you totally missed my point here again.
I said CAP does not say you can't achieve scalability. It says you can't have all 3 features in a system completely at once. There, can't be more clear than that.
Of course they associate CAP with scalability. I never said otherwise!
Many real world systems want WAN seperation because of a) disaster recovery (think DC going down because of fire or electricity problems) and b) to be closer to the end user (think CDN).
But you don't even have to have a WAN setup to get problems when your internal network goes down for whatever reason.
In your diagram you have a client write to a RAM cache which replicates to another cache synchronously which in turn write asynchronously to disk.
What if any of the two go down? Your system becomes unavailable. That is a problem, now tell me how do you still guarantee availability?
In fact you did not even create one single point of failure per shard/partition, no you created two!
Also where is the replication for the disk? Don't tell me RAID is a solution.
Another point: doing writes to disk async does only help with overall throughput to some extend as you can user smarter writing patterns, it does not make the disk magically a lot faster and you still need to get all writes to disk.
The system you outline achieves two things: read performance and consistency. It lacks availability and compared to evantual consistent systems write throughput.
If I want to be strict, I could also say your system does not guarantee durability in the ACID sense because a successfully acknowledged write could be forgotten if the systems go down before the data is written to disk. Depending on the lag between RAM and disk, this could be a big amount of data. And in turn depending on the system at hand, it could be a real disaster.
Somewhat true but again misses the point because it's more about availability.
"NoCAP" implies that you can overcome the restrictions the theorem imposes. You can not.
One more time: there is no "scalability" in CAP.
I see in the comments to Alex' post you outlined some failover mechanism. That was not mentioned at all in your post or diagram.
How does this differ from all the Master-Slave replica schemes employed by many (No)SQL databases?
How does a failed master node coming back know that it is a slave?
When a master fails, who tells the slave to become a master?
What if the link between the nodes is just down but everything else works, how do they determine who's the master?
Frost
Based on your and many other feedback i realized that the implementation details of the model that i outlined is not clear enough. I was assuming that since this model serves lots of mission critical applications today it is more known than it really is. I would post a follow-up post with more detailed on how all three properties of CAP are being handled in the proposed model.
The model was designed primarily with write scalability in-mind. What makes you come up with this assumption?
As for availability and partition tolerance it does assume that two failure wouldn't happen at the same time (you can also set it to deal with 3,4 replicas per node but for the sake of discussion and practicality i think that its fair to assume 1 replica per partition). If the two nodes live in two separate data centers as in many deployments then it is fair to assumes that two data centers wouldn't fail at the same time. I believe that covers most of the more likely failure scenarios.
You may find the following post - Elasticity for the Enterprise -- Ensuring Continuous High Availability in a Disaster Failure Scenario as a good reference on that regard. (Unfortunately i couldn't mention the name of the customer behind this architecture).
My problem with the P and A properties in CAP is that they are fairly vague.
I think that there are more variable degrees in which A and P could be handled then what comes up in this and many other discussions. In other words one can address A and P in certain degrees but not all possible degrees of failure scenario - what does that mean? that if you don't support three data center failure at the same time over the internet your not partition tolerance? I believe that it comes down to common sense. i.e. what's considered a likely failure scenario and what's not. and even that may change over a course of time and application. I would say that If you can't handle a likely failure scenario then your not partition tolerance. The question now becomes what's a likely failure scenario?
Anyway i'm working on another follow-up post that will put that and other perspectives together. Hopefully this will help to get a more aggregated view of the different perspective.