So, Why is Twitter Really Not Using Cassandra to Store Tweets?

A firestorm of accusations circled around recently saying that Cassandra, the elected-by-major-adopters emperor of the NoSQL movement, has no clothes. It was said Twitter was dumping Cassandra; Reddit outages were linked to Cassandra; and even Facebook, Cassandra's cradle of birth, was said to have abandoned Cassandra. Shouts of NoSQL Fail! were heard in the streets. Much gloating followed. Is the emperor really naked? Casually dressed maybe, but not naked.
(Note: after this point the article contains a flow chart that is NSFW. Some people are very sensitive about cussing, so if that's you, please go back, don't read on. Danger! There are no nude pictures or anything, just some strong language. But this is my most favorite flow chart of all time, so it's worth it :-)
Is Twitter really abandoning Cassandra? Not according to Twitter, which came out with a post, Cassandra at Twitter Today, explaining that they are using Cassandra in production for geolocation and analytics. Twitter, however, will not be using Cassandra to store tweets. Why? Twitter’s Ryan King says: This is a change in strategy. Instead we’re going to continue to maintain our existing Mysql-based storage. We believe that this isn’t the time to make large scale migration to a new technology.
Twitter is busy fighting other fires and they don't have the time to retrofit something that is (more or less) working, namely their MySQL based tweet storage, with a completely new technology based on Cassandra.
This is the perfect opportunity to share a flow chart that I copied many years ago from a lone cubicle wall, deep in the heart of Dilbert-land. This may be something like the thought process Twitter went through in making their decision:
Flow Chart for Project Decision Making
I still get a laugh everytime I read it. It's so true. Every company has these decisions to make when deciding where to put resources. Should you build, buy, rebuild, expand, or hang on with your fingernails? The soul of an engineer says do it right and start over. The best business decision might be quite different.
Joel Spolsky once declared rewriting working code from scratch as on of the things you should never ever do. Remember Netscape? Many no longer do, but at one time they were the web. From those heights they were brought low says Joel, because they made the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch.
Twitter may not want to make the same mistake. Does this mean Cassandra and NoSQL suck? No, I think it's just smart project planning. It's actually OK to have multiple platforms for different purposes.
Does Twitter work well enough though? Brad McCarty in Twitter implements more features; ignores its broken platform, makes a good case that Twitter should shore up its infrastructure before moving on. But the problems Twitter seems to be having are with facilities, not the core Tweet engine. Until recently Twitter's up-time has been pretty good. So fix that, keep the Tweet storage engine going, event if it's ugly, and start moving on with new competition crushing features. Seem reasonable?
Some of this is inevitable. Cassandra has been cloistered inside the loving arms of Facebook and dedicated early adopters. Only recently has Cassandra left home and had to enter a very complicated world where customers have a very wide variety of needs. It will take a while for Cassandra to mature, for substantial new features to be added, for existing features to be rearchitected, and for people to figure out what Cassandra is really good at. Until then there will problems. And for developers switching to a new tool chain there will be pain as they will have to learn, relearn, and overcome a lot of obstacles. This process will not be pretty. So we will hear about problems with Cassandra and every other product out there. That's just what it means to be new. Reddit seems to be in this zone.
Is Facebook really abandoning Cassandra? Nope. Facebook still has a 150 node Cassandra cluster used for Inbox Search, which supports close to 500M users, over, 150TB of data, and is growing rapidly everyday. Now, it would be interesting to see the rate of adoption of Cassandra into other groups. When a new feature is being implemented is Cassandra being selected internally? Is it still being developed? We don't know, but Facebook is still using Cassandra for a major feature.
Related Articles
- Twitter Changes Tweet Storage Strategy, Confirms Realtime Analytics Product by MG Siegler at TechCrunch.
- Why are Facebook, Digg, and Twitter so hard to scale?
- Six Ways Twitter May Reach its Big Hairy Audacious Goal of One Billion Users
- Database Drama by Jeremy Zawodny
- Twitter: Comparing its Velocity, not Downtime by Alex Williams
- Excellent discussion on Reddit
Reader Comments (19)
Did ... you just say that Twitter's uptime has been pretty good?
Do you not know what a failwhale is?
From a high level architectural view, if you aren't growing you're dying. Cassandra's use in facebook doesn't appear to be expanding, considering I hear they have a scaling problem, it suggests that perhaps the actual use cases of Cassandra are not as wide as they appear... or perhaps it isn't scalable after all?
As for twitter, they seem to have some smart people, I wonder why they couldn't make it work for them? Would you have a better chance?
Used to like your blog until you turned into a Cassandra fanboy. Cassandra != NoSQL. If Facebook might move from one NoSQL option to another, does that mean NoSQL is the problem? No. What do you think is going on over there? Cassandra != NoSQL. There are other options that work better depending on what you are trying to do. Does the emperor have no clothes? What emperor are we talking about? Cassandra != NoSQL
Just Rename High Scalability to Cassandra Scalabil, please don't remove your HighScalability tattoo yet. I just love that thing.
There's an answer on Quora from a FB engineer (I think) saying they have no plans to use Cassandra for anything else, but will keep using it for inbox search.
it seems it's still under development...
Recently twitter did get a lot better:
http://www.publicstatic.net/2010/07/twitter-upgrades/
But it's still averaging just above 99% availability. That's not so good. :-)
http://demo.leemba.com/platform/?groupId=1&platformId=11
The uptime for Twitter has seemed to improve a bit as of late. But they need to continue to grow and improve with that growth. @jgwentworth
I came back today to get the link to the flow chart for a co worker but it is gone? What gives? You link to someone else and the image got them suspended? :)
Hey Mark, I'm not sure what's up. I serve most images off my Flickr account (http://www.flickr.com/photos/13733851@N00/) so I don't think I got suspended :-) It's there when I look.
what makes me anxious about Cassandra is that it was just sort of dumped in the open source arena. it has been a couple of years now, and the thing doesn't even exist in the official Maven repositories. this means that we have to spend a LOT of time dicking around with multiple versions of Thrift, Cassandra and all manner of unpleasantness.
this is why enthusiasm for Cassandra is, shall we say, "limited".
Digg is throwing down and saying they can be the exception to this rule. Quite a bet.
MongoDB is where it's at.
Does anyone seriously think that being in the official Maven repository is the ultimate evidence of a project's maturity, quality, or anything else at all? I'd almost say that non-participation in such an insular and half-assed attempt to displace real package managers with a language-specific one (which is by definition a big waste of everyone's time) is a sign of smart engineering and resource management. I don't see the Linux kernel there, so it must suck too. I don't see any of Cassandra's obvious competitors there either. Maybe some people set their sights beyond the asylum walls.
Digg is throwing down and saying they can be the exception to this rule. Quite a bet.
I really don't think the demise of Netscape was mainly because of the rewrite, even though it clearly was a bad move where Netscape was at the time. Rewrites are done all the time. (SourceForge.net is currently undergoing a rewrite, for example). Netscape's problem was that they didn't have control over all the functionality of the existing system, and tried to add features that affected the rewritten part at the same time, introducing even more bugs and problems.
I think Twitter's decision not to rewrite is a good one at this point, considering the trouble it has experienced. It might be a good decision at a later stage, though, whether the do the move to Cassandra or something else. They just need more control over what they already have, in my eyes.
"because they made the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch."
I find this very interesting, and having worked for a huge bank still using COBOL, it seems that this advice is quite used. I'm wondering what do you think about Digg's "rewriting from the ground up"?
Do we know enough about Digg's situation to make a judgement? I'm personally not so against rewriting, especially for core capabilities that make or break a company. Maybe Digg is in that situation? Twitter seemed to be in more of Wouldn't it Be Nice situation and in that case it's easier, if still not distasteful, to nurse the old situation along. In the case of older apps there simply may not be enough ROI to justify the investment.
"single worst strategic mistake"
I generally like and agree with Joel. Really, I do. But it's been my experience that working code is not enough. If that code is not proven robust, and if that code is core to what you do, and if noone in your organization knows how it works, that code cannot be left alone until the day one of its inputs or dependencies causes it to fail. That risk is typically greater than the one you will take in sending someone into that code with the mission of (1) understand it and (2) if it's cryptic and arcane, make it less so. There is the 3rd option of (3) take all that custom stuff we wrote, like MyCompanyLinkedList, and point it at a 3rd party library so the code base we are talking about shrinks to the core logic we are selling.
I'm certainly not trying to fully contradict Joel's observations, but I think they are far too broadly applied. In the case of Big Data problems you have to rewrite / refactor earlier than you do with other projects because there is a huge tail of testing and data conversion that is typically not a factor in other projects. Just my perspective on this, though.
Most of Social networks (like Facebook) has stopped using mysql as main database and switched to use Cassandra or other no-sql DB. And we can consider this change as big grow for this new open-source data store, Cassandra, which was developed originally by Facebook to solve the problem of inbox search and to be fast, reliable and had the ability to handle read and write requests at the same time
source: Why does large Social Network projects switch to use Cassandra instead of Mysql?