Entries in nosql (54)

Monday
Dec062010

What the heck are you actually using NoSQL for?

It's a truism that we should choose the right tool for the job. Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? Is it worth the risk to try something new when all my people know something else and we have a deadline to meet? How can I make all the tools work together?

In the NoSQL space this kind of real-world data is still a bit vague. When asked, vendors tend to give very general answers like NoSQL is good for BigData or key-value access. What does that mean for for the developer in the trenches faced with the task of solving a specific problem and there are a dozen confusing choices and no obvious winner? Not a lot. It's often hard to take that next step and imagine how their specific problems could be solved in a way that's worth taking the trouble and risk.

Let's change that. What problems are you using NoSQL to solve? Which product are you using? How is it helping you? Yes, this is part the research for my webinar on December 14th, but I'm a huge believer that people learn best by example, so if we can come up with real specific examples I think that will really help people visualize how they can make the best use of all these new product choices in their own systems.

Here's a list of uses cases I came up with after some trolling of the interwebs. The sources are so varied I can't attribute every one, I'll put a list at the end of the post. Please feel free to add your own. I separated the use cases out for a few specific products simply because I had a lot of uses cases for them they were clearer out on their own. This is not meant as an endorsement of any sort. Here's a master list of all the NoSQL products. If you would like to provide a specific set of use cases for a product I'd be more than happy to add that in.

Click to read more ...

Thursday
Oct282010

NoSQL Took Away the Relational Model and Gave Nothing Back

Update: Benjamin Black said he was the source of the quote and also said I was wrong about what he meant. His real point: The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them.

At the A NoSQL Evening in Palo Alto, an audience member, sorry, I couldn't tell who, said something I found really interesting: NoSQL took away the relational model and gave nothing back.

The idea being that NoSQL has focussed on ease of use, scalability, performance, etc, but it has lost the idea of how data relates to other data. True to its name, the relational model is very good at capturing a managing relationships. With NoSQL all relationships have been pushed back onto the poor programmer to implement in code rather than the database managing it. We've sacrificed usability. NoSQL is about concurrency, latency, and scalability, but it's not about data.

My ears perked up because I said something similar a while back while commenting on VoltDB's criticism of the NoSQL transaction model: I agree completely that moving the repair logic to the programmer is a recipe for disaster. Having programmers worry about read repair, vector clocks, the commutativity of transactions, how to design compensatory transactions to make up for previous failed transactions, and the other very careful bits of design, is asking for a very fragile system. ACID transactions are clean and understandable and that's why people like them.

Relationships are very much in the same spirit. Managing relationships without explicit support or multiple object transaction support puts a huge burden on the programmer. At one level key-value systems are far simpler at every level to use. That's great. But, for more complex data all that work really comes back and falls on the overburdened shoulders of the programmer. What I liked about this comment during the event is that it put an emphasis on making the programmers life easier across a wider variety of use cases, which is always a good thing, and it was worth surfacing.

Related Articles

  • Interesting Hacker News Thread, especially the commentary on state of the art IO subsystems.

Thursday
Oct282010

Notes from A NOSQL Evening in Palo Alto 

I along with 180 other people and veritable who's who of NoSQL vendors, attended the A NoSQL Evening in Palo Alto NoSQL Meetup on Tuesday. The format was a panel of 10 vendors--10gen, Basho, CouchOne, Cloudant, Cloudera, GoGrid, InfiniteGraph, Membase, Riptano, Scality--sitting in two rows of chairs in front of what seemed like a pretty diverse audience. Tim Anglade (founder, A NOSQL Summer) moderated. Tim kept things moving by asking a few leading questions and the panel chimed in with answers. Quite a few questions came from the audience, which was refreshing. 

Overall a genial evening with some good discussion. I was pleased that the panel members didn't just automatically slip into marketing speak. Most of the discussions were on point rather than just another excuse to hit the talking points. There were some complaints about the talk not being technical enough, but I don't think that was really the purpose of this kind of talk. The panel format is excellent at giving a wide range of views on general topics, and that's exactly how the evening went.

Some key takeaways:

  • Good energy. A lot of people are trying to good things and are excited to be in a space where technology still matters more than politics. Real problems are being solved for customers and that's motivating.
  • NoSQL took away the relational model and gave nothing back. Using NoSQL for complex data puts way too much pressure on the programmer.
  • NoSQL will not converge. There's no consensus on what the next thing will be, so we are unlikely to see any standardization in the NoSQL world any time soon. There is a convergence on some features, but it seems the products will evolve to serve specific markets. This is not a bad thing. NoSQL doesn't need to converge on one stack. Products can remain differentiated by being able solve specific problems.
  • NoSQL has a parallel to the "back to the land movement". As the relational world and the framework world got ever more complex and expensive, a counter movement developed that sought out simplicity and transparency. 

Click to read more ...

Sunday
Sep052010

Hilarious Video: Relational Database vs NoSQL Fanbois

This is so funny I laughed until I cried! Definitely NSFW. OMG it's hilarious, but it's also not a bad overview of the issues. Especially loved: You read the latest post on HighScalability.com and think you are a f*cking Google and architect and parrot slogans like Web Scale and Sharding but you have no idea what the f*ck you are talking about. There are so many more gems like that.

Thanks to Alex Popescu for posting this on MongoDB is Web Scale. Whoever made this deserves a Webby.

Wednesday
Sep012010

Paper: The Case for Determinism in Database Systems  

Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL, coauthored with Alexander Thomson, on their paper The Case for Determinism in Database Systems. We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach.

The solution, they propose, is: 

Click to read more ...

Sunday
Jul112010

So, Why is Twitter Really Not Using Cassandra to Store Tweets?

A firestorm of accusations circled around recently saying that Cassandra, the elected-by-major-adopters emperor of the NoSQL movement, has no clothes. It was said Twitter was dumping Cassandra; Reddit outages were linked to Cassandra; and even Facebook, Cassandra's cradle of birth, was said to have abandoned Cassandra. Shouts of NoSQL Fail! were heard in the streets. Much gloating followed. Is the emperor really naked? Casually dressed maybe, but not naked.

(Note: after this point the article contains a flow chart that is NSFW. Some people are very sensitive about cussing, so if that's you, please go back, don't read on. Danger! There are no nude pictures or anything, just some strong language. But this is my most favorite flow chart of all time, so it's worth it :-)

Is Twitter really abandoning Cassandra?

Click to read more ...

Monday
Jun282010

VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process

What do you get when you take a SQL database and start a new implementation from scratch, taking advantage of the latest research and modern hardware? Mike Stonebraker, the sword wielding Johnny Appleseed of the database world, hopes you get something like his new database, VoltDB: a pure SQL, pure ACID, pure OLTP, shared nothing, sharded, scalable, lockless, open source, in-memory DBMS, purpose-built for running hundreds of thousands of transactions a second. VoltDB claims to be 100 times faster than MySQL, up to 13 times faster than Cassandra, and 45 times faster than Oracle, with near-linear scaling.

Will VoltDB kill off the new NoSQL upstarts? Will VoltDB cause a mass extinction of ancient databases? Probably no and no to both questions, but it's a product with a definite point-of-view and is worth a look as the transaction component in your system. But will it be right for you? Let's see...

Click to read more ...

Thursday
Apr292010

Product: SciDB - A Science-Oriented DBMS at 100 Petabytes

Scientists are doing it for themselves. Doing what? Databases. The idea is that most databases are designed to meet the needs of businesses, not science, so scientists are banding together at scidb.org to create their own Domain Specific Database, for science. The goal is to be able to handle datasets in the 100PB range and larger.

SciDB, Inc. is building an open source database technology product designed specifically to satisfy the demands of data-intensive scientific problems. With the advice of the world's leading scientists across a variety of disciplines including astronomy, biology, physics, oceanography, atmospheric sciences, and climatology, our computer scientists are currently designing and prototyping this technology

The scientists that are participating in our open source project believe that the SciDB database — when completed — will dramatically impact their ability to conduct their experiments faster and more efficiently and further improve the quality of life on our planet by enabling them to run experiments that were previously impossible due to the limitations of existing database systems and infrastructure. Many of the world's leading computer scientists with expertise in database systems have contributed to the design and architecture of the system to meet the needs of the world's scientists.

SciDB looks like a cool project and follows what might be considered a trend, instead of beating a general tool into submission, build a specialized tool that does what you need it to do. More details about SciDB can be found in the paper A Demonstration of SciDB: A Science-Oriented DBMS. A nice succinct poster is available summarizing the product.

Some interesting bits from the paper:

Click to read more ...

Monday
Apr122010

Poppen.de Architecture

This is a guest a post by Alvaro Videla describing their architecture for Poppen.de, a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung.

What is Poppen.de?

Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems.

The Stats

  • 2.000.000 users
  • 20.000 concurrent users
  • 300.000 private messages per day
  • 250.000 logins per day
  • We have a team of eleven developers, two designers and two sysadmins for this project.

Click to read more ...

Tuesday
Apr062010

Strategy: Make it Really Fast vs Do the Work Up Front

In Cool spatial algos with Neo4j: Part 1 - Routing with A* in Ruby Peter Neubauer not only does a fantastic job explaining a complicated routing algorithm using the graph database Neo4j, but he surfaces an interesting architectural conundrum: make it really fast so work can be done on the reads or do all the work on the writes so the reads are really fast.

The money quote pointing out the competing options is:

[Being] able to do these calculations in sub-second speeds on graphs of millions of roads and waypoints makes it possible in many cases to abandon the normal approach of precomputing indexes with K/V stores and be able to put routing into the critical path with the possibility to adapt to the live conditions and build highly personalized and dynamic spatial services.

The poster boys for the precompute strategy is SimpleGeo, a startup that is building a "scaling infrastructure for geodata." Their strategy for handling geodata is to use Cassandra and build two clusters: one for indexes and one for records. The records cluster is a simple data lookup. The index cluster has a carefully constructed key for every lookup scenario. The indexes are computed on the write, so reads are very fast. Ad hoc queries are not allowed. You can only search on what has been precomputed.

What I think Peter is saying is because a graph database represents the problem in such a natural way and graph navigation is so fast, it becomes possible to run even large complex queries in real-time. No special infrastructure is needed.

If you are creating a geo service, which approach would you choose? Before you answer, let's first ponder: is the graph database solution really solving the same problem as SimpleGeo is solving?

Click to read more ...