« Using Google AppEngine for a Little Micro-Scalability | Main | Google App Engine - what about existing applications? »
Monday
Apr212008

The Search for the Source of Data - How SimpleDB Differs from a RDBMS

Update 2: Yurii responds with the Top 10 Reasons to Avoid Document Databases FUD.
Update: Top 10 Reasons to Avoid the SimpleDB Hype by Ryan Park provides a well written counter take. Am I really that fawning? If so, doesn't that make me a dear?

All your life you've used a relational database. At the tender age of five you banged out your first SQL query to track your allowance. Your RDBMS allegiance was just assumed, like your politics or religion would have been assumed 100 years ago. They now say--you know them--that relations won't scale and we have to do things differently. New databases like SimpleDB and BigTable are what's different. As a long time RDBMS user what can you expect of SimpleDB? That's what Alex Tolley of MyMeemz.com set out to discover. Like many brave explorers before him, Alex gave a report of his adventures to the Royal Society of the AWS Meetup. Alex told a wild almost unbelievable tale of cultures and practices so different from our own you almost could not believe him. But Alex brought back proof.

Using a relational database is a no-brainer when you have a big organization behind you. Someone else worries about the scaling, the indexing, backups, and so on. When you are out on your own there's no one to hear you scream when your site goes down. In these circumstances you just want a database that works and that you never have to worry about again. That's what attracted Alex to SimpleDB. It's trivial to setup and use, no schema required, insert data on the fly with no upfront preparation, and it will scale with no work on your part. You become free from DIAS (Database Induced Anxiety Syndrome). You don't have to think about or babysit your database anymore. It will just work. And from a business perspective your database becomes a variable cost rather than a high fixed cost, which is excellent for the angel food funding. Those are very nice features in a database. But for those with a relational database background there are some major differences that take getting used to.

No schema. You don't have to define a schema before you use the database. SimpleDB is an attribute-value store and you can use any you like any time you like. It doesn't care. Very different from Victorian world of the RDBMS.

No joins. In relational theory the goal is to minimize update and deletion anomolies by normaling your data into seperate tables related by keys. You then join those tables together when you need the data back. In SimpleDB there are no joins. For many-to-1 relationships this works out great. In SimpleDB attribute values can have multiple values so there's no need to do a join to recover all the values. They are stored together. For many-to-many to relationships life is not so simple. You must code them by hand in your program. This is a common theme in SimpleDB. What the RDBMS does for you automatically must generally be coded by hand with SimpleDB. The wages of scale are more work for the programmer. What a surprise.

Two step query process. In a RDBMS you can select which columns are returned in a query. Not so in SimpleDB. In a query SimpleDB just returns back a record ID, not the values of the record. You need to make another trip to the database to get the record contents. So to minimize your latency you would need to spawn off multiple threads. See, more work for the programmer.

No sorting. Records are not returned in a sorted order. Values for multi-value attribute fields are not returned in sorted order. That means if you want sorted results you must do the sorting. And it also means you must get all the results back before you can do the sorting. More work for the programmer.

Broken cursor. SimpleDB only returns back 250 results at a time. When there are more results you cursor through the result set using a token mechanism. The kicker is you must iterate through the result set sequentially. So iterating through a large result set will take a while. And you can't use your secret EC2 weapon of massive cheap CPU to parallelize the process. More work for the programmer because you have to move logic to the write part of the process instead of the read part because you'll never be able to read fast enough to perform your calculations in a low latency environment.

The promise of scaling is fulfilled. Alex tested retrieving 10 record ids from 3 different database sizes. Using a 1K record database it took an average of 141 msecs to retrieve the 10 record ids. For a 100K record database it took 266 msecs on average. For a 1000K record database it took an average of 433 msecs to retrieve the 10 record ids. It's not fast, but it is relatively consistent. That seems to be a theme with these databases. BigTable isn't exactly a speed demon either. One could conclude that for certain needs at least, SimpleDB scales sufficiently well that you can feel comfortable that your database won't bottleneck your system or cause it to crash under load.

If you have a complex OLAP style database SimpleDB is not for you. But, if you have a simple structure, you want ease of use, and you want it to scale without your ever lifting a finger ever again, then SimpleDB makes sense. The cost is everything you currently know about using databases is useless and all the cool things we take for granted that a database does, SimpleDB does not do.

SimpleDB shifts work out of the database and onto programmers which is why the SimpleDB programming model sucks: it requires a lot more programming to do simple things. I'll argue however that this is the kind of suckiness programmers like. Programmers like problems they can solve with more programming. We don't even care how twisted and inelegant the code is because we can make it work. And as long as we can make it work we are happy. What programmers can't do is make the database scalable through more programming. Making a database scalable is not a solvable problem through more programming. So for programmers the right trade off was made. A scalable database you don't have to worry about for more programming work you already know how to do. How does that sound?

Related Articles



  • The new attack on the RDBMS by techno.blog("Dion")
  • The End of an Architectural Era (It’s Time for a Complete Rewrite) - A really fascinating paper bolstering many of the anti-RDBMS threads the have popped up on the intertube.
  • Reader Comments (19)

    Thanks for this article. You neatly summed up some things I've been thinking about -- see my article http://typicalprogrammer.com/programming/programmers-vs-rdbms/">Why Programmers Don’t Like Relational Databases.

    Programmers jump on every bandwagon that promises to replace relational databases. SimpleDB and BigTable are just the latest offerings. There are real uses for things like SimpleDB, just like there are real uses for plain text files or XML over an RDBMS. That doesn't mean that SimpleDB or BigTable will replace or kill off relational database systems, though.

    You wrote "Programmers like problems they can solve with more programming." I will add that programmers like problems they can solve without learning anything hard. In that respect programmers are a lot like people who talk about dieting and exercise and losing weight, because talking substitutes for actually making hard lifestyle changes. Learning enough relational theory and SQL to appreciate relational databases is hard -- it's not something that can be slammed through in a weekend with a "Head First" book. Like chubby, sweaty newbies at the gym, programmers learning RDBMSs will have a hard time at first, and they may be ridiculed by DBAs and programmers more proficient with relational databases. If they stick to it there's a big payoff, but most programmers give up and decide that relational databases are old technology, "enterprisey," legacy, too rigid, etc.

    Like a magic diet pill on a late-night infomercial there's always something like SimpleDB out there that promises quick fixes without the hard work. There's a whole culture of conspiracy-minded programmers out there blogging about how RDBMSs suck and how some new technology is better, but it's kept down by so-called relational supremacists. Sometimes the new technology is actually something like hierarchical or multivalued databases that went extinct before most programmers were born. SimpleDB looks appealing because it doesn't demand learning anything new and hard, just that the programmer write a lot of code to solve problems already addressed and optimized in even the most low-end RDBMSs. If all you know about RDBMSs is the received wisdom found on blogs ("they suck") then you don't know what you're giving up with tools like SimpleDB, or when those tools might not be appropriate choices.

    December 31, 1999 | Unregistered CommenterGreg Jorgensen

    Those coming from a Hibernate/JPA background should check out the SimpleJPA project: http://code.google.com/p/simplejpa/

    It is essentially a JPA implementation over SimpleDB. While it will obviously have the restrictions present with SimpleDB (no joins, etc), it does a great job of handling the other issues for the user such as the two step query process. I've found it much easier to get my existing schemas up and running using it rather than doing all calls to SimpleDB natively.

    December 31, 1999 | Unregistered Commenterjsjenkins168

    I'd argue that programmers aren't afraid of learning how to make RDBMSes faster... what they DONT like is learning techniques that are only useful for one single application. After you learn the basics about database performance -- and avoid WTFs in your code -- the scalability tips and tricks are vastly different depending on the database. And what about after the project is done? Who will perform the care and feeding of the database after the solution is live, and the developer has moved on?

    Developers would prefer a simpler model that requires knowing developer tricks that are useful across databases... the dangers of broken cursors, multithreaded programming, stale caches, etc. At least now the programmers have control over the environment, no matter who the DBA is.

    SimpleDB may be sub-optimal... but for environments that lack a helpful, competent DBA, its probably a really good route to take.

    December 31, 1999 | Unregistered Commenterbex

    I am really enjoying the watching of RDBMS fans as they squirm trying to figure out how to disparage this new wave of databases that will trounce their current skillsets.

    This isn't OODB all over again. These distributed DBs (bigtable, simpledb, couchdb, et al) are ALREADY powering the most powerful sites on the internets. They aren't some theoretical wonder spewed from some academic. They are for real.

    So, you are right. Traditional RDBMS won't go away. They will simply be relegated to toy-db status. And as we watch RDBMS fans make poor choices by sticking to their guns and then having their apps crumble at the exact time they become popular (becase RDBMS don't scale), they will win ever more scorn.

    December 31, 1999 | Unregistered CommenterJackson

    Dear Jackson,

    Traditional RDBMS won't go away. They will simply be relegated to toy-db status.

    Umm, no. They will be relegated to jobs that need multi-row transactional integrity. If you do not understand why, please refrain from writing any applications that handle money.

    December 31, 1999 | Unregistered CommenterJoshua Haberman

    Amen.

    RDBMS's aren't going anywhere.

    December 31, 1999 | Unregistered CommenterJon Gilkison

    If Google's AppEngine is an indication of how the future of these non-RDBMS systems will look like, I want nothing to do with it. AppEngine's so slow, a crappy P4 server with MySQL or PSQL will outperform it. Period!

    Fact is, most of the web app developers DO NOT have scalability issues. They are struggling to GET users! The way they get and retain users is with GREAT user experience. You cannot provide a great user experience with SimpleDB or AppEngine. They're just too slow, don't show changes immediately and are cumbersome and take you forever to program what all these other DBs do for free to you.

    This is as silly as coding your app in assembly so it runs faster. I think we all know where this leads... a total failure.

    So, if you're a developer and you're reading this, if you have scaling issues, you can easily buy another server since it costs next to nothing. Do not waste your time reinventing the wheels and sacrificing user experience... this will lead to a disaster.

    December 31, 1999 | Unregistered CommenterAnonymous

    This is a joke right?

    December 31, 1999 | Unregistered CommenterAnonymous

    "Fact is, most of the web app developers DO NOT have scalability issues. They are struggling to GET users!"

    Having vertical scalability is a nice benefit, but the real reason to use these new breed of databases is because they fit the web model better than a traditional RDBMS. SQL has it's place, but your blog, for example, is not one of them.

    December 31, 1999 | Unregistered CommenterAnonymous

    I like the 'no schema'. It's much more suited to many web applications.

    Many times, a version control system with good indexing (OLAP) would be a much better model than a RDBMS.

    December 31, 1999 | Unregistered CommenterAnonymous

    The "don't show changes immediately" is a really bad feature though...

    December 31, 1999 | Unregistered CommenterAnonymous

    Todd, thanks for the link to my site. By the way, the rest of the content on your site is *awesome* -- very informative and interesting. I just disagreed with this particular article. No schema, no sorting, and no joins "differences" from an RDBMS; I think they're fundamental problems in many data storage situations.

    December 31, 1999 | Unregistered CommenterRyan Park

    My name is Yurii, not Yurri :)

    December 31, 1999 | Unregistered CommenterYurii Rashkovskii

    My apologies. Oddly even with you pointing out the mistake it took me forever to see the difference. Explains my problems at scrabble :-)

    December 31, 1999 | Unregistered CommenterTodd Hoff

    In web applications I think that both kinds of databases have their place.

    Web app data often has two distinct segments. One segment is composed of read-intensive data that doesn't have to be joined or sorted, but there is tons of it. For this data, SimpleDB, BigTable, or CouchDB look like good choices because you can scale out the number of entries without doing much programming. You don't need the smarts and complexity of a RDBMS here. In the case of Amazon, you pay proportional to your use. eBay listings are a good example of this kind of data.

    Other data segment does have to joined and sorted, and handling insert and update anomalies is critical. Also, you want to add features, optionality, and data-driven behaviour to your web application without a lot of programming. For this kind of data, a RDBMS is very useful because you can scale out the number of paid users, sales, and data-driven features without doing much programming. Good example is any web application functionality that involves money, optional features.

    One factor that could tip the selection in favour of the SimpleDB style databases is that it is probably easier to use a program that searches and indexes the database because table joins are not used to determine key search concepts. That is only a gut feeling at this point, but Google seems to have done well with the general idea.

    December 31, 1999 | Unregistered CommenterJay Godse
    December 31, 1999 | Unregistered Commenterfarhaj

    Hi Todd,

    I've just open-sourced a high-level C# for Simple DB named Simple Savant. The project is here: http://www.codeplex.com/SimpleSavant

    It addresses some of the limitations of working with SimpleDB vs a relational database. You may want to check it out.

    Regards,
    Ashley

    December 31, 1999 | Unregistered CommenterAshley Tate

    The link fot Yurii article is broken...

    January 19, 2010 | Unregistered CommenterCyberelfo

    Sybase 12 is a perfectly useful relational database.

    Sure, its probably not going to win any performance prices, nor does it handle massive scale, nor is its T-SQL syntax the most modern but it does the job. If we were to start over we'd probably go for Oracle or DB2, but with JPA we can migrate if there is ever the time or the need. At the moment there is no time and certainly no need.

    So when someone comes along and tells me SimpleDB will make everything better, I say so what?

    We'll take a look at Couche and Mongo and Simple if we need a loose data model, but until then I'll trade all of that for an aging but reliable platform with the millions of lines of code that has been battle-hardened plugged into it.

    Sure, we're a super-tanker and aren't going to be able to turn around quickly (or go much faster), but we carry a lot of cargo and are running at full speed already and have been steaming along for a decade. Sure, you can try and catch up but we have cashflow (which is still growing incidentally) so if we really need to evolve faster I'll promote the grad who is doing peripheral projects trying new technologies out for us to a more central role. Until then I'm taking his enthusiastic evangelisms with the salt I've had to sweat out over my career supporting the numerous ways we have used to reliably store and retrieve data for our (happy) clients and business users.

    Everyone in IT eventually realises that the 'new' is usually just more polished and easier versions of the 'old'. We adopt new to save time and clean things up, not because the world has suddenly changed. Graph and hierarchical databases are no exception, they have been discussed and used for all of my career and as such they will always have a place.

    But please, don't assume any technology is 'the killer app' until you've seen it happen. Most just add to the mix of choices we all have to make our lives easier. At the moment I don't see SimpleDB making my teams life any easier, infact I see migrating to it breaking everything and getting us all fired.

    November 7, 2011 | Unregistered CommenterDan

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>