« How can I learn to scale my project? | Main | Amazon SimpleDB - Scalable Cloud Database »
Friday
Dec142007

The Current Pros and Cons List for SimpleDB

Not surprisingly opinions on SimpleDB vary from it sucks, don't take my database, to it will change the world, who needs a database anyway? From a quick survey of the blogosphere, here's where SimpleDB stands at the moment:

SimpleDB Cons



  • No SLA. We don't know how reliable it will be, how fast it will be, or how consistent the performance will be.
  • Consistency constraints are relaxed. Reading data immediately after a write may not reflect the latest updates. To programmers used to transactions, this may be surprising, but many people think this is one of the tradeoffs that needs to be made to scale.
  • Database is a core competency. If you don't control your database you can't out compete your competition.
  • When your database is out of your control you can't guarantee it will work properly. You can't create the proper indexes and other optimizations.
  • No join or IN operator. You'll need to do multiple client side calls to simulate joins, which will be slow.
  • No stored procedures, referential integrity, and other relational goodies. This is not a professional product.
  • Attribute size limited to 1024 bytes. It's not designed for content serving.
  • Latency from outside Amazon will be high.
  • Setting up and maintaining a database is cheap and easy these days, so why bother? It costs too much compared when compared to running your own servers.
  • What happens when you need to super scale to very large datasets?
  • No API support from common languages like PHP, Ruby, etc.
  • All your existing code and infrastructure needs to be rewritten.
  • Not geographically distributed with nearest datacenter routing.
  • Queries are lexigraphical. So you’ll need to store data in lexicographical order. This means says inside looking out: zero-padding your integers, adding positive offsets to negative integer sets, and converting dates into something like ISO 8601.
  • Attribute values are typeless which could lead to a lot of typing related errors and inefficient queries.
  • The 10 GB maximum per domain is too limiting.
  • It's not Dynamo. Amazon is keeping the really good stuff to themselves.
  • Text searching is not supported. You'll need to construct your own fast search indexes.
  • Queries are limited to 5 seconds running time. It's only for getting and setting, nothing more SQLish.
  • No cloned APIs for unit testing. Need to be able to develop locally against other data stores.
  • Your data is under Amazon's control, so there could be security and privacy problems.
  • The XML based protocol unnecessarily increases overhead, latency, and cost.
  • Lockin. If you decide to leave Amazon’s cloud how do you move all your data and get a similar system up and working outside the cloud?
  • Open cash register. Since SDB is charge on use, a malicious user can simply setup a loop to query your site, which costs you an unbounded amount of money.

    SimpleDB Pros



  • SimpleDB is not a relational database. Relational databases are too complex and don't scale well. Keeping data access simple is a selling point, not a weakness.
  • Low setup costs and pay-as-you-go expansion make it perfect for startups. The price is reasonable given the functionality and the hands off admin.
  • Setting up and maintaining a highly available clustered database that is constantly growing is extremely difficult. Building your application on a building block that does all this for you adds a lot of value.
  • Setting up a database inside EC2 is a pain. The makes getting basic database functionality trivial. No need to worry about scaling, capacity planning, or partitioning.
  • It has a decent query language, which is unusual for this type of data store.
  • Data are stored across multiple nodes which supports parallel query execution.
  • It's built on Erlang and that's cool.
  • You don't need to seek funding to hire a database team and buy hardware.


    Depending on how you weight each factor, SimpleDB could be way behind or way ahead of other options. What's interesting is to see what people think is important. For many people the only real database is relational and if it doesn't have transactions, joins, etc it's not real. Databases like beauty seem to be in the eye of the beholder.
  • Reader Comments (19)

    1) SLA: It is in beta afterall, and S3 didn't while it was in beta, and does now. I'm sure when it is out of beta it will have an SLA.

    2) The consistency may not be an issue for a lot of applications. There are many, many applications that run on a messaging system for example that don't need immediate readability. The site does say reads are generally within a second.

    3) A SOAP and REST API are common and standard APIs, which means any language can use it. You can bet within a few weeks the developer community will provide API wrappers for the various languages, much like they do for S3.

    4) It isn't supposed to be a relational database. Not every system needs a relational database, they just happen to be used by just about every system because they are so common. To claim it isn't a professional product is a bit premature. I could say the same thing about say, websphere because it is such a horrible implementation of Java EE even though it has been around for many, many years. Oh wait, I just did...

    5) The lexigraphical approach IMHO is the worst 'feature'. Surely it couldn't have been that hard to define a simple schema allowing at least numbers. yuck, That alone almost makes me not even want to take a look at it.

    6) I would be their dataset limitations would be lifted once it is out of beta. Look at EC for example. During beta they only had the little server available, now you can get a pretty good machine and have a few choices.

    7) Their site mentions an interesting scenario. Using SimpleDB for its query capability that returns keys for items in S3.

    December 31, 1999 | Unregistered CommenterRobert

    you miss the points completely. This is not supposed to be a relational database at all. Database is a core but doesn't mean you host it yourself.
    most of the other points are also off the track.

    December 31, 1999 | Unregistered CommenterAnonymous

    > This is not supposed to be a relational database at all.

    Correct. The list is problems various people have with SimpleDB, not my personal opinion. And to the extent people think a database must be relational to be a real database, they will find fault with SimpleDB.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    On the contrary the lack of joins is a blessing. Once you get to a point where your data is in the millions you start removing all join queries. I'm sure many would think of this as a "premature optimization" but if you're building to scale you might as well think of these matters from the get go. Think parallel execution of gets. That's definitely better.

    December 31, 1999 | Unregistered CommenterCyril David

    A year ago SimpleDB didn't have "DB" in the name, it was called something different. When I heard they were putting "DB" in the name I thought this was a mistake that would lead to lots of confusion. Mhh....

    Try this: call it "Super Hashtable" and then read the specs. Ahhh, finally one of these "distributed hashtables" with enough features to be really useful.

    December 31, 1999 | Unregistered CommenterThorsten

    There is a better approach to get the pros of SimpleDB with smaller price on the cons compared to existing Data Bases - i refer to that as http://natishalom.typepad.com/nati_shaloms_blog/2007/09/paas-persistenc.html">PaaS – Persistence as a Service (using Hibernate)
    The basic idea is that you front-end your data base with In-Memory Data-Grid (IMDG) which takes care for the synchronization with what ever data base you choose in the background. The application only interact with the IMDG.
    In this way you get the speed of memory and you can still keep your system fully in sync with existing data base so other application such reporting systems could still work with the data base as if nothing have changed. An option that you can think of is having the in-memory data grid as the data cloud running on EC2 while the data base can still leave inside your existing network. The data grid will keep the updates to the database in a reliable fashion. So in this way you don't even need to think of moving the data to EC2.

    Nati S.
    http://www.gigaspaces.com">GigaSpaces
    Write Once Scale Anywhere

    December 31, 1999 | Unregistered CommenterNati Shalom

    > – Persistence as a Service (using Hibernate)

    Who sets up, maintains, backs up, replicates, fails over, and scales the database Hibernate goes against?

    December 31, 1999 | Unregistered CommenterTodd Hoff

    Who sets up, maintains, backs up, replicates, fails over, and scales the database Hibernate goes against?

    The main idea with PaaS (Persistency as s Service) is the decoupling of the application from the underlying data base. By achieving that your relaxing many of the complex requirement from your existing data base. let me explain:
    You don't have to use a clustered data base since scaling and performance is achieved through the In-Memory-Data-Grid, in addition to that the volume of updates that hits the data base is reduced since you don't persist the in-flight transactions only the end result of the transactions. Since the updates to the data base are done asynchronously the updates happens in batches this enables to push more updates through the data base (without adding more instances) .
    What's even more interesting with this approach is that you don't even have to have a fully highly available data base since in a case of a failure the in-memory-data-grid act as a front end data store to your application and will queue all updates to the data base while it is down. It will automatically replay those updates once the data base brought back to live.

    So back to your question hibernate mapping also happens in the background.

    We were able to achieve very easily 30k ops/sec consistent updates to a standard single instance of MySQL data base - read scalability was not an issue since all reads happens in-memory.

    HTH
    Nati S.
    http://www.gigaspaces.com">GigaSpaces
    Write Once Scale Anywhere

    December 31, 1999 | Unregistered Commenternatis

    I think the simpleDB is designed around the requirements of amazon and not of any ERP like application. For amazon it is ok to miss data. The same thing goes with google too. How do you know they missed a page for your search criteria? Google and amazon have the luxury of not displaying 100% accurate data. From these database design it is very clear, how difficult it is to make a RDBMS scalable. But salesforce has done that. So shouldn't salesforce be doing this?

    December 31, 1999 | Unregistered CommenterBPMGuy

    Large Web and search engine companies like google, yahoo and as said amazon are fit to use the simpledb as they can easily get away from minor losses caused by this but not for others.
    -----
    http://underwaterseaplants.awardspace.com">Underwater sea plants
    http://underwaterseaplants.awardspace.com/seaweed.htm">Seaweed...http://underwaterseaplants.awardspace.com/seagrass.htm">Seagrass

    December 31, 1999 | Unregistered Commenterfarhaj

    I still think the database is only one half of the equation. They should make "database appliances" instead, basically high speed multi-core CPU with solid state disks and lots and lots of memory. There's only so much one can push a database, before they have to shift their focus to the hardware part.

    December 31, 1999 | Unregistered Commenterearth4energy

    There are way too many cons..

    December 31, 1999 | Unregistered CommenterEarth4Energy Scam

    No consistency? Sometimes for a sequence of queries, you really want to know your last one has executed.
    No API support? Hey, why not leave the join out too.

    Well, I guess they did call it "simple" ...

    December 31, 1999 | Unregistered Commenterjamorama

    i'm all for SimpleDB , having an alternative to ordinary databases along with the added manpower to get it working saves alotta costs

    December 31, 1999 | Unregistered CommenterCellphone

    not really a fan of having data under the control of additional parties.

    December 31, 1999 | Unregistered CommenterCell Phone

    SimpleDB is needs more features implemented to counter scripting.

    December 31, 1999 | Unregistered CommenterDigital Camera

    Simple Database is necessary.

    December 31, 1999 | Unregistered Commenterearth 4 energy

    The cons are just too much. What the hell is this?

    December 31, 1999 | Unregistered Commenterwotlk leveling guide

    I used SimpleDB before, but I wasn't happy with it's performance. :S

    December 31, 1999 | Unregistered Commenteredu backlinks

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>