Entries in nosql (54)

Tuesday
Mar232010

Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL

O'Reilly Radar's James Turner conducted a very informative interview with Joe Stump, current CTO of SimpleGeo and former lead architect at Digg, in which Joe makes some of his usually insightful comments on his experience using Cassandra vs MySQL. As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful:

Click to read more ...

Tuesday
Mar162010

1 Billion Reasons Why Adobe Chose HBase 

Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2. Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss. The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. 

One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team:

HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argument that “HBase is not a good choice because it is complex” is irrelevant. The advantages far outweigh the problems. Relying on decoupled components plays nice with the Unix philosophy: do one thing and do it well. Distributed storage is delegated to HDFS, so is distributed processing, cluster state goes to Zookeeper. All these systems are developed and tested separately, and are good at what they do. More than that, this allows you to scale your cluster on separate vectors. This is not optimal, but it allows for incremental investment in either spindles, CPU or RAM. You don’t have to add them all at the same time.

Highly recommended, especially if you need some sort of balance to the recent gush of Cassandra articles. 

Friday
Feb262010

MySQL and Memcached: End of an Era?

If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.

This was a pre-cloud, relational database dominated world, built from parts scrounged from the remnants of enterprises and datacenters past. Twitter and Digg started in this era, but are evolving into something different, as scaling pressures increase and new purpose built technologies pop into being.

With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era.

Click to read more ...

Thursday
Feb252010

Paper: High Performance Scalable Data Stores 

The world of scalable databases is not a simple one. They come in every race, creed, and color. Rick Cattell has brought some harmony to that world by publishing High Performance Scalable Data Stores, a nicely detailed one stop shop paper comparing scalable databases soley on the content of their character. Ironically, the first step in that evaluation is dividing the world into four groups:

  • Key-value stores: Redis, Scalaris, Voldmort, and Riak.
  • Document stores: Couch DB, MongoDB, and SimpleDB.
  • Record stores: BigTable, HBase, HyperTable, and Cassandra.
  • Scalable RDBMSs: MySQL Cluster, ScaleDB, Drizzle, and VoltDB.

The paper describes each system and then compares them on the dimensions of Concurrency Control, Data Storage Replication, Transaction Model, General Comments, Maturity, K-hits, License Language.

And the winner is: there are no winners. Yet. Rick concludes by pointing to a great convergence:

I believe that a few of these systems will gain critical mass and key players, and will pull away from the others by next year.  At that point, open source contributors will likely migrate to those players.

From the paper:

 

Click to read more ...

Wednesday
Feb242010

Hot Scalability Links for February 24, 2010

  • Cassandra @ Twitter: An Interview with Ryan King. Great interview by Alex Popescu on Twitter's thought process for switching to Cassandra. Twitter chose Cassandra because it had more big system features out of the box. Is that Cassandra FTW?
  • I Had Downtime Today. Here’s What I’m Doing About It by Patrick McKenzie. Awesome deep dive into went wrong with Bingo Card Creator. Sh*t happens. How do you design a process to help prevent it from happening and how do you deal with problems with integrity when they do?
  • High Availability Principle : Request Queueing by Ashish Soni. Queue request to ride out traffic spikes: 1) Request Queuing allows your system to operate at optimal throughput. 2) Your users only experience linear degradation versus exponential degradation. 3) Your system experiences NO degradation.
  • pfffft twatter tweeter by Knowbuddy. The reason you should care [about NoSQL] is because now you have more options--you're not stuck trying to wedge your system into a relational model if you don't want to. And isn't /. all about freedom of choice?
  • Wordpress, Varnish and Edge Side Includes. Using Varnish to go from .63 requests per second to 537.44 requests per second.
  • Facebook’s Petabyte Scale Data Warehouse using Hive and Hadoop by Ashish Thusoo and Namit Jain. How does Facebook deal with 12 TB of compressed new data everyday? They get a bad case of the Hives.
  • Click to read more ...

    Tuesday
    Feb162010

    Seven Signs You May Need a NoSQL Database

    While exploring deep into some dusty old library stacks, I dug up Nostradamus' long lost NoSQL codex. What are the chances? Strangely, it also gave the plot to the next Dan Brown novel, but I left that out for reasons of sanity. About NoSQL, here is what Nosty (his friends call him Nosty) predicted are the signs you may need a NoSQL database...

    Click to read more ...

    Wednesday
    Feb032010

    NoSQL Means Never Having to Store Blobs Again

    Morgan Tocker has an awesome article and comment thread in the MySQL Performance Blog about When should you store serialized objects in the database? Before the NoSQL age is was very common to simulate schemalessness by storing blobs in MySQL. Sharding was implemented by running multiple MySQL instances and spreading writes across them. While not ideal for the purpose, developers felt comfortable with MySQL. They knew how to install it, back it up, replicate it, in short:  they knew how to make it work. Yet they also needed to store objects without the penalty of joins. Searches and aggregate queries were handled by indexes kept in separate tables, this offloaded the fast path to objects.

    This all made perfect sense. Usually we just want stuff to work and going with what you know is often the best path to that goal. And what we have known is MySQL. All the different pros and cons of this approach are covered wonderfully in the post.

    But the world has changed.

    Click to read more ...

    Wednesday
    Nov252009

    Brian Aker's Hilarious NoSQL Stand Up Routine

    Brian Aker gave this 10 minute lightning talk on NoSQL at the Nov 2009 OpenSQLCamp in Portland, Oregon. It's incredibly funny, probably because there's a lot of truth to what he's saying.

    Here are the slides and here are the notes. Found though #nosql.

    Monday
    Nov092009

    10 NoSQL Systems Reviewed

    Jonathan Ellis reviews in the NoSQL Ecosystem the origin of the NoSQL movement and 10 different NoSQL products and how their 1) support for multiple datacenters,  2) the ability to add new machines to a live cluster transparently to the your applications, 3) Data Model, 4) Query API, 5) Persistence Design. The 10 systems reviewed are: Cassandra, CouchDB, HBase, MongoDB, Neo4J, Redis, Riak, Scalaris, Tokyo Cabinet, Voldemort.

    A very thorough and thoughtful article on the entire NoSQL space. It's clear from the article that NoSQL is not monolithic, there is a very wide variety of approaches to not being a relational database.

    Click to read more ...

    Thursday
    Nov052009

    A Yes for a NoSQL Taxonomy

    NorthScale's Steven Yen in his highly entertaining NoSQL is a Horseless Carriage presentation has come up with a NoSQL taxonomy that thankfully focuses a little more on what NoSQL is, than what it isn't:

    • key‐value‐cache
      • memcached, repcached, coherence, infinispan, eXtreme scale, jboss cache, velocity, terracoqa
    •  key‐value‐store
      • keyspace, flare, schema‐free, RAMCloud
    • eventually‐consistent key‐value‐store
      • dynamo, voldemort, Dynomite, SubRecord, Mo8onDb, Dovetaildb
    • ordered‐key‐value‐store
      • tokyo tyrant, lightcloud, NMDB, luxio, memcachedb, actord
    • data‐structures server
      •  redis
    • tuple‐store
      • gigaspaces, coord, apache river
    • object database
      • ZopeDB, db4o, Shoal
    • document store
      •  CouchDB, Mongo, Jackrabbit, XML Databases, ThruDB, CloudKit, Perservere, Riak Basho, Scalaris
    • wide columnar store
      • BigTable, Hbase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

    "Who will win?" Steven asks. He answers:  the most approachable API with enough power will win. Steven touts the contender with the most devastating knock out punch will be document stores because "everyone groks documents." Though the thought is there will be just a few winners and products will converge in functionality.

    Steven is banking on the "worse is better" model of dominance, which is hard to argue with as it has been so successful an adoption pattern in our field. The convergence idea is something I also agree with. What we have now are a lot features masquerading as products. Over time they will merge together to become more full featured offerings.

    The key question though is what is enough power to win? Just getting a value back for a key won't be enough. Who are you putting your money on?

    Click to read more ...