High Scalability -

Entries in nosql (54)

Tuesday

Mar232010

Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL

Tuesday, March 23, 2010 at 11:57AM

O'Reilly Radar's James Turner conducted a very informative interview with Joe Stump, current CTO of SimpleGeo and former lead architect at Digg, in which Joe makes some of his usually insightful comments on his experience using Cassandra vs MySQL. As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful:

Click to read more ...

HighScalability Team |

23 Comments |

Permalink |

Print Article

Email Article

Strategy,

nosql

Tuesday

Mar162010

1 Billion Reasons Why Adobe Chose HBase

Tuesday, March 16, 2010 at 11:46AM

Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2. Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss. The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency.

One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team:

HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argument that “HBase is not a good choice because it is complex” is irrelevant. The advantages far outweigh the problems. Relying on decoupled components plays nice with the Unix philosophy: do one thing and do it well. Distributed storage is delegated to HDFS, so is distributed processing, cluster state goes to Zookeeper. All these systems are developed and tested separately, and are good at what they do. More than that, this allows you to scale your cluster on separate vectors. This is not optimal, but it allows for incremental investment in either spindles, CPU or RAM. You don’t have to add them all at the same time.

Highly recommended, especially if you need some sort of balance to the recent gush of Cassandra articles.

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

Example,

nosql

Friday

Feb262010

MySQL and Memcached: End of an Era?

Friday, February 26, 2010 at 9:06AM

If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.

This was a pre-cloud, relational database dominated world, built from parts scrounged from the remnants of enterprises and datacenters past. Twitter and Digg started in this era, but are evolving into something different, as scaling pressures increase and new purpose built technologies pop into being.

With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era.

Click to read more ...

HighScalability Team |

17 Comments |

Permalink |

Print Article

Email Article

Memcached,

MySQL,

nosql

Thursday

Feb252010

Paper: High Performance Scalable Data Stores

Thursday, February 25, 2010 at 7:58AM

The world of scalable databases is not a simple one. They come in every race, creed, and color. Rick Cattell has brought some harmony to that world by publishing High Performance Scalable Data Stores, a nicely detailed one stop shop paper comparing scalable databases soley on the content of their character. Ironically, the first step in that evaluation is dividing the world into four groups:

Key-value stores: Redis, Scalaris, Voldmort, and Riak.
Document stores: Couch DB, MongoDB, and SimpleDB.
Record stores: BigTable, HBase, HyperTable, and Cassandra.
Scalable RDBMSs: MySQL Cluster, ScaleDB, Drizzle, and VoltDB.

The paper describes each system and then compares them on the dimensions of Concurrency Control, Data Storage Replication, Transaction Model, General Comments, Maturity, K-hits, License Language.

And the winner is: there are no winners. Yet. Rick concludes by pointing to a great convergence:

I believe that a few of these systems will gain critical mass and key players, and will pull away from the others by next year. At that point, open source contributors will likely migrate to those players.

From the paper:

Click to read more ...

HighScalability Team |

6 Comments |

Permalink |

Print Article

Email Article

BigData,

nosql,

papers

Wednesday

Feb242010

Hot Scalability Links for February 24, 2010

Wednesday, February 24, 2010 at 8:57AM

Cassandra @ Twitter: An Interview with Ryan King. Great interview by Alex Popescu on Twitter's thought process for switching to Cassandra. Twitter chose Cassandra because it had more big system features out of the box. Is that Cassandra FTW?

I Had Downtime Today. Here’s What I’m Doing About It by Patrick McKenzie. Awesome deep dive into went wrong with Bingo Card Creator. Sh*t happens. How do you design a process to help prevent it from happening and how do you deal with problems with integrity when they do?

High Availability Principle : Request Queueing by Ashish Soni. Queue request to ride out traffic spikes: 1) Request Queuing allows your system to operate at optimal throughput. 2) Your users only experience linear degradation versus exponential degradation. 3) Your system experiences NO degradation.

pfffft twatter tweeter by Knowbuddy. The reason you should care [about NoSQL] is because now you have more options--you're not stuck trying to wedge your system into a relational model if you don't want to. And isn't /. all about freedom of choice?

Wordpress, Varnish and Edge Side Includes. Using Varnish to go from .63 requests per second to 537.44 requests per second.

Facebook’s Petabyte Scale Data Warehouse using Hive and Hadoop by Ashish Thusoo and Namit Jain. How does Facebook deal with 12 TB of compressed new data everyday? They get a bad case of the Hives.

Click to read more ...

HighScalability Team |

Seven Signs You May Need a NoSQL Database

Tuesday, February 16, 2010 at 7:40AM

While exploring deep into some dusty old library stacks, I dug up Nostradamus' long lost NoSQL codex. What are the chances? Strangely, it also gave the plot to the next Dan Brown novel, but I left that out for reasons of sanity. About NoSQL, here is what Nosty (his friends call him Nosty) predicted are the signs you may need a NoSQL database...

Click to read more ...

HighScalability Team |

12 Comments |

Permalink |

Print Article

Email Article

nosql

Wednesday

Feb032010

NoSQL Means Never Having to Store Blobs Again

Wednesday, February 3, 2010 at 8:19AM

Morgan Tocker has an awesome article and comment thread in the MySQL Performance Blog about When should you store serialized objects in the database? Before the NoSQL age is was very common to simulate schemalessness by storing blobs in MySQL. Sharding was implemented by running multiple MySQL instances and spreading writes across them. While not ideal for the purpose, developers felt comfortable with MySQL. They knew how to install it, back it up, replicate it, in short: they knew how to make it work. Yet they also needed to store objects without the penalty of joins. Searches and aggregate queries were handled by indexes kept in separate tables, this offloaded the fast path to objects.

This all made perfect sense. Usually we just want stuff to work and going with what you know is often the best path to that goal. And what we have known is MySQL. All the different pros and cons of this approach are covered wonderfully in the post.

But the world has changed.

Click to read more ...

HighScalability Team |

Brian Aker's Hilarious NoSQL Stand Up Routine

Wednesday, November 25, 2009 at 2:02PM

Brian Aker gave this 10 minute lightning talk on NoSQL at the Nov 2009 OpenSQLCamp in Portland, Oregon. It's incredibly funny, probably because there's a lot of truth to what he's saying.

Here are the slides and here are the notes. Found though #nosql.

HighScalability Team |

10 Comments |

Permalink |

Print Article

Email Article

funny,

nosql

Monday

Nov092009

10 NoSQL Systems Reviewed

Monday, November 9, 2009 at 8:04AM

Jonathan Ellis reviews in the NoSQL Ecosystem the origin of the NoSQL movement and 10 different NoSQL products and how their 1) support for multiple datacenters, 2) the ability to add new machines to a live cluster transparently to the your applications, 3) Data Model, 4) Query API, 5) Persistence Design. The 10 systems reviewed are: Cassandra, CouchDB, HBase, MongoDB, Neo4J, Redis, Riak, Scalaris, Tokyo Cabinet, Voldemort.

A very thorough and thoughtful article on the entire NoSQL space. It's clear from the article that NoSQL is not monolithic, there is a very wide variety of approaches to not being a relational database.

Click to read more ...

General Chicken |

3 Comments |

Permalink |

Print Article

Email Article

nosql

Thursday

Nov052009

A Yes for a NoSQL Taxonomy

Thursday, November 5, 2009 at 7:50AM

NorthScale's Steven Yen in his highly entertaining NoSQL is a Horseless Carriage presentation has come up with a NoSQL taxonomy that thankfully focuses a little more on what NoSQL is, than what it isn't:

key‐value‐cache
- memcached, repcached, coherence, infinispan, eXtreme scale, jboss cache, velocity, terracoqa
key‐value‐store
- keyspace, flare, schema‐free, RAMCloud
eventually‐consistent key‐value‐store
- dynamo, voldemort, Dynomite, SubRecord, Mo8onDb, Dovetaildb
ordered‐key‐value‐store
- tokyo tyrant, lightcloud, NMDB, luxio, memcachedb, actord
data‐structures server
- redis
tuple‐store
- gigaspaces, coord, apache river
object database
- ZopeDB, db4o, Shoal
document store
- CouchDB, Mongo, Jackrabbit, XML Databases, ThruDB, CloudKit, Perservere, Riak Basho, Scalaris
wide columnar store
- BigTable, Hbase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

"Who will win?" Steven asks. He answers: the most approachable API with enough power will win. Steven touts the contender with the most devastating knock out punch will be document stores because "everyone groks documents." Though the thought is there will be just a few winners and products will converge in functionality.

Steven is banking on the "worse is better" model of dominance, which is hard to argue with as it has been so successful an adoption pattern in our field. The convergence idea is something I also agree with. What we have now are a lot features masquerading as products. Over time they will merge together to become more full featured offerings.

The key question though is what is enough power to win? Just getting a value back for a key won't be enough. Who are you putting your money on?

Click to read more ...

HighScalability Team |

13 Comments |

Permalink |

Print Article

Email Article

key-value store,

nosql,

papers