High Scalability -

1 Comment |

Permalink |

cloud

Tuesday

Jul132010

DbShards Part Deux - The Internals

Tuesday, July 13, 2010 at 7:45AM

This is a follow up article by Cory Isaacson to the first article on DbShards, Product: dbShards - Share Nothing. Shard Everything, describing some of the details about how DbShards works on the inside.

The dbShards architecture is a true “shared nothing” implementation of Database Sharding. The high-level view of dbShards is shown here:

The above diagram shows how dbShards works for achieving massive database scalability across multiple database servers, using native DBMS engines and our dbShards components. The important components are:

7 Comments |

Permalink |

Product

Monday

Jun282010

VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process

Monday, June 28, 2010 at 7:36AM

What do you get when you take a SQL database and start a new implementation from scratch, taking advantage of the latest research and modern hardware? Mike Stonebraker, the sword wielding Johnny Appleseed of the database world, hopes you get something like his new database, VoltDB: a pure SQL, pure ACID, pure OLTP, shared nothing, sharded, scalable, lockless, open source, in-memory DBMS, purpose-built for running hundreds of thousands of transactions a second. VoltDB claims to be 100 times faster than MySQL, up to 13 times faster than Cassandra, and 45 times faster than Oracle, with near-linear scaling.

Will VoltDB kill off the new NoSQL upstarts? Will VoltDB cause a mass extinction of ancient databases? Probably no and no to both questions, but it's a product with a definite point-of-view and is worth a look as the transaction component in your system. But will it be right for you? Let's see...

21 Comments |

Permalink |

databases,

nosql

Wednesday

Jun232010

Product: dbShards - Share Nothing. Shard Everything.

Wednesday, June 23, 2010 at 7:37AM

I met the CodeFutures folks, makers of dbShards, at Gluecon. They occupy an interesting niche in the database space, somewhere between NoSQL, which jettisons everything SQL, and high end analytics platforms that completely rewrite the backend while keeping a SQL facade.

High concept: I think of dbShards as a sort of commercial OLTP mashup of features from HSCALE (partitioning) + MySQL Proxy (transparent intermediate layer) + Memcached (client side sharding) + Gigaspaces (parallel query) + MySQL (transactions).

You may find dbShards interesting if you are looking to keep SQL, need scale out writes and reads, need out of the box parallel query capabilities, and would prefer to use a standard platform like MySQL as a base. To learn more about dbShards I asked Cory Isaacson (CEO and CTO) a few devastatingly difficult questions (not really).

Who are you, what is dbShards, and what problem was dbShards created to solve?

1 Comment |

Permalink |

MySQL,

Shard,

databases

Thursday

Apr292010

Product: SciDB - A Science-Oriented DBMS at 100 Petabytes

Thursday, April 29, 2010 at 6:42AM

Scientists are doing it for themselves. Doing what? Databases. The idea is that most databases are designed to meet the needs of businesses, not science, so scientists are banding together at scidb.org to create their own Domain Specific Database, for science. The goal is to be able to handle datasets in the 100PB range and larger.

SciDB, Inc. is building an open source database technology product designed specifically to satisfy the demands of data-intensive scientific problems. With the advice of the world's leading scientists across a variety of disciplines including astronomy, biology, physics, oceanography, atmospheric sciences, and climatology, our computer scientists are currently designing and prototyping this technology

The scientists that are participating in our open source project believe that the SciDB database — when completed — will dramatically impact their ability to conduct their experiments faster and more efficiently and further improve the quality of life on our planet by enabling them to run experiments that were previously impossible due to the limitations of existing database systems and infrastructure. Many of the world's leading computer scientists with expertise in database systems have contributed to the design and architecture of the system to meet the needs of the world's scientists.

SciDB looks like a cool project and follows what might be considered a trend, instead of beating a general tool into submission, build a specialized tool that does what you need it to do. More details about SciDB can be found in the paper A Demonstration of SciDB: A Science-Oriented DBMS. A nice succinct poster is available summarizing the product.

Some interesting bits from the paper:

4 Comments |

Permalink |

nosql

Friday

Apr092010

Vagrant - Build and Deploy Virtualized Development Environments Using Ruby

Friday, April 9, 2010 at 8:38AM

One of the cool things we are seeing is more tools and tool chains for performing very high level operations quite simply. Vagrant is such a tool for building and distributing virtualized development environments.

Web developers use virtual environments every day with their web applications. From EC2 and Rackspace Cloud to specialized solutions such as EngineYard and Heroku, virtualization is the tool of choice for easy deployment and infrastructure management. Vagrant aims to take those very same principles and put them to work in the heart of the application lifecycle. By providing easy to configure, lightweight, reproducible, and portable virtual machines targeted at development environments, Vagrant helps maximize your productivity and flexibility.

If you've created a build and deployment system before Vagrant does a lot of the work for you:

4 Comments |

Permalink |

automation

Monday

Mar222010

7 Secrets to Successfully Scaling with Scalr (on Amazon) by Sebastian Stadil

Monday, March 22, 2010 at 2:42AM

This is a part interview part guest with Sebastian Stadil, founder of Scalr, a cheaper open-source version of RightScale. Scalr takes care of all the web site infrastructure bits to on Amazon (and other clouds) so you don’t have to.

I first met Sebastian at one of the original Silicon Valley Cloud Computing Group meetups, a group which he founded. The meetings started in the tiny offices of Intalio where Sebastian was working with this new fangled Amazon thing to create an auto-scaling server farm on EC2. I remember distinctly how Sebastian met everyone at the door with a handshake and a smile, making us all feel welcome. Later I took one of the classes he created on how to use AWS. I guess he figured all this cloud stuff was going somewhere and decided to start Scalr.

My only regret about this post is that the name Amazon does not begin with the letter ‘S’, that would have made for an epic title.

4 Comments |

Permalink |

amazon,

cloud

Tuesday

Jan262010

Product: HyperGraphDB - A Graph Database

Tuesday, January 26, 2010 at 11:30AM

With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB, in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in:Java, Query Method: Java or P2P, Replication: P2P, Concurrency: STM, Misc: Open-Source, Especially for AI and Semantic Web.

So it has some interesting features, like software transactional memory and P2P for data distribution, but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was:

A HyperGraphDB database is a generalized graph of entities. The generalization is two-fold:

Links/edges "point to" an arbitrary number of elements instead of just two as in regular graphs

Links can be pointed to by other links as well.

OK, but I wish there was some explanation of why this is valuable. What can I do with it that I can't do with normal graphs? Given that there have been concerns over the complexity of the API this would seem a natural topic to cover. I assume it's cool, it sounds cool, but I would like to know why :-)

In any case it looks like an interesting product to take a look at. Database options are expanding fast.

10 Comments |

Permalink |

graph

Friday

Nov062009

Product: Resque - GitHub's Distrubuted Job Queue

Friday, November 6, 2009 at 7:43AM

Queuing work for processing in the background is a time tested scalability strategy. Queuing also happens to be one of those much needed tools where it easy enough to forge for your own that we see a lot of different versions made. Resque is GitHub's take on a job queue and they've used it to process million and millions of jobs so far.

What is Resque?

Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

GitHub tried and considered many other systems: SQS, Starling, ActiveMessaging, BackgroundJob, DelayedJob, beanstalkd, AMQP, and Kestrel, but found them all wanting in one way are another. The latency for SQS was too high. Others didn't make full use of Ruby. Others still had a lot of overhead. Some didn't have enough features. And still others weren't reliable enough.