Entries in Product (120)

Friday
Aug272010

OpenStack - The Answer to: How do We Compete with Amazon?

The Silicon Valley Cloud Computing Group had a meetup Wednesday on OpenStack, whose tag line is the open source, open standards cloud. I was shocked at the large turnout. 287 people registered and it looked like a large percentage of them actually showed up. I wonder, was it the gourmet pizza, the free t-shirts, or are people really that interested in OpenStack? And if they are really interested, why are they that interested? On the surface an open cloud doesn't seem all that sexy a topic, but with contributions from NASA, from Rackspace, and from a very avid user community, a lot of interest there seems to be. 

The brief intro blurb to OpenStack is:

Click to read more ...

Tuesday
Jul132010

DbShards Part Deux - The Internals

This is a follow up article by Cory Isaacson to the first article on DbShards, Product: dbShards - Share Nothing. Shard Everything, describing some of the details about how DbShards works on the inside.

The dbShards architecture is a true “shared nothing” implementation of Database Sharding. The high-level view of dbShards is shown here:

The above diagram shows how dbShards works for achieving massive database scalability across multiple database servers, using native DBMS engines and our dbShards components. The important components are:

Click to read more ...

Monday
Jun282010

VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process

What do you get when you take a SQL database and start a new implementation from scratch, taking advantage of the latest research and modern hardware? Mike Stonebraker, the sword wielding Johnny Appleseed of the database world, hopes you get something like his new database, VoltDB: a pure SQL, pure ACID, pure OLTP, shared nothing, sharded, scalable, lockless, open source, in-memory DBMS, purpose-built for running hundreds of thousands of transactions a second. VoltDB claims to be 100 times faster than MySQL, up to 13 times faster than Cassandra, and 45 times faster than Oracle, with near-linear scaling.

Will VoltDB kill off the new NoSQL upstarts? Will VoltDB cause a mass extinction of ancient databases? Probably no and no to both questions, but it's a product with a definite point-of-view and is worth a look as the transaction component in your system. But will it be right for you? Let's see...

Click to read more ...

Wednesday
Jun232010

Product: dbShards - Share Nothing. Shard Everything.

I met the CodeFutures folks, makers of dbShards, at Gluecon. They occupy an interesting niche in the database space, somewhere between NoSQL, which jettisons everything SQL, and high end analytics platforms that completely rewrite the backend while keeping a SQL facade.

High concept: I think of dbShards as a sort of commercial OLTP mashup of features from HSCALE (partitioning) + MySQL Proxy (transparent intermediate layer) + Memcached (client side sharding) + Gigaspaces (parallel query) + MySQL (transactions).

You may find dbShards interesting if you are looking to keep SQL, need scale out writes and reads, need out of the box parallel query capabilities, and would prefer to use a standard platform like MySQL as a base. To learn more about dbShards I asked Cory Isaacson (CEO and CTO) a few devastatingly difficult questions (not really).

Who are you, what is dbShards, and what problem was dbShards created to solve?

Click to read more ...

Thursday
Apr292010

Product: SciDB - A Science-Oriented DBMS at 100 Petabytes

Scientists are doing it for themselves. Doing what? Databases. The idea is that most databases are designed to meet the needs of businesses, not science, so scientists are banding together at scidb.org to create their own Domain Specific Database, for science. The goal is to be able to handle datasets in the 100PB range and larger.

SciDB, Inc. is building an open source database technology product designed specifically to satisfy the demands of data-intensive scientific problems. With the advice of the world's leading scientists across a variety of disciplines including astronomy, biology, physics, oceanography, atmospheric sciences, and climatology, our computer scientists are currently designing and prototyping this technology

The scientists that are participating in our open source project believe that the SciDB database — when completed — will dramatically impact their ability to conduct their experiments faster and more efficiently and further improve the quality of life on our planet by enabling them to run experiments that were previously impossible due to the limitations of existing database systems and infrastructure. Many of the world's leading computer scientists with expertise in database systems have contributed to the design and architecture of the system to meet the needs of the world's scientists.

SciDB looks like a cool project and follows what might be considered a trend, instead of beating a general tool into submission, build a specialized tool that does what you need it to do. More details about SciDB can be found in the paper A Demonstration of SciDB: A Science-Oriented DBMS. A nice succinct poster is available summarizing the product.

Some interesting bits from the paper:

Click to read more ...

Friday
Apr092010

Vagrant - Build and Deploy Virtualized Development Environments Using Ruby

One of the cool things we are seeing is more tools and tool chains for performing very high level operations quite simply. Vagrant is such a tool for building and distributing virtualized development environments.

Web developers use virtual environments every day with their web applications. From EC2 and Rackspace Cloud to specialized solutions such as EngineYard and Heroku, virtualization is the tool of choice for easy deployment and infrastructure management. Vagrant aims to take those very same principles and put them to work in the heart of the application lifecycle. By providing easy to configure, lightweight, reproducible, and portable virtual machines targeted at development environments, Vagrant helps maximize your productivity and flexibility.

If you've created a build and deployment system before Vagrant does a lot of the work for you:

Click to read more ...

Monday
Mar222010

7 Secrets to Successfully Scaling with Scalr (on Amazon) by Sebastian Stadil

This is a part interview part guest with Sebastian Stadil, founder of Scalr, a cheaper open-source version of RightScale. Scalr takes care of all the web site infrastructure bits to on Amazon (and other clouds) so you don’t have to.

I first met Sebastian at one of the original Silicon Valley Cloud Computing Group meetups, a group which he founded. The meetings started in the tiny offices of Intalio where Sebastian was working with this new fangled Amazon thing to create an auto-scaling server farm on EC2. I remember distinctly how Sebastian met everyone at the door with a handshake and a smile, making us all feel welcome. Later I took one of the classes he created on how to use AWS. I guess he figured all this cloud stuff was going somewhere and decided to start Scalr.

My only regret about this post is that the name Amazon does not begin with the letter ‘S’, that would have made for an epic title.

Click to read more ...

Tuesday
Jan262010

Product: HyperGraphDB - A Graph Database

With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB, in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in:Java,  Query Method: Java or P2P, Replication: P2P, Concurrency: STM, Misc: Open-Source, Especially for AI and Semantic Web.

So it has some interesting features, like software transactional memory and P2P  for data distribution, but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was:

A HyperGraphDB database is a generalized graph of entities. The generalization is two-fold:

  1. Links/edges "point to" an arbitrary number of elements instead of just two as in regular graphs 
  2. Links can be pointed to by other links as well.

OK, but I wish there was some explanation of why this is valuable. What can I do with it that I can't do with normal graphs? Given that there have been concerns over the complexity of the API this would seem a natural topic to cover. I assume it's cool, it sounds cool, but I would like to know why :-)

In any case it looks like an interesting product to take a look at. Database options are expanding fast.

Click to read more ...

Friday
Nov062009

Product: Resque - GitHub's Distrubuted Job Queue

Queuing work for processing in the background is a time tested scalability strategy. Queuing also happens to be one of those much needed tools where it easy enough to forge for your own that we see a lot of different versions made. Resque is GitHub's take on a job queue and they've used it to process million and millions of jobs so far.

What is Resque?

Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

GitHub tried and considered many other systems: SQS, Starling, ActiveMessaging, BackgroundJob, DelayedJob, beanstalkd, AMQP,  and Kestrel, but found them all wanting in one way are another. The latency for SQS was too high. Others didn't make full use of Ruby. Others still had a lot of overhead. Some didn't have enough features. And still others weren't reliable enough.

Click to read more ...

Thursday
Oct222009

Paper: The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM 

Stanford Info Lab is taking pains to document a direction we've been moving for a while now, using RAM not just as a cache, but as the primary storage medium. Many quality products have built on this model. Even if the vision isn't radical, the paper does produce a lot of data backing up the transition, which is in itself helpful. From the The Abstract:
Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale grace-fully to meet the needs of large-scale Web applications, and improvements in disk capacity have far out-stripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of data-intensive applications.

Related Articles