High Scalability -

13 Comments |

Permalink |

operations

Wednesday

Oct012008

Joyent - Cloud Computing Built on Accelerators

Wednesday, October 1, 2008 at 4:33PM

Kent Langley was kind enough to create a profile template for Joyent, Kent's new employer. Joyent is an infrastructure and development company that has put together a multi-site, multi-million dollar hosting setup for their own applications and for the use of others. Joyent competes with the likes of Amazon and GoGrid in the multi-player cloud computing game and hosts Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App. The template was originally created with web services in mind, not cloud providers, but I think it still works in an odd sort of way. Remember, anyone can fill out a profile template for their system and share their wonderfulness with the world.

Getting to Know You

What is the name of your system and where can we find out more about it? Joyent Accelerator Cloud Computing IaaS My name is Kent Langley, Sr. Director, Joyent, Inc. (www.productionscale.com) The Joyent website is located at www.joyent.com The scope of this exercise is the Joyent Accelerator product. http://www.joyent.com/accelerator/

What is your system is for? It is essentially a system that provides infrastructure primitives as a service (IaaS) for building cloud computing applications, migrating enterprise data center operations to secure private clouds, or just hosting your blog. There is a page on the site called what scales on Joyent: http://www.joyent.com/accelerator/what-scales-on-joyent/ Java, PHP, Ruby, Erlang, Perl, Python all work beautifully on Joyent. There is no lock-in. Ever. We try to run an open cloud. It's also a "loving cloud" if you ask our CTO. We have some of the largest Rails applications in the world, very high volume ejabberd XMPP infrastructure, exceptionally large Drupal installations, commerce sites in private clouds, .NET with Mono, TomCat, Resin, Glassfish, and much more all running on Accelerators. Joyent Accelerators are the perfect building blocks for almost any PaaS (Platform as a Service) play as well. Of particular note, Java runs exceptionally well on Accelerators because Accelerators are 64bit and you can also do 64 bit Java and have a JVM that could address as much as 32 GiB of RAM! This gives excellent vertical scalability for any running JVM. more below the fold

Why did you decide to build this system? There is demand for a high-end but reasonably priced elastic computing infrastructure.

How is your project financed? Self-Funded at this time.

What is your revenue model? We sell Joyent Accelerators, do Scale Consulting, and some related Services. We also have a growing Parter Channel.

How do you market your product? WebSite, Word of Mouth, Blogs, Email Lists, Twitter, Event Sponsorships, Open Source participation, forums, friendfeed, and more...

How long have you been working on it? I, Kent Langley, have been working with Joyent for about 2.5 years as a client. I've been with the company as an employee for about 2 months. Joyent has existed formally for about 4 years.

How big is your system? Try to give a feel for how much work your system does. We have hundreds and hundreds of servers representing significant compute power across 1000's of cores in multiple locations.

Number of monthly page views? Billions and Billions (multiple billion+ page view per month clients)

What is your in/out bandwidth usage? That's a secret.

How many documents, do you serve? How many images? How much data? Billions per month.

How fast are you growing? Fast enough to give me grey hairs.

What is your ratio of free to paying users? Very low. Most of our users have paid accounts. We do have some free offerings to help people get started. But, the demand for those services has been high so the lines are a little long.

What is your user churn? About Average for the industry we think.

How many accounts have been active in the past month? Thousands.

How is your system architected?

What is the architecture of your system? Talk about how your system works in as much detail as you feel comfortable with. Our technology stack is predicated on something we call a Pod. We have several pods and plans to add more. From the top to bottom you'd find. BigIP F5 Application Switches Force 10 (1GB and 10GB switching) Custom Dell Hardware with some secret sauce Tier1 Hosting Providers Essentially a custom Solaris Nevada based OS Core w/ a Pkgsrc install system

What particular design/architecture/implementation challenges does your system have? Automation. Automation. Automation. Self-Service.

What did you do to meet these challenges? We have an amazing team of Systems Developers that work very hard to improve our ability to grow and manage systems each day. We have some great updates on our Roadmap coming up that should be very exciting for existing and potential customers.

How does your system evolve to meet new scaling challenges? Our system is by it's nature evolutionary. As technology grows and changes, we grow with it. A recent example is when a client needed a private cloud computing environment to achieve PCI compliance in a cloud environment. So, we worked with the client to create this. While it is in production for two clients already we consider this a beta product. But you should expect to see it as a formal offering soon. This is an example of a way we have evolved our systems to respond to the changing cloud computing market place.

Do you use any particularly cool technologies are algorithms? ZFS, BigIP, DTrace, OpenSolaris Nevada, and an in-house custom provisioning system we call MCP (hat's off to Tron)

What do you do that is unique and different that people could best learn from? We know our approach to things is a little different. But, we think that helps us inhabit a space that is different enough from other vendors in the Cloud Computing space that we offer a significant value proposition to a large cross-section of the IT industry. From the lone developer with a great idea that comes in and picks up a $199 per year 1/4 GiB Acclerator to a deployment that has literally 100's of Acclerators running the largest Rails applications on the planet. We are able to take good care of them both.

What lessons have you learned? If at first you don't succeed. Try, try again. Get over your mistakes and move on.

Why have you succeeded? We care. Our clients care. That's a nice fit.

How are you thinking of changing your architecture in the future? MORE secret sauce... But seriously, we have some great additions coming up soon. I'll be in touch.

How is your team setup?

How many people are in your team? Joyent has a small employee to client ratio. But, that's because we do what we do well. We are divided into several of the normal divisions you might expect like client support, marketing, sales, development, operations, and the business units.

Where are they located? Our corporate office is in Sausalito, CA. We have a development team in Seattle. We have a support organization that follows the sun and spans the globe. IM is a big deal at Joyent.

Is there anything that you would do different or that you have found surprising? I'd say managing expectations is the most challenging thing. I think that's where we stand to improve the most and where most of the surprises come from.

What infrastructure do you use?

Which languages do you use to develop your system? Ruby

How many servers do you have? Not as many as Google!

How is functionality allocated to the servers?

How are the servers provisioned? We have a custom cloud provisioning system called MCP.

What operating systems do you use? Customized Sun Solaris Nevada

Which web server do you use? Apache and Nginx are the work horses in Joyent Accelerators

Which database do you use? MySQL and PostGRES are included w/ every Accelerator. Oracle works of you bring your own licenses. CouchDB works. We are certifying more all the time.

Do you use a reverse proxy? Well, our clients often use Nginx and now that there is a viable port of Varnish to OpenSolaris we are seeing more of that. Some of our clients use Squid as well. Most popular reverse proxy software will run find on our setup.

Do you collocate, use a grid service, use a hosting service, etc? We are that.

What is your storage strategy? DAS/SAN/NAS/SCSI/SATA/etc/other? We provide NFS to our clients for $0.15/GiB. 1 GiB = 1024 MB.

How much capacity do you have? Many Terabytes

How do you grow capacity? Add hardware

Do you use a storage service? We are a storage service.

Do you use storage virtualization? Not really. It's been and continues to be tested. But, you can't beat the real thing still in many cases.

How do you handle session management? Our clients do this depending on their development platform of choice at the application layer. Also, we can of course use our BigIP load balancing infrastructure to help out with that also.

How is your database architected? Master/slave? Shard? Other? All of the above, client by client. We know that Master-Master MySQL, Master-Slave MySQL, Oracle Clusters, MySQL Clusters, PostGRES, etc. They all work fine.

How do you handle load balancing? We have F5 BigIP's and we do what we call a managed load balancing service. For example, if you have two application servers, you need to load balance. Just ask us to set you up a VIP and we'll add the nodes you specify for a cost per node. All the pricing information is here. http://www.joyent.com/accelerator/pricing/

Which web framework/AJAX Library do you use? We have clients that use just about everything you can think of.

Which real-time messaging frame works do you use? We have very large clients running ejabberd. Erlang works great on our systems.

Which distributed job management system do you use? This is client by client. We do not offer this out of the box.

How do you handle ad serving? This is up to the client. We've seen just about all of them.

What is your object and content caching strategy? We usually recommend memcached, it's pre-installed and ready to turn on.

What is your client side caching strategy? I'd say most of our clients use cookies.

How do you handle customer support?

We have a customer support team that is dedicated to helping our customers. Our services pretty much assume that you will have some degree of ability with building and deploying systems. However, if you don't, we have standard, extended plan, and partners that can all be combined in various ways to help our clients. Our support follows the sun around the world.

How is your data center setup?

How many data centers do you run in? Several. :) Currently only domestic on both coasts and elsewhere.

How is your system deployed in data centers? In-House Automated provisioning systems

Are your data centers active/active, active/passive? Everything is always on. Our clients often co-locate in multiple locations so that they can have solid DR scenarios to keep investors happen and recover quickly should a truck hit a telephone pole or something.

How do you handle syncing between data centers and fail over and load balancing? This is a complex topic and can be very simple of very complex. It's a bit out of scope for this document.

Which DNS service do you use? We run our own based on PowerDNS

Which switches do you use? Force10

Which email system do you use? Mostly Postfix

How do you handle spam? Filter at a variety of levels

How do you backup and restore your system? High level snap shots, clients are responsible for their own data primarily. However, we have ways to help them.

How are software and hardware upgrades rolled out? We do quarterly releases of key software and our Accelerators. Sometimes we get a little behind but try to roll with it. You get root on your Accelerator so you are not dependent on the Joyent release cycle at all.

How do you handle major changes in database schemas on upgrades? This is up to the clients and highly platform and applications specific.

What is your fault tolerance and business continuity plan? Lots of redundancy.

Do you have a separate operations team managing your website? No. We do it ourselves.

Do you use a content delivery network? If so, which one and what for? Yes. We are currently partnered with Limelight.

How much do you pay monthly for your setup? Accelerator plans range from $199 per year to $4000 per month. Significant discounts can be had if you pay ahead. But, it's very important to note that we do not require or even want contracts. Some companies try to force us into contracts and if you just MUST lock yourself in for years, we'll tie you down. But, we don't recommend it at all. In essence pay for what you need when you need it on a month to month granularity. http://www.joyent.com/accelerator/pricing/

SUMMARY

The Joyent Accelerator is an extremely flexible tool for building and deploying all manner of infrastructure. If you have questions, please just contact us at sales@joyent.com. Email or at an address is the best way to reach us usually.

Scaling in the Cloud with Joyent's Jason Hoffman (podcast)

Amazon Web Services or Joyent Accelerators: Reprise by Jason Hoffman

Product: Happy = Hadoop + Python

Sunday, September 28, 2008 at 2:17AM

Has a Java only Hadoop been getting you down? Now you can be Happy. Happy is a framework for writing map-reduce programs for Hadoop using Jython. It files off the sharp edges on Hadoop and makes writing map-reduce programs a breeze. There's really no history yet on Happy, but I'm delighted at the idea of being able to map-reduce in other languages. The more ways the better. From the website:

Happy is a framework that allows Hadoop jobs to be written and run in Python 2.2 using Jython. It is an 
easy way to write map-reduce programs for Hadoop, and includes some new useful features as well. 
The current release supports Hadoop 0.17.2.

Map-reduce jobs in Happy are defined by sub-classing happy.HappyJob and implementing a 
map(records, task) and reduce(key, values, task) function. Then you create an instance of the 
class, set the job parameters (such as inputs and outputs) and call run().

When you call run(), Happy serializes your job instance and copies it and all accompanying 
libraries out to the Hadoop cluster. Then for each task in the Hadoop job, your job instance is 
de-serialized and map or reduce is called.

The task results are written out using a collector, but aggregate statistics and other roll-up 
information can be stored in the happy.results dictionary, which is returned from the run() call.

Jython modules and Java jar files that are being called by your code can be specified using 
the environment variable HAPPY_PATH. These are added to the Python path at startup, and 
are also automatically included when jobs are sent to Hadoop. The path is stored in happy.path 
and can be edited at runtime.

Ten Useful Gridgain How-To Tips

GridGain: One Compute Grid, Many Data Grids

Thursday, September 25, 2008 at 2:02AM

GridGain was kind enough to present at the September 17th instance of the Silicon Valley Cloud Computing Group. I've been curious about GridGain so I was glad to see them there. In short GridGain is: an open source computational grid framework that enables Java developers to improve general performance of processing intensive applications by splitting and parallelizing the workload. GridGain can also be thought of as a set of middleware primitives for building applications. GridGain's peer group of competitors includes GigaSpaces, Terracotta, Coherence, and Hadoop. The speaker for GridGain was the President and Founder, Nikita Ivanov. He has a very pleasant down-to-earth way about him that contrasts nicely with a field given to religious discussions of complex taxomic definitions. Nikita first talked about cloud computing in general. He feels Java is the perfect gateway for cloud computing. Which is good because GridGain only works with Java. The Java centricity of GridGain may be an immediate deal killer or a non-issue for a Java shop. Being so close to the language does offer a lot of power, but it sure sucks in a multi-language environment. Nikita gave a few definitions which are key to understanding where GridGain stands in the grid matrix:

Compute Grids: parallel execution.

Data Grids: parallel data storage.

Grid Computing: Compute Grids + Data Grids

Cloud Computing: datacenter + API. The key is automation via programmability as a way to deploy applications. The advantage is a unified programming model. Build an application on one node and you can run on many nodes without code change. Moving peak loads to the cloud can give you a 10x-100x cost reduction. Cloud computing poses a number of challenges: deployment, data sharing, load balancing, failover, discovery (nodes, availability), provisioning (add, remove), management, monitoring, development process, debugging, inter and external clouds (syncing data, syncing code, failover jobs). Nakita talked some about these issues, but he didn't go in-depth. But he showed a good understanding of the issues involved so I would be inclined to think GridGain handles them well. The cloud computing section is new to the standard GridGain presentation. GridGain is moving their grid into the cloud with new features like a cloud management layer available in Q1 2009. This move competes with GigaSpaces early move to the cloud with their RightScale partnership. It's a good move. Like peanut butter and chocolate, grids and clouds go better together. Grids have been under utilized largely because of infrastructure issues. A cloud platform makes it is to affordably grow and manage grids, so we might see an uptick in grid adoption as clouds and grids hookup. GridGain positions themselves as a developer centric framework according to their analysis of cloud computing in Java:

Heavy UI oriented. These types of applications or framework usually provide UI-based consoles, management applications, plugins, etc that provide the only way to manage resources on the cloud such as starting and stopping the image, etc. The key characteristic of this approach is that it requires a substantial user input and human interaction and thus they tend to be less dynamic and less on-demand. Good examples would be RightScale, GigaSpaces, ElasticGrid.

Heavy framework oriented. This approach strongly emphasizes dynamism of resource management on the cloud. The key characteristic of this approach is that it requires no human interaction and all resource management can be done programmatically by the grid/cloud middleware - and thus it is more dynamic, automated and true on-demand. Google App Engine (for Python), GridGain would be good examples. I think there's a misunderstanding of RightScale here. The UI is to configure the automated system, not manage the system. The automated system monitors and responds to events without human interaction. Won't their automated cloud layer have to do something similar? To bootstrap any complex system out of the mud of complexity a helpful UI is needed. The framework approach of GridGain's infrastructure is developer friendly, but that won't fly for external management within the cloud.

GridGain's True Nature: One Compute Grid, Many Data Grids

With these definitions in place we can now learn the secret of Grid Gain: One Compute Grid, Many Data Grids. Ding! Ding! Ding! Once I understood this I understood Grid Gain's niche. GridGain has focussed on making it dead simple to distribute work across a compute grid. It's a job management mechanism. GridGain doesn't include a data grid. It will work against any data grid. For some reason this fact was something I'd never pulled out of the noise before. And when I would read Nakita's blog with all the nifty little code samples I never really appreciated what was happening. Yes, I'm just that dumb, but I also think Grid Gain should expose the magic of what's going on behind the scenes more rather than push the simple 30-second-lets-write-code-live style demo. Seeing the mechanics would make it easier to build a mental model of the value being added by GridGain.

Transparent and Low Configuration Implementation of Key Features

A compute grid is just a bunch of CPUs calculations/jobs/work can be run on. As a developer problem are broken up into smaller tasks and spread across all your nodes so the result is calculated faster because it is happening in parallel. GridGain enthusiastically supports the MapReduce model of computation. When deploying a grid a few key problems come up:

How do you get your code to all nodes? Not just the first time, but every time a JAR file changes how distributed across all nodes?

How do all the other nodes find each other when they come up? Clearly for work to be sent to nodes someone must know about them.

How are jobs distributed to the nodes? Somehow jobs must be sent to a node, the calculations made, and the results assembled.

How are failures handled? Somehow when a node goes down and new nodes come on-line work must be rescheduled.

How does each node get the data it needs to do its work? Scalable computation without scalable data doesn't work for most problems. Much of the drama is lost with GridGain because most of these capabilities almost are implemented almost transparently. Discovery happens automatically. When nodes come up they communicate with each other and transparently form a grid. You don't see this, it just happens. In fact, this was one of GridGain's issues when porting to the cloud. They used multicast for discovery and Amazon doesn't support multicast. So they had to use another messaging service, which GridGain supports doing out-of-the box, and are now working on their unicast own version of the discovery service. Deploying new code is always a frustrating problem. Over the same transparently formed grid, code updates are transparently auto deployed on the grid. Again, this is one of those things you see happen from Eclipse and it loses most of the impact. It just looks like how it's supposed to work, but rarely does. With GridGain you do a build and your code changes are automatically sent through to each node in the grid. Very nice. To mark a method a gridified an annotation (or an API call) is used:

@Gridify(taskClass = GridifyHelloWorldTask.class, timeout = 3000)
public static int sayIt(String phrase) {
    // Simply print out the argument.
    System.out.println(">>> Printing '" + phrase + "' on this node from grid-enabled method.");
    return phrase.length();
}

The task class is responsible for splitting method execution into sub-jobs. For a full example go here. The @Gridify annotation uses AOP (aspect-oriented programming) to automatically "gridify" the method. I assume this registers the method with the job scheduling system. When the application comes up and triggers execution the method is then scheduled through the job scheduling system and allocated to nodes. Again, you don't see this and they really don't talk enough about how this part works. Notice how so much complexity is nicely hidden by GridGain with very little configuration on the developer's part. There aren't a billion different XML files where every single part of the system has to be defined ahead of time. The dynamic transparent nature of the core features make it simple to use.

Integrating with the Data Grid

We haven't talked about data at all. If you are just concerned with a program like a Monte Carlo simulation then the compute grid is all you need. But most calculations require data. Where does your massive compute grid pull the data from? That's where the data grid comes in. A data grid is the controlled sharing and management of large amounts of distributed data. GridGain leaves the data grid up to other software by integrating with packages like, JBoss Cache, Oracle Coherence, and GigaSpaces. Remember One Compute grid, Many Data Grids. GridGain accesses the data grid through an API so you can plug in any data grid you want to support with a little custom code. Google and Hadoop use a distributed file system (DFS) as their data grid. This makes sense. When you need to feed lots of CPUs the data can't come from a centralized store. The data must be parallelized and that's what a DFS does. A DFS splats data across a lot of spindles so it can be pulled relatively quickly by lots of CPUs in parallel. Other products like Coherence and GigaSpaces store data in an in-memory data grid instead of a filesytem. Serving data from memory is faster, but you are limited by the amount of memory you have. If you have a petabyte of data your choice is clear, but if your problem is a bit smaller than maybe an in-memory solution would work. The closer data is to the business logic the better performance will be. GridGain controls job execution while the data grid is responsible for the availability and integrity of the data. GridGain doesn't care what data grid you use, but your choice has implications for performance. A compute grid and an in-memory data grid in the same cloud will smoke configurations where the data grid comes from disk or is located outside the cloud.

GridGain is Linearly Scalable for a Pure CPU Benchmark

The good folks at GridDynamics are doing some serious cloud testing of different products and different clouds. They did a test Scalability Benchmark of Monte Carlo Simulation on Amazon EC2 with GridGain Software that found GridGain was linearly scalable to 512 nodes in Amazon's EC2. A Monte Carlo simulation is a CPU test only, it does not use a data grid. A data grid based test would be more useful to me as everything changes once large amounts of data start flying around, but it does indicate the core of GridGain is quite scalable.

Wrapping Up

Grid products like Coherence and GigaSpaces include both compute grid and data grid features. Why choose a compute grid only system like GridGain when other products include both capabilities? GridGain might say they win business on the quality of their compute grid, excellent support and documentation, and the ability to cleanly integrate into almost any existing ecosystem through their well thought out API abstraction layer and their out-of-the-box support for almost every important Java framework. Others may counter performance is far better when the business logic and the job management are integrated. All interesting issues to tradeoff in your own decision making process. GridGain is free as their business model is based on providing support and consultation. A non-starter for many is the Java-only restriction. What is unique about GridGain is how easy and transparent they made it to use and deploy. That's some thoughtful engineering.

Gridify Blog

10 Reasons to Use GridGain

What is Grid Gain?

Developers Productivity: Unsung Hero of GridGain

GridGain vs Hadoop

Cameron Purdy: Defining a Data Grid

Compute Grids vs. Data Grids

Product: Func - Fedora Unified Network Controller

Tuesday, September 16, 2008 at 12:52AM

Func is used to manage a large network using bash or Python scripts. It targets easy and simple remote scripting and one-off tasks over SSH by creating a secure (SSL certifications) XMLRPC API for communication. Any kind of application can be written on top of it. Other configuration management tools specialize in mass configuration. They say here's what the machine should look like and keep it that way. Func allows you to program your cluster. If you've ever tried to securely remote script a gang of machines using SSH keys you know what a total nightmare that can be. Some example commands:

Using the command line:
func "*.example.org" call yumcmd update

Using the Pthon API:
import func.overlord.client as fc
client = fc.Client("*.example.org;*.example.com")
client.yumcmd.update()
client.service.start("acme-server")
print client.hardware.info()

Func may certainly overlap in functionality with other tools like Puppet and cfengine, but as programmers we always need more than one way to do it and definitely see how I could have used Func on a few projects.

Product: Tungsten Replicator

Friday, September 5, 2008 at 12:52AM

With Tungsten Replicator Continuent is trying to deliver a better master/slave replication system. Their goal: scalability, reliability with seamless failover, no performance loss. From their website: The Tungsten Replicator implements open source database-neutral master/slave replication. Master/slave replication is a highly flexible technology that can solve a wide variety of problems including the following: * Availability - Failing over to a slave database if your master database dies * Performance Scaling - Spreading reads across many copies of data * Cross-Site Clustering - Maintaining active database replicas across WANs * Change Data Capture - Extracting changes to load data warehouses or update other systems * Zero Downtime Upgrade - Performing upgrades on a slave server which then becomes the master The Tungsten Replicator architecture is flexible and designed to support addition of new databases easily. It includes pluggable extractor and applier modules to help transfer data from master to slave. The Replicator is designed to include a number of specialized features designed to improve its usefulness for particular problems like availability. * Replicated changes have transaction IDs and are stored in a transaction history log that is identical for each server. This feature allows masters and slaves to exchange roles easily. * Smooth procedures for planned and unplanned failover. * Built-in consistency check tables and events allow users to check consistency between tables without stopping replication or applications. * Support for statement as well as row replication. * Hooks to allow data transformations when replicating between different database types. Tungsten Replicator is not a toy. It is designed to allow commercial construction of robust database cluster

Tungsten ScaleOut Stack - an open source collection of integrated projects for database scale-out making use of commodity hardware.

Continuent Intros Tungsten Replicator by Shamila Janakiraman.

3 Comments |

Permalink |

MySQL,

replication

Wednesday

Sep032008

MapReduce framework Disco

Wednesday, September 3, 2008 at 2:42PM

Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.

tmielika |

1 Comment |

Permalink |

Latency is Everywhere and it Costs You Sales - How to Crush it

Python,

erlang,

map-reduce,

mapreduce

Friday

Aug292008

Product: ScaleOut StateServer is Memcached on Steroids

Friday, August 29, 2008 at 2:39AM

ScaleOut StateServer is an in-memory distributed cache across a server farm or compute grid. Unlike middleware vendors, StateServer is aims at being a very good data cache, it doesn't try to handle job scheduling as well. StateServer is what you might get when you take Memcached and merge in all the value added distributed caching features you've ever dreamed of. True, Memcached is free and ScaleOut StateServer is very far from free, but for those looking a for a satisfying out-of-the-box experience, StateServer may be just the caching solution you are looking for. Yes, "solution" is one of those "oh my God I'm going to pay through the nose" indicator words, but it really applies here. Memcached is a framework whereas StateServer has already prepackaged most features you would need to add through your own programming efforts. Why use a distributed cache? Because it combines the holly quadrinity of computing: better performance, linear scalability, high availability, and fast application development. Performance is better because data is accessed from memory instead of through a database to a disk. Scalability is linear because as more servers are added data is transparently load balanced across the servers so there is an automated in-memory sharding. Availability is higher because multiple copies of data are kept in memory and the entire system reroutes on failure. Application development is faster because there's only one layer of software to deal with, the cache, and its API is simple. All the complexity is hidden from the programmer which means all a developer has to do is get and put data. StateServer follows the RAM is the new disk credo. Memory is assumed to be the system of record, not the database. If you want data to be stored in a database and have the two kept in sync, then you'll have to add that layer yourself. All the standard memcached techniques should work as well for StateServer. Consider however that a database layer may not be needed. Reliability is handled by StateServer because it keeps multiple data copies, reroutes on failure, and has an option for geographical distribution for another layer of added safety. Storing to disk wouldn't make you any safer. Via email I asked them a few questions. The key question was how they stacked up against Memcached? As that is surely one of the more popular challenges they would get in any sales cycle, I was very curious about their answer. And they did a great job differentiation themselves. What did they say? First, for an in-depth discussion of their technology take a look ScaleOut Software Technology, but here a few of the highlights:

Platforms: .Net, Linux, Solaris

Languages: .Net, Java and C/C++

Transparent Services: server farm membership, object placement, scaling, recovery, creating and managing replicas, and handling synchronization on object access.

Performance: Scales with measured linear throughput gain to farms with 64 servers. StateServer was subjected to maximum access load in tests that ramped from 2 to 64 servers, with more than 2.5 gigabytes of cached data and a sustained throughput of over 92,000 accesses per second using a 20 Mbits/second Infiniband network. StateServer provided linear throughput increases at each stage of the test as servers and load were added.

Data cache only. Doesn't try to become middleware layer for executing jobs. Also will not sync to your database.

Local Cache View. Objects are cached on the servers where they were most recently accessed. Application developers can view the distributed cache as if it were a local cache which is accessed by the customary add, retrieve, update, and remove operations on cached objects. Object locking for synchronization across threads and servers is built into these operations and occurs automatically.

Automatic Sharding and Load Balancing. Automatically partitions all of distributed cache's stored objects across the farm and simultaneously processes access requests on all servers. As servers are added to the farm, StateServer automatically repartitions and rebalances the storage workload to scale throughput. Likewise, if servers are removed, ScaleOut StateServer coalesces stored objects on the surviving servers and rebalances the storage workload as necessary.

High Availability. All cached objects are replication on up to two additional servers. If a server goes offline or loses network connectivity, ScaleOut StateServer retrieves its objects from replicas stored on other servers in the farm, and it creates new replicas to maintain redundant storage as part of its "self-healing" process. Uses a quorum-based updating scheme.

Flexible Expiration Policies. Optional object expiration after sliding or fixed timeouts, LRU memory reclamation, or object dependency changes. Asynchronous events are also available to signal object expiration.

Geographical Scaleout. Has the ability to automatically replicate to a remote cache using the ScaleOut GeoServer option.

Parallel Query. Perform fully parallel queries on cached objects. Developers can attach metadata or "tags" to cached objects and query the cache for all matching objects. ScaleOut StateServer performs queries in parallel across all caching servers and employs patent-pending technology to ensure that query operations are both highly available and scalable. This is really cool technology that really leverages the advantage of in-memory databases. Sharding means you have a scalable system then can execute complex queries in parallel without you doing all the work you would normally do in a sharded system. And you don't have to resort to the complicated logics need for SimpleDB and BigTable type systems. Very nice.

Pricing: - Development Edition: No Charge - Professional Edition: $1,895 for 2 servers - Data Center Edition: $71,995 for 64 servers - GeoServer Option First two data centers $14,995, Each add'l data center $7,495. - Support: 25% of software license fee Some potential negatives about ScaleOut StateServer:

I couldn't find a developer forum. There may be one, but it eluded me. One thing I always look for is a vibrant developer community and I didn't see one. So if you have problems or want to talk about different ways of doing things, you are on your own.

The sales group wasn't responsive. I sent them an email with a question and they never responded. That always makes me wonder how I'll be treated once I've put money down.

The lack of developer talk made it hard for me to find negatives about the product itself, so I can't evaluate its quality in production. In the next section the headings are my questions and the responses are from ScaleOut Software.

Why use ScaleOut StateServer instead of Memcached?

I've [Dan McMillan, VP Sales] included some data points below based on our current understanding of the Memcached product. We don't use and haven't tested Memcached internally, so this comparison is based in part upon our own investigations and in part what we are hearing from our own customers during their evaluation and comparisons. We are aware that Memcached is successfully being used on many large, high volume sites. We believe strong demand for ScaleOut is being driven by companies that need a ready-to-deploy solution that provides advanced features and just works. We also hear that Memcached is often seen as a low cost solution in the beginning, but development and ongoing management costs sometimes far exceed our licensing fees. What sets ScaleOut apart from Memcached (and other competing solutions) is that ScaleOut was architected from the ground up to be a fully integrated and automated caching solution. ScaleOut offers both scalability and high availability, where our competitors typically provide only one or the other. ScaleOut is considered a full-featured, plug-n-play caching solution at a very reasonable price point, whereas we view Memcached as a framework in which to build your own caching solution. Much of the cost in choosing Memcached will be in development and ongoing management. ScaleOut works right out of the box. I asked ScaleOut Software founder and chief architect, Bill Bain for his thoughts on this. He is a long-time distributed caching and parallel computing expert and is the architect of ScaleOut StateServer. He had several interesting points to share about creating a distributed cache by using an open source (i.e. build it yourself) solution versus ScaleOut StateServer. First, he estimates that it would take considerable time and effort for engineers to create a distributed cache that has ScaleOut StateServer's fundamental capabilities. The primary reason is that the open source method only gives you a starting point, but it does not include most capabilities that are needed in a distributed cache. In fact, there is no built-in scalability or availability, the two principal benefits of a distributed cache. Here is some of the functionality that you would have to build:

Scalable storage and throughput. You need to create a means of storing objects across the servers in the farm in a way that will scale as servers are added, such as creating and managing partitions. Dynamic load balancing of objects is needed to avoid hot spots, and to our knowledge this is not provided in memcached.

High availability. To ensure that objects are available in the case of a server failure, you need to create replicas and have a means of automatically retrieving them in case a server fails. Also, just knowing that a server has failed requires you to develop a scalable heart-beating mechanism that spans all servers and maintains a global membership. Replicas have to be atomically updated to maintain the coherency of the stored data.

Global object naming. The storage, load-balancing, and high availability mechanisms need to make use of efficient, global object naming and lookup so that any client can access any object in the distributed cache, even after load-balancing or recovery actions.

Distributed locking. You need distributed locking to coordinate accesses by different clients so that there are not conflicts or synchronization issues as objects are read, updated and deleted. Distributed locks have to automatically recover in case of server failures.

Object timeouts. You also will need to build the capability for the cache to handle object timeouts (absolute and sliding) and to make these timeouts highly available.

Eventing. If you want your application to be able to catch asynchronous events such as timeouts, you will need a mechanism to deliver events to clients, and this mechanism should be both scalable and highly available.

Local caching. You need the ability to internally cache deserialized data on the clients to keep response times fast and avoid deserialization overhead on repeated reads. These local caches need to be kept coherent with the distributed cache.

Management. You need a means to manage all of the servers in the distributed cache and to collect performance data. There is no built-in management capability in memcached, and this requires a major development effort.

Remote client support. ScaleOut currently offers both a standard configuration (installed as a Windows service on each web server) and a remote client configuration (Installed on a dedicated cache farm).

ASP.Net/Java interoperability. Our Java/Linux release will offer true ASP.Net/Java interop, allowing you to share objects and manage sessions across platforms. Note: we just posted our "preview" release last week.

Indexed query functionality. Our forthcoming ScaleOut 4.0 release will contain this feature, which allows you to query the store to return objects based on metadata.

Multiple data center support. With our GeoServer product, you can automatically replicate cached information to up to 8 remote data centers. This provides a powerful solution for disaster recovery, or even "active-active" configurations. GeoServer's replication is both scalable and high available. In addition to the above, we hope that the fact ScaleOut Software provides a commercial solution that is reasonably priced, supported and constantly improved would be viewed as an important plus for our customers. In many cases, in-house and open source solutions are not supported or improved once the original developer is gone or is assigned to other priorities.

Do you find yourself in competition with the likes of Terracotta, GridGain, GridSpaces, and Coherence type products?

Our ScaleOut technology has previously been targeted to the ASP.Net space. Now that we are entering the Java/Linux space, we will be competing with companies like the ones you mentioned above, which are mainly Java/Linux focused as well. We initially got our start with distributed caching for ecommerce applications, but grid computing seems to be a strong growth area for us as well. We are now working with some large Wall Street firms on grid computing projects that involve some (very large) grid operations. I would like to reiterate that we are very focused on data caching only. We don't try to do job scheduling or other grid computing tasks, but we do improve performance and availability for those tasks via our distributed data cache.

What architectures your customers are using with your GeoServer product?

A. GeoServer is a newer, add-on product that is designed to replicate the contents of two or more geographically separated ScaleOut object stores (caches). Typically a customer might use GeoServer to replicate object data between a primary data center site and a DR site. GeoServer facilitates continuous (async.) replication between sites, so if site A goes offline, the other site B is immediately available to handle the workload. Our ScaleOut technology offers 3 primary benefits: Scalability, performance & high availability. From a single web farm perspective, ScaleOut provides high availability by making either 1 or 2 (this is configurable) replica copies of each master object and storing the replica on an alternate host server in the farm. ScaleOut provides uniform access to the object from any server, and protects the object in the case of a server failure. With GeoServer, these benefits are extended across multiple sites. It is true that distributed caches typically hold temporary, fast-changing data, but that data can still be very critical to ecommerce, or grid computing applications. Loss of this data during a server failure, worker process recycle or even a grid computation process is unacceptable. We improve performance by keeping the data in-memory, while still maintaining high availability.

RAM is the new disk

A Bunch of Great Strategies for Using Memcached and MySQL Better Together

Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

Google's Paxos Made Live – An Engineering Perspective

Industry Chat with Bill Bain and Marc Jacobs - Joe Rubino interviews William L. Bain, Founder & CEO of ScaleOut Software and Marc Jacobs, Director at Lab49, on distributed caches and their use within Financial Services.

18 Comments |

Permalink |

Caching Category on High Scalability

Caching,

Product

Tuesday

Jul292008

Ehcache - A Java Distributed Cache

Tuesday, July 29, 2008 at 2:21PM

Ehcache is a pure Java cache with the following features: fast, simple, small foot print, minimal dependencies, provides memory and disk stores for scalability into gigabytes, scalable to hundreds of caches is a pluggable cache for Hibernate, tuned for high concurrent load on large multi-cpu servers, provides LRU, LFU and FIFO cache eviction policies, and is production tested. Ehcache is used by LinkedIn to cache member profiles. The user guide says it's possible to get at 2.5 times system speedup for persistent Object Relational Caching, a 1000 times system speedup for Web Page Caching, and a 1.6 times system speedup Web Page Fragment Caching. From the website: Introduction Ehcache is a cache library. Before getting into ehcache, it is worth stepping back and thinking about caching generally. About Caches Wiktionary defines a cache as A store of things that will be required in future, and can be retrieved rapidly . That is the nub of it. In computer science terms, a cache is a collection of temporary data which either duplicates data located elsewhere or is the result of a computation. Once in the cache, the data can be repeatedly accessed inexpensively. Why caching works Locality of Reference While ehcache concerns itself with Java objects, caching is used throughout computing, from CPU caches to the DNS system. Why? Because many computer systems exhibit locality of reference . Data that is near other data or has just been used is more likely to be used again. The Long Tail Chris Anderson, of Wired Magazine, coined the term The Long Tail to refer to Ecommerce systems. The idea that a small number of items may make up the bulk of sales, a small number of blogs might get the most hits and so on. While there is a small list of popular items, there is a long tail of less popular ones. The Long Tail The Long Tail is itself a vernacular term for a Power Law probability distribution. They don't just appear in ecommerce, but throughout nature. One form of a Power Law distribution is the Pareto distribution, commonly know as the 80:20 rule. This phenomenon is useful for caching. If 20% of objects are used 80% of the time and a way can be found to reduce the cost of obtaining that 20%, then the system performance will improve. Will an Application Benefit from Caching? The short answer is that it often does, due to the effects noted above. The medium answer is that it often depends on whether it is CPU bound or I/O bound. If an application is I/O bound then then the time taken to complete a computation depends principally on the rate at which data can be obtained. If it is CPU bound, then the time taken principally depends on the speed of the CPU and main memory. While the focus for caching is on improving performance, it it also worth realizing that it reduces load. The time it takes something to complete is usually related to the expense of it. So, caching often reduces load on scarce resources. Speeding up CPU bound Applications CPU bound applications are often sped up by: * improving algorithm performance * parallelizing the computations across multiple CPUs (SMP) or multiple machines (Clusters). * upgrading the CPU speed. The role of caching, if there is one, is to temporarily store computations that may be reused again. An example from ehcache would be large web pages that have a high rendering cost. Another caching of authentication status, where authentication requires cryptographic transforms. Speeding up I/O bound Applications Many applications are I/O bound, either by disk or network operations. In the case of databases they can be limited by both. There is no Moore's law for hard disks. A 10,000 RPM disk was fast 10 years ago and is still fast. Hard disks are speeding up by using their own caching of blocks into memory. Network operations can be bound by a number of factors: * time to set up and tear down connections * latency, or the minimum round trip time * throughput limits * marshalling and unmarhshalling overhead The caching of data can often help a lot with I/O bound applications. Some examples of ehcache uses are: * Data Access Object caching for Hibernate * Web page caching, for pages generated from databases. Increased Application Scalability The flip side of increased performance is increased scalability. Say you have a database which can do 100 expensive queries per second. After that it backs up and if connections are added to it it slowly dies. In this case, caching may be able to reduce the workload required. If caching can cause 90 of that 100 to be cache hits and not even get to the database, then the database can scale 10 times higher than otherwise. How much will an application speed up with Caching? The short answer The short answer is that it depends on a multitude of factors being: * how many times a cached piece of data can and is reused by the application * the proportion of the response time that is alleviated by caching In applications that are I/O bound, which is most business applications, most of the response time is getting data from a database. Therefore the speed up mostly depends on how much reuse a piece of data gets. In a system where each piece of data is used just once, it is zero. In a system where data is reused a lot, the speed up is large. The long answer, unfortunately, is complicated and mathematical. It is considered next.

Product: Memcached

Manage a Cache System with EHCache

6 Comments |

Permalink |

Caching,

Java,

Product

Monday

Jul212008

Eucalyptus - Build Your Own Private EC2 Cloud

Monday, July 21, 2008 at 3:11AM

Update: InfoQ links to a few excellent Eucalyptus updates: Velocity Conference Video by Rich Wolski and a Visualization.com interview Rich Wolski on Eucalyptus: Open Source Cloud Computing. Eucalyptus is generating some excitement on the Cloud Computing group as a potential vendor neutral EC2 compatible cloud platform. Two reasons why Eucalyptus is potentially important: private clouds and cloud portability: Private clouds. Let's say you want a cloud like infrastructure for architectural purposes but you want it to run on your own hardware in your own secure environment. How would you do this today? Hm.... Cloud portability. With the number of cloud offerings increasing how can you maintain some level of vendor neutrality among this "swarm" of different options? Portability is a key capability for cloud customers as the only real power customers have is in where they take their business and the only way you can change suppliers is if there's a ready market of fungible services. And the only way their can be a market is if there's a high degree of standardization. What should you standardize on? The options are usually to form a great committee and take many years to spec out something that doesn't exist, nobody will build, and will never really work. Or have each application create a high enough layer interface that portability is potentially difficult, but possible. Or you can take a popular existing API, make it the general API, and everyone else is accommodated using an adapter layer and the necessary special glue to take advantage of value add features for each cloud. With great foresight Eucalyptus has chosen to create a cloud platform based on Amazon's EC2. As this is the most successful cloud platform it makes a lot of sense to use it as a model. We see something similar with the attempts to port Google AppEngine to EC2 thus making GAE a standard framework for web apps. So developers would see GAE on top of EC2. A lot of code would be portable between clouds using this approach. Even better would be to add ideas in from RightScale, 3Tera, and Mosso to get a higher level view of the cloud, but that's getting ahead of the game. Just what is Eucalyptus? From their website: Overview ¶ Elastic Computing, Utility Computing, and Cloud Computing are (possibly synonymous) terms referring to a popular SLA-based computing paradigm that allows users to "rent" Internet-accessible computing capacity on a for-fee basis. While a number of commercial enterprises currently offer Elastic/Utility/Cloud hosting services and several proprietary software systems exist for deploying and maintaining a computing Cloud, standards-based open-source systems have been few and far between. EUCALYPTUS -- Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems -- is an open-source software infrastructure for implementing Elastic/Utility/Cloud computing using computing clusters and/or workstation farms. The current interface to EUCALYPTUS is interface-compatible with Amazon.com's EC2 (arguably the most commercially successful Cloud computing service), but the infrastructure is designed to be modified and extended so that multiple client-side interfaces can be supported. In addition, EUCALYPTUS is implemented using commonly-available Linux tools and basic web service technology making it easy to install and maintain. Overall, the goal of the EUCALYPTUS project is to foster community research and development of Elastic/Utility/Cloud service implementation technologies, resource allocation strategies, service level agreement (SLA) mechanisms and policies, and usage models. The current release is version 1.0 and it includes the following features: * Interface compatibility with EC2 * Simple installation and deployment using Rocks cluster-management tools * Simple set of extensible cloud allocation policies * Overlay functionality requiring no modification to the target Linux environment * Basic "Cloud Administrator" tools for system management and user accounting * The ability to configure multiple clusters, each with private internal network addresses, into a single Cloud. The initial version of EUCALYPTUS requires Xen to be installed on all nodes that can be allocated, but no modifications to the "dom0" installation or to the hypervisor itself. For more discussion see:

James Urquhart's excellent blog The Wisdom of Clouds.

Simon Wardley's post Open sourced EC2 .... not by Amazon.

Google Cloud Computing Group.

Eucalyptus and You by James Urquhart

Open Virtual Machine Format on LayerBoom. The Open Virtual Machine Format, or OVF is a proposed universal format that aims to create a secure, extensible method of describing and packaging virtual containers.

6 Comments |

Permalink |

EC2,