High Scalability -

Entries by HighScalability Team (1576)

Wednesday

Dec162009

Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

Wednesday, December 16, 2009 at 9:28AM

"But it is not complicated. [There's] just a lot of it."
-- Richard Feynman on how the immense variety of the world arises from simple rules.

Contents:

Have We Reached the End of Scaling?
Applications Become Black Boxes Using Markets to Scale and Control Costs
Let's Welcome our Neo-Feudal Overlords
The Economic Argument for the Ambient Cloud
What Will Kill the Cloud?
The Amazing Collective Compute Power of the Ambient Cloud
Using the Ambient Cloud as an Application Runtime
Applications as Virtual States
Conclusion

We have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.

Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. This may seem like a lot now, but consider we have no planet wide applications yet. None.

Tomorrow the numbers foreshadow a new Cambrian explosion of connectivity that will look as different as the image of a bare lifeless earth looks to us today. We will have 10 billion people, we will have trillions of things, and we will have a great multitude of social networks densely interconnecting all these people to people, things to things, and people to things.

How can we possibly build planet scalable systems to handle this massive growth if building much smaller applications currently stresses architectural best practices past breaking? We can't. We aren't anywhere close to building applications at this scale, except for perhaps Google and a few others, and there's no way you and I can reproduce what they are doing. Companies are scrambling to raise hundreds of millions of dollars in order to build even more datacenters. As the world becomes more and more global and more and more connected, handling the load may require building applications 4 or 5 orders of magnitude larger than any current system. The cost for an infrastructure capable of supporting planet-scale applications could be in the 10 trillion dollar range (very roughly estimated at $100 million a data center times 10K).

If you aren't Google, or a very few other companies, how can you possibly compete? For a glimmer of a possible direction that may not require a kingdom's worth of resources, please take a look at this short video:

Click to read more ...

HighScalability Team |

20 Comments |

Permalink |

Print Article

Email Article

ambient

Thursday

Nov262009

What I'm Thankful For on Thanksgiving

Thursday, November 26, 2009 at 7:35AM

I try to keep this blog targeted and on topic. So even though I may be thankful for the song of the tinniest sparrow at sunrise, I'll save you from all that. It's hard to tie scalability and the giving of thanks together, especially as it sometimes occurs to me that this blog may be a self-indulgent waste of time. But I think I found a sentiment in A New THEORY of AWESOMENESS and MIRACLES by James Bridle that manages to marry the topic of this blog and giving thanks meaningfully together:

I distrust commercial definitions of innovation, and particularly of awesomeness. It’s an overused term. When I think of awesomeness, I want something awe-inspiring, vast and mind-expanding.

So I started thinking about things that I think are awesome, or miraculous, and for me, it kept coming back to scale and complexity.

We’re not actually very good about thinking about scale and complexity in real terms, so we have to use metaphors and examples. Douglas Adams writes somewhere about how big the Hitchhiker’s Guide to the Galaxy actually is—imagine a sheet of paper, then a filing cabinet full of sheets of paper, then a room full of filing cabinets, then a skyscraper full of rooms, then a city full of skyscrapers, a country, a planet, a solar system and so on. I couldn’t find the exact quote, so his thoughts on space will have to do:

Just wonderful. I especially love the quote So I started thinking about things that I think are awesome, or miraculous, and for me, it kept coming back to scale and complexity. This perfectly sums up why the topic of scalability is so endlessly diverting. It can take you anywhere you want to go and everything eventually ends up back again.

Thanks for reading and...

Happy Thanksgiving!

HighScalability Team |

1 Comment |

Permalink |

Print Article

Email Article

Wednesday

Nov252009

Brian Aker's Hilarious NoSQL Stand Up Routine

Wednesday, November 25, 2009 at 2:02PM

Brian Aker gave this 10 minute lightning talk on NoSQL at the Nov 2009 OpenSQLCamp in Portland, Oregon. It's incredibly funny, probably because there's a lot of truth to what he's saying.

Here are the slides and here are the notes. Found though #nosql.

HighScalability Team |

10 Comments |

Permalink |

Print Article

Email Article

funny,

nosql

Tuesday

Nov242009

Hot Scalability Links for Nov 24 2009

Tuesday, November 24, 2009 at 7:06AM

Eventual Consistency by Example by Sergio Bossa. Attempts to clear up some misconceptions about eventual consitency as discussed in Amazon's Dynamo paper.
Boston Big Data Summit keynote outline by Curt Monash. Interesting topics: Big Data and the cloud actually have relatively little to do with each other and The NoSQL movement is a lot like the Ron Paul campaign.
I think RDBMS has set the industry back by 10 years by Henry G. Baker, Ph.D, from 1992. I can categorically state that relational databases set the commercial data processing industry back at least ten yearsand wasted many of the billions of dollars that were spent on data processing. Henry thought OO databases would change things. They didn't. The question is why?
Intel cloud service tests the scalability of your code. Intel has a cloud based tool that can test how your application will perform on will on a number of multicore processor configurations -- 1, 2, 4, 8, or 16 hardware threads.
Mapreduce 1, a lecture by Brian Harvey.
Gear6 has released a software version of their cache product. Interesting departure from the appliance model. Appliances are good because they allow you complete control and something to hang some margin off of. Yet if you want to sell into the cloud you have to build software components, not a hardware solution. Seems like a good idea for those who want a tricked out memcached solution out of the box.
Hadoop at Twitter (part 1): Splittable LZO Compression. How Twitter is using Hadoop to analyze a tweasure trove of tweets.
A funny/insightful/sad/truish Dilbert cartoon on how clouds fit into Dilbert's world.

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

Tuesday

Nov172009

10 eBay Secrets for Planet Wide Scaling

Tuesday, November 17, 2009 at 11:27AM

You don't even have to make a bid, Randy Shoup, an eBay Distinguished Architect, gives this presentation on how eBay scales, for free. Randy has done a fabulous job in this presentation and in other talks listed at the end of this post getting at the heart of the principles behind scalability. It's more about ideas of how things work and fit together than a focusing on a particular technology stack.

Impressive Stats

In case you weren't sure, eBay is big, with lots of: users, data, features, and change...

Over 89 million active users worldwide
190 million items for sale in 50,000 categories
Over 8 billion URL requests per day
Hundreds of new features per quarter
Roughly 10% of items are listed or ended every day
In 39 countries and 10 languages
24x7x365
70 billion read / write operations / day
Processes 50TB of new, incremental data per day
Analyzes 50PB of data per day

10 Lessons

Click to read more ...

HighScalability Team |

5 Comments |

Permalink |

Print Article

Email Article

Example,

Strategy,

ebay

Monday

Nov162009

Building Scalable Systems Using Data as a Composite Material

Monday, November 16, 2009 at 8:50AM

Think of building websites as engineering composite materials. A composite material is when two or more materials are combined to create a third material that does something useful that the components couldn't do on their own. Composites like reinforced concrete have revolutionized design and construction. When building websites we usually bring different component materials together, like creating a composite, to get the features we need rather than building a completely new thing from scratch that does everything we want.

This approach has been seen as a hack because it leads to inelegancies like data duplication; great gobs of component glue; consistency issues; and messy operations. But what if the the composite approach is really a strength, not a hack, but a messy part of the world that needs to be embraced rather than belittled?

They key is to see data as a material. Right now we are arguing which is the best single material to build with. Is it NoSQL, relational, massively parallel, graph, in-memory, or something else entirely? It all seems a bit crazy. Each material has both limits and capabilities. What we need to think of building is a composite material that combines the best characteristics of what is available into something better.

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Strategy

Wednesday

Nov112009

Hot Scalability Links for Nov 11 2009

Wednesday, November 11, 2009 at 8:03AM

The Cost of Latency by James Hamilton. James summarizes latency info from Steve Souder, Greg Linden, and Marissa Mayer. Speed [is] an undervalued and under-discussed asset on the web.
Dynamo - Part I: a followup and re-rebuttals. Dynamo under attack as having Design flaws and the resounding rebuttal in response.
Programming Bits and Atoms. Thinking about programming and scaling as a problem in physics. Absolutely fascinating and inspiring.
Scaling Servers with the Cloud: Amazon S3. Build a static site using S3 for pennies. An oldly but still a goody idea.
Are Wireless Road Trains the Cure for Traffic Congestion? The concept of road trains--up to eight vehicles zooming down the road together--has long been considered a faster, safer, and greener way of traveling long distances by car.
Erlang at Facebook by Eugene Letuchy. How Facebook uses Erlang to implement Chat, AIM Presence, and Chat Jabber support.
Yahoo Open Sources Traffic Server. Traffic Server enables the session management, authentication, configuration management, load balancing, and routing for an entire cloud computing stack.
How Complex Systems Fail by Richard Cook. Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety
Heroku vs EngineYard Cloud vs Joyent by Eliot Sykes. Rails hosting options head-to-head.

HighScalability Team |

Product: Resque - GitHub's Distrubuted Job Queue

Friday, November 6, 2009 at 7:43AM

Queuing work for processing in the background is a time tested scalability strategy. Queuing also happens to be one of those much needed tools where it easy enough to forge for your own that we see a lot of different versions made. Resque is GitHub's take on a job queue and they've used it to process million and millions of jobs so far.

What is Resque?

Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

GitHub tried and considered many other systems: SQS, Starling, ActiveMessaging, BackgroundJob, DelayedJob, beanstalkd, AMQP, and Kestrel, but found them all wanting in one way are another. The latency for SQS was too high. Others didn't make full use of Ruby. Others still had a lot of overhead. Some didn't have enough features. And still others weren't reliable enough.

Click to read more ...

HighScalability Team |

2 Comments |

Permalink |

ruby

Thursday

Nov052009

A Yes for a NoSQL Taxonomy

Thursday, November 5, 2009 at 7:50AM

NorthScale's Steven Yen in his highly entertaining NoSQL is a Horseless Carriage presentation has come up with a NoSQL taxonomy that thankfully focuses a little more on what NoSQL is, than what it isn't:

key‐value‐cache
- memcached, repcached, coherence, infinispan, eXtreme scale, jboss cache, velocity, terracoqa
key‐value‐store
- keyspace, flare, schema‐free, RAMCloud
eventually‐consistent key‐value‐store
- dynamo, voldemort, Dynomite, SubRecord, Mo8onDb, Dovetaildb
ordered‐key‐value‐store
- tokyo tyrant, lightcloud, NMDB, luxio, memcachedb, actord
data‐structures server
- redis
tuple‐store
- gigaspaces, coord, apache river
object database
- ZopeDB, db4o, Shoal
document store
- CouchDB, Mongo, Jackrabbit, XML Databases, ThruDB, CloudKit, Perservere, Riak Basho, Scalaris
wide columnar store
- BigTable, Hbase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

"Who will win?" Steven asks. He answers: the most approachable API with enough power will win. Steven touts the contender with the most devastating knock out punch will be document stores because "everyone groks documents." Though the thought is there will be just a few winners and products will converge in functionality.

Steven is banking on the "worse is better" model of dominance, which is hard to argue with as it has been so successful an adoption pattern in our field. The convergence idea is something I also agree with. What we have now are a lot features masquerading as products. Over time they will merge together to become more full featured offerings.

The key question though is what is enough power to win? Just getting a value back for a key won't be enough. Who are you putting your money on?

Click to read more ...

HighScalability Team |

13 Comments |

Permalink |

Print Article

Email Article

key-value store,

nosql,

papers

Wednesday

Nov042009

Damn, Which Database do I Use Now?

Wednesday, November 4, 2009 at 6:28AM

With so many database options available these days, like for the rest of life, it's natural to wonder how it all fits together. Amazon complicated, or rather expanded the available options by introducing RDS, their relational database service. RDS is MySQL safely cocooned as a manageable cloud element, resting boldly within an energy providing elastic CPU pool, supported by a virtually infinite supply of very capable virtualized storage .

MySQL in AWS is now easy to start, stop, monitor, backup, snapshot, expand, and effortlessly move up and down the instance hierarchy. What it's not, contrary to what you might expect, is a scale-out solution, it's a scale-up solution. You get more by buying a bigger instance, not by horizontally adding more instances. There's a limit. Admittedly a larger limit now with Amazon's new high memory instances.

That's OK, well maybe not for people who helped grow Amazon's ecosystem by offering a similar product, but so many projects use MySQL that this is a big win for a lot of people. It makes life easier even if the promise of infinite relational database storage is yet to be realized.

If one of the reasons you were considering using a Platform as a Service is to knock the database item off your worry list, RDS is one more reason to consider playing your own general contractor and orchestrating all the elements together yourself. As more services become packaged into cloud capable components this is likely how many systems will be bolted together in the future.

But we are left wondering, how RDS fits together with SimpleDB and all the other database options?

Click to read more ...

HighScalability Team |

5 Comments |

Permalink |

Print Article

Email Article

amazon