High Scalability -

16 Comments |

Permalink |

Example,

Strategy

Friday

Aug192011

Stuff The Internet Says On Scalability For August 19, 2011

Friday, August 19, 2011 at 9:00AM

You may not scale often, but when you scale, please drink HighScalability:

Akamai: - 95,811 Servers, 1,000 Networks, 70 Countries.
Quotably quotable quotes:
- @segphault : Linus talking about the kernel's scalability. Beneficial to have one kernel used from embedded to high-end bc improvements span use cases.
- @russferriday : Just completed a proposal for a rare bird data gathering system using #CouchDB *and* #Cassandra. Nice project. #NoSQL
- @drelu : Oracle - everything is very convenient until it fails. #nosql
How do you model Google+ circles with MongoDB? Some ideas in this Google Groups thread. More on MongoDB with Mat Wall explaining Why I Chose MongoDB for guardian.co.uk .

Paper: The Akamai Network - 61,000 servers, 1,000 networks, 70 countries

Thursday, August 18, 2011 at 9:14AM

Update: as of the end of Q2 2011, Akamai had 95,811 servers deployed globally.

Akamai is the CDN to the stars. It claims to deliver between 15 and 30 percent of all Web traffic, with major customers like Facebook, Twitter, Apple, and the US military. Traditionally quite secretive, we get a peek behind the curtain in this paper: The Akamai Network: A Platform for High-Performance Internet Applications by Erik Nygren, Ramesh Sitaraman, and Jennifer Sun.

Abstract:

Comprising more than 61,000 servers located across nearly 1,000 networks in 70 countries worldwide, the Akamai platform delivers hundreds of billions of Internet interactions daily, helping thousands of enterprises boost the performance and reliability of their Internet applications. In this paper, we give an overview of the components and capabilities of this large-scale distributed computing platform, and offer some insight into its architecture, design principles, operation, and management.

Delivering applications over the Internet is a bit like living in the Wild West, there are problems: Peering point congestion, Inefficient communications protocols, Inefficient routing protocols, Unreliable networks, Scalability, Application limitations and a slow rate of change adoption. A CDN is the White Hat trying to remove these obstacles for enterprise customers. They do this by creating a delivery network that is a virtual network over the existing Internet. The paper goes on to explain how they make this happen using edge networks and a sophisticated software infrastructure. With such a powerful underlying platform, Akamai is clearly Google-like in their ability to deliver products few others can hope to match.

Detailed and clearly written, it's well worth a read.

4 Comments |

Permalink |

CDN,

Example,

Paper

Monday

Aug152011

Should any cloud be considered one availability zone? The Amazon experience says yes.

Monday, August 15, 2011 at 9:23AM

Amazon has a very will written account of their 8/8/2011 downtime: Summary of the Amazon EC2, Amazon EBS, and Amazon RDS Service Event in the EU West Region. Power failed, backup generators failed to kick in, there weren't enough resources for EBS volumes to recover, API servers where overwhelmed, a DNS failure caused failovers to alternate availability zones to fail, a double fault occurred as the power event interrupted the repair of a different bug. All kind of typical stuff that just seems to happen.

Considering the previous outage, the big question for programmers is: what does this mean? What does it mean for how systems should be structured? Have we learned something that can't be unlearned?

5 Comments |

Permalink |

Strategy,

amazon

Friday

Aug122011

Stuff The Internet Says On Scalability For August 12, 2011

Friday, August 12, 2011 at 8:46AM

Submitted for your scaling pleasure, you may not scale often, but when you scale, please drink us:

Quotably quotable quotes:
- @mardix : There is no single point of truth in #NoSQL . #Consistency is no longer global, it's relative to the one accessing it. #Scalability
- @kekline : RT @CurtMonash: "...from industry figures, Basho/Riak is our third-biggest competitor." How often do you encounter them? "Never have" #nosql
- @dave_jacobs : Love being in a city where I can overhear a convo about Heroku scalability while doing deadlifts. #ahsanfrancisco
- @sufw : How can it be possible that Tagged has 80m users and I have *never* heard of it!?!
- @EventCloudPro : One of my vacation realizations? Whole #bigdata thing has turned into a lotta #bighype - many distinct issues & nothing to do w/ #bigdata
NoSQL as dynamic duos. NoSQL combinations - what works best? A common pattern seems to be Redis as a cache and Riak as the distributed backend.

2 Comments |

Permalink |

hot links

Wednesday

Aug102011

LevelDB - Fast and Lightweight Key/Value Database From the Authors of MapReduce and BigTable

Wednesday, August 10, 2011 at 9:01AM

LevelDB is an exciting new entrant into the pantheon of embedded databases, notable both for its pedigree, being authored by the makers of the now mythical Google MapReduce and BigTable products, and for its emphasis on efficient disk based random access using log-structured-merge (LSM) trees.

The plan is to keep LevelDB fairly low-level. The intention is that it will be a useful building block for higher-level storage systems. Basho is already investigating using LevelDB as one if its storage engines.

In the past many systems were built around embedded databases, though most developers now use database servers connected to via RPCs. An embedded database is a database distributed as a library and linked directly into your application. The application is responsible for providing a service level API, sharding, backups, initiating consistency checking, initiation rollback, startup, shutdown, queries, etc. Applications become the container for the database and the manager of the database.

Architectures using embedded databases typically never expose a raw database abstraction at all. They have a service API and the services use the embedded database library call transparently behind the scene. Often an embedded database will provide multiple access types, like indexed access for key-value uses and btrees for range queries and cursors.

BerkelyDB is one well known example of an embedded database, SQLite is another, the file system is perhaps the most commonly used database, and there have been many many other btree libraries in common use. I've used C-tree on several projects. In a battle of old versus new, a user named IM46 compared Leveldb to BerkelyDB and found that LevelDB solidly outperforms Berkeley DB for larger databases.

Programmers usually thought doing this stuff was easy, wrote their own failed on-disk btree library (raises hand), and then look around for a proven product. It's only relatively recently the databases have gone up market and included a network layer and higher level services.

Building a hybrid application/database architecture is still a very viable option when you want everything to be just so. If you are going to load balance requests across sharded application servers anyway, using a heavy weight external database infrastructure may not be necessary.

The LevelDB mailing list started off very active and has died down a bit, but is still nicely active and informative. Here are some excellent FAQish tips, performance suggestions, and porting issues extracted from the list:

7 Comments |

Permalink |

Product,

databases

Tuesday

Aug092011

Who's Hiring?

Everything is sexier in the cloud. Box is hiring operations engineers and infrastructure automation engineers to help us revolutionize the way businesses collaborate. Please apply here.
BetterWorks is hiring a PHP Software Engineer in Los Angeles to help make enterprise software be as beautiful and usable as an Apple product. Please apply here.

Fun and Informative Events

NoSQL Now! is a new conference covering the dynamic field of NoSQL technologies. August 23-25 in San Jose. For more information please visit: http://www.NoSQLNow.com
Surge 2011: The Scalability and Performance Conference. Surge is a chance to identify emerging trends and meet the architects behind established technologies. Early Bird Registration.
Curious about Couchbase Server 2.0? Register for a series of weekly 30-minute webinars. Couchbase has announced CouchConf Berlin! Join us on November 7, 2011.

Cool Products and Services

Breaking the Cross-Site Data Barrier: Tungsten Multi-Master Replication for MySQL. Please register here.
New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
ScaleOut StateServer - Scale Out Your Server Farm Applications!
CloudSigma. Instantly scalable European cloud servers.
ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Tagged Architecture - Scaling to 100 Million Users, 1000 Servers, and 5 Billion Page Views

Monday, August 8, 2011 at 9:20AM

This is a guest post by Johann Schleier-Smith, CTO & co-founder, Tagged.

Five snapshots on how Tagged scaled to more than 1,000 servers

Since 2004, Tagged has grown from a tiny social experiment to one of the largest social networks, delivering five billion pages per month to many millions of members who visit to meet and socialize with new people. One step at a time, this evolution forced us to evolve our architecture, eventually arriving at an enormously capable platform.

V1: PHP webapp, 100k users, 15 servers, 2004

Tagged was born in the rapid-prototyping culture of an incubator that usually launched two new concepts each year in search of the big winner...

6 Comments |

Permalink |

Example

Friday

Aug052011

Stuff The Internet Says On Scalability For August 5, 2011

Friday, August 5, 2011 at 8:52AM

Submitted for your beginning of the end of summer scaling pleasure:

Google Uses About 900,000 Servers; eBay deploys 100TB of flash storage
The cloud isn't for closers. Another gaming startup pulls back from the cloud by Derrick Harris. Digital Chocolate is following the Zynga strategy of moving games into higher performing datacenter infrastructure once it becomes popular enough in the cloud to justify the primo stuff. We talked about this strategy in Zynga's Z Cloud - Scale Fast Or Fail Fast By Merging Private And Public Clouds. An architectural approach made all the more sensible with Amazon's new AWS Direct Connect service, which enables lower latency and higher bandwidth services by skipping the Internet and connecting directly to the AWS network. AWS Direct Connect FAQs. Amazon Virtual Private Cloud.
Quotes that are quotable:
- @Werner : "If You Are Slow, You Can't Grow" - Peecho Architecture - scalability on a shoestring http://wv.ly/n4fpPC #aws #highscalability #peecho
- @tcolar : Using neo4j .... After so many years of RDMS it's almost feels wrong having all that freedom to have an intuitive model. - almost !
- @robertmclaws : Am I wrong in my opinion that #nosql is just an excuse for script kiddies not to learn how to build scalable systems? Or to not pay 4 MSSQL?
- @b_erb : Quote of the day: "All computers wait at the same speed" – Dr. Thomas E. Bell

For much more Stuff the Internet has to say, please keep on reading on...

3 Comments |

Permalink |

hot links

Thursday

Aug042011

Jim Starkey is Creating a Brave New World by Rethinking Databases for the Cloud

Thursday, August 4, 2011 at 9:11AM

Jim Starkey, founder of NuoDB, in this thread on the Cloud Computing group, delivers a masterful post on why he thinks the relational model is the best overall compromise amongst the different options, why NewSQL can free itself from the limitations of legacy SQL architectures, and how this creates a brave new lock free world....

I'll [Jim Starkey] go into more detail later in the post for those who care, but the executive summary goes like this: Network latency is relatively high and human attention span is relatively low. So human facing computer systems have to perform their work in a small number of trips between the client and the database server. But the human condition leads inexorably to data complexity. There are really only two strategies to manage this problem. One is to use coarse granularity storage, glombing together related data into a single blob and letting intelligence on the client make sense of it. The other is storing fine granularity data on the server and using intelligence on the server to aggregate data to be returned to the client.

NoSQL uses the former for a variety of reasons...

3 Comments |

Permalink |