High Scalability -

Strategy

Wednesday

Apr232014

Here's a 1300 Year Old Solution to Resilience - Rebuild, Rebuild, Rebuild

Wednesday, April 23, 2014 at 8:56AM

How is it possible that a wooden Shinto shrine built in the 7th century is still standing? The answer depends on how you answer this philosophical head scratcher: With nearly every cell in your body continually being replaced, are you still the same person?

The Ise Grand Shrine has been in continuous existence for over 1300 years because every twenty years an exact replica has been rebuilt on an adjacent footprint. The former temple is then dismantled.

Now that's resilience. If you want something to last make it a living part of a culture. It's not so much the building that is remade, what is rebuilt and passed down from generation to generation is the meme that the shrine is important and worth preserving. The rest is an unfolding of that imperative.

You can see echoes of this same process in Open Source projects like Linux and the libraries and frameworks that get themselves reconstructed in each new environment.

The patterns of recurrence in software are the result of Darwinian selection process that keeps simplicity and value alive in human minds.

A blog post on Persuing Wabi has some fabulous photos of the shrine along with a brief description of why it's the way it is:

2 Comments |

Permalink |

Strategy

Monday

Apr212014

This is why Microsoft won. And why they lost.

Monday, April 21, 2014 at 8:56AM

My favorite kind of histories are those told from an insider's perspective. The story of Richard the Lionheart is full of great battles and dynastic intrigue. The story of one of his soldiers, not so much. Yet the soldiers' story, as someone who has experienced the real consequences of decisions made and actions taken, is more revealing.

We get such a history in Chat Wars, a wonderful article written by David Auerbach, who in 1998 worked at Microsoft on MSN Messenger Service, Microsoft’s instant messaging app (for a related story see The Rise and Fall of AIM, the Breakthrough AOL Never Wanted).

It's as if Herodotus visited Microsoft and wrote down his experiences. It has that same sort of conversational tone, insightful on-the-ground observations, and facts no outsider might ever believe.

Much of the article is a play-by-play account of the cat and mouse game David plays changing Messenger to track AOL's Instant Messenger protocol changes. AOL repeatedly tried to make it so Messenger could not interoperate with AIM and each time Messenger countered with changes of their own. AOL finally won the game with a radical and unexpected play. A great read for programmers.

For a general audience David's explanation of how and why Microsoft came to dominance and why they lost that dominance is most revealing. It stares directly into the heart of the entropy that brings everything down in the end.

Why Microsoft Won

7 Comments |

Permalink |

Strategy

Wednesday

Apr162014

Six Lessons Learned the Hard Way About Scaling a Million User System

Wednesday, April 16, 2014 at 8:56AM

Ever come to a point where you feel you've learned enough to share your experiences in the hopes of helping others traveling the same road? That's what Martin Kleppmann has done in an lovingly written Six things I wish we had known about scaling, an article well worth your time.

It's not advice about scaling a Twitter, but of building a million user system, which is the sweet spot for a lot of projects. His conclusion rings true:

Building scalable systems is not all sexy roflscale fun. It’s a lot of plumbing and yak shaving. A lot of hacking together tools that really ought to exist already, but all the open source solutions out there are too bad (and yours ends up bad too, but at least it solves your particular problem).

Here's a gloss on the six lessons (plus a bonus lesson):

General Chicken |

1 Comment |

Permalink |

Strategy

Monday

Apr072014

Google Finds: Centralized Control, Distributed Data Architectures Work Better than Fully Decentralized Architectures

Monday, April 7, 2014 at 8:56AM

For years a war has been fought in the software architecture trenches between the ideal of decentralized services and the power and practicality of centralized services. Centralized architectures, at least at the management and control plane level, are winning. And Google not only agrees, they are enthusiastic adopters of this model, even in places you don't think it should work.

Here's an excerpt from Google Lifts Veil On “Andromeda” Virtual Networking, an excellent article by Timothy Morgan, that includes a money quote from Amin Vahdat, distinguished engineer and technical lead for networking at Google:

Like many of the massive services that Google has created, the Andromeda network has centralized control. By the way, so did the Google File System and the MapReduce scheduler that gave rise to Hadoop when it was mimicked, so did the BigTable NoSQL data store that has spawned a number of quasi-clones, and even the B4 WAN and the Spanner distributed file system that have yet to be cloned.

"What we have seen is that a logically centralized, hierarchical control plane with a peer-to-peer data plane beats full decentralization,” explained Vahdat in his keynote. “All of these flew in the face of conventional wisdom,” he continued, referring to all of those projects above, and added that everyone was shocked back in 2002 that Google would, for instance, build a large-scale storage system like GFS with centralized control. “We are actually pretty confident in the design pattern at this point. We can build a fundamentally more efficient system by prudently leveraging centralization rather than trying to manage things in a peer-to-peer, decentralized manner.”

The context of the article is Google's impressive home brew SDN (software defined network) system that uses a centralized control architecture instead of the Internet's decentralized Autonomous System model, which thinks of the Internet as individual islands that connect using routing protocols.

SDN completely changes that model as explained by Greg Ferro:

9 Comments |

Permalink |

Strategy

Tuesday

Apr012014

The Mullet Cloud Selection Pattern

Tuesday, April 1, 2014 at 9:05AM

In a recent thread on Hacker News one of the commenters mentioned that they use Digital Ocean for personal stuff, but use AWS for business.

This DO for personal and AWS for business split has become popular enough that we can now give it a name: the Mullet Cloud Selection Pattern - business on the front and party on the back.

Providers like DO are cheap and the lightweight composable container model has an aesthetic appeal to developers. Even though it seems like much of the VM infrastructure has to be reinvented for containers, the industry often follows the lead of developer preference.

The mullet is dead. Long live the mullet! Developers are ever restless, always eager to move onto something new.

Strategy: Cache Stored Procedure Results

Thursday, March 27, 2014 at 8:56AM

Caching is not new of course, but I don't think I've heard of caching store procedure results before. It's like memoization in the database. Brent Ozar covers this idea in How to Cache Stored Procedure Results.

The benefits are the usual for doing work in the database, it doesn't take per developer per app work, just code it once in the stored proc and it works for everyone, everywhere, for all of time. The disadvantage is the usual as well, it adds extra load to a probably already busy database, so it should only be applied to heavy computations.

Brent positions this strategy as an emergency bandaid to apply when you need to take pressure off a database now. Developers can then work on moving the cache off the database and into its own tier. Interesting idea. And as the comments show the implementation is never as simple as it seems.

3 Comments |

Permalink |

Strategy

Wednesday

Mar262014

Oculus Causes a Rift, but the Facebook Deal Will Avoid a Scaling Crisis for Virtual Reality

Wednesday, March 26, 2014 at 10:25AM

Facebook has been teasing us. While many of their recent acquisitions have been surprising, shocking is the only word adequately describing Facebook's 5 day whirlwind acquisition of Oculus, immersive virtual reality visionaries, for a now paltry sounding $2 billion.

The backlash is a pandemic, jumping across social networks with the speed only a meme powered by the directly unaffected can generate.

For more than 30 years VR has been the dream burning in the heart of every science fiction fan. Now that this future might finally be here, Facebook’s ownage makes it seem like a wonderful and hopeful timeline has been choked off, killing the Metaverse before it even had a chance to begin.

For the many who voted for an open future with their Kickstarter dollars, there’s a deep and personal sense of betrayal, despite Facebook’s promise to leave Oculus alone. The intensity of the reaction is because Oculus matters to people. It's new, it's different, it creates a better future. It's important in a way sending messages or taking pictures never can be.

Let’s use Andy Baio as a sane example of the loyal opposition:

I can palpably feel the oxygen sucked out of the room. Infinite bright possibilities from indie game devs, quietly shelved.

Jigsus backs this up on reddit:

Within the last hour EVERY friend I know was developing a rift game has canceled. That's around 11 projects just gone.

So WTF? Why in this reality would Oculus sell to Facebook? At first blush it makes little sense. A more mismatched couple could hardly be created in a fantasy immersive 3D game.

Yet there's a reason and a method here that's not only about about a sweet payoff. When you look deeper at the Facebook deal it’s about Oculus' existential need to scale. And scale is something Facebook is very, very good at.

Let’s explore why this deal makes more sense for Oculus than it might first appear and the central role scalability plays in the decision making...

5 Comments |

Permalink |

Strategy

Wednesday

Mar192014

Strategy: Three Techniques to Survive Traffic Surges by Quickly Scaling Your Site

Wednesday, March 19, 2014 at 8:56AM

Matthew Might, as a first responder to a surprise traffic surge on his inexpensive linode hosted blog, took emergency steps that you might find useful in a similar situation:

Find the bottleneck. Reloading the page in firebug showed the first page took 24 seconds to load and after that everything else loaded quickly. In retrospect this burst meant the site was thread limited as the CPU was idle.
Cut image sizes in half with a shell script using ImageMagick's convert. Load time is now 12 seconds.
Turn dynamic content into static content using a static index.html file copied using the browser's "view source" feature. Load time is now 6 seconds.
Added threads to the Apache configuration file. Load time is now 2 seconds. Crises averted.

Because of this quick thinking and quick action the patient survived to serve pages another day.

And in fine post-mortem tradition some of the future changes are:

General Chicken |

10 Things You Should Know About Running MongoDB at Scale

Wednesday, March 5, 2014 at 8:56AM

Guest post by Asya Kamsky, Principal Solutions Architect at MongoDB.

This post outlines ten things you need to know for operating MongoDB at scale based on my experience working with MongoDB customers and open source users:

MongoDB requires DevOps, too. MongoDB is a database. Like any other data store, it requires capacity planning, tuning, monitoring, and maintenance. Just because it's easy to install and get started and it fits the developer paradigm more naturally than a relational database, don't assume that MongoDB doesn't need proper care and feeding. And just because it performs super-fast on a small sample dataset in development doesn't mean you can get away without having a good schema and indexing strategy, as well as the right hardware resources in production! But if you prepare well and understand the best practices, operating large MongoDB clusters can be boring instead of nerve-wracking.
Successful MongoDB users monitor everything and prepare for growth. Tracking current capacity and capacity planning are essential practices in any database system, and MongoDB is no different. You need to know how much work your cluster is currently capable of sustaining and what demands will be placed on it during times of highest use. If you don't notice growing load on your servers you'll eventually get caught without enough capacity. To monitor your MongoDB deployment, you can use MongoDB Management Service (MMS) to visualize your operations by viewing the opscounters (operation counters) chart:
The obstacles to scaling performance as your usage grows may not be what you'd expect. Having seen hundreds of users' deployments, the performance bottlenecks usually are (in this order):

Click to read more ...

12 Comments |

Permalink |