advertise
Wednesday
Apr232014

Here's a 1300 Year Old Solution to Resilience - Rebuild, Rebuild, Rebuild

How is it possible that a wooden Shinto shrine built in the 7th century is still standing? The answer depends on how you answer this philosophical head scratcher: With nearly every cell in your body continually being replaced, are you still the same person?

The Ise Grand Shrine has been in continuous existence for over 1300 years because every twenty years an exact replica has been rebuilt on an adjacent footprint. The former temple is then dismantled.

Now that's resilience. If you want something to last make it a living part of a culture. It's not so much the building that is remade, what is rebuilt and passed down from generation to generation is the meme that the shrine is important and worth preserving. The rest is an unfolding of that imperative.

You can see echoes of this same process in Open Source projects like Linux and the libraries and frameworks that get themselves reconstructed in each new environment.

The patterns of recurrence in software are the result of Darwinian selection process that keeps simplicity and value alive in human minds. 

A blog post on Persuing Wabi has some fabulous photos of the shrine along with a brief description of why it's the way it is:

Click to read more ...

Monday
Apr212014

This is why Microsoft won. And why they lost.

My favorite kind of histories are those told from an insider's perspective. The story of Richard the Lionheart is full of great battles and dynastic intrigue. The story of one of his soldiers, not so much. Yet the soldiers' story, as someone who has experienced the real consequences of decisions made and actions taken, is more revealing.

We get such a history in Chat Wars, a wonderful article written by David Auerbach, who in 1998 worked at Microsoft on MSN Messenger Service, Microsoft’s instant messaging app (for a related story see The Rise and Fall of AIM, the Breakthrough AOL Never Wanted).

It's as if Herodotus visited Microsoft and wrote down his experiences. It has that same sort of conversational tone, insightful on-the-ground observations, and facts no outsider might ever believe.

Much of the article is a play-by-play account of the cat and mouse game David plays changing Messenger to track AOL's Instant Messenger protocol changes. AOL repeatedly tried to make it so Messenger could not interoperate with AIM and each time Messenger countered with changes of their own. AOL finally won the game with a radical and unexpected play. A great read for programmers. 

For a general audience David's explanation of how and why Microsoft came to dominance and why they lost that dominance is most revealing. It stares directly into the heart of the entropy that brings everything down in the end.

Why Microsoft Won 

Click to read more ...

Friday
Apr182014

Stuff The Internet Says On Scalability For April 18th, 2014

Hey, it's HighScalability time:


Scaling to the top of "Bun Mountain" in Hong Kong
  • 44 trillion gigabytes: size of the digital universe by 2020; 6 Times: we have six times more "stuff" than the generation before us.
  • Quotable Quotes:
    • Facebook: Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB.
    • @windley: The problem with the Internet of Things is right now it’s more like the CompuServe of Things
    • Chip Overclock: If you want to eventually generate revenue, you must first optimize for developer productivity; everything else is negotiable.
    • @igrigorik: if you are gzipping your images.. you're doing it wrong: http://bit.ly/1pb8JhK  - check your server configs! and your CDN... :)
    • Seth Lloyd: The arrow of time is an arrow of increasing correlations.
    • @kitmacgillivray: When will Google enable / require all android apps to have full deep search integration so all content is visible to the engine?
    • Neal Ford: Yesterday's best practices become tomorrow's anti-patterns.
    • Rüdiger Möller: just made a quick sum up of concurrency related issues we had in a 7-15 developer team soft realtime application (clustered inmemory data grid + GUI front end). 95% of threads created are not to improve throughput but to avoid blocking (e.g. IO, don't block messaging receiver thread, avoid blocking the event thread in a GUI app, ..).
    • Ansible: When done correctly, automation tools are giving them time back -- and helping out of this problem of needing to wear many hats.

  • Amazon and Google are in an epic battle to dominate the cloud—and Amazon may already have won: If Amazon’s entire public cloud were a single computer, it would have five times more capacity than those of its next biggest 14 competitors—including Google—combined. Every day, one-third of people who use the internet visit a site or use a service running on Amazon’s cloud.

  • What books would you select to help sustain or rebuild civilization? Here's Neal Stephenson’s list. He was too shy to do so, but I would certainly add his books to my list. 

  • 12-Step Program for Scaling Web Applications on PostgreSQL. Great write up of lessons learned by wanelo.com that they used to sustain 10s of thousand of concurrent users at 3K req/sec. Highly detailed. 1) Add more cache, 2) Optimize SQL, 3) Upgrade Hardware and RAM, 4) Scale reads by replication, 5) Use more appropriate tools, 6) Move write heave table out, 7) Tune Postures and your File System, 8) Buffer and serialize frequent updates, 9) Optimize DB Schema, 10) Shard busy tables vertically, 11) Wrap busy tables with services, 12) Shard services backend horizontally.

  • Is this a soap opera? It turns out Google and not Facebook is buying Titan Aerospace, makers of high flying solar powered drones. Google's fiber network would make a great backbone network for a drone and loon powered wireless network, completely routing around the telcos.

  • Building Carousel, Part I: How we made our networked mobile app feel fast and local. Dropbox changes to an eventualy consistent / optimistic replication syncing model to make their app "feel fast, responsive, and local, even though the data on which users operate is ultimately backed by the Dropbox servers."  Lesson: don’t block the user! Instead of requiring changes to be propagated to the server synchronously. Local and remote photos are treated as equivalent objects. Actions take effect immediately locally then work there way out globally. Changes are used to stay consistent. A fast hash is used to tell what photos have not been backed up to dropbox. Remote operations happen asynchronously.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Apr162014

Six Lessons Learned the Hard Way About Scaling a Million User System 

Ever come to a point where you feel you've learned enough to share your experiences in the hopes of helping others traveling the same road? That's what Martin Kleppmann has done in an lovingly written Six things I wish we had known about scaling, an article well worth your time.

It's not advice about scaling a Twitter, but of building a million user system, which is the sweet spot for a lot of projects. His conclusion rings true:

Building scalable systems is not all sexy roflscale fun. It’s a lot of plumbing and yak shaving. A lot of hacking together tools that really ought to exist already, but all the open source solutions out there are too bad (and yours ends up bad too, but at least it solves your particular problem).

Here's a gloss on the six lessons (plus a bonus lesson):

Click to read more ...

Tuesday
Apr152014

Sponsored Post: Apple, HelloSign, CrowdStrike, Gengo, Layer, The Factory, Airseed, ScaleOut Software, Couchbase, Tokutek, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngine, Site24x7  

Who's Hiring?

  • Apple is hiring a Senior Engineer in their Mobile Services team. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here

  • Apple is hiring a Software Engineer in their Messaging Services team. We build the cloud systems that power some of the busiest applications in the world, including iMessage, FaceTime and Apple Push Notifications. You'll have the opportunity to explore a wide range of technologies, developing the server software that is driving the future of messaging and mobile services. Please apply here.

  • Senior Engineer wanted for large scale, security oriented distributed systems application that offers career growth and independent work environment. Use your talents for good instead of getting people to click ads at CrowdStrike. Please apply here.

  • Ops Engineer - Are you passionate about scaling and automating cloud-based websites? Love Puppet and deployment scripts? Want to take advantage of both your sys-admin and DevOps skills? Join HelloSign as our second Ops Engineer and help us scale as we grow! Apply at http://www.hellosign.com/info/jobs

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • We’re looking for talented and driven engineers who are passionate about scalability and reliability to help us build Layer, the open communications layer for the Internet. Layer enables app developers to easily build secure, scalable messaging, voice and video features into any app. For more information and our full list of openings, please visit: https://layer.com/jobs

  • The Factory is seeking a collaborative, talented and entrepreneurial Sr. Front End Engineer. Backed by the co-founder of Skype+Rdio and led by Rdio's founding team, we're changing the way products are built and companies get launched (think incubator/accelerator without the nagging outside influence or funding/timing constraints).  Our goal is to build a platform to better launch startups and opensource what we do along the way http://www.thefactory.com/pdfs/sr_frontend.pdf

  • Airseed -- a Google Ventures backed, developer platform that powers single sign-on authentication, rich consumer data, and analytics -- is hiring lead backend and fullstack engineers (employees #4, 5, 6). More info here: https://www.airseed.com/jobs

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Safeguard Commercial Success with a Strategic Monitoring Approach. Does your enterprise have adequate monitoring strategies in place to deal with tech challenges that can otherwise impact revenue and damage your corporate brand? Register now to learn why your business must elevate the strategic importance of technology monitoring and learn how to safeguard your business

  • The Biggest MongoDB Event Ever Is On. Will You Be There? Join us in New York City June 23-25 for MongoDB World! The conference lineup includes Amazon CTO Werner Vogels and Cloudera Co-Founder Mike Olson for keynote addresses.  You’ll walk away with everything you need to know to build and manage modern applications. Register before April 4 to take advantage of super early bird pricing.

  • April 3 Webinar: The BlueKai Playbook for Scaling to 10 Trillion Transactions a Month. As the industry’s largest online data exchange, BlueKai knows a thing or two about pushing the limits of scale. Find out how they are processing up to 10 trillion transactions per month from Vice President of Data Delivery, Ted Wallace. Register today.

Cool Products and Services

  • GigOM Interviews Aerospike at Structure Data 2014 on Application Scalability. Aerospike Technical Marketing Director, Young Paik explains how you can add rocket fuel to your big data application by running the Aerospike database on top of Hadoop for lightning fast user-profile lookups. Watch this interview.

  • Couchbase: NoSQL and the Hybrid Cloud. If a NoSQL database can be deployed on-premise or it can be deployed in the cloud, why can’t it be deployed on-premise and in the cloud? It can, and it should. Read how in this article converting three use cases for hybrid cloud deployments of NoSQL databases: master / slave, cloud burst, and multi-master.

  • Do Continuous MapReduce on Live Data? ScaleOut Software's hServer was built to let you hold your daily business data in-memory, update it as it changes, and concurrently run continuous MapReduce tasks on it to analyze it in real-time. We call this "stateful" analysis. To learn more check out hServer.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Apr142014

How do you even do anything without using EBS?

In a recent thread on Hacker News discussing recent AWS price changes, seldo mentioned they use AWS for business, they just never use EBS on AWS. A good question was asked:

How do you even do anything without using EBS?

Amazon certainly makes using EBS the easiest path. And EBS has a better reliability record as of late, but it's still often recommended to not use EBS. This avoids a single point of failure at the cost of a lot of complexity, though as AWS uses EBS internally, not using EBS may not save you if you use other AWS services like RDS or ELB.

If you don't want to use EBS, it's hard to know where to even start. A dilemma to which Kevin Nuckolls gives a great answer:

Well, you break your services out onto stateless and stateful machines. After that, you make sure that each of your stateful services is resilient to individual node failure. I prefer to believe that if you can't roll your entire infrastructure over to new nodes monthly then you're unprepared for the eventual outage of a stateful service.

Most databases have replication but you need to make sure that the characteristics of how the database handles a node failure are well understood. Worst case you use EBS, put your state on it, snapshot it regularly, and ship those snapshots to another region because when EBS fails it fails hard.

Also, logs make every machine stateful. Use something like logstash to centralize that state.

If ELB is down in a given region then DNS failover to another region. Assuming you feel comfortable rolling your entire infrastructure monthly, have good images / configuration management, and have the state replicated in the backup region. That or sidestep ELB in your region to a team of stateless load balancers that terminate SSL.

Jeremy Edberg to a question about how to run databases without EBS, says:

By having good replication, either hand rolled or built in.

At Netflix we use Cassandra and store all data on local instance storage. We don't use EBS for databases.

Friday
Apr112014

Stuff The Internet Says On Scalability For April 11th, 2014

Hey, it's HighScalability time:

(Image: Daly and Newton/Getty Images)
DNA nanobots deliver drugs in living cockroaches which have as much compute power as a Commodore 64
  • 40,000: # of people it takes to colonize a star system; 600,000: servers vulnerable to heartbleed
  • Quotable Quotes:
    • @laurencetratt: High frequency traders paid $300M to reduce New York <-> Chicago network transfers from 17 -> 13ms. 
    • @talios: People read http://highscalability.com  for sexual arousal - jim webber #cm14
    • @viktorklang: LOL RT @giltene 2nd QOTD: @mjpt777 “Some people say ‘Thread affinity is for sissies.’Those people don’t make money.”
    • @pbailis: Reminder: eventual consistency is orthogonal to durability and data loss as long as you correctly resolve update conflicts.
    • @codinghorror: Converting a low-volume educational Discourse instance from Heroku at ~$152.50/month to Digital Ocean at $10/month.
    • @FrancescoC: Scary post on kids who can't tell the diff. between atomicity & eventual consistency architecting bitcoin exchanges 
    • @jboner: "Protocols are a lot harder to get right than APIs, and most people can't get APIs right" -  @daveathomas at #reactconf
    • @vitaliyk: “Redundancy is ambiguous because it seems like a waste if nothing unusual happens. Except that something unusual happens—usually.” @nntaleb
    • Blazes: Asynchrony * partial failure is hard.
    • David Rosenthal: I have long thought that the fundamental challenge facing system architects is to build systems that fail gradually, progressively, and slowly enough for remedial action to be effective, all the while emitting alarming noises to attract attention to impending collapse. 
    • Brian Wilson: Moral of the story: design for failure and buy the cheapest components you can. :-)

  • Just damn. DNA nanobots deliver drugs in living cockroaches: Levner and his colleagues at Bar Ilan University in Ramat-Gan, Israel, made the nanobots by exploiting the binding properties of DNA. When it meets a certain kind of protein, DNA unravels into two complementary strands. By creating particular sequences, the strands can be made to unravel on contact with specific molecules – say, those on a diseased cell. When the molecule unravels, out drops the package wrapped inside.

  • Remember those studies where a guerilla walks through the middle of a basketball game and most people don't notice? Attention blindness. 1000 eyeballs doesn't mean anything will be seen. That's human nature. Heartbleed -- another horrible, horrible, open-source FAIL.

  • Remember the Content Addressable Web? Your kids won't. The mobile web vs apps is another front on the battle between open and closed systems.

  • In Public Cloud Instance Pricing Wars - Detailed Context and Analysis Adrian Cockcroft takes a deep stab at making sense of the recent price cuts by Google, Amazon, and Microsoft. AWS users should migrate to the new m3, r3, c3 instances; AWS and Google instance prices are essentially the same for similar specs; Microsoft doesn't have the latest Intel CPUs and isn't pricing against like spec'ed machines;  IBM Softlayer pricing is still higher; Moore's law dictates price curves going forward.

  • Seth Lloyd: Quantum Machine Learning - QM algorithms are a win because they give exponential speedups on BigData problems. The mathematical structure of QM, because a wave can be at two places at once, is that the states of QM systems are in fact vectors in high dimensional vector spaces. The kind of transformations that happen when particles of light bounce of CDs, for example, are linear transformations on these high dimensional vector spaces. Quantum computing is the effort to exploit quantum systems to allow these linear transformations to perform the kind of calculations we want to perform. Or something like that. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Apr102014

Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers

We are not yet at the End of History for database theory as Peter Bailis and the Database Group at UC Berkeley continue to prove with a great companion blog post to their new paper: Scalable Atomic Visibility with RAMP Transactions. I like the approach of pairing a blog post with a paper. A paper almost by definition is formal, but a blog post can help put a paper in context and give it some heart.

From the abstract:

Databases can provide scalability by partitioning data across several servers. However, multi-partition, multi-operation transactional access is often expensive, employing coordination-intensive locking, validation, or scheduling mechanisms. Accordingly, many real-world systems avoid mechanisms that provide useful semantics for multi-partition operations. This leads to incorrect behavior for a large class of applications including secondary indexing, foreign key enforcement, and materialized view maintenance. In this work, we identify a new isolation model—Read Atomic (RA) isolation—that matches the requirements of these use cases by ensuring atomic visibility: either all or none of each transaction’s updates are observed by other transactions. We present algorithms for Read Atomic Multi-Partition (RAMP) transactions that enforce atomic visibility while offering excellent scalability, guaranteed commit despite partial failures (via synchronization independence), and minimized communication between servers (via partition independence). These RAMP transactions correctly mediate atomic visibility of updates and provide readers with snapshot access to database state by using limited multi-versioning and by allowing clients to independently resolve non-atomic reads. We demonstrate that, in contrast with existing algorithms, RAMP transactions incur limited overhead—even under high contention—and scale linearly to 100 servers.

What is RAMP?

Click to read more ...

Tuesday
Apr082014

Microservices - Not a free lunch!

This is a guest post by Benjamin Wootton, CTO of Contino, a London based consultancy specialising in applying DevOps and Continuous Delivery to software delivery projects. 

Microservices are a style of software architecture that involves delivering systems as a set of very small, granular, independent collaborating services.

Though they aren't a particularly new idea, Microservices seem to have exploded in popularity this year, with articles, conference tracks, and Twitter streams waxing lyrical about the benefits of building software systems in this style.

This popularity is partly off the back of trends such as Cloud, DevOps and Continuous Delivery coming together as enablers for this kind of approach, and partly off the back of great work at companies such as Netflix who have very visibly applied the pattern to great effect.

Let me say up front that I am a fan of the approach. Microservices architectures have lots of very real and significant benefits:

Click to read more ...

Monday
Apr072014

Google Finds: Centralized Control, Distributed Data Architectures Work Better than Fully Decentralized Architectures

For years a war has been fought in the software architecture trenches between the ideal of decentralized services and the power and practicality of centralized services. Centralized architectures, at least at the management and control plane level, are winning. And Google not only agrees, they are enthusiastic adopters of this model, even in places you don't think it should work.

Here's an excerpt from Google Lifts Veil On “Andromeda” Virtual Networking, an excellent article by Timothy Morgan, that includes a money quote from Amin Vahdat, distinguished engineer and technical lead for networking at Google:

Like many of the massive services that Google has created, the Andromeda network has centralized control. By the way, so did the Google File System and the MapReduce scheduler that gave rise to Hadoop when it was mimicked, so did the BigTable NoSQL data store that has spawned a number of quasi-clones, and even the B4 WAN and the Spanner distributed file system that have yet to be cloned.

 

"What we have seen is that a logically centralized, hierarchical control plane with a peer-to-peer data plane beats full decentralization,” explained Vahdat in his keynote. “All of these flew in the face of conventional wisdom,” he continued, referring to all of those projects above, and added that everyone was shocked back in 2002 that Google would, for instance, build a large-scale storage system like GFS with centralized control. “We are actually pretty confident in the design pattern at this point. We can build a fundamentally more efficient system by prudently leveraging centralization rather than trying to manage things in a peer-to-peer, decentralized manner.

The context of the article is Google's impressive home brew SDN (software defined network) system that uses a centralized control architecture instead of the Internet's decentralized Autonomous System model, which thinks of the Internet as individual islands that connect using routing protocols.

SDN completely changes that model as explained by Greg Ferro:

Click to read more ...