@florind : I just realized that the Santa story is a classical scalability myth.
@juokaz : doesn't always use dating sites, but when he does, he finds out about them on High Scalability http://bit.ly/xYfBmq. True story
@niclashulting : The Yahoo! homepage is updated 45,000 times every five minutes." A content strategy is vital.
Google’s Data Center Engineer Shares Secrets of ‘Warehouse’ Computing. Cade Metz interviews Google's guru of the datacenter, Luiz André Barroso: delivering good Internet service requires designing the software and hardware of entire datacenter to work together as one computer; split application pieces across an array of computers; modesty is key, select modest machines with modest processors and spread applications as thin as possible; wimpy cores won't process work fast enough to be useful, that's too thin.
Hackerspace Global Grid. You know the nation state is in trouble when hackers are proposing to create their own space program and satellite network
Brewster Kahle - Universal Access to All Knowledge - The Long Now. The Internet Archive snapshots one copy of every web page every 2 months. In January 1996 you could go to AltaVista and literally look at the web, it was 30 million pages, about the size of two coke machines.
Need long lasting persistence? Is 2000 years enough? Take a look at the Rosetta Project. They've figured out how to micro-etch text on to a nickle disk using a 10 micron wide eximer beam. It's not digital, but human readable text, as long as you have a 500x microscope hanging around.
Your Ideal Performance: Consistency Tradeoff. Paul Cannon with a clear explanation on different tuning strategies for N (replication), W (write nodes), R (read nodes). Priority = no data loss: N=5, W=3, R=3. Since W+R > N any node set chosen for reading will always intersect with any node set chosen for writing, and so Abby’s data is guaranteed to be consistent- even if she loses up to two nodes within a replication set. Priority = speed + low cost: N=W=R=1. Priority = many nodes + fast consistency + high consistency: N=3 and R=W=1. Sound impossible? Read up on the magic of Probabilistically Bounded Staleness.
Mark Atwood - A Modest Proposal for a heretical Key Value Store. Proposes a new KV store that works on real hardware, uses fast disk streaming IO, and can use random writes. Such a system would have no networking, no REST, no JSON. It should be implemented in the kernel with about six system calls, and a buffer mediated API. It should have simple string based hierarchical name spaces. It would store binary objects. It would have some simple access control and have mutable objects. I must admit I did not get the joke until it was explained. My cheeks are red with shame.
Jeff Darcy on Scaling Filesystems vs. Other Things. I'm not exactly sure what this thread is about, but it was interesting. It started off with Block devices are the wrong place scale and do HA. It’s always expensive (NetApp), unreliable (SPOF), or administratively complex (Gluster) and went somewhere from there.
We need a more efficent SQS. Everything is fine with HTTP until usage based billing makes you question your most basic assumptions as a programmer: HTTP everywhere may not be such a good idea. The HTTP overhead for SQS is 4x, which increases bandwith costs and decreases performance.
Facebook tells the gripping story of The Life of a Typeahead Query. A great architecture breakdown. Many technical decisions in search boil down to a trilemma among performance, recall, and relevance. Our typeahead is perhaps unusual in our high priority on performance; spending more than 100 msec to retrieve a result will cause the typeahead to "stutter," leading to a bad user experience that is almost impossible to compensate for with quality.
Along the same lines the Google Bots now aren't so interesting looking either: Google Bot Is Still the enemy - We really appreciate that the Crawl team helps us prove that we could serve 8.6M users a day with our product, (and against unique page requests no less) but it would be nice if they could do it once a month rather than once a day.