Friday
Nov212014
Stuff The Internet Says On Scalability For November 21st, 2014

Hey, it's HighScalability time:
- 80 million: bacteria xferred in a juicy kiss;
- Quotable Quotes:
- James Hamilton: Every day, AWS adds enough new server capacity to support all of Amazon's global infrastrucrture when it was a $7B annual revenue enterprise.
- @iglazer: What is the test that could most destroy your business model? Test for that. @adrianco #defragcon
- @zhilvis: Prefer decoupling over duplication. Coupling will kill you before duplication does by @ICooper #buildstufflt
- @jmbroad: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." ~George Box
- @RichardWarburto: Optimisation maybe premature but measurement isn't.
- @joeerl: Hell hath no version numbers - the great ones saw no need for version numbers - they used port numbers instead. See, for example, RFC 821,
- JustCallMeBen: tldr: queues only help to flatten out burst load. Make sure your maintained throughput is high enough.
- @rolandkuhn: «the event log is a database of the past, not just of the present» — @jboner at #reactconf
- @ChiefScientist: CRUD is dead. -- @jboner #reactconf
- @fdmts: 30T of flash disks cabled up and online. Thanks @scalableinfo!
- monocasa: Immutable, statically linked, minimal system images running micro services on top of a hypervisor is a very old concept too. This is basically the direction IBM went in the 60's with their hypervisors and they haven't looked back.
- Kiril Savino: Scaling is the process of decoupling load from latency.
- Perhaps they were controlled by a master AI? Google and Stanford Built Similar Neural Networks Without Knowing It: Neural networks can be plugged into one another in a very natural way. So we simply take a convolutional neural network, which understands the content of images, and then we take a recurrent neural network, which is very good at processing language, and we plug one into the other. They speak to each other—they can take an image and describe it in a sentence.
- You know how you never really believed the view in MVC was ever really separate? Now this is MVC. WatchKit apps run on the iPhone as an extension, only the UI component runs on the watch. XWindows would be so proud.
- Shopify shows how they Build an Internal Cloud with Docker and CoreOS: Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.
- Machine learning isn't just about creating humavoire AIs. It's a technology, like electricity, that will transform everything it affixes with its cyclops gaze. Here's a seemingly mundane example from Google, as discussed on the Green (Low Carbon) Data Center Blog. Google has turned inward, applying Machine Learning to its data center fleet. The result: Google achieved from 8% to 25% reduction in its energy used to cool the data center with an average of 15%. Who wouldn’t be excited to save an average of 15% on their cooling energy costs by providing new settings to run the mechanical plant? < And this is how the world will keep those productivity increases reaching skyward.
- Does anyone say "I love my water service"? Or "I love my garbage service"? Then why would anyone say "I love Facebook"? That's when you've arrived. When you are so much a part of the way things are that people don't even think of loving them or not. They just are. The Fall of Facebook.
- How Nike thinks about app development: Lots of micro services: Nike's plan: Build a series of services that do little things like checkout and reading data and then bring them together into larger apps that'll be easier to tweak in the future.
- Some slides and videos from React San Francisco 2014 are now available.
- Cross platform development ssucks, hard, but here's something interesting. Google Inbox shares 70% of its code across Android, iOS, and the Web using J2ObjC, which converts Android Java code to iOS-ready Objective-C code. How Google Inbox shares 70% of its code across Android, iOS, and the Web.
- Dominic Umbeer with a nice set of notes on GOTO Berlin 2014 Day 1 and Day 2.
- If I said a post on Hacker News had 892+ comments, what would be the topic? Docker? iOS vs Android? Nope. How about .NET. Microsoft takes .NET open source and cross-platform. Some very good comments, but is it too late? .NET/C#/Visual Studio is an excellent platform, so maybe not. But at least now there's a chance.
- Networking is still the bottleneck and Intel wants to pop the top off. Omni-Path architecture, will come to market next year, offering 100 Gb/sec links on switches that are denser and zippier than InfiniBand gear.
- This. trhway: their schematics [Fabric, the next-generation Facebook data center network] of datacenter reminds about schematics of a big server 15 years ago. Server racks instead of CPU-boards. "The datacenter is the computer."
- Marc Gravell: one thing that I've learned over and over again is: at the application level, sure: do what you want - you're not going to upset the runtime / etc - but at the library level (in these areas, and others) you often need to do quite a bit of work to minimize CPU and memory (allocation / collection) overhead.
- How a Memory Is Made: On the other hand, he said, these experiments are “limited, because in the real world, real memory is not about single strong memories.” Rather, said Silva, we remember events as “strings” of individual sensory memories.
- Surprising performance boost by pinning single threaded application to a core? No.
- Autoscaling, welcome to Google Compute Engine: Autoscaler can respond to a number of different metrics such as CPU load, QPS on a HTTP Load Balancer and metrics...Autoscaler performs well even in unexpected scenarios such as sudden traffic spikes...an application could scale from zero to handling over 1.5 million requests per second using Autoscaler.
- How beautiful...Memex #001 Final. Is there a science fiction book where Vannevar Bush was able to realize his vision? That would be interesting.
- A lost art, but thus stuff really makes a difference. Coding for Performance: Data alignment and structures: This article collects the general knowledge and Best-Known-Methods (BKMs) for aligning of data within structures in order to achieve optimal performance.
- If you like the JVM tool chain, but not the Java language, then consider Clojure at Scale: Why Python Just Wasn’t Enough for AppsFlyer: 2 Billion events per day...We started to encounter issues like one of the critical Python processes taking too long to digest the incoming messages...We’ve been toying around with the idea of introducing Functional Programming into the company for some time...the entire system is based on micro-services...Clojure provides its own approach to concurrency and it might take some time to adjust to it...This is a huge advantage: coding is more focused on the logic itself, rather than the plumbing around locks...We experienced a significant performance boost when we moved AppsFlyer to Clojure. In addition, using functional programming allows us to have a really small code base with only a few hundred lines of code for each service. The effects of working in Clojure dramatically speed up the development time and allow us to create a new service in days.
- When might you want to replace TCP? When you discover Netlix is chewing up 9.5% of upstream traffic on the North American Internet with ACKs. Since connections are usually asymmetric, meaning upstream connections often suck, relatively speaking, ACK drops on the upstream can cause throttling and degradation on the downstream. Replace with what? A UDP based protocol that doesn't use ACKs.
- Here's how Pinterest built Pinterest News. The problem: rank millions of events a day and use that to construct a feed for each individual Pinner. The process: decide if creating the service on multiple platforms for 10s of millions of users has the required ROI; decide it doesn't need to be real-time; build out an infrastructure that could scale to 10 percent of users rather than 100 percent; initially build out the feature on iOS. They used two internal services on the backend: Zen and PinLater. They built a queuing system on top of Redis.
- ScaleScale show there's a lot more to DNS than a simple lookup. A lot more. Global Routing with Anycasted DNS: It’s common for us to do something like: pick a region (e.g. US-WEST or US-EAST), then within the region, send 95% of traffic to colo and 5% to AWS, and make that sticky so most of the time the same 5% of users to go AWS to ensure good cache locality, and if colo infrastructure gets overloaded, start shifting weight so more traffic goes to AWS with auto-scaling enabled, and on and on. There’s a lot going on in our stack. On the delivery side, we’re touching stuff from the hardware/NIC level (crazy packet filtering), doing deep traffic engineering in BGP, leveraging low level kernel features to get as precise as routing DNS queries to specific cores to maximize cache locality, and hitting a totally custom written nameserver that executes complex routing algorithms for every single request. At a higher level, what we’ve built is a big globally distributed real-time system, and we’ve tried to use the right tools for the right jobs.
- Here's LinkedIn Operating Apache Samza at Scale: Apache Samza is a framework that gives a developer the tools they need to process messages at an incredibly high rate of speed while still maintaining fault tolerance. It allows us to perform some interesting near-line calculations on our data without major architectural changes. Samza has the ability to retrieve messages from Kafka, process them in some way, then output the results back to Kafka.
- Murat with more of his patent pending paper summaries. Paper Summary: Calvin, Distributed transactions for database systems. Paper Summary: Granola, Low overhead distributed transaction coordination.
- Adventures in Encodings: Ideally, we could have small ziplists linked together to allow for quick single-ziplist operations while still not limiting the number of elements we can store in an efficient, pointer-free data structure.
-
The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox: the data management community has focused heavily on the design of systems to support training complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the deployment and serving of models at scale.
Reader Comments (2)
I'm not sure that James Hamilton quote is that impressive, that "Every day, AWS adds enough new server capacity to support all of Amazon's global infrastructure when it was a $7B annual revenue enterprise."
Looks to me that Amazon made $7B/year in revenue back in 2004. I'm not sure of their exact server count then, but it probably was on the order of a few hundred servers, maybe a couple thousand cores. That would suggest that adding a 100k or so machines a year to AWS would be sufficient to make that statement true?
That jibes roughly with estimates of how many machines are in AWS too, so it's probably about right. But, in any case, my point is that the stat seems impressive on first glance, but really is comparing growth of AWS now with server counts in Amazon's retail business way back a decade ago, which is an odd thing to compare and not clear what it means. I suppose James' point is that they add enough servers daily to AWS to support a 2004-sized Amazon.com business? So, 365 of those sized businesses per year?
Certainly there's a gee isn't this big aspect to it without actually understanding what big means. But operationally it's impressive. Presumably it took some time for Amazon to build up its infrastructure to that $7B/year mark. Now the process is such that it can be done not just in a day, but every day. That implies a lot about efficiency, organization, and core competencies.