Stuff The Internet Says On Scalability For November 30th, 2018
Friday, November 30, 2018 at 8:59AM
HighScalability Team in hot links
Wake up! It's HighScalability time:
We all know the oliphant in the room this week (reinvent)
Do you like this sort of Stuff? Please support me on Patreon. I'd really appreciate it. Know anyone looking for a simple book explaining the cloud? Then please recommend my well reviewed (30 reviews on Amazon and 72 on Goodreads!) book: Explain the Cloud Like I'm 10. They'll love it and you'll be their hero forever.
- 8: successful Mars landings; $250,000: proposed price for Facebook Graph API; 33: countries where mobile internet is faster than WiFi; 1000s: Facebook cache poisoning; 8.2 million: US Nintendo Switch sales; 40+%: Rust users feel productive; 15 terabytes: monthly measurements of third-party web transparency tracking data; $133.20: total music sales by Imogen Heap on blockchain; 8.3 million: concurrent Fortnite players; 6.2 Billion: fuel costs saved by smart car drivers; 80: salad bags assembled per minute by smart machines, 2x the output of a worker; 1/10th: power used by ebike compared to Nissan Leaf; 100,000: new micro industries; 40MW: solar plant floats on water; 20%: car crashes reduced using Waze-fed AI platform; 14%: decline in Sillycon Valley median wage over 20 years; 36.7%: smartphone e-commerce sales; many: Google Earth datasets; 6 zeta tons: earth's mass;
- Quotable Quotes:
- What is Amazon's Amazon Quantum Ledger Database QLDB good for? Colm MacCárthaigh, Senior Principle Engineer at AWS, says in a tweet: "As a QLDB user for a few years now ... it is awesome, and game-changing. In EC2, we are big big fans of it ... and thrilled that we're opening it up to everyone!" How enigmatic. What's he talking about? Won't say. Directly. But—Dan Brown-like—he points to this talk as solving the mystery: AWS re:Invent 2018: Close Loops & Opening Minds: How to Take Control of Systems, Big & Small ARC337. It's an awesome talk. Well worth watching. Lot's of deep stuff not covered elsewhere.
- The talk is about creating simple and stable control systems. So presumably QLDB—a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority—is infrastructure for building control systems as large as the one used to control S3.
- What's a control plane? CloudFront—a distributed HTTP cache—is given as an example. The cache is the data plane. The control plane launches new sites, adding servers, directing user requests to the user, measuring latency, and so on. So control planes: manage the life cycle for resources; provision software, provision service configuration; provision user configuarion.
- Building high quality systems is about habit. Teams paying attention to deail. Building test cases. Worrying about what could go wrong.
- High quality designs are invented by diverse creative minds working a fearless environment; systematic reviews and mechanisms to share lessons; use well-work patterns where possible focus invention where it is truly needed; testing, testing, testing.
- Every design requires making tradeoffs. Order of priority: security, durability, availability, speed.
- Control planes are a lot of effort. You need to approach building them with the right level of investment and effort.
- What makes a stable control system? Some kind of measurement process to tell what state it's in. An algorithm that issues corrections to move to the desired state. An actuator to peform the actions. It's done in a closed loop—constantly measuring; constantly deciding; constantly actuating. Systems like this are tremendously sensitive to lag. These systems should be fast. The only kind of stable control system contains a PID (proportional–integral–derivative) controller, which is a mathematical component. If you understand PID you can quickly spot what's wrong with a control system. Hhe recommends the book Designing Distributed Control Systems.
- Pattern 1: Checksum all the things. A S3 outage was caused by a single bit corruption from one box with a bad network card. Because it was an open loop system the single bit error propagated everywhere. MD5 checksums are used everywhere now. Also, don't push configuration state around in YAML. A truncated YAML file is still a valid, but wrong file.
- Pattern 2: Cryptographic authentication everywhere. Make sure nothing malicious is inserted. Use HMAC encrypted signatures. Everything talking to everything else should have a strong sense of identity and authenticity. Baking a system based on credentials, where you can revoke credentials and rotate credentials, is great for operational health. It means you can create a system humans shouldn't have access to. You know machines are being managed by the system and not from someone's desk. You can also make sure a preproduction control plane can't talk to a production data plane.
- Pattern 3: Limit blast radius. Assume everything can fail and try to limit the damage it can do. Divide control planes horizontally. Regions don't know about each other. Compartmentalize in a vertical direction. User input from APIs are guarded by bulkheads, One example is a poison taster. Put input through as much of the processing the data plane will do before accepting a request.
- Pattern 4: Asynchronous coupling. Separate systems by using async calls. Put work on a queue and drive it asynchronous through a state machine. Makes the system much more deterministic. Workflows and queues can retry and won't cause as much amplification as big trees of sync calls. Cross-region replication is all async. One region won't impact another.
- Pattern 5: Closed feedback loops. If you're not measuring your system you have no hope of achieving stability.
- Pattern 6: push or pull? It's the wrong question. The better way to think about it is to consider the relative size of the fleets. You don't want big fleets connecting to small fleets. The big fleet will hammer the small fleet and it will never recover. Choose the direction that will prevent thundering herds.
- Pattern 7: Avoid cold starts and cold caches. Another pattern to prevent thundering herds. A control plane is small relative to S3, for example. Caches are bi-modal. They are fast when they have entries and slow when they are empty. Try not to have caches. Avoid them. If you need them self warm them before accepting requests. You can also serve stale cache entries (DNS resolvers should do this, for example).
- Pattern 8: Throttles. You need to be able to throttle incoming requests, at least until the system recovers. Throttling is also a form of smart prioritization. Load balancers are deliberately throttled during recovery.
- Pattern 9: Deltas. It's more efficient to compute deltas and distribute patches rather than push all the original data around. The best way to have deltas is a versioned data scheme. Add a version column to your key value pair. Make them append only and immutable that just record every version of a value. Keep a versioned data structure in your data plane. Makes it easy to implement snapshots and rollbacks. I'm assuming this is where QLDB comes in.
- Pattern 10: Modality and constant-work. You want to minimize the number of modes of operation. You want to minimize the number of code branches. Every time you execute combinations of code that have never occurred before you are at risk. The more paths, the more state, you have an untestable system.
- Relational databases are terrible for control planes. That's because relational databases have a lot of modes. They have emergent query optimizers so you can't predict what will happen. They can radically change in an instant even though you didn't do anything. That mode can be catastrophic. Nosql databases are much easier to reason about. Performance characteristics, what the code is doing, how joins work, how merges are occurring, are all obvious to the programmer and make it easier to build stable systems.
- Do things the dumb way every time. Always do full table scans. Always do full joins.
- For Route 53 network health checks they don't use deltas, they do something really dumb. A user calls an API that edits a configuration file on S3. There's no queues or workflows pushing data to the data plane. Instead, the data plane is in a loop every 10 seconds getting that file from S3, regardless of if it has changed or not. It's very reliable and robust. Very resilient. It won't build up backlog. It won't build up queues. It will just work. The system doesn't care how much changes. If every file changes that's OK because they are doing the same thing every time. This is how Route 53 health checks work. Distributed fleets all over the world check your service in real-time. That aggregates into a bit map with two bits per health check status, indicating if it's healthy or stale. Those bits are pushed no matter what to the DNS. If the DNS sees a one in the bitmap then it serves that IP. Every health check could fail in a zone and the system wouldn't be bogged down. This is a constant-time operation. It's a surprisingly cheap way of doing things. A hundred nodes fetching a config every 10 seconds costs $1200 a year. Which is nothing compared to developer time. Few systems work like this.
- When you're using API Gateway and AWS Lambda you're using a system that has all these ideas baked in.
- How do you institutionalize these learnings? Weekly tech talks. Design review process. Operational readiness review assessment. Codify processes and make them the default in the stack.
- Who was the first hacker? Medea. Ancient Dreams of Technology, Gods & Robots. Her method was figuring out how systems worked so she could hack them. Sound familiar? Over two thousand years ago greeks were imagining all sorts of non-biological automata—things given life through craft. Hephaestus—the god of blacksmithing, forging, invention, and technology—was a combination Elon Musk and Steve Wozniak for lazy Olympians. He made lots of cool stuff, like automatic gates for Mount Olympus and a fleet of automatic butlers to deliver ambrosia at parties. The very first robot to walk the earth was Talos, a giant bronze automaton invented by who else? Hephaestus. Talos patrolled the island of Crete, protecting it from sea invaders. It identified which ships were friend or foe, showering boulders on the less righteous. Talos could also kill enemies by heating its body to red hot and hugging them to its chest. Talos was powered by ichor that ran through a single artery. The entire system was sealed by a bronze bolt on its ankle. Madea, sensing Talos would like to live forever, promise Talos she could make im immortal if only she could remove the bolt. A very human-like Talos agreed. Using a wrench, Jason, as in Jason and the Argonauts, removed the bolt. Ichor poured out like molten metal. Talos tumbled over as a tear fell from his cheek. Medea also reasoned out how to thwart an unstoppable skeleton army that grew from planted dragons teeth. The skeletons had one drive: go forward and attack. Medea figured out how to trigger their programming so they would destroy themselves. She advised Jason to throw rocks at the army. The blows on their shields triggered their attack programming and they hacked each other to death. Magic and technology, doesn't matter, the issues are all the same.
- How Cheap Labor Drives China’s A.I. Ambitions. Here's a sentence that could only make sense this year: "Hou Xiameng runs a data factory out of her in-laws’ former cement factory in the Hebei city of Nangongshi." A data factory?: Inside, Hou Xiameng runs a company that helps artificial intelligence make sense of the world. Two dozen young people go through photos and videos, labeling just about everything they see. That’s a car. That’s a traffic light. That’s bread, that’s milk, that’s chocolate. That’s what it looks like when a person walks...the ability to tag that data may be China’s true A.I. strength, the only one that the United States may not be able to match. In China, this new industry offers a glimpse of a future that the government has long promised: an economy built on technology rather than manufacturing...We’re the assembly lines 10 years ago...now employs 300 workers but plans to expand to 1,000...
- Amazon's new Graviton processor is not meant to be a speed demon. The A1 instance type will do best on highly scalable, loosely coupled workloads. They targeted price/performance. James Hamilton explains in AWS Designed Processor: Graviton: "The AWS Graviton Processor powering the Amazon EC2 A1 Instances targets scale-out workloads such as web servers, caching fleets, and development workloads. These new instances feature up to 45% lower costs and will join the 170 different instance types supported by AWS, ranging from the Intel-based z1d instances which deliver a sustained all core frequency of 4.0 GHz, a 12 TB memory instance, the F1 instance family with up to 8 Field Programmable Gate Arrays, P3 instances with NVIDIA Tesla V100 GPUs, and the new M5a and R5a instances with AMD EPYC Processors. No other cloud offering even comes close." Why not RISC-V? What Arm brings to the table is massive volume with over 90 billion cores delivered. It is a commercially licensed rather than open sourced design but because they are amortizing their design costs over a very broad usage base, the cost per core is remarkably low. Still not free but surprisingly good value. Because Arms are used in such massive volume, there is a well developed software ecosystem, the development tools are quite good, and the Arm licensing model allows Arm cores to be part of special purpose ASICs. Arm cores can be inexpensive enough to be used in very low cost IoT devices. They perform well enough they can be used in specialized, often very expensive embedded devices. They are excellent power/performers and are used in just about every mobile device. And, they deliver an excellent price/performing and power/performing solution for server-side processing. It’s an amazingly broadly used architecture that continues to evolve quickly
- Azure will soon have on-prem competition. AWS Outposts bring native AWS services, infrastructure, and operating models to virtually any data center, co-location space, or on-premises facility. You can use the same APIs, the same tools, the same hardware, and the same functionality across on-premises and the cloud to deliver a truly consistent hybrid experience. Outposts can be used to support workloads that need to remain on-premises due to low latency or local data processing needs. Workloads like this: Peter Sbarski: Google’s Bret McGowen, at Serverlessconf San Francisco ’18, gave an example of a real-life customer out on an oil rig in the middle of an ocean with poor Internet connectivity. The customer needed to perform computation with terabytes of data but uploading it to a cloud platform over a connection equivalent to a 3G modem wasn’t feasible. “They cannot use cloud and it’s totally unfair to say — sorry buddy, hosted functions-as-a-service or bust — their developers deserve to have the same serverless experience as the rest of us” was McGowen’s explanation why, in this case, running kNative locally on the oil rig made sense.
- Your One-Stop Shop For Everything React Boston 2018: GraphQL was a big player; ReasonML was the subject of an enthusiastic presentation; React is everywhere.
- 5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code: every time you go to work on a new piece of infrastructure, go through checklist (it's a long list); using infrastructure as code is an investment: there’s an up-front cost to get going, but if you invest wisely, you’ll earn big dividends over the long-term; build your code out of small, standalone, reusable, composable modules; If your infrastructure code does not have automated tests, it’s broken; follow a specific process for building and managing infrastructure. Good discussion on HN.
- CockroachDB 2.1 is now 50x more scalable than Amazon Aurora. awoods187: I'm the author. We've introduced transactional write pipelining (covered in a forthcoming blog post), load-aware rebalancing, and completed general performance tuning which all contribute to our improved performance numbers. manigandham: CRDB is a great product with some of the easiest operations (although key management is a nightmare that they do not have a good plan for). It's fast enough for point-lookups and makes it easy to distribute and replicate your data across zones and regions. All nodes are part of a single cluster so read and write latencies will be high for global deployment, with the enterprise version having a workaround for local regional reads using pinned covering indexes. That works, but further lowers write performance. It also has trouble with large transactions and the middle ground between OLTP and OLAP with heavy joins. Good choice if you need easy scalability and SQL interface over performance and complex queries.
- At one point in time you make a build decision rather than a buy decision. Time marches on and things change. Hardware improves (cheap SSDs improve random IO, better NICs push more bandwidth). Competing systems improve (Kafka supports replication). Do you make the effort to reevaluate your decision? Twitter did. Twitter’s Kafka adoption story. It was decision based on several months evaluating Kafka under similar workloads that as run on EventBus. Kafka had significantly lower latency, regardless of the amount of throughput. Kafka requires fewer network hops, uses zero-copy, fsyncs in the background. Kafka requires fewer machines resulting in 68% resource savings, and for fanout cases with multiple consumers we saw a 75% resource savings. In practice fanout has not been extreme enough to merit separating the serving layer, especially given the bandwidth available on modern hardware.
- How many years ago was it that building your own datacenter was just how things were done? The hype cycle has gone to that place where it's now OK to make fun of a behaviour that was once SOP. Like smoking. AWS Chief Executive Andy Jassy now says building one’s own data centers is “kind of wacky."
- Emerging Memories Today: Understanding Bit Selectors: Every bit cell in a memory chip requires a selector. This device routes the bit cell’s contents onto a bus that eventually makes its way to the chip’s pins, allowing it to be read or written. The bit cell’s technology determines the type of selector that is appropriate: SRAMs use two transistors, DRAMs use one transistor, and flash memories combine a transistor with the bit cell so that the transistor both stores the bit and selects it. Emerging memory technologies use simpler selectors than are required for today’s leading memory technologies. They can get by with either two-terminal selectors or three-terminal selectors. The circuits for both types of cells, with three-terminal and two-terminal selectors, are shown in the graphic to the left. You can see that there’s not much difference between the two. In both cases the selector controls the current through the cell either by turning it off with a transistor, or by turning it off when the current reverses with a diode (or something similar)...All of this has been presented to explain that the selector has a significant influence on the amount of area a memory array consumes, and that the array’s cost is proportional to its area. For this reason memories that can use a two-terminal selector are much more likely to compete against established memories than are memories that must use a three-terminal selector.
- An excellent and thorough description. Stack Overflow: How We Do Monitoring - 2018 Edition.
- It's free! Pattern Recognition and Machine Learning:This leading textbook provides a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of pattern recognition or machine learning concepts is assumed.
- Cloudflare explains how they can drop 8 million packets per second using only 10% CPU. The key is using eBPF: The softirq CPU usage (blue) shows the CPU usage under which XDP / L4drop runs, but also includes other network related processing. It increased by slightly over a factor of 2, while the number of incoming packets per second increased by over a factor of 40!
- @RogerGrosse: Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread]. Measuring the Effects of Data Parallelism on Neural Network Training: In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured in the number of steps necessary to reach a goal out-of-sample error. Eventually, increasing the batch size will no longer reduce the number of training steps required, but the exact relationship between the batch size and how many training steps are necessary is of critical importance to practitioners, researchers, and hardware designers alike.
- No time to read AI research? We summarized top 2018 papers for you. They covered 10 papers. Titles include:Universal Language Model Fine-tuning for Text Classification, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, World Models.
Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.