« Sponsored Post: Toptal, IP2Location, Ipdata, StackHawk, InterviewCamp.io, Educative, Triplebyte, Stream, Fauna | Main | Sponsored Post: Toptal, IP2Location, Ipdata, StackHawk, InterviewCamp.io, Educative, Triplebyte, Stream, Fauna »
Saturday
Dec192020

Stuff The Internet Says On Scalability For December 19th, 2020

Hey, it's HighScalability time once again!

 

NASA

 

Do you like this sort of Stuff? Without your support on Patreon this Stuff won't happen. 

 

Know someone who could benefit from becoming one with the cloud? I wrote Explain the Cloud Like I'm 10 just for them. On Amazon it has 212 mostly 5 star reviews. Here's a load-balanced and fault-tolerant review:

Number Stuff:

  • 1 billion: TikTok users in half the time as YouTube, Instagram, FB. 
  • 3x: Ruby 3.0 faster than Ruby 2.0. The Rails speedup is more like double than triple in practice since a lot of Rails time isn’t Ruby
  • 150 million: peak requests per second served by Pinterest's cache fleet.
  • 2.8x: young adults more likely to become depressed in 6 months after using more than 300 minutes of social media a day.
  • $72,000: Google cloud bill because of...wait for it...infinite recursion in a web scraper. That's 116 billion Firebase reads and 16,000 hours of Cloud Run Compute time. Firebase handled about one billion reads per minute. But should you really silently upgrade a free plan to a paid plan?
  • 312: days an AI can keep a Loon balloon hovering over an area. I'd like these as platforms for fire watchers.
  • 110%: increase in in daily online US grocery sales in April of 2020. DoorDash is dominant, Uber Eats is gaining.
  • 75%: of databases will be cloud-hosted in 2022 according to Gartner.
  • 52 million: daily Reddit users in October, up 44% from the same month a year earlier.
  • 1200+: asteroids bigger than a meter have collided with the earthsince 1988. Only five detected. Never more than a day notice.
  • $2.25: Spotify's payment for a song that was streamed 125,000 times.
  • $50/kg: projected cost to send 1kg to lower earth orbit using Starship. $5000/kg with a Saturn V rocket. Falcon 9 delivers to LEO for $2800/kg. Falcon Heavy for $1400/kg. 
  • $5.1 billion: record Thanksgiving online shopping. Up 21.5% on 2019. 47% of sales via mobile. 
  • 1.2 trillion: transistors in the Cerebras CS-1 chip, which performed 200-times faster than a supercomputer when simulating combustion within a powerplant.
  • 91%: increase in US broadband speeds from 2019.
  • 28%: of 2100 companies surveyed by Sumo Logic used Redis, making it the number one used database in AWS.
  • Hundreds of Billions: TinyML devices within a few years.
  • .42 to .58%: increase in African GDP within the first two to three years of subsea cables going live in 2023–2024. 
  • 27 picograms: weight of a red blood cell.

Quoteable Stuff:

  • deanCommie: Being right isn't enough. You need to be able to convince people of the fact that you're right.
  • Kurt Vonnegut: One thing I don't like about computers in the home is it cheats people out of the experience of becoming.
  • Joe Morrison: I no longer believe venture-backed companies can responsibly pursue a strategy of giving away the software at the core of their value proposition. I no longer think it’s a feasible model for companies with ambitions of becoming very large or those actively avoiding consulting work. Eventually, if they’re successful, they will be forced to choose between betraying their loyal early adopters and dying a long, slow death by rubber chicken bludgeoning...Cloud killed open core.
  • Discover: His [Erik Hoel] new idea is that the purpose of dreams is to help the brain to make generalizations based on specific experiences. And they do this in a similar way to machine learning experts preventing overfitting in artificial neural networks.
  • Leo Kelion: The two new laws involved - the Digital Services Act and the Digital Markets Act - have yet to be passed, so would only come into force after the Brexit transition period has ended...Competition Commissioner Margrethe Vestager described the two laws as "milestones in our journey to make Europe fit for the digital age... we need to make rules that put order into chaos".
  • Periscope:  The truth is that the Periscope app is in an unsustainable maintenance-mode state, and has been for a while. Over the past couple of years, we’ve seen declining usage and know that the cost to support the app will only continue to go up over time. Leaving it in its current state isn’t doing right by the current and former Periscope community or by Twitter.
  • By Stilgherrian: A study by Massachusetts Institute of Technology (MIT) researchers labelled assertions that Internet- and blockchain-based voting would boost election security "misleading," adding that they would "greatly increase the risk of undetectable, nation-scale election failures." The MIT team analyzed previous research on the security risks of online and offline voting systems, and found blockchain solutions are vulnerable to scenarios where election results might have been erroneously or deliberately changed. The MIT researchers proposed five minimal election security mandates: ballot secrecy to deter intimidation or vote-buying; software independence to verify results with something like a paper trail; voter-verifiable ballots, where voters themselves witness that their vote has been correctly recorded; contestability, where someone who spots an error can persuade others that the error is real; and an auditing process.
  • tarruda: A few years ago I released a bug in production that prevented users from logging into our desktop app. It affected about ~1k users before we found out and rolled back the release. I still remember a very cold feeling in my belly, barely could sleep that night. It is difficult to imagine what the people responsible for this are feeling right now. 
  • Forrest Brazeal: Their fancy computer science degrees taught them to balance binary search trees and negotiate the Border Gateway Protocol but not how to say “I write programs that run on someone else's computers.”
  • Cass Warner Sperling: Director Frank Capra recalled, “It was a shock to the audience to see Jolson open his mouth and hear words come out of it. It was one of those once-in-a-lifetime experiences to see it happen on screen. A vision … a voice that came out of a shadow
  • Pat Helland: I want to emphasize that I see this as a great thing.  While, in my dreams, it would be great to have a dedicated freeway lane for my commute home from the office, it’s not really practical.  Instead, we share freeway lanes in spite of the inherent risk of delay in an open queuing network that allows lots of incoming traffic without any constraints.  Not many of us can build a freeway dedicated to our personal use, hence we share.  Cloud computing is sharing.
  • Luke Rissacher: It may sound obvious, but - optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster. Optimizing to 1/100 the time (reducing requests from say 1.5 sec to 15ms) is like adding 99 servers. That’s a 1U server doing the work of two 42U server racks, formerly busy turning inefficient code into heat.
  • hinkley: The biggest thing we're going to regret looking back on the early Cloud era is the foolish notion that you need 100x as many servers to do 100x as much work. Servers don't scale linearly. It's more likely you'll need, at a minimum, 107-110x as many servers. Between 100 + log(100) and 100 + sqrt(100). So making your code 100x faster saves you more than 100 servers.
  • James S. A. Corey: Wars never ended because one side was defeated. They ended because the enemies were reconciled. Anything else was just a postponement of the next round of violence. That was her strategy now. The synthesis of her arguments with Bobbie. The answer she wished they’d found together, when they were both alive.
  • nickjj: This topic just came up recently on a podcast I was on where someone said a large service was down for X amount of time and the service being down tanked his entire business while it was down for days. But he was compensated in hosting credits for the exact amount of down time for the 1 service that caused the issue. It took so long to resolve because it took support a while to figure out it was their service, not his site. So then I jokingly responded with that being like going to a restaurant, getting massive food poisoning, almost dying, ending up with a $150,000 hospital bill and then the restaurant emails you with "Dear valued customer, we're sorry for the inconvenience and have decided to award you a $50 gift card for any of our restaurants, thanks!". If your SLA agreement is only for precisely calculated credits, that's not really going to help in the grand scheme of things.
  • @t3rmin4t0r: SQL is very hard to restrict from a DoS perspective, unless you disable joins and declare everything a view + only allow project, filter, aggregate over a set of views. Something like "select * from customers, orders;" needs to be prevented.
  • @glenc: tl;dr things scale linearly until they don’t
  • @TysonTrautmann: First, the post misses the nuance of hard versus soft dependencies. Hard dependency failures cause outages but you can engineer around failures in soft dependencies. AWS teams may get this wrong on occasion, but they have the processes in place to learn from mistakes. 2/x
  • @jeffbarr: Woodside Energy ran a million-vCPU workload spanning 3 AWS Regions and got results in 2 hours vs. industry standard of weeks (150x faster).
  • @JoePWilliams31: Exclusive: Capital One has abandoned its last data center, making the $40 billion financial titan the first US bank (and one of the few major publicly-traded companies) to go all-in on the cloud
  • Google: Google Cloud Platform and Google Workspace experienced a global outage affecting all services which require Google account authentication for a duration of 50 minutes. The root cause was an issue in our automated quota management system which reduced capacity for Google's central identity management system, causing it to return errors globally. As a result, we couldn’t verify that user requests were authenticated and served errors to our users.
  • @danoot: computer person is just a larval phase, in 15-30 years they hatch into blacksmiths, cheesemakers, bookbinders or ultramarathoners, or something
  • @cloud_opinion: Zoom has "picked" AWS as its preferred cloud provider. This is how you play the cloud game people. First tell AWS, we are going with Oracle - get a sweet deal, AWS asks for PR, get another 30% discount.  Do the PR. All the while running your workloads on AWS 
  • @DonMacAskill: Seven more large-scale @Flickr services moved to @awscloud @Arm Graviton2 yesterday. The vast majority of non-GPU compute now runs on Arm, with only a few straggler services left. We'll get them, too. ~40% savings every time we flip the switch. Awesome.
  • @muratdemirbas: Is it time to rekindle the threads versus events debate? 1995: Ousterhout argued why threads are a bad idea https://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf 2003: Brewer retorted with why events are a bad idea http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf
  • Sarah Rose Cavanagh: We’re synchronous beings, and the contents of our minds spread from one of us to another easily and effortlessly, whether in person or online. Fear and love and hate are infectious, and they spread over new media.
  • wink: While the result is true I don’t like the intro or the premise. If that’s a one-off calculation 30 mins is perfectly fine, the original code is very readable and straightforward. I’m pretty sure anyone who hasn’t solved this specific problem would take longer than 30min to develop the fast, more complicated solution. And at some point (running it 5-10x) it’s worth it, absolutely. 
  • DeepMind: We trained this system on publicly available data consisting of ~170,000 protein structures from the protein data bank together with large databases containing protein sequences of unknown structure. It uses approximately 16 TPUv3s (which is 128 TPUv3 cores or roughly equivalent to ~100-200 GPUs) run over a few weeks, a relatively modest amount of compute in the context of most large state-of-the-art models used in machine learning today. 
  • ipsocannibal: This took way too many words to say thst as AWS internal service dependency graph gets deeper and more complex the higher the likelihood that any one service failure cascades into a systemic failure.
  • Twirrim: An anecdote I've shared in greater detail here before: Years ago we had a service component that needed created from scratch and had to be done right. There was no margin for error. If we'd made a mistake, it would have been disastrous to the service. Given what it was, two engineers learned TLA+, wrote a formal proof, found bugs, and iterated until they got it fixed. Producing the java code from that TLA+ model proved to be fairly trivial because it almost became a fill-in-the-blanks. Once it got to production, it just worked. It cut down what was expected to be a 6 months creation and careful rollout process down to just 4 months, even including time to run things in shadow mode worldwide for a while with very careful monitoring. That component never went wrong, and the operational work for it was just occasional tuning of parameters that had already been identified as needing to be tune-able during design review.
  • @bengoerz: Instead of blaming AWS for “smart” vacuums and doorbells not working during the outage, we should be blaming the R&D engineers who thought it was a good idea to design products that become bricks when the remote server is unreachable.
  • Kurt Vonnegut: It works. I’m grateful for things that work. Not many things do work, you know
  • Tim Bray: And like the Devourers, they’re destroying the ecosystem that they’re farming — eventually all the quality storytelling on the Internet will retreat behind the paywalls of the very few operations that can manage the pivot to subscriptions. Which, among other things leads to a future where The Truth Is Paywalled But The Lies Are Free, not exactly what our society needs right now.
  • @thomascollett: 2.5 years ago people were telling me that serverless is crazy. Those same people are now saying that a multi-account architecture is crazy. It’s the cost of being early to the party.
  • @ShortJared: I now spend a time trying to figure out how I can reduce operation burden, and a lot of time that means figuring out how I can not even need Lambda functions. If I can glue two services together with some declarative mapping I can skip the inherent risk of my own code!
  • DSHR: Duplicating systems is never a good approach to fault tolerance, they must be triplicated. In the 70s BA used Tridents on the Edinburgh to London shuttle. Their autoland systems were triplcated, and certified for zero-visibility landing. I experienced my first go-round when, on my way from Edinburgh to Miami for a conference, the approach to LHR in heavy cloud was interrupted by the engines spooling up and an abrupt climb. The captain calmly announced that one of the autopilots disagreed with the other two and, as a precaution, we were going around for another try. On the second approach there was no disagreement. We eventually landed in fog so thick I couldn't see the wingtips. Only the Tridents were landing, nothing was taking off. My Miami flight was delayed and after about 10 hours I was re-routed via LGA.
  • Benedict Evans: In the USA, where lockdowns have been pretty patchy, ecommerce was ~17% of addressable retail at the beginning of the year. It spiked up to 22.5% in Q2 and in Q3, reported last week, it dipped back to close to 20%. In the UK, where we have monthly data that makes the picture clearer, penetration was already 20%, spiked to over 30% and now seems to be stabilising in the high 20s. 
  • DSHR: This graph shows that the result of the hard disk vendors' efforts is that, despite the technological advances by the flash industry (see below), the vast majority of bytes shipped continues to be in the form of hard disk.
  • @asymco: In Japan there are more than 33,000 businesses at least 100 years old. Over 3,100 have been running for at least 200 years. Around 140 have existed for more than 500 years. And at least 19 claim to have been continuously operating for a millennium. How old is your business?
  • Geoff Huston: For short responses UDP is an efficient and reliable transport vehicle. However, when the size of the UDP response is larger than the network path MTU and UDP fragmentation is required, then fragmentation packet losses create serious problems for the protocol, and it becomes unreliable. For that reason, TCP will be more far more reliable than fragmented UDP for larger responses on average. However, TCP is slower and far less efficient than UDP and its basic reliability rate is worse than unfragmented UDP. If carriage efficiency and reliability is a consideration for the DNS, then unfragmented UDP is clearly superior to TCP, while TCP is clearly superior to fragmented UDP.
  • lyptt: One thing I've found about k8s is running a cluster can be very costly, and using a managed cluster on one of the big three clouds is even more so. Also, when rolling your own cluster it requires a lot of elbow grease to get things in a working state, and even more to keep things working. It feels awesome from a developer standpoint once you work out all the kinks. It's very natural describing your service with a bit of yaml, and then being able to then spawn it into your infrastructure and have it automatically scale out.
  • usul: I was a software engineer in America's largest company for healtcare payment transactions and let me tell you, as stated in the article, we will justify our existence and do anything in order to continue existing so that the cash flow can continue. We automate the writing of templates that say, "Sorry Mr/Mrs. Smith, your insurance claim includes x-y-z specific services which make this claim invalid, so you must experience financial turmoil as a result." Sometimes the eligibility period would close with a hard deadline, despite healthcare delays. I also wrote code to reject or approve financial transactions for claims.
  • zomgwat: As a one-man team operating a 10+ year old SaaS product, I’ve done two things that have helped keep things sustainable. The frontend continues to be server generated. PJAX style partial page updates is (mostly) dynamic enough. The maintenance burden of operating a JavaScript frontend is too high to justify for a 1-2 person team. The second thing I’ve done is avoid containers. VMs work fine for many types of applications. That’s especially true if provisioning and configuration is mostly automated. A product like Google Cloud Run looks interesting for a few low-volume backend services that I operate. Containers are a good fit there because it simplifies operations rather than increases complexity. The core infrastructure will remain VMs for the foreseeable future.
  • CraftThatBlock: SaaS business single founder here, moved from a mix of GCP's Cloud Run, Cloud SQL, and Compute Engine to DO's managed Postgres and Kubernetes, and it has had the follow effects: - Much cheaper, reducing bill from >100$/month to ~40$. Important for early stage startups - More performance, easier scaling. Found that my application was much better suited to run in K8S, but this is definitely specific to my use-case - Consolidation of resources. Except Postgres/Redis, everything is in my cluster, simplifying management and maintenance
  • davnicwil: If I were to attempt a one liner for the key takeaway from this, it'd be "use frameworks as much as possible, use managed services as much as possible".
  • @mattbeane: This @WSJ is one of many pieces showing that our massive shift to #remotework is making it harder to learn on the job via normal, legitimate pathways. My prior work on #shadowlearning shows we should expect more people to bend rules to learn anyway.
  • Cass Warner Sperling: Griffith’s dream did not bring sound to the screen and he quickly gave the idea up, saying it was “professional suicide.” After all, only 5 percent of the world spoke English, and he would be depriving his films of 95 percent of their potential audience. A journalist of the early ’20s wrote: “Silent films made us all one people all around the world with one language.” Theater moguls agreed: They would stick with the silents.
  • @swardley: If you didn't notice that quiet bombshell that 50% of AWS new services are built on serverless (Lambda etc) then pay more attention next time. Can't be long until conversational programming kicks in. Give it a few more years. Meet Alexa, your new developer and operations manager.
  • lmilcin: Yes, it is possible to do really low latency in Java. My experience is my optimized Java code is about 20-30% slower than what very optimized C code which is absolutely awesome. We are talking real research project funded by one of larger brokerage houses and the application was expected to respond to market signals within single digit microseconds every single time. The issue with Java isn't that you can't write low latency code. The issue is that you are left almost completely without tools as you can't use almost anything from Java standard library and left scratching your head how to solve even simplest problem like managing memory without something coming and suddenly creating a lot of latency where you don't want it to happen. Can't use interfaces, can't use collections, can't use exceptions, etc. You can forget about freely allocating objects, except for very short lived objects you must relegate yourself to managing preallocated pools of objects.
  • Geoff Huston: What is evident here is that we are seeing loss-based and rate-based control systems in competition for network resources. Rate-based control paradigms are fundamentally different to the loss-based systems, and the concepts of fairness probably need to be re-examined. There are some very interesting research questions in this work and questions of the future of congestion control algorithms that are widely used the Internet in terms of evolutionary pressures. 
  • divulgingwords: At my current job, I was asked to do hackerrank. I asked them if they want someone who can architect apps and features that solve real world problems and can talk to non tech people and clients just as well as developers or someone who spends all their time learning how to solve interview algorithm brain teasers that don’t mean anything to the business. This has always been my go-to. I’ve got enough experience in development to not waste my time doing that crap.
  • Ed Sperling: even the most die-hard proponents of Moore’s Law recognize that the benefits of planar device scaling have been dwindling since 28nm. Intel, AMD and Marvell have shifted their focus to chiplets and high-speed die-to-die interconnects, and all of the major foundries and OSATs have embraced multi-die, multi-dimensional packaging as the path forward. Unlike in the past, when 30% to 50% PPA improvements were possible at each new node, scaling beyond 7nm provides a 10% to 20% improvement in performance and power — and even those numbers are becoming more expensive to achieve. The price tag can be as high as a couple hundred million dollars for incremental engineering time, IP, and EDA tools at 5nm or 3nm. To make those chips also requires billions of dollars for new process technology
  • easton_s: My tiny team has been using Cloud Functions, Pub/Sub, and Apache Beam(Dataflow) to process tens of millions of daily events. I'm middle of the road on them. Pro: - They are stupid fast to develop, deploy, and update. - Firebase tooling emulators makes local dev much easier. - Simple enough a developer can build, deploy and monitor; Cost less then hiring another person to handle devops. - Now has decent IAM restrictions - Baked in access to almost all other GCP tools. Con: - Like any google product could be deprecated at any time. - They have poor versioning compared to App Engine or Cloud Run. No rollback, A/B, or multi-version deployments - Cold starts are slow. - Like any google product could be deprecated at any time.
  • CockroachDB: For these reasons, CockroachDB uses a divide the space approach, leveraging the S2 library. The totally ordered cell-IDs are easily representable in CockroachDB’s lexicographic total order. This choice comes with some additional advantages: (a) bulk ingestion becomes simple, and (b) compactions in our log-structured merge tree (LSM tree) approach to organizing storage can proceed in a streaming manner across the input files, which minimizes memory consumption. In contrast, BKD trees, which are a divide the objects approach, also permit a log-structured storage organization, but, to the best of our understanding, compactions need to load all the input files to redivide the objects.
  • manoj_mm: If you're a mobile engineer at a small startup, with a small- medium size codebase - trust me, despite your frustrations with Google/apple, your development speed is probably better than a mobile engineer working at a large tech company. Edit: Engineering is all about tradeoffs, not just for code but in terms of people, product growth, business, cost, etc. I see a lot of misinformation being spread, mostly by people who've probably never done mobile engineering at a large tech company with a million+ user app. Most of Uber's problems are a consequence of very fast growth & trying to do a lot in a short time. Uber runs buses running at fixed schedules across fixed bus stops, and the app lets you book these & ride. Probably none of you reading this might have even known about this; but this is just one out of the dozens of experiences that a user on Uber may have (that you have never even heard of). And all this growth happened within 3-4 years. The tradeoffs made were definitely not perfect, but imo, they were decent enough considering the circumstances. 
  • _jal: This is IBM signaling that the good-ole' days are in the past. If you want to run RH stuff, they will get their pound of flesh. If you're not paying, you are not something they're concerned with. When the corporate tone projected externally changes, that usually signals more than just a product change.
  • Timnit Gebru: This is not peer review. This is not reviewer #2 telling you, “Hey, there’s this missing citation.” This is a group of people, who we don’t know, who are high up because they’ve been at Google for a long time, who, for some unknown reason and process that I’ve never seen ever, were given the power to shut down this research.
  • ryandvm: Now you're in your 40s and the real obligations of life have started piling up. You've gone from 7 hours of free time a day to 1, and you sure as hell aren't going to spend it learning about the latest iteration of Angular/React/Vue/Svelte/whatever the fuck is next. So you start losing your technical edge. Meanwhile at work you have some dipshit in sprint planning that wants to have the 57th discussion about how story points do not represent time. Oh, by the way, it's December, so we need you to watch that 45 minute security training video again so you don't accept USB flash drives from strangers in the parking lot. Now it's 4 PM and you didn't get anything done today and you start thinking about how pointless all this shit is because all we do is make an app that helps people schedule dog massages...
  • Dishy McFlatface: The speed of light is faster in vacuum than in fiber, so the space lasers have exciting potential for low latency links. They will also allow us to serve users where the satellites can't see a terrestrial gateway antenna - for example, over the ocean and in regions badly connected by fiber. We did have an exciting flight test earlier this year with prototype space lasers on two Starlink satellites that managed to transmit gigabytes of data. But bringing down the cost of the space lasers and producing a lot of them fast is a really hard problem that the team is still working on.
  • DishyMcFlatface: We challenge ourselves every day to push Starlink to the fundamental limitations of physics. Current Starlink satellites operate at 550 km, where light travel time is 1.8 milliseconds to Earth. The roundtrip from your house to a gaming server and back is at best 4 times 1.8 milliseconds at these altitudes, or under 8 milliseconds.
  • @ben11kehoe: Multi-region failover for #serverless arch built in the wake of this week’s outage is going to be like the fire extinguisher in your kitchen: in theory it will save you, but when you eventually need it, it will fail because you never checked (never had to check!) it was working
  • David Gerard: Coinbase doesn’t go into detail in the announcement as to what’s so hard about implementing physical settlement — but when the CFTC was considering the rule, Coinbase’s submission to the CFTC said that their internal systems use a ledger in an ordinary database … because the Bitcoin blockchain couldn’t possibly scale to their transaction load.
  • @ben11kehoe: Wednesday, there was not much to do. AWS IoT was hit hard by the Kinesis outage, which meant lots of stuff was simply not going to work. And CloudWatch outage meant we couldn’t see what was and wasn’t working cloud-side.
  • Texas Attorney General Ken Paxton: Google repeatedly used its monopolistic power to control pricing, engage in market collusions to rig auctions in a tremendous violation of justice. If the free market were a baseball game, Google positioned itself as the pitcher, the batter and the umpire.
  • Anne Trafton: The researchers saw little to no response to code in the language regions of the brain. Instead, they found that the coding task mainly activated the so-called multiple demand network. This network, whose activity is spread throughout the frontal and parietal lobes of the brain, is typically recruited for tasks that require holding many pieces of information in mind at once, and is responsible for our ability to perform a wide variety of mental tasks. “It does pretty much anything that’s cognitively challenging, that makes you think hard,” Ivanova says.
  • Alexander Osipovich: Because light travels nearly 50% faster through air than glass, it takes about one-third less time to send data through hollow-core fiber than through the same length of standard fiber.
  • Philip Monk: Technology shouldn't force us into a single global community. Discoverability isn't an unmitigated good. Most communities should be closed off from each other — that's what makes them a community. This allows them to develop their own ideas and norms. These bleed into other communities and may spread if they provide some benefit.
  • Tyler Rogoway: The major obstacle to having the F-22 and F-35 “talk” to one another is the different digital “languages” and waveforms that their stealthy datalinks use. While the F-22 is equipped with the Intra-Flight Data Link (IFDL), the F-35 employs the Multifunctional Advanced Data Link (MADL). Traditionally, if pilots of these different aircraft needed to share data with each other inflight, or transmit it to a command and control center, they would have to utilize legacy tactical connections, such as Link 16, which are non-stealthy. Link 16 broadcasts omnidirectional and they are more easily detectable by enemy forces, giving up the advantage of stealth. 
  • DSHR: The result is that the M1's fast cores are effectively processing instructions twice as fast as Intel's and AMD's at the same clock frequency. And their efficiency cores are processing about as many using much less power. Using much less power for the same workload is one of the main reasons ARM dominates the mobile market, where battery life is crucial. That brings us to the second interesting recent RISC development. ARM isn't the only RISC architecture, it is just by a long way the most successful. Among the others with multiple practical implementations, RISC-V is I believe unique; it is the only fully open-source RISC architecture.
  • snej: And after the feel-good C parser (“Hey dude, I’m not sure that void* really points to a Foo, but I’m not going to harsh your mellow, go for it”), the borrow checker would feel like some sadistic English headmaster. “Class! Tompkins, that blithering moron in your midst, has yet again attempted to reuse an already-moved value. Fetch my cricket bat and lower your trousers, boy.”
  • William Blake: I find more & more that my style of designing is a species by itself, and in this which I send you have been compelled by my Genius or Angel to follow where he led; if I were to act otherwise it would not fulfill the purpose for which alone I live, which is … to renew the lost art of the Greeks. I attempted every morning for a fortnight together to follow your dictate, but when I found my attempts were in vain, resolved to show an independence which I know will please an author better than slavishly following the track of another, however admirable that track may be. At any rate, my excuse must be: I could not do otherwise; it was out of my power! I know I begged of you to give me your ideas and promised to build on them; here I counted without my host. I now find my mistake.

Useful Stuff:

  • It's amazing how much complexity one person can wrangle. The Tech Stack of a One-Man SaaS
    • treat infrastructure as cattle instead of pets
    • choose boring technology
    • Moved DigitalOcean to Linode because of stability issues. Then moved to AWS due to a good deal. 
    • Migrations were relatively easy because the infrastructure was described via Terraform and Kubernetes manifests. 
    • Python/Typescript
    • Django/React/NextJS/Celery/Bootstrap 4
    • Clickhouse/PostgreSQL/Redis/RDS
    • Terraform/Docker/K8s/GitHub Actions
    • AWS/Cloudflare/Let's Encrypt/Namecheap
    • Prometheus/Grafana/Sentry/Loki
    • Fastmail/Sendgrid
    • GitHub/PyCharm/VS Code/Poetry/Yarn/Invoked

  • Should you adopt a new unproven language for a complete rewrite? It's easy to say no, but what would you have done? Who doesn't want to run away from Objective-C as fast as possible? 
    • @StanTwinB: Alright folks, gather round and let me tell you the story of (almost) the biggest engineering disaster I’ve ever had the misfortune [at Uber] of being involved in. It’s a tale of politics, architecture and the sunk cost fallacy [I’m drinking an Aberlour Cask Strength Single Malt Scotch]
    • that we called “Let builder’s build”, meant that the app architecture was complicated and fragile. Uber at the time was extremely heavy on client side logic so the app would break a lot. We were constantly doing hot fixes, burning releases, etc. The design was also scaling badly.
    • But once Swift started to scale past ten engineers the wheels started coming off. The Swift compiler is still much slower than Objective-C to then but back then it was practically unusable. Build times went though the roof. Typeahead/debugging stopped working entirely.
    • The app looks simple from the user perspective. But on the global scale the business rules are insanely complicated. Every region has custom rules/regulations. Different products have different workflows. Some regions have cash payments. Every airport has different pick rules...
    • He told me that if this project fails he might as well pack his bags. The same was true for his boss all the way up to the VP
    • So said brilliant engineer in Amsterdam, built an annealing algorithm in the release build to reorder the optimization passes in such a way to as minimize size. This shaved a whooping 11 mbs off the total machine code size and bought us enough runway to keep development going.
    • Why if the uber app so big? It's complicated. Here's an explanation. Part of the problem is the app store makes you download one app and it can't be dynamically configurable. Ideally you could compile these different scenarios into separate apps that would do only what they need to do. It should be possible these days to compile a custom app for each individual user.
    • uberiosthrow: As someone who was there, a rewrite was necessary, and with hindsight we should of done it with objective-c. The original obj-c app was built with under 10 iOS engineers in mind, and now Uber had 100s of mobile engineers work on one app in some form or another.
      We didn't do a simple build scalability test until we were well into the project. If we did it would of revealed swift's build problems to us. Our swift experience is what slowed down our kotlin migration significantly. Today Uber android is still a majority Java app.
    • throwaway_googler: Google isn't like this...Just kidding. Google is totally promotion driven. The promotion cycle and release cycle are linked. It doesn't matter too much because most people just want the promo in order to get better pay and for that you should just find a new job.

  • Videos from the Open Source Cubesat Workshop 2020 are now available. You might like how to build distributed ground station network alone

  • Even AWS has a bad day every once in awhile. Summary of the Amazon Kinesis Event in the Northern Virginia (US-EAST-1) Region. My service, Best Sellers Rank, is hosted out of us-east-1, so I experienced this event as a personal tragedy. I can only hope one day to overcome the trauma :-)
    • The proximate cause: new capacity had caused all of the servers in the fleet to exceed the maximum number of threads allowed by an operating system configuration. As this limit was being exceeded, cache construction was failing to complete and front-end servers were ending up with useless shard-maps that left them unable to route requests to back-end clusters
    • The ripple of evil was this caused Kinesis failures. It turns out Kinesis is used by a number of backend services, so a failure cascade ensued.
    • New capacity coming on-line causes problems all the time. I'm a little surprised threshold alarms on each node didn't fire as hard or soft limits were being approached. That's a very basic protection mechanism that can cause backpressure and load shedding to occur. But hey, that's how things improve—one fault at a time. 
    • Forrest Brazeal in the The cold reality of the Kinesis Incident argues this was a systemic failure, not a random event, and that AWS must do better. I doubt AWS would disagree, but when your focus is on feature velocity, who wants to harden the old stuff? No promotions in that.
    • What to do about it is always the difficult part. As a customer, if you really care then you should use a multi-region design. AWS never said their stuff would never fail. They give you a way to deal with failures. But for most of us it's way too hard and expensive. AWS really should make this a lot easier, but keeping it simple and enduring the occasional failure is where the smart money is.
    • People are full of suggestions of what AWS should do. Yes, cascading failures is bad, but do you really want services to depend on half-baked internal services or on first class external services? It doesn't make sense for Cognito to create a completely parallel sercurity mechanism that can be used when Kinesis fails, because it will likely fail too. And it did.
    • Cell-based architectures are the way to go here: talawahtech: Multi-AZ doesn't protect against a software/OS issue like this, Multi-AZ would be relevant if it was an infrastructure failure (e.g. underlying EC2 instances or networking). The relevant resiliency pattern in this case would be what they refer to as cell-based architecture, where within an AZ services are broken down into smaller independent cells to minimize the blast radius. Celluarization in combination with workload partitioning would have helped, e.g. don't run Cloudwatch, Cognito and Customer workloads on the same set of cells. It is also important to note that celluarization only helps in this case if they limit code deployment to a limited number of cells at a time. Another relevant point made in the video is that they restrict cells to a maximum size which then makes it easier to test behavior at that size. This would have also helped avoid this specific issue since the number of threads would have been tied to the number of instances in a cell.
    • otterley: Nearly all AWS services are regional in scope, and for many (if not most) services, they are scaled at a cellular level within a region. Accounts are assigned to specific cells within that region. There are very, very few services that are global in scope, and it is strongly discouraged to create cross-regional dependencies -- not just as applied to our customers, but to ourselves as well. IAM and Route 53 are notable exceptions, but they offer read replicas in every region and are eventually consistent: if the primary region has a failure, you might not be able to make changes to your configuration, but the other regions will operate on read-only replicas. This incident was regional in scope: us-east-1 was the only impacted region. As far as I know, no other region was impacted by this event. So customers operating in other regions were largely unaffected. (If you know otherwise, please correct me.) As a Solutions Architect, I regularly warn customers that running in multiple Availability Zones is not enough. Availability Zones protect you from many kinds of physical infrastructure failures, but not necessarily from regional service failures. So it is super important to run in multiple regions as well: not necessarily active-active, but at least in a standby mode (i.e. "pilot light") so that customers can shed traffic from the failing region and continue to run their workloads.
    • But for Cognito this is impossible: Corrado: However, Cognito is very region specific and there is currently no way to run in active-active or even in standby mode. The problem is user accounts; you can't sync them to another region and you can't back-up/restore them (with passwords). Until AWS comes up with some way to run Cognito in a cross-region fashion, we are pretty much stuck in a single region and vulnerable to this type of outage in the future.
    • Which brings up the idea of partial failure. Should Cognito continue to give service when part of its security layer is down? Certainly the Kinesis path could have been short circuited. But would that be the right thing to do, prioritizing availability over security? 
    • Twirrim: side note: One of the key values engineers get evaluated on before reaching "Principal Engineer" level is "respect what has gone before". No one sets out to build something crap. You likely weren't involved in the original decisions, you don't necessarily know what the thinking was behind various things. Respect that those before you built something as best as suited the known constraints at the time. The same applies forwards. Respect what is before you now, and be aware that in 3-5 years someone will be refactoring what you're about to create. The document you present to your team now will help the next engineers when they come to refactor later on down the line. Things like TLA+ models will be invaluable here too.
    • Google has some bad days too. Google Cloud Infrastructure Components Incident #20013 and Google Cloud Issue Summary. Migrations are tough. 

  • How about a new internet for xmas? SCION: the first clean-slate Internet architecture designed to provide route control, failure isolation, and explicit trust information for end-to-end communication. SCION organizes existing ASes into groups of independent routing planes, called isolation domains, which interconnect to provide global connectivity. Isolation domains provide natural isolation of routing failures and misconfigurations, give endpoints strong control for both inbound and outbound traffic, provide meaningful and enforceable trust, and enable scalable routing updates with high path freshness. As a result, the SCION architecture provides strong resilience and security properties as an intrinsic consequence of its design. Besides high security, SCION also provides a scalable routing infrastructure, and high efficiency for packet forwarding. As a path-based architecture, SCION end hosts learn about available network path segments, and combine them into end-to-end paths that are carried in packet headers. 

  • Designing an infrastructure to handle spike loads used to be a challenge. Not so much anymore. A technical deep dive into processing €5 million in donations in 2 hours using Cloudflare Workers:
    • We record a peak of 151 donations per second. By midnight, the platform has raised over 5 million euros, with the bulk of donations (over million) in just one hour, from 11pm to midnight. Our Cloudflare workers received over 4 million requests.
    • Cloudflare Workers might be less well known than AWS Lambda. However, they would allow our code to run on Cloudflare’s global edge network rather than on regional instances. Cloudflare, unlike any other cloud computing services, doesn’t use containers or virtual machines, but V8 isolates, the technology created by the Google Chrome team to run javascript in the browser. Practically for us this meant code would run closer to our end-user making the request, and invocations wouldn’t incur any cold start, which can run into the 100s of ms.
    • the results from the evening as measured by Cloudflare were even more impressive: 99% of the requests used less than 6.6 ms CPU time, and this stayed flat throughout the evening and the next day. Only in the 99.9th percentile, we saw a little bump which coincided with the donation peak, at around 11.34pm. With AWS Lambda we were used to seeing functions running for 100s of ms, and seconds during the warm-up phase, so having functions consistently run for less than 10ms, without the inevitable penalty, was quite the sight!
    • Keep in mind every service you use must also scale to handle the load. It's not enough just to scale compute: 
      • Then activity goes off the charts, and we see the first warnings of Stripe’s API returning rate limiting errors coming from Sentry and the Stripe Dashboard, followed by users reporting the same on Twitter. Whilst the platform itself is responding quickly, a percentage of users are unable to get through due to rate limiting from Stripe. We get on the phone with the team over at RTÉ and the account manager at Stripe. All teams involved are incredibly quick to respond and less than half hour later Stripe’s limits are raised 5x. In the meantime, we get a call from Stripe’s CTO, who reassured us we shouldn’t see any further issues.
    • At the core, we have a single Worker Site which does two things: it serves the static ReactJS front-end and responds to a back-end route. That back-end route is the core of handling payments, creating a payment intent to Stripe’s APIs using the user-submitted data and returning any feedback.
    • We automated testing of the platform through Cypress.io, which simulates a full payment journey on the staging environment using Stripe’s test mode, and run this for every commit to our main branch using Github Actions
    • All warnings are logged in Sentry, and sent through to our Slack
    • Fortunately for us, and a testament to our extensive cross-browser testing using Browserstack ahead of time, none of the reported errors and warnings were deemed to be critical to the user journey.
    • Serverless is faster when running at the edge (via Cloudflare Workers, or AWS Lambda@Edge). In addition, Cloudflare Workers don’t suffer from a typical “cold start”, keeping request times flat throughout the event as traffic ramped up.

  • A Second Conversation with Werner Vogels:
    • When I joined Amazon in 1998, the company had a single US-based website selling only books and running a monolithic C application on five servers, a handful of Berkeley DBs for key/value data, and a relational database. That database was called "ACB" which stood for "Amazon.Com Books," a name that failed to reflect the range of our ambition.
    • one of the tenets up front was don't lock yourself into your architecture, because two or three orders of magnitude of scale and you will have to rethink it. Some of the things we did early on in thinking hard about what an evolvable architecture would be—something that we could build on in the future when we would be adding functionality to S3—were revolutionary. We had never done that before.
    • One of the biggest things that we learned early on is—and there's this quote that I use—"Everything fails, all the time." Really, everything fails, all the time, in unexpected ways, things that I never knew. Bit flips in memory, yes. You need to protect individual data structures with a CRC or checksum on it because you can't trust the data in it anymore. TCP (Transmission Control Protocol) is supposed to be reliable and not have any flips in bits, but it turns out that's not the case.
    • We went down a path, one that Jeff [Bezos] described years before, as building tools instead of platforms. A platform was the old-style way that large software platform companies would use in serving their technology.
    • If you build everything and the kitchen sink as one big platform, you build with technology that is from five years before, because that's how long it takes to design and build and give everything to your customers. We wanted to move much faster and have a really quick feedback cycle with our customers that asks, "How would you develop for 2025?"
    • Now let's say, to make it simple, you have a 2,000-line algorithm. That's something you can evaluate with formal verification tools; with 50,000 lines, forget about it. Simple building blocks allow you to have a culture that focuses exactly on what you want to do, whether it's around auditing or whether it's around using TLA+ or durability reviews or whatever. Everything we change in S3 goes through a durability review, making sure that none of these algorithms actually does anything other than what we want them to do.
    •  I am never, ever going to combine account and identity at the same time again. This was something we did in the early days; we didn't really think that through with respect to how the system would evolve. It took us quite a while actually to rip out accounts. An account is something you bill to; identity is something you use in building your systems. These are two very different things, but we didn't separate them in the early days; we had one concept there. It was an obvious choice in the moment but the wrong choice.
    • One of the costs that we didn't anticipate was the number of requests, and request handling. We added this later to the payment model in S3, but it was clearly something we didn't anticipate. Services that came after S3 have been able to learn the lessons from S3 itself.
    • I remember during an earlier period in Amazon Retail, we had a whole year where we focused on performance, especially at the 99.9 percentile, and we had a whole year where we focused on removing single points of failure, but then we had a whole year where we focused on efficiency. Well, that last one failed completely, because it's not a customer-facing opportunity. Our engineers are very well attuned to removing single points of failure because it's good for our customers, or to performance, and understanding our customers. Becoming more efficient is bottom-line driven, and all of the engineers go, "Yes, but we could be doing all of these other things that would be good for our customers."
    • you need to be ready to talk about your operational results of the past week. An important part of that is there are senior people in the room, and there are junior folks who have just launched their first service. A lot of learning goes on in those two hours in that room that is probably the highest-value learning I've ever seen.

  • Videos from EmacsConf 2020 are now available.

  • M1 Macs are the new Lisp Machines. If you built your own custom hardware wouldn't you add instructions to optimize expensive operations? That's what Apple did. 
    • Apple Silicon M1: Black. Magic: Retain and release are tiny actions that almost all software, on all Apple platforms, does all the time. ….. The Apple Silicon system architecture is designed to make these operations as fast as possible. It’s not so much that Intel’s x86 architecture is a bad fit for Apple’s software frameworks, as that Apple Silicon is designed to be a bespoke fit for it …. retaining and releasing NSObjects is so common on MacOS (and iOS), that making it 5 times faster on Apple Silicon than on Intel has profound implications on everything from performance to battery life. 
    • Also, Microsoft Designing Its Own Chips for Servers, Surface PCs
    • ChuckMcM: Everything old is new again right? IBM, DEC, HP all built their own chips as part of their development. That got eaten alive by people like Sun and Apollo who started building workstations on commodity microprocessors, which got better and better so that even the "toy" computers (which is what the IBM PC started out as) became capable of eating their lunch, so they moved "into chips" with SPARC, PA-RISC, PowerPC which forced Intel to abortively try Itanium except that AMD kicked them in the nuts with AMD64. And that was where we lived until the computer architecture "for the masses" became the phone, with ARM chips and they started trickling down into the masses, and then Samsung and Apple started pushing advantages because they could customize their SoC chips and others couldn't, and all the while Intel kept adding specialized instruction sets to try to hold off ARM and AMD from their slipping hold on the Data center and what was left of the "laptop" business. 

  • A lesson people continually relearn. Introducing Glommio - a Thread-per-Core Crate for Rust & Linux
    • Research recently demonstrated that a thread-per-core architecture can improve tail latencies of applications by up to 71%
    • Thread-per-core programming eliminates threads from the picture altogether. Each core, or CPU, runs a single thread, and often (although not necessarily), each of these threads is pinned to a specific CPU. As the Operating System Scheduler cannot move these threads around, and there is never another thread in that same CPU, there are no context switches.
    • To take advantage of thread-per-core, developers should employ sharding: each of the threads in the thread-per-core application becomes responsible for a subset of the data.
    • The biggest advantage of this model is that locks are never necessary.
    • github.com/DataDog/glommio

  • Scaling Cache Infrastructure at Pinterest
    • Pinterest’s distributed cache fleet spans an EC2 instance footprint consisting of thousands of machines, caching hundreds of terabytes of data served at more than 150 million requests per second at peak. This cache layer optimizes top-level performance by driving down latency across the entire backend stack and provides significant cost efficiency by reducing the capacity required for expensive backends.
    • Every API request incoming to Pinterest internally fans out to a complex tree of RPCs through the stack, hitting tens of services before completion of its critical path. This can include services for querying core data like boards and Pins, recommendation systems for serving related Pins, and spam detection systems.
    • At Pinterest, the most common use of the distributed cache layer is storing such results of intermediate computations with lookaside semantics. This allows the cache layer to absorb a significant share of traffic that would otherwise be destined for compute-expensive or storage-expensive services and databases. With both single-digit millisecond tail latency and an extraordinarily low infrastructure dollar cost per request served, the distributed cache layer offers a performant and cost-efficient mechanism to scale a variety of backends to meet growing Pinterest demand.
    • Cache clients use a universal routing abstraction layer that ensures applications have a fault-tolerant and consistent view of data.
    • The server fleet can additionally be scaled out independently of the application layer to transparently adjust memory or throughput capacity to accommodate changes in resource usage profiles.
    • Memcached and mcrouter form the backbone of Pinterest’s distributed caching infrastructure and play a critical role in Pinterest’s storage infrastructure stack.
    • Memcached is highly efficient: a single r5.2xlarge EC2 instance is capable of sustaining in excess of 100K requests per second and tens of thousands of concurrent TCP connections without tangible client-side latency degradation,
    • At Pinterest, memcached’s extstore drives huge wins in storage efficiency for use cases ranging from Visual Search to personalized search recommendation engines. Extstore expands cached data capacity to a locally mounted NVMe flash disk in addition to DRAM, which increases available per-instance storage capacity from ~55 GB (r5.2xlarge) to nearly 1.7 TB (i3.2xlarge) for a fraction of the instance cost. In practice, extstore has benefitted data capacity-bound use cases without sacrificing end-to-end latency despite several orders of magnitude in difference between DRAM and SSD response times. 
    • We operate ~100 distinct memcached clusters, many of which have different tenancy characteristics (dedicated versus shared), hardware instance types, and routing policies. While this presents a sizable maintenance burden on the team, it also allows for effective performance and availability isolation per use case, while also providing opportunities for efficiency optimization by choosing parameters and instance types most appropriate for a particular workload’s usage profile.

  • Episode #76: Building Well-Architected Serverless using CDK Patterns. While you will learn a lot about the CDK in this podcast, the part I found most interesting and will concentrate on, is how to diffuse a new technology into large organization that already has an existing process.
    • Matt Coulter is a Technical Architect at Liberty IT and AWS Community Builder. Liberty is a large company with 1000s of engineers and many teams. 
    • The CIO mandated that Liberty wanted to be serverless first company. Buy-in at the top is essential. That way everyone knows they eventually have to adopt the serverless religion. A diffusion gradient has been set up towards serverless. Now people only have to figure out how to go serverless. That's where Matt comes in.
    • Matt didn't begin by requiring to engineers to do this or that. If you walk in and tell people "I know better than you," they'll just say no.
    • He started in a unique way, he created a public external site—CDK Patterns—to document a set of serverless architecture patterns. The idea was to be able to say this is an actual thing in the real world that's supported by industry. It lists the AWS Heroe's that talk about the patterns along with links to all the supporting articles. For example: Web Hooks, Cloud Formation, Alexa Skills. Community validation is a form of social proof. 
    • The advantage of patterns is they create a vocabulary around which people can have conversations. It gets everyone on the same page. 
    • The amount of AWS "stuff" out there is overwhelming. By providing a curated list of CDK patterns it helps developers to focus on what is most useful.
    • Internally, Liberty runs Well Architected design reviews. Design reviews serve as a way to have conversations with teams. When someone is trying to do X, you can say "have you considered this CDK pattern to implement it?" Because the solution already exists, and is coded, documented, and available, you've reduced the barrier for them to go in the direction you want them to go rather than forcing people into a direction against their will.
    • What does Well Architected mean? AWS has a Well Architected framework that talks about how best to design systems that work on AWS. It also has a Serverless Lens that talks about how best to design serverless solutions.
    • The Serverless Lens is used as a structure to conduct design reviews with teams. Having this objective third party framework for what constitutes a good design has helped drive adoption across the organization. 
    • CDK constructs can be used to encapsulate best practices so they can be reused across the organization without each team having to create their own. 
    • Liberty has a tailored version of every CDK pattern. They also have an internal tool to setup patterns quickly and easily. You want to make adoption as easy as possible. Reduce friction everywhere.
    • Tests can be built into CDK, which helps standardization. 
    • John Hagel: The commitment is to scale the edge as rapidly as possible and to drive transformation on the edge, rather than trying to get the core to transform. As the edge scales, it will pull more and more of the people and resources from the core out to the edge. In a world of accelerating change, edges can scale at a pace that would have been unimaginable a few decades ago. Before we know it, the edge has scaled to the point where it has become the new core of the institution, not just a diversification effort or growth initiative.

  • LOL! We Rewrote Everything in Rust, and our Startup Still Failed:
    • The team brainstormed ideas for what we would like to build, and set to work.
    • We rewrote our microservices from Node to Go to C++ and then back to Node.
    • We pushed back the launch for two months to tweak our build system to perfection.
    • Had we had more funding, we would have taken the time to rewrite our app in Haskell instead. 

  • Thread-Per-Core Buffer Management for a modern Kafka-API storage system
    • software does not run on category theory, it runs on superscalar CPUs with wide, multi-channel GB/s memory units and NVMe SSD access times in the order of 10-100’s of microseconds. The reason some software written a decade ago - on a different hardware platform - feels slow is because it fails to exploit the advances in modern hardware.
    • The new bottleneck in storage systems is the CPU. SSD devices are 100-1000x faster than spinning disks and are 10x cheaper today[1] than they were a decade ago, from $2,500 down to $200 per Terabyte. Networks have 100x higher throughput in public clouds from 1Gbps to 100Gbps.
    • This is all to say that the rise of readily available, many-core systems necessitates a different approach for building infrastructure. Case in point[9]: in order to take full advantage of 96 vCPUs on a i3en.metal on AWS, you’ll need to find a way to exploit sustained CPU clock speed of 3.1 GHz, 60 TB of total NVMe instance storage, 768 GiB of memory and NVMe devices capable of delivering up to 2 million random IOPS at 4 KB block sizes. This kind of beast necessitates a new kind of storage engine and threading model that leverages these hardware advances.
    • Redpanda - a Kafka-API compatible system for mission critical workloads[3] - addresses all of these issues. It uses a thread-per-core architecture with Structured Message Passing (SMP) to communicate between these pinned threads. Threading is a foundational decision for any application, whether you are using a thread-pool, pinned threads with a network of Single Producer Single Consumer SPSC[7] queues, or any other of the advanced Safe Memory Reclamation (SMR) techniques, threading is your ring-0, the true kernel of your application. It tells you what your sensitivity is for blocking - which for Redpanda is less than 500 microseconds - otherwise, Seastar’s[4] reactor will print a stack trace warning you of the blocking since it effectively injects latency on the network poller.
    • Redpanda uses a single pinned thread per core architecture to do everything. Network polling, submitting async IO to the kernel, reaping events, triggering timers, scheduling compute tasks, etc. Structurally, it means nothing can block for longer than 500 microseconds, or you’ll be introducing latency in other parts of your stack. 

  • Corey Quinn opines, or perhaps sublimes, in Day Two Cloud 078: Cloud Economics Are Ridiculous:
    • Feature velocity outweighs cost optimization. It's easier for businesses to make more money than it is for them to cust costs. For people it's the exact opposite.
    • Saving money for many companies is not the primary goal. What companies really care about is predicting the bill. What will the financial model look like for the company over a 18-36 month span? When this month's bill is 20% higher than last month's bill is it an aberration or the new normal?
    • Your cloud bill is often less dependent on the number of customers you have and more predicated on the number of engineers you have.
    • The way larger customers pay for AWS is by signing multi-year contracts with committed spending amounts. They commit to using $X of cloud services. The irony is these contracts remove the flexibility of the cloud. Once you've committed to using EC2, for example, and Lambda comes out, you can't use Lambda because you're already paying for EC2. The win for the cloud was you pay for what you use, but now the cloud provider wants you to tell them what you'll use for the next three users. You need to do the analysis so you can know what commit to sign up for. 
    • 60% of spend globally on AWS is EC2. There's a very long tail.
    • Cloud bills are ridiculously complex and hard to understand. Nobody understands it at first. It's not intuitive. Everyone makes mistakes. Don't feel bad. Humans are bad at this stuff, yet oddly AI has not been been deployed by AWS to optimize bills.
    • You used to be able to play vendors against each other, but when you have one cloud provider you can't play that game anymore.
    • When you start spending $500 million a year on cloud, you probably want a team to analyze your bill. When you're spending $2000 a month it's usually not enough for a company to worry about.
    • There are three roles: You need someone that has strategic visibility into where the business is and where it's going; someone who can figure out how all the accounting pieces work; someone deep on the engineering side. It's hard to find all those skills in one person, which is why you need to hire the Duckbill Group.
    • At some point, after you find all the big stuff, you'll spend more money trying to find ways to save money than you'll actually save. 
    • AWS wants you to spend money over the long term, they aren't interested in tricking you into spend money. 
    • AWS is complex, so there's always something new to learn when examining a bill. The Duckbill Group can spread all those learnings across their customer base. While for one customer may not warrant spending two weeks analyzing X, after it's done for a customer where it does matter, everyone benefits.
    • You have to answer the business question: when traffic spikes do you want it to scale or would rather it fail to scale and stop serving traffic? Both or valid business options. 
    • Auto-scaling often doesn't scale down, so infrastructure costs may stay even as traffic declines. There's also usually a constant traffic baseline. Usually it's not like traffic drops off at night and only exists during the day.
    • Understand why your bill is the way it is. Does it align with what your business is focussing on? 
    • Don't use your datacenter procurement procedures in the cloud. Don't incentivize people to do the wrong thing. If accounting turns off idle instances then developers won't let them go because they need them. Build guard-rails, but get out of their way, dont hover over their shoulder like they're going to embezzle money.
    • Focus on the economics of your business. Saving money on the cloud is not a strategic priority for most businesses. Almost no company has comprehensive cloud economic system. It's not always a burning priority. The people who understand this stuff are expensive and hard to find. You probably want your engineers working on product, not cloud cost optimization. 
    • There is no nefarious plot to screw customers over. 
    • The cost of data egress is the achilles heal of the cloud. Oracle cloud has great egress bandwidth and a large free tier. Zoom chose Oracle for this reason. 
    • In the cloud you can't pay for different levels of network performance. You pay for great network performance, you can't say I'm willing to pay less for poorer latency or less bandwidth. For storage and CPU you can pay for different layers of service. 
    • Every cloud provider negotiates terms for everything. That's how it works. If you spend $100 million a year you can pretty much dictate terms in some cases around some things. 
    • There are some workloads that should be on-prem. Latency used to be big issue, not so much today. Anything that needs to be close to manufacturing. If you've already spent money building datacenters. Don't feel bad about what you are doing. If you're business is doing revenue in your datacenter then do you really need to move to the cloud?
    • The way you actually figure out what a service will really cost is to use it. No one knows how all this stuff works. 
    • If you organize around two pizza teams anything that requires teams to come together—like the console or the billing system—will be a disaster. Companies are vast and don't talk to each other. Anything that unifies everything is super hard. Companies ship their culture.
    • If you get a surprise bill contact your provider and they'll reverse it. Once.
    • Also, AWS Well Architected Framework in Serverless: Cost Optimization. Also also, AWS Lambda Power Tuning

  • Serverless Cassandra: AWS Keyspaces and Azure Cosmos DB in Comparison. These kind of comparisons are always tricky to evaluate, but there's a lot of detail that can help in your decision making.

  • Episode 16: Kelsey Hightower, Kubernetes and Google Cloud
    • Millions of people  are still using the LAMP stack. They may have 10 servers processing millions of requests per day and building a viable business on top of that. They're good.
    • At a KubeCon everyone was talking about how big their cluster was etc. A small group wondered what all these people were talking about. They build their apps, deploy to AppEngine, and it just works...for eight years.
    • The end game for any platform is I have this app and here's how I want it run. 
    • Why does k8s exist at all? So everyone doesn't have to figure out how to glue all the seperate components together themselves. The datacenter is turned into a computer and k8s is its operating system. K8s will follow the same trajectory as Linux. It will be at the bottom of the stack and be so reliable and invisible you'll barely note that it's there, even though it's actually everywhere.

  • Cloudflare continues trying to make the internet better by expanding their hosting platform. Introducing Cloudflare Pages: the best way to build JAMstack websites
    • Consider Cloudflare Pages as a Netlify/Vercel competitor. Given Cloudflare's stateful workers, first class CDN and edge network, a promised database, and competitive pricing, Cloudflare is becoming a contender as a platform provider, especially for hosting static sites. You can put your static site in S3 and use CloudFront, but that's not your only option now. Cloudfare's low bandwidth charges are attractive. 
    • JAMStack is "A modern web development architecture based on client-side JavaScript, reusable APIs, and prebuilt Markup". 
    • The advantage it it's "a modern web development architecture that gives devs an opportunity to rely on the advantages of a static website, which include better web performance and security benefits, while still retaining the dynamic attributes of a database-oriented CMS without the database. Jamstack websites serve static files immediately when a request is made. It is able to do so because there is no need to query the database as the files are already compiled and get served to the browser from a CDN." 
    • How is this different than a SPA? Good question. It's all kind of a blur to me. The main idea is JAMstack ships static stites with prerendered HTML. SPA's tends to dynamically generate HTML on the fly from data returned from some sort of API (REST, GraphQL). Though an SPA is a static site too, so you can mix and match however you want. 
    • Also, Changing Lanes: How Lyft is Migrating 100+ Frontend Microservices to Next.js

  • True of most consoles and dashboards. Why is the Google Cloud UI so slow? shadowgovt: There's a meta-answer, which is Google is shipping its org chart. Google Cloud UI is one giant Angular app with components written by sub-teams across wildly disparate timezones, much less offices. Their ability to consolidate resources is poor. They came late-to-the-game on tooling up an infrastructure team to provide both standardized libraries and rigor on how the architecture is used. The duplication of component logic is a direct consequence of this, as those components are ending up in the codebase as a side-effect of different segments of code, worked on by different teams, probably in different offices, "reusing" the same component---but not really, since they might be skewed on the version they're depending on. It's DLL hell in Javascript client form, basically.

  • Arch Conf 2020 videos are now available.

  • Interesting use cases of bloom filters:
    • Bloom filters can be so much more than a space efficient hashmap!
    • The first and most obvious is to use it as a lookup cache. 
    • You can also use bloom filters to mitigate cache busting attacks on your website. 
    • Akami also uses bloom filters to avoid one hit wonder items filling the cache.
    • Spelling checkers were also a use case for bloom filters back in the day when you really had to worry about your memory usage.
    • Consider a distributed social network where I want to have two people determine if there is any overlap in their contact lists. I want to do so without having to share the lists because of privacy reasons. You could build a filter for each users contacts, and swap them. Then check against the filter in order to determine if we have an overlap, without the possibility of you being able to reconstruct either list even were you to hold even both filters.
    • You can extend this idea further as well. Say I want to determine the overlap between two user’s social circles. Build two filters of the same size containing each users friends. Then compare the bits of the two filters. The more overlap the more shared bits. This also applies to finding distance between words or sentences, and it works regardless of how long the text is as its driven more closely by the words/terms and not how often they appear. You split the document into ngrams (say trigrams), add each into your filter and compare them the same way. The more overlapping bits the closer they are.
    • The idea of sharing reasonable proof of ownership is an interesting use case for bloom filters.
    • You can also use bloom filters to make a search engine.

  • I thought for sure this was going to be another Nagle’s Algorithm problem. It wasn't. It's a not using a real-time OS problem. But the fun part is the investigation and the analysis. Life of a Netflix Partner Engineer — The case of the extra 40 ms:
    • Let’s take a moment to talk about the audio/video pipeline in the Netflix application. Everything up until the “decoder buffer” is the same on every set top box and smart TV, but moving the A/V data into the device’s decoder buffer is a device-specific routine running in its own thread. This routine’s job is to keep the decoder buffer full by calling a Netflix provided API which provides the next frame of audio or video data. In Ninja, this job is performed by an Android Thread. There is a simple state machine and some logic to handle different play states, but under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again.
    • To play a 60fps video, the highest frame rate available in the Netflix catalog, the device must render a new frame every 16.66 ms, so checking for a new sample every 15ms is just fast enough to stay ahead of any video stream Netflix can provide.
    • The grey line, the time between calls invoking the handler, tells a different story. In the normal playback case you can see the handler is invoked about every 15 ms. In the stutter case, on the right, the handler is invoked approximately every 55 ms. There are an extra 40 ms between invocations, and there’s no way that can keep up with playback. But why?
    • A bug deep in the plumbing of Android itself meant this extra timer value was retained when the thread moved to the foreground. Usually the audio handler thread was created while the application was in the foreground, but sometimes the thread was created a little sooner, while Ninja was still in the background. When this happened, playback would stutter.

  • There's an entire YouTube channel for AWS serverless related videos. Serverless Land. Like in the land of fairy, once you eat in serverless land, you can never return to your world previous.

  • How do you fight a juggernaut? By somehow differentiating yourself. Comparing Fauna and DynamoDB
    • Strangely, they say DynamoDB is basically a web-server in front of a database. I'm not even sure what that means. It has an API. There's nothing web-serverish about it.
    • DynamoDB is certainly all about single-table design. That sucks and is great in equal measure. You as the developer on the hook for all the low level details. In that sense DynamoDB is like C. It's structured assembler for databases. 
    • For the longest time DynamoDB did not have a query language. They now have PartiQL, which I haven't used, so I can't say if it's good or bad. I couldn't find any examples of it being used with the SDK, so I'm not sure how useful it is.
    • In FaunaDB everything is a transaction, like in a typical SQL database. This is a big win for developers. Single table design is a PITA. Now, does this scale as well and perform as well as DynamoDB? Since they didn't talk about performance we can assume it doesn't.
    • FaunaDB uses a commit pricing plan model. You have to agree to a minimum spend to get a certain feature set. This sucks. DynamoDB you pay for what you use and all features are always available. 
    • Given the Fauna cost examples it seems DynamoDB will be cheaper for simpler use cases and FaunaDB will cheaper for more complex use cases. 

  • How Facebook keeps its large-scale infrastructure hardware up and running
    • In addition to server logs that record reboots, kernel panics out-of-memory, etc., there are also software and tooling logs in our production system. But the scale and complexity of all these means it’s hard to examine all the logs jointly to find correlations among them.
    • We implemented a scalable root-cause-analysis (RCA) tool that sorts through millions of log entries (each described by potentially hundreds of columns) to find easy-to-understand and actionable correlations. With data pre-aggregation using Scuba, a realtime in-memory database, we significantly improved the scalability of a traditional pattern mining algorithm, FP-Growth, for finding correlations in this RCA framework. We also added a set of filters on the reported correlations to improve the interoperability of the result. 

  • Moving OkCupid from REST to GraphQL:
    • Our GraphQL API has been in production for 1½ years, and we stopped adding new features to our REST API over a year ago. The graph handles up to 170k requests per minute, and it is made up of 227 entities.
    • Our first draft of this API took nearly twice the time of the REST API, which, obviously, was not cool. Releasing a shadow request allowed us to triage these performance issues without affecting real users’ experience on the site.
    • After working with the graph for a while now, we’ve realized that the business logic works best when centralized in the back-end, and that the role of our graph is to fetch, format, and present the back-end’s data in a way that makes sense to clients.

  • Do we suffer from the same sort of blindspots when planning for system failures? Everything We’ve Learned About Modern Economic Theory Is Wrong
    • His beef is that all too often, economic models assume something called “ergodicity.” That is, the average of all possible outcomes of a given situation informs how any one person might experience it. But that’s often not the case, which Peters says renders much of the field’s predictions irrelevant in real life. In those instances, his solution is to borrow math commonly used in thermodynamics to model outcomes using the correct average.
    • The problem, Peters says, is the model fails to predict how humans actually behave because the math is flawed. Expected utility is calculated as an average of all possible outcomes for a given event. What this misses is how a single outlier can, in effect, skew perceptions. Or put another way, what you might expect on average has little resemblance to what most people experience.

  • How the BBC World Service migrated 31 million weekly readers to an isomorphic react app and improved page performance by up to 83%:
    • Over the past 12 months we’ve migrated our pages which are spread across 41 discrete sites from a legacy PHP monolith to a new React based application. This application is called Simorgh, an open source, isomorphic single page application
    • Lighthouse performance score saw a 224% increase from 24 > 94. Lighthouse best practice score saw a 27% increase from 79 > 100. Total number of requests dropped by 85% to 17 down from 112. Blocking JS requests dropped by 100% from 9 to 0. JS requests dropped by 79%. Total page weight is now 60% smaller than before. JS size dropped by 61%. Dom Content Loaded is 85% faster at just 0.4s down from 2.6s. Visually complete time dropped by 62% down to just 1.8s vs the previous 4.7.

  • From “Secondary Storage” To Just “Storage”: A Tale of Lambdas, LZ4, and Garbage Collection
    • We use Lambda to accelerate the “secondary” part of secondary storage queries. For each segment (a roughly 1GB chunk of contiguous data), our servers call a Lambda function which pulls the data from S3, reads the necessary bits, and calculates its own local query result. This all happens concurrently, sometimes involving thousands of simultaneous Lambda jobs. Our servers then take the serialized responses and merge them together with the primary storage result. So, rather than acting as a server replacement, Lambda is effectively a force multiplier, with each of our own CPU cores overseeing the work of hundreds of others. When the query is done, all these extra resources disappear back into the cloud, to be summoned – and paid for – again only when needed.
    • The result is that querying over a lot of secondary storage data now needn’t take much longer than querying over a little, something which would be cost-prohibitive to achieve without an on-demand compute service. For larger queries, this can be fully ten times faster, sometimes more.
    • Designing a new compression-friendly file layout allowed us to switch from gzip to the speedy LZ4 without sacrificing much compression. It seems like a small thing, but the result was file reads around three times faster than the old gzipped format, and the speedup applied to all of our secondary storage data.
  • Egnyte on The Journey to 7X Search Performance Improvement
    • upgrade the Elasticsearch cluster to the latest 7.x version to take advantage of the latest performance optimizations and tools
    •  we divided each cluster into 4 clusters per region. These smaller clusters also allowed us to reduce our full cluster backup times
    • We split big indices into smaller indices and provisioned separate indices for some of the larger customers.
    • We concluded that the first 50KB of the document content gave us the same quality for search. It was a great finding for us, as it reduced data size by 30% and also reduced the document size leading to faster reads from the disk.
    • we found that removing trigrams did not reduce search quality and improved the search response time. Based on that, we decided to abandon trigrams and support only exact and prefix matches. 
    • Tests showed that the exact term match was not required as the match query on analyzed tokens would also match documents that would additionally match with the term query. 
    • We took several initiatives to reduce the number of deleted documents.
    •  performance tests on the index with parent-child documents showed that search time increased with an accompanying increase in disk space as well. Hence we dropped this idea.
    • we came up with the idea of a staging index. The idea was to introduce a second index adjacent to the main index, that will store documents temporarily before they are updated with content
    • We decided to increase the memory from 30GB to 200GB on each data node to allow more data to remain in the cache and, therefore, less amount of paging. It resulted in a big improvement in search response times.
    • we decided to move all of our clusters to SSD. Migrating to faster disks was of course beneficial and came at a higher cost.
    • we could clearly see that on a bigger data set as our using ids to update documents gives better results than using paths

Soft Stuff

  • github.com/ElAlev/Wayeb: a Complex Event Processing and Forecasting (CEP/F) engine written in Scala. It is based on symbolic automata and Markov models.

  • github.com/coverclock/com-diag-diminuto: A Linux/GNU systems programming library in C.

  • github.com/visionspacetec: an open source on-board computer platform for CubeSats.

  • github.com/bbc/simorgh (article): The BBC's Open Source Single Page Application. 

  • github.com/HyperdimensionalComputing/collection: we aim to provide a comprehensive collection of projects using hyperdimensional computing. The way the brain works suggests that rather than working with numbers that we are used to, computing with hyperdimensional (HD) vectors, referred to as “hypervectors,” is more efficient. Computing with hypervectors, offers a general and scalable model of computing as well as well-defined set of arithmetic operations that can enable fast and one-shot learning (no need of backpropagation). 

  • github.com/donnemartin/system-design-primer: his repo is an organized collection of resources to help you learn how to build systems at scale.

  • radicle: a peer-to-peer stack for code collaboration 🌱. It enables developers to collaborate on code without relying on trusted intermediaries. Radicle was designed to provide similar functionality to centralized code collaboration platforms — or "forges" — while retaining Git’s peer-to-peer nature, building on what made distributed version control so powerful in the first place.

Hard Stuff:

  • SuperCell: a large-area coverage solution that leverages towers up to 250 meters high and high-gain, narrow-sectored antennas to increase mobile data coverage range and capacity.  Our field measurements found that a 36-sector SuperCell base station mounted on a 250-meter tower can serve a geographical coverage area up to 65 times larger than a standard three-sector rural macro base station on a 30-meter tower in the same topography. 

Pub Stuff

  • Protean: VM Allocation Service at Scale: We describe the design and implementation of Protean – the Microsoft Azure service responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A single instance of Protean serves an entire availability zone (10-100k machines), facilitating seamless failover and scaleout to customers. 

  • High availability in cheap distributed key value storage: The paper talks about using NVMM for building a distributed key value store. NVMMs are a new technology. They are slowly rolling into the datacenters, but there are still questions about their performance and how many writes it could handle before a write wear.

  • EPIC: Every Packet Is Checked in the Data Plane of a Path-Aware Internet (article): We propose EPIC, a family of data-plane protocols that provide increasingly strong security properties, addressing all three described requirements. The EPIC protocols have significantly lower communication overhead than comparable systems: for realistic path lengths, the overhead is 3–5 times smaller compared to the state-of-the-art systems OPT and ICING. Our prototype implementation is able to saturate a 40 Gbps link even on commodity hardware due to the use of only few highly efficient symmetric cryptographic operations in the forwarding process. Thus, by ensuring that every packet is checked at every hop, we make an important step towards an efficient and secure future Internet.

  • Building a fault-tolerant quantum computer using concatenated cat codes: We find that with around 1,000 superconducting circuit components, one could construct a fault-tolerant quantum computer that can run circuits which are intractable for classical supercomputers. Hardware with 32,000 superconducting circuit components, in turn, could simulate the Hubbard model in a regime beyond the reach of classical computing.

  • Achieving 100Gbps intrusion prevention on a single server: Our experiments with a variety of traces show that Pigasus can support 100Gbps using an average of 5 cores and 1 FPGA, using 38x less power than a CPU-only approach.

  • Concurrent and Distributed Systems: This course considers two closely related topics, Concurrent Systems and Distributed Systems, over 16 lectures.

  • REST vs GraphQL: A Controlled Experiment: Our results show that GraphQL requires less effort to implement remote service queries when compared to REST (9 vs 6 minutes, median times). These gains increase when REST queries include more complex endpoints, with several parameters. Interestingly, GraphQL outperforms REST even among more experienced participants (as is the case of graduate students) and among participants with previous experience in REST, but no previous experience in GraphQL.
    • tsimionescu: GraphQL puts most of the onus on the client to know the data model, define their own queries, understand how to join data etc. REST-style APIs do all of this on the server side, and provide the most interesting query results directly....when you say 'give me all cars and all users joined by user ID = owner id', that's business logic. With a good REST API you would just do a GET on /cars and find any user details that are needed already in each car (perhaps under a link, which may lead to the N+1 problem, but that's another discussion).
    • jillesvangurp: I introduced graphql a few months ago to basically unblock myself from having to think about and design gazillions of custom REST endpoints for our mobile client developers. Turns out, that I don't miss doing that. REST has been a huge drain intellectually on this industry ever since people got pedantic over Roy Fielding's thesis and insisted that we stop treating HTTP like yet another RPC mechanism. The amount of debates I've been involved in over such arcane detail as the virtues of using a PUT vs POST and exactly which is more 'correct' in what situation is beyond ridiculous. I appreciate a well designed REST API as much as anyone but for most projects where the frontend code is the single customer of the API, it's not very relevant. If you are shipping SDKs to third parties, it's a different matter of course.
    • Lt_Riza_Hawkeye: It makes perfect sense for facebook. You write the moral equivelent of __attribute__((graphql)) on your code, and boom, you can query it. You want mutations? __attribute__((graphql_root_mutation)). If your object is stored in TAO, everything works perfect. You can add custom fields or even custom objects implemented in PHP that can do whatever the hell they want. You never have to think about a database. And you barely even have to think about security if you're using pre-existing objects, the rules for which users should be allowed to see which objects are written in one centralized place and enforced everywhere in the codebase, graphql included. Of course, it only works that well because there are multiple ~20-30 person teams maintaining that infrastructure. And GraphQL was designed with Facebook's infrastructure in mind. Outside of Facebook, I cannot see myself using GraphQL for any reason.
    • mirekrusin: We use jsonrpc over ws on f/e-b/e and between b/e-b/e services in trading system, typescript types for api, runtime type assertion combinator library for io boundary type checks, backend teams co-maintain client libraries to access the service, it works very fast, it is safe and easy to maintain/track changes etc.
    • Also, How Netflix Scales its API with GraphQL Federation (Part 2)

Reader Comments (1)

Another great scalability blog, thanks for putting these together. Happy holidays!

December 23, 2020 | Unregistered CommenterDave

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>