- High Scalability

Wake up! It's HighScalability time:

The modern day inner sanctum revealed for all to experience. Nausea no extra charge.

Do you like this sort of Stuff? Please support me on Patreon. Need cloud? Consume Explain the Cloud Like I'm 10 (35 nearly 5 star reviews).

8x: V8 Promiss.all parallel performance improvement; 1.3%: print sales increase; 11%: over 65 shared a hoax; 40%: add jobs after deploying AI; 3%: Eventbot's revenue pledged to open source; 51%: successful Ethereum attack; $308,620: cost of a Bitcoin 51% attack; $30: Apple services revenue per device per year; .3 cents: earnings from selling private data; 2,000: baguettes a day produced on French aircraft carrier; 5.6 nm: future smallest grains on a magnetic disk; 11,000: free books from 1923;

Quotable Quotes:
- @mekkaokereke: He joined SpaceX as a "founding employee." He designed the Merlin engine. He's CTO of Propulsion. His name is Tom Mueller. Everyone knows Elon Musk. No one knows Tom Mueller, even though Tom is the one currently designing a rocket that will put humans on Mars. 🤷🏿‍♂️
- Dr. Rachael Tatman: My universal advice is to not get a Ph.D. I even wrote a blog post about it a while ago. The blog’s about linguistics specifically, but most of it applies to machine learning as well. I think that having a Ph.D. can be an advantage when you’re looking for data science jobs, but unless you really want to 1) do research or 2) be a professor there’s no really no benefit to getting a Ph.D. that you can can’t get more quickly doing something else.
- @unclebobmartin: Unit tests are written by programmers for programmers. Acceptance tests are specified by customers for customers. Integration tests are written by architects for architects.
- @wrathofgnon: A city can be seen as an attempt to solve the predicament that is human existence. No building can solve all problems, and as soon as we come up with what we think is a solution, we are faced with a new (often far worse) set of problems. Historically cities have been compromises.
- cdixon: Smartphones are a good example of a broader historical pattern: technologies usually arrive in pairs, a strong form and a weak form...But it’s strong technologies that end up defining new eras...Weak technologies adapt to the world as it currently exists. Strong technologies adapt the world to themselves. Progress depends on strong technologies.
- @Tr0llyTr0llFace: Today's Bitcoin dump was a deluge of fat-tail events: a waterfall of five 1-minute candles above 7 standard deviations in length (8σ, 10σ, 14σ, 8σ, 9σ). Good luck day-trading that.
- Geoff Huston: In the last couple of months, we have seen evidence that points to large scale deployment of IPv6 services in China. This is most evident in the regional networks of China Mobile and in ChinaNet...If one was to look to China to be the last piece in a critical mass of IPv6 deployment that will propel the Internet’s migration over the coming years, then the picture is looking very encouraging.
- @danveloper: What is the hill you die on? Mine is that WebAssembly is the future. I think the next three years will bring some great platforms built on WASM.
- Matt Klein: The frank reality is that, at scale, how well an organization does with code sharing, collaboration, tight coupling, etc. is a direct result of engineering culture and leadership, and has nothing to do with whether a monorepo or a polyrepo is used.
- @DrShaffopolis: Spy agencies: Opening back doors in software for ourselves doesn't mean bad guys can get in through those same back doors. Also spy agencies: This bad guy opened a back door for himself and then we got in through that same back door to catch him.
- Dropbox: It is a common misconception that encryption is expensive. Symmetric encryption is actually blazingly fast on modern hardware. A desktop-grade processor is able to encrypt and authenticate data at 40Gbps rate on a single core
- CydeWeys: If you're not on the largest coin for any particular interchangeable hashing algorithm then you're susceptible to these [51%] attacks, as people from a larger coin could simply turn their hardware against you and take you out. That means: For SHA256^2-specific hardware, Bitcoin (the real one, not Cash, Gold, or SV), for scrypt, Litecoin, and for anything mined on GPUs, Ethereum (not Classic).
- @JaredNaude: Someone sent a message using IPv6 packets constructing a message using morse code that is visible from the monitoring system: "Use more bandwidth"
- @jack_daniel: I'm an old person in a movie. In spite of the fact that my generation and those before me invented computers and the internet we're portrayed as technically incompetent by younger people who can't use switches with ping.
- Brenon Daly: No matter where shoppers looked last year, the tech M&A market was a pricey place to be. Unprecedented valuations are one of the main reasons why overall acquisition spending basically matched the highest level since the dot-com collapse. 451 Research’s M&A KnowledgeBase recorded $573bn worth of deals in 2018, nearly equaling 2015’s record but on 20% fewer transactions.
- donjulioanejo: IDK I find CodeBuild almost prohibitively expensive in a medium-sized team with a medium to large Java project when pay like $0.10/minute for a 30+ minute build time (with unit tests). That's like $3/build, with easily dozens of them a day if you want to do per-commit CI.
- Jeff Barr: This job [Western Digital HDD Simulation at Cloud Scale – 2.5 Million HPC Tasks, 40K EC2 Spot Instances] ran 8 hours and cost $137,307 ($17,164 per hour). The folks I talked to estimated that this was about half the cost of making the run on an in-house cluster, if they had one of that size!
- DSHR: This gives rise to a Gresham's Law of preservation, in which low-quality services, economizing on replication, metadata, fixity checks and so on, out-compete higher-quality services. Services such as DPN and Duracloud, which act as brokers layered on top of multiple services and whose margins are thus stacked on those of the underlying services, find it hard to deliver enough value to justify their margins, and are strongly incentivized to use the cheapest among the available services.
- Tim Bray: And when you’re working out the costs of serverless vs serverful, ask yourself: How much is it worth to you to not have to manage hosts, or containers, or capacities, or Kubernetes? I don’t know the number, but I’m pretty sure it’s not zero.
- GREGORY BARBER: First I signed up for an app called Datum. With a tap, my GPS location was shared; in exchange I was promised 1 DAT, a token that can be traded on the Ethereum blockchain. Next I scrolled to Doc.ai, where users share everything from their prescriptions to results of microbiome tests in return for a coin called NRN.
- Sam Bashton: In a single morning we refactored the code to use a single Lambda invocation every minute, operating on 20% of the customer base each time. This Lambda spawns a Goroutine per customer, which spawns a Goroutine per region, which spawns a Goroutine per AWS service. The Lambda execution time hasn’t increased significantly, because as before we’re mainly waiting on network I/O - we’re just waiting on a lot more responses at the same time in a single Lambda function. Cost per customer is now much more manageable however, becoming lower and lower with every sign up.
- CALM: Recent work showed that state-of-the-art multiprocessor key-value stores can spend 90% of their time waiting for coordination; a coordination-free implementation called Anna ran over two orders of magnitude faster by eliminating that coordination
- Stonebraker: [A] pick-up team of volunteers, none of whom have anything to do with me or Berkeley, have been shepherding that open source system ever since 1995. The system that you get off the web for Postgres comes from this pick-up team. It is open source at its best and I want to just mention that I have nothing to do with that and that collection of folks we all owe a huge debt of gratitude to.
- Joerg Blumtritt: After ten years of research and development, privacy-preserving computation is finally ready for commercial application. Homomorphic encryption plays a key role in solving the most pressing problem in data protection: getting useful information from data without breaking privacy.
- Ed Sim: Still in second inning for enterprise move to cloud: Regardless of what economic cycle we endure, the Fortune 500 march to a cloud-native architecture will continue. For the more advanced enterprises who have migrated to the cloud, this will be a year of net new technology and building applications. Along these lines, we are starting to hear serverless more and more from the Fortune 500 and see this trend reflected in the sales pipeline at iopipe which has gone from mostly startups to larger companies
- Mick Semb Wever: The trade-off for better supporting wide partitions in Cassandra 3.11.3 is increased read latency as row offsets now need to be read off disk. However, modern SSDs and kernel pagecaches take advantage of larger configurations of physical memory providing enough IO improvements to compensate for the read latency trade-offs.
- David Gerard: Proof-of-stake in its simplest form is "them what has, gets." The idea is those with the most resources have the most interest in maintaining the coin. This could be an even more powerful centralising force than proof-of-work.
- Amy Nordrum: For most of the past 50 years, the areal density of hard disks—a measure of how many bits of data that engineers can squeeze into a given area—increased by an average of nearly 40 percent each year. Lately, though, that rate has slowed to around 10 percent.
- NovaX: I’m confused by the read/write performance numbers. I don’t disagree that lock-free reads are better, but 25M reads/sec at 16 threads is really bad. I can do 13.5M with a niave exclusive lock on an Lru cache, and 380M reads / 48M writes on a concurrent cache. The left/right concurrency isn’t novel and I feel bad being so very underwhelmed. What am I missing?
- MrTonyD: I've been working in Open Source for a long time now - decades. The problem I see is that so many rich and wealthy companies (and their billionaire owners) are the big beneficiaries of open source. Look at Spark, Hadoop, Linux, gnu tools and others - who uses them at a large scale? It is the wealthy companies who avoid paying salaries for the development of those tools. So I've become convinced that we should distinguish between small companies and large companies, and that small companies should get to use open source, while the big wealthy companies should be required to pay. It should be analogous to the free software given to education - with restrictions in license.
- John Mark: Once you begin with the premise of “I need an open source business model”, it leads you down a path of “I need to monetize my project” rather than “I need to build a product that delivers value”.
- Wally Rhines: We’ve hit the limits on the pipelines. It’s the same regarding performance for predictive branching, where you do speculative execution, and if you took the wrong branch you would go back. As long as the wrong choice was a rare event, that improves the performance. But now we’ve gotten to the point where the pipelines are so long that speculative execution produces wrong decisions too many times, and it ends up not speeding up. So now you’ve topped out putting big pipes together to get parallelism that way, and you’re looking for other ways to get that parallelism. There are a lot of ways to do that, but they require special architectures.
- niftich: For many of these newer projects, the libre aspect isn't a heartfelt belief -- it's a sort of loss-leader strategy to enable access to a particular type of audience, and unlock a particular type of language for marketing. Handfuls of people may exercise their rights to fork and/or redistribute, but plenty of intrinsic barriers exist to keep these from being a competitive threat -- until a sufficiently equipped and dedicated party like AWS or Google Cloud, that is.
- @whereistanya: After several attempts to stop paying AWS 80c every month I spent an hour searching the console and finally found the stray service I hadn't deleted. And I was *sure* I had it this time until... I just got an AWS bill for 23c. This thing is the goddamn Hotel California.
- @Carnage4Life: A recent study by Harvard researchers showed these results from those who switched to an open plan office • 73% less time in face-to-face interactions • 67% more time on email • 75% more time on instant messenger Only real benefit is cost effectiveness
- @sarahjeong: i'm never going to stop thinking about the venture capitalist who scolded me for scent pod fragrance machine skepticism and compared the invention to the home computer
- @JoeEmison: I have a hard time getting worked up over AWS / OSS / Mongo / DocumentDB. I’d rather spend more time pushing all the corporations who use OSS and don’t help support it at all to do so (perhaps through @tidelift). AWS has given the world and tech communities a lot.
- @kellabyte: Every coders experience coming back to work after time off from work. 1. I better run this command to make sure this code is still working. Nope it’s broke. Hmm I thought it was working when I left. 2. Run make. Nothing compiles. Hmm I thought this was compiling before I left.
- UofG: Thus, these neurons act like a gate for the incoming information, and which is normally closed. But when feedback comes in, the gate is opened, allowing those synapses that take care of the primary sensory information to increase their strength. With this study we have identified how feedback possibly optimizes synaptic connections to better prepare for future incoming information," she adds.
- Rachel Traylor: Just as in chemistry and physics, packet flow in a network has microscopic behavior controlled by various protocols, and macro-level dynamics. We see this in queueing theory as well–we can study (typically in steady-state to help us out, but in transient state as well) the stochastic behavior of a queue, but find in many cases that even simple attempts to scale the analysis up to networks (such as retaining memorylessness) can become overwhelming. What ends up happening in many applied cases is a shift to an expression of the macro-level properties of the network in terms of average flow. The cost of such smoothing is an unpreparedness to model and thus deal effectively with erratic behavior. This leads to overprovisioning and other undesirable and costly design choices to mitigate those risks.

A lively reddit discussion on Is it just me or is aws is a nightmare for beginners? faceyjim: As somebody who has used the platform since 2011 and also regularly uses gcp I can’t say that it’s easier or harder than gcp. But I don’t think cloud is difficult at all, it’s just doing things “correctly” instead of being lazy and taking shortcuts which I think typical sysadmins do. (Not saying you are, just typical sysadmins) wrensdad: No. It's a nightmare for experienced folks too. Interacting with AWS seems like what happens when you let the developers write the platform AND the UI. birdstweeting: Well.. in the old days ... *stamps pipe* .... we had separate teams for network, Windows, Linux, storage, database, data centre management, etc etc, and you only had to specialise in one of those areas. But now it's all bundled together. So yeah, AWS can be complicated, but it's pretty much covering the whole stack from bare metal up to the front end.

FoundationDB Summit videos are now available.

Santa delivered Marco Arment a lump of Xmas coal in the form of a nasty service impacting bug. What followed was an instructive debugging experience report under dire visiting relatives in a remote land type conditions. The Nightmare After Christmas. After trying many things, as one does, Linode stepped up and found a load balancer was deluged with traffic. Was it a connection management bug? Was it a DoS attack? We'll never know. Linode suggested switching to Cloudflare and that along with some last minute code performance improvements—fixed the problem. Achievement unlocked. Some lessons. If you're a service make your pricing clear and make it easy to sign up. You're losing customers otherwise. Making a horizontally scalable system is still a good idea. Marco smartly followed the 2x rule when testing server sizing. Don't mess around. Double the amount of server resources to see if that removes bottlenecks. It didn't, so the problem was elsewhere. Observability across an entire mobile and server stack makes it brutally difficult to track down problems. PHP. Enough said.

Here's a benchmark of Lambda cold start times in different regions, VPC, and different memory configurations. VPC is a lot slower. Smaller memory configs have generally lower latencies.

Lessons from developing Postgres:
- The highest-order lesson I draw comes from the fact that that Postgres defied Fred Brooks’ “Second System Effect” [Bro75]. Brooks argued that designers often follow up on a successful first system with a second system that fails due to being overburdened with features and ideas. Postgres was Stonebraker’s second system, and it was certainly chock full of features and ideas. Yet the system succeeded in prototyping many of the ideas, while delivering a software infrastructure that carried a number of the ideas to a successful conclusion. This was not an accident—at base, Postgres was designed for extensibility, and that design was sound
- Another lesson is that a broad focus—“one size fits many”—can be a winning approach for both research and practice. To coin some names, “MIT Stonebraker” made a lot of noise in the database world in the early 2000s that “one size doesn’t fit all.” Under this banner he launched a flotilla of influential projects and startups, but none took on the scope of Postgres. It seems that “Berkeley Stonebraker” defies the later wisdom of “MIT Stonebraker,” and I have no issue with that.13
- A final lesson I take from Postgres is the unpredictable potential that can come from open-sourcing your research. In his Turing talk, Stonebraker speaks about the “serendipity” of PostgreSQL succeeding in open source, largely via people outside Stonebraker’s own sphere.

Murat is not reading books or watching Netflix because he's reviewing papers for you: An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems: Almost all (97%) of crash recovery bugs involve no more than four nodes. This finding indicates that we can detect crash recovery bugs in a small set of nodes, rather than thousands. A majority (87%) of crash recovery bugs require a combination of no more than three crashes and no more than one reboot. It suggests that we can systematically test almost all node crash scenarios with very limited crashes and reboots. Crash recovery bugs are difficult to fix. 12% of the fixes are incomplete, and 6% of the fixes only reduce the possibility of bug occurrence. This indicates that new approaches to validate crash recovery bug fixes are necessary. Also, Paper review. Serverless computing: One step forward, two steps back

How long before cities build eSports arenas centered around custom datacenters? Data Centers Power the Growth of eSports for Gamers, Streamers.

My how things change. You know Apple has become a services company because they've given up the iPhone tax that held services back. The question is what does Apple's service revenue push mean for privacy? Apple Music support hits Amazon Echo. Apple is putting iTunes on Samsung TVs. Apple's privacy pitch has always been we don't care about your data because we sell you impossibly expensive slumps of glass. When your growth opportunity is in services, doesn't data's siren song sing louder?

Companies like Apple, Google, and Amazon, building their own chips has been enabled by tool chain improvements—and a different way of thinking. Chip Industry In Rapid Transition: One reason [companies can develop custom chips] is high-level synthesis. Most of the change came in the datapath, and by going to high-level synthesis you could increase the amount of simulation by several orders of magnitude to test out new architectures. That was an enormous boon. If you look at who’s doing all of these leading-edge algorithmic chips, the leaders tend to be companies that weren’t leaders before. There are system companies—Google, Facebook, Amazon—and new people coming into automotive. They don’t have as much legacy know-how with the traditional ‘start by writing lines of Verilog.’ They’re more open to writing their algorithms in C++ and then synthesize them. That’s one of the contributors. Another contributor is the ability to buy shared emulation so that you don’t have to be a big company to afford access to an emulator. There are others. With AI-driven simulation, we’re looking at least a half-order of magnitude to an order of magnitude speed-up just from the ability to do the right simulation rather than simulating everything. Then there are the advantages of the Portable Stimulus, allowing you to minimize the redundant simulation...Yes, and that’s a necessary condition. Whenever you make a fundamental change in abstraction, it’s really hard for people to change. Having new people come into the business with a clean sheet of paper allows you to move a lot faster than if you have years and years of legacy. When we went to RTL from schematic capture, it was a very slow process. But the new startups and new college graduates were very quick to adopt it.

Fair use gone wrong. Rick Beato really knows music. He creates a lot of educational content on his channel. But to teach music you have to play music. You can guess what happens. His videos get blocked or demonitized because of the clips he plays. So he doen't make money on those videos. Does that incent him to make more educational videos? Of course not. Allowing copyright to be used as a weapon hurts us all. It encourages the production of random crap, not the curated wisdom that's the promise of the digital age.

By making a MongoDB clone is Amazon giving the middle finger to open source? Arguments both yes and no. By the letter of the law, no, that's what open source let's you do. But by the spirit of the law? It's a strange move. Doesn't Amazon have enough to support already? Why take on the forever project of supporting an downrev version of MongoDB? Usually Amazon's moves make more sense than this. If it really is a move to discourage licences like Server Side Public License (SSPL), that impose conditions on companies using a technology, then it's not a middle finger, this is trying to choke off someone's air supply.

The software development process has stagnated for decades. Evidence can be found in that so many arguments never seem to die. This is literally an argument from 30 years ago. Monorepos: Please don’t! There's a deep parallel here between building a centralized versus a distributed system. Centralization makes some things easier and other things harder. Decentralization makes some things easier and other things harder. Want to apply a security checker across the entire code base? Try that with 50 repos. Want to trace and fix a bug? Try that if components cross 50 other repos. Thinking there's only one way to do it? Crazy, but even crazier is that this is still an issue. Also, Why does decentralization matter?

A site with excellent coverage of SRE topics. SRE: Resiliency: Bulkheads in Action — Using JS. SRE: Resiliency: Retries in Action — Using JS. SRE: Resiliency: Bolt on Rate Limiting using Envoy.

Looking for a new stack? Good HN discussion on Would you still pick Elixir in 2019? Yes would be the general concensus, but there's an interesting thread on distributed objects vs messsage queues as the superior design primitive. quaunaut: Until I used Elixir, I thought workers/queues were enough. But after the last nearly-three-years, I've actually fallen into a place where workers/queues are almost always strictly inferior. jashmatthews: Erlang/Elixir has some really great advantages in concurrency and parallelism but what you're describing are just badly designed systems. Shopify, for example, use Resque (Ruby + Redis) to process thousands of background jobs per second. toast0: I'm a big fan of Erlang, but I think you can acheive similar things in other languages with queues and workers. Erlang's advantage here is that you can do easily do a worker per client connection, for almost any number of client connections; for data processing queues, the lack of data sharing between processes (threads) strongly pushes you towards writing things in a way that easily scales to multiple queue workers -- you can of course write scalable code in other languages, but it's easier to write code with locking on shared state when shared state is easier. arvidkahl: I've been working with Elixir in a single-developer production system for over a year now. I'm running it in Docker containers on Kubernetes, in the cloud. It has been extremely stable, scaling has been a non-issue. Error reporting has become easier and easier, now that companies like Sentry and AppSignal have integrations for Elixir.

Building your own framework can add huge system-wide wins because you operate at the meta infrastructure level. You aren't stuck focussing most of your efforts working around framework quirks. Look at the leverage built by Courier: Dropbox migration to gRPC: We settled on gRPC primarily because it allowed us to bring forward our existing protobufs. For our use cases, multiplexing HTTP/2 transport and bi-directional streaming were also attractive...Courier implements our standard service identity mechanism. All our servers and clients have their own TLS certificates...After identity is confirmed and the request is decrypted, the server verifies that the client has proper permissions. Access Control Lists (ACLs) and rate limits can be set on both services and individual methods. They can also be updated via our distributed config filesystem (AFS). This allows service owners to shed load in a matter of seconds, without needing to restart processes...Our code generation adds per-service and per-method stats for both clients and servers. Server stats are broken down by the client identity...Every gRPC request includes a deadline, indicating how long the client will wait for a reply. Since Courier stubs automatically propagate known metadata, the deadline travels with the request even across API boundaries...Another common problem that our legacy RPC clients have to solve is implementing custom exponential backoff and jitter on retries. This is often necessary to prevent cascading overloads from one service to another...Having the ability to get an insight into the runtime state is a very useful debug feature, e.g. heap and CPU profiles could be exposed as HTTP or gRPC endpoints.

We take map views for granted these days, but they are really hard to get right. As Uber shows in Building a Scalable and Reliable Map Interface for Drivers. It's the same idea as repos and distributed computing. We want clean layers but as soon as we divide things into layers we have to spend a lot of time working out how those layers can work together again. This is all a recapitulation of Descartes' mind-body dualism.

I'm so guilty of this. Save time and money with AWS Lambda using asynchronous programming. The idea is to send multiple items of work to a lambda function and operate on them in parallel at the handler level. For me it's a lot harder to identify and parallelize tasks than it is to parallelize data. It would also be nice if AWS made Step Functions as cheep as it needs to be. Also, SF-5: Serverless Bills?

S3 is used as a data lake for Epic. Spark is front and center in Epic's architecture. How Fortnite approaches analytics, cloud to analyze petabytes of game data: Fortnite processes 92 million events a minute and sees its data grow 2 petabytes a month...The company invited 125 million people to participate at the same time. Akamai said Fortnite set a game traffic record on its network July 12 with 37 terabytes per second delivered across its platform...everything is stored in AWS S3. It's a real-time pipeline that integrates everything from S3 to Spark to scores to telemetry data to Tableau and SQL. We use the data for everything from ARPU to game analysis and improvements...Architecture is critical. A company like Epic--like other gaming companies--provide good lessons for enterprises. Why? One million customers can show up at a first day product launch.

Microsoft/SEAL (article): Microsoft Simple Encrypted Arithmetic Library (Microsoft SEAL) is an easy-to-use homomorphic encryption library developed by researchers in the Cryptography Research group at Microsoft Research. SEAL is written in modern standard C++ and has no external dependencies, making it easy to compile and run in many different environments.

This could be big. triggermesh/knative-lambda-runtime: are Knative build templates that can be used to run an AWS Lambda function in a Kubernetes cluster installed with Knative. The execution environment where the AWS Lambda function runs is a clone of the AWS Lambda cloud environment thanks to a custom AWS runtime interface and some inspiration from the LambCI project. With these templates, you can run your AWS Lambda functions as is in a Knative powered Kubernetes cluster.

Future *is* predictable: Welcome to WardleyMaps book project! This project should lead to a book, a freely-downloadable book explaining in details the art of strategic play at corporate level.

Keeping CALM: When Distributed Consistency is Easy: A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. Until recently, there was no answer to the question of when coordination is actually required. In this paper we present an informal introduction to the CALM Theorem, which answers this question precisely by moving up from traditional storage consistency to consider properties of programs.

Noria: dynamic, partially-stateful data-flow for high-performance web applications (video): We introduce partially-stateful data-flow, a new streaming data-flow model that supports eviction and reconstruction of data-flow state on demand. By avoiding state explosion and supporting live changes to the data-flow graph, this model makes data-flow viable for building long-lived, low-latency applications, such as web applications. Our implementation, Noria, simplifies the backend infrastructure for read-heavy web applications while improving their performance. A Noria application supplies a relational schema and a set of parameterized queries, which Noria compiles into a data-flow program that pre-computes results for reads and incrementally applies writes. Also, Faster: A Concurrent Key-Value Store with In-Place Updates