- High Scalability

Wake up! It's HighScalability time:

Forrest Brazeal warns you not to spend your limited innovation credits building bespoke, complex systems that provide no direct value to your business, instead position yourself at the top of what heI calls the Wisdom/Cleverness Curve

Do you like this sort of Stuff? Your support on Patreon is appreciated more than you can know. I also wrote Explain the Cloud Like I'm 10 for everyone who needs to understand the cloud (which is everyone). On Amazon it has 84 mostly 5 star reviews (147 on Goodreads). Please recommend it. You'll be a real cloud hero.

Number Stuff:

20%: drop in tech M&A spending in 2019.
$6 billion: cost of fake clicks out of $300 billion spent on online ads in 2019. Potentially tens of billions of fraud have not be accounted for.
380: undersea cables spanning more than 745,000 miles—or more than three times the distance to the moon. Google is a part or sole owner of 15 different undersea cables. One of the fastest undersea cables in operation today is the Marea cable, partially owned by Microsoft, Amazon, and Facebook. It transmits data at 160 terabits per second.
60: more Startlink satellites sent into space by SpaceX.
100 million: active Apple News users.
32: strands of DNA used to calculate the square root of square numbers 1, 4, 9, 16, 25 and so on up to 900. The team controls hybridisation in such a way that it changes the overall fluorescent signal so that it corresponds to the square root of the original number. The number can then be deduced from the colour.
200 million: Alexa powered devices. 2x growth in one year.
500 million: Google Assistant users.
$1: invested after the Civil War would be worth $1 million today (even allowing for inflation).
11000 watts: our social metabolic rate is 11,000 watts (a dozen elephants, 3/4 of blue whale). It only takes 90 watts of energy (2000 calories a day, a lightbulb) to keep us alive.
1%: of CPU and 4% of RAM globally at Google is used by hash tables.
3: concurrent k8s clusters running on F-16s. The Department of Defense is making a big bet on containers, Kubernetes and Istio. It’s a flexible but universal development platform for software teams across the military and prevents vendor-lock in.
-65%: Fortnite year-over-year decline in net revenue on iOS. It is likely that Fortnite is generating less than half its total mobile revenue on Android.
$1 billion: spent through eBay's Buy APIs.

Quotable Stuff:

Forrest Brazeal: Here’s the problem: K8s is so complex that we avoid even spelling out the word, like it’s the Hebrew name for God. The orchestration and configuration requirements to run it in production are far beyond many teams’ comfort level. That’s why hosted versions of Kubernetes like GKE and EKS are so popular. When your open-source software is so complex that it effectively requires a cloud provider to run it on your behalf, you’ve stumbled into back door lock-in. And you don’t even get the advantage of traditional cloud lock-in, which is deep integration between native services.
Bertrand Meyer: The Shortest Possible Schedule theorem confirms what good project managers know: you can, within limits, shorten delivery times by bringing all hands on deck. The precise version deserves to be widely known.
Adrian Colyer: The central irony referred to in this paper is that the more we automate, and the more sophisticated we make that automation, the more we become dependent on a highly skilled human operator.
@ID_AA_Carmack: My formative memory of Python was when the Quake Live team used it for the back end work, and we wound up having serious performance problems with a few million users. My bias is that a lot (not all!) of complex “scalable” systems can be done with a simple, single C++ server.
Chris Swan: The emerging message is simple – once your data is in a cloud storage is very cheap relative to the cost of getting the data back out. The corollary is also alluringly simple – data gravity is real, so you should put all of your data (and associated workload) into the same cloud provider.
@ben11kehoe: Christmas Day operations at iRobot so far: requested a limit increase for a firehose stream. That’s about it. Anyone who says #serverless isn’t ready for production doesn’t know what they are talking about. There are two important beneficiaries of our serverless architecture today: our customers, obviously, but also our employees: their minds can be at ease, able to celebrate if they are celebrating.
Linus Torvalds: I repeat: do not use spinlocks in user space, unless you actually know what you're doing. And be aware that the likelihood that you know what you are doing is basically nil. There's a very real reason why you need to use sleeping locks (like pthread_mutex etc). In fact, I'd go even further: don't ever make up your own locking routines. You will get the wrong, whether they are spinlocks or not. You'll get memory ordering wrong, or you'll get fairness wrong, or you'll get issues like the above "busy-looping while somebody else has been scheduled out".
@sophiebits: Someone asked me today about the differences between Google’s and Facebook’s engineering cultures. Based on what I know, here’s what I said: 1. G’s tech stacks differ widely between products (many teams build their own infrastructure or frameworks), whereas FB favors consistency across teams (eg: 95% of FB’s web properties use the same stack). 2. G has a stronger culture of “only some people can work on certain code” (OWNERS files, “readability”); FB tries more to encourage cross-team contributions (eg: there are posters with “Nothing at Facebook is someone else’s problem”) – as well as internal mobility. 3. G historically rewarded building technology for the sake of building technology; FB has been more focused on building products and seeing technology only as a means towards that.
@AWSreInvent: @clare_liguori shares how AWS Fargate takes advantage of Firecracker microVMs that are based on Nitro. These VMs require less than 5mb of memory & allow for extremely high-density containers on bare metal instances. #reInvent
Eric Berger: Some have questioned whether China, which has flown six human spaceflights in the last 16 years, can really build a large low-Earth space station, send taikonauts to the Moon, return samples from Mars, and more in the coming decade or two. But what seems clear is that the country's authoritarian government has long-term plans and is taking steps toward becoming a global leader in space exploration.
@cloud_opinion: AWS is executing MSFT's playbook from the 90s MSFT is executing GOOG playbook from the 2000s GOOG is executing Xerox playbook from the 80s
@melissamcewen: Software Engineering Pro Tip: Avoid releasing software. Just keep it on your localhost. Releases just cause problems, not just on Holidays or weekends, but weekdays too.
spankalee: Yeah, App Engine would have cost $0 since that traffic would fit in the free tier. Cloud Run would be the slightly more general and newer way to do this too, and the free tire there is 2 million req/month. I _think_ he could have migrated containers directly from GKE.
Adrian Colyer: This puts me in mind of Kernighan’s Law (“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”). If we push ourselves to the limits of our technological abilities in automating a system, how then are we going to be able to manage it?
Linus: Reality is messy.
kevstev: The performance ceilings aren't that different, and not that impactful, at least not until you get to FANG scale, and I mean literally only FANG scale. We were running a billion dollar business with on 8 fairly small VMs for the API layer, which handled all of the ecommerce transaction handling. I remember at one point we encountered a memory leak of some sort in node, and the instances were falling over and dying about once an hour, but restarting and recovering- this was causing a few % error rates to our customers. I was insistent that we get all hands on deck to figure this out ASAP, and our head of Ops type person said "kevstev, we can throw hardware at this problem to meet SLOs until you get it under control. Your monthly server costs are less than my studio apartment cost me per month in Jersey City 15 years ago."
@iamharaldur: Seven years ago I led a team to create the first Google Santa Tracker. I was not supposed to be in that role. But it changed my life. It started as a pretty standard call. I had been working with Upperquad and their founder, my friend Phil, on a number of projects, many of them for Google. This time around, the Google Maps team wanted a way for kids to follow Santa on Christmas Eve.
Ariel Ellis: When I started at Backblaze, SSDs were more than ten times the cost of conventional hard drives. Now they’re about three times the cost. But for Backblaze’s business, three times the cost is not viable for the pricing targets we have to meet.
Renan Ranelli~ Do your reads in the same transaction as your writes. Pay special attention to updates; Be ready to retry transactions if you want consistency, do mindlessly think transactions will always be successful; You, backend develop, are the guardian of primary data's integrity.
@copyconstruct: Subtweeting here: Requests/second is a garbage metric. It means jack shit unless you’re willing go into what kind of request you’re serving, what the cache hit ratio is, what your traffic distribution looks like, what’s the expected response time and more.
@fchollet: Our field isn't quite "artificial intelligence" -- it's "cognitive automation": the encoding and operationalization of human-generated abstractions / behaviors / skills. The "intelligence" label is a category error
@benawad: The dark side of GraphQL: performance. My resolver took 19ms to fetch 20 recipes from postgres with ALL fields then GraphQL took 426ms resolving/validating the recipe fields... 🙃
Qnovo: The cost of lithium-ion batteries declined in the past decade from over $1,100 per kWh to $150 per kWh in 2019. Forecasters expect this figure to drop below $100 in 2023. At such levels, electric vehicles will reach cost parity with traditional vehicles using internal combustion engines (ICE) — without government buyer incentives.
rahvin: Spinning Rust is still way cheaper and will remain way cheaper for mass storage for likely a very long time. NAND prices collapsed because of an oversupply, like the DRAM crash in the early 2010's NAND prices are going to correct, every producer has slashed production and most have ended entire production lines. By the end of 2020 I expect prices to at least double if not triple on NAND. In the meantime with these new storage technologies mass storage will continue to be dominated by spinning rust and the amount of storage that's needed in the cloud companies is staggering, demand that couldn't be met by NAND because enough isn't produced to meet it. NAND will likely takeover the consumer market completely, but NAND has mostly already taken over that market so it doesn't really hurt the drive producers. Spinning Rust is frankly the only solution for the exabyte storage needs of the cloud companies, at least for the foreseeable future.
Boeing Employees: "I still haven't been forgiven by God for the covering up I did last year" ; "Would you put your family on a Max simulator trained aircraft? I wouldn't," says one employee to another, who responds, "No."
@callmevlad: My group chat stack: • Wife+kids → iMessage • Besties → Marco Polo • Siblings → Telegram • Wife’s fam → Viber • Church friends → Messenger • EU friends → WhatsApp • Tech friends → Twitter • Work → Slack Wanted seamless communication, got annoying fragmentation.
pjc50: For some reason, people have a fantasy that RISC-V is going to give them a sub-dollar microcontroller that runs Linux, rather than being like ARM where "microcontroller" and "Linux capable" and "server grade" are three entirely different market segments.
flomo: Plus the old chestnut that 50%-80% of software projects are failures, and looking out for number one means resume-oriented development for the next job. You are probably better off with a failed project on a 'cool' framework than a successful project on an uncool one.
@mulligan: Startup Guidance 2010s: • Work 120 hours a week • Ignore profitability • Grow at all costs • Bay Area or bust • Hustle or die trying 2020s: • Work however much feels right • Focus on revenue • Build a sustainable company • Hire remote teams • Take care of yourself
@hillelogram: Everybody says "Write simple code". Everybody says "complexity is bad". But we don't have a "concept" of complexity. We can't "do" anything with it besides say it's bad. There's only two popular treatments of complexity as a concept, and neither goes deeper than a basic level.
Time Anderson: The big [cloud] picture did not change radically in 2019. All the big three cloud providers (or big four if you consider Alibaba’s growth in East Asia) continued to grow at a dramatic pace. In the quarter ending September 30 2019, AWS reported 34 per cent revenue growth and Microsoft Azure 59 per cent growth compared to the same quarter a year ago. Google did not offer an exact figure for the quarter but said that (PDF): “Other revenues for Google were $6.4bn, up 39 per cent year-over-year, once again fuelled by cloud.” AWS remains the largest IaaS (Infrastructure as a Service) provider by some distance. How much though? Gartner said in July that AWS had a 47.8 per cent share, ahead of Azure at 15.5 per cent and Google at 4.0 per cent. Canalys in October put AWS at 32.6 per cent, Azure at 16.9 per cent and Google at 6.9 per cent. Most seem to agree that while AWS is still growing fast, its market share overall is slipping just a little.
@alexbdebrie: General rule: DynamoDB won't let you write a bad query. It *makes* you plan in advance. Great predictability over time, but more work up front. There are two areas where DDB might not scale -- those are covered in the post as well.
Henry: Most of these sets were empty for the entirety of their life, but each one was consuming enough memory to hold the default number of entries. Huge numbers of these empty sets were created and together they consumed over half a gigabyte of memory. Over a third of our available heap. My simple change set the initial capacity to 0, greatly reducing the overhead.
aevitas: Exactly. When you're launching something quick and under pressure, choosing the technology stack you know and are comfortable with is a good choice.
David Rosenthal: In other words, most transactions pay only around 10% of the cost of their routing. This greater than Uber level of subsidy could make business sense only in the context of investing in a future monopoly capable of massively raising prices. But prices are capped by transaction fees on the Bitcoin blockchain, making it impossible. Even with this level of subsidy, the network doesn't work well. Béres et al estimate that the network attempts about 7,000 transactions per day. They simulated lower and higher transaction rates and summarize the rates of failed transactions in Figure 22. Note that at 7,000 transactions per day one-third of them fail. This is not a practical payment system.
Kevin Mitchell: For A.I., it still may be best to try and keep such priors to a minimum, to make a general-purpose learning machine. On the other hand, making a true A.I. – something that qualifies as an agent – may require building something that has to do more than solve some specific computational problems in splendid isolation. If it has to be embodied and get around in the world, it may need as much help as it can get.
Geoff Huston: And if we want to reduce buffer size and maintain efficient and fair performance how can we achieve it? One view is that sender pacing can remove much of the pressure on buffers, and self-clocking flows can stabilise without emitting transient bursts that need to be absorbed by buffers. Another view, and one that does not necessarily contradict the first, is that the self-clocking algorithm can operate with higher precision if there was some form of feedback from the network on the state of the network path. This can be as simple as a single bit (ECN) or a complete trace of path element queue state (HPCC).
Alessandro Vespignani: There is no general theory of networks.
Brian Bailey: 2019 has been a tough year for semiconductor companies from a revenue standpoint, especially for memory companies. On the other hand, the EDA industry has seen another robust growth year. A significant portion of this disparity can be attributed to the number of emerging technology areas for semiconductors, none of which has reached volume production yet.
Danny Hillis: As we are becoming more entangled with our technologies, we are also becoming more entangled with each other. The power (physical, political, and social) has shifted from comprehensible hierarchies to less-intelligible networks. We can no longer understand how the world works by breaking it down into loosely-connected parts that reflect the hierarchy of physical space or deliberate design. Instead, we must watch the flows of information, ideas, energy and matter that connect us, and the networks of communication, trust, and distribution that enable these flows.
softwaredoug: Move fast/break things works great when the risks associated with the “breaking” are minuscule, experiments can be measured and isolated, and the upside for experimentation high (UI tweaks, recsys or search algo improvements, social media, etc). Move slow, keep things working works well when breaking things has a huge downside and safe experimentation on live customers near impossible. Such as medicine, legal compliance, building airplanes, utilities.
@dtemkin: Nearly every trip I’ve taken with @Uber since they moved their drivers to their in house navigation has been a suboptimal route with extra distance and time. Numerous examples where the route was 2x the distance it needed to be. Have others seen this?
Daniel Abadi: Indeed, one of the main obstacles to building decentralized database systems like what we are proposing is how to secure the confidentiality, integrity, and availability of data, query results, and payment/incentive processing when the participants in the system are mutually distrustful and no universally-trusted third party exists. Until relatively recently, the security mechanisms necessary for building such a system did not exist, were too inefficient, or were unable to scale. Today, we believe recent advances in secure query processing, blockchain, byzantine agreement, and trusted execution environments put secure decentralized database systems within reach.
Werner Vogels: The reason AWS has the most purpose-built databases of any cloud provider is so that customers have more choices and freedom. In addition to graph, you might have other datasets that work better in a different database type, like relational, time series, or in-memory. That's fine too—that's modern application development. For example, Neptune is part of the toolkit that we use to continually expand Alexa's knowledge graph for tens of millions of customers. Alexa also uses other databases, like Amazon DynamoDB for key-value and document data and Amazon Aurora for relational data. Different types of data come with different types of challenges, and picking the right database for each unique use case allows for greater speed and flexibility.
@lakhera2015: Dear recruiters, if you are looking for- Java,Python, PHP - React,Angular - PostgreSQL, Redis, MongoDB - AWS, S3, EC2, ECS, EKS - *nix system administration - Git and CI with TDD - Docker, Kubernetes That's not a Full Stack Developer. That’s an entire IT deportment
cddotdotslash: Not a single post on Hacker News can use the term "serverless" without the exact same replies being posted every time. It's as if a certain portion of the HN crowd simply cannot fathom that a new term exists and is in use, and instead resort to the same, tired responses.
@benedictevans: We wonder about the next S curve. Sometimes, it looks like an accessory to the previous one. In 2000 or 2005 almost anyone would have said mobile was a PC accessory. Now PCs are smartphone accessories.
@brianleroux: The serverless email server link I shared yesterday got negative reactions to using s3 for data. S3 is *amazing* for data. Read/write throughput with Lambda is excellent. 3,500 PUT/POST/DELETE and 5,500 GET requests per second. Max obj size 5TB. It's what it was designed for!
Sumit Khanna: Microservices can be done right, or rather, after systems evolve at an organization, some teams can have really good, well thought out services, with large numbers of unit and integration tests. Yet, they are still the ultimate product of an often weird evolutionary process, that tends to be muddled with technical debt, company policies, legal requirements and politics. People who try to start with a microservice model are asking for a world of pain and hurt. Good microservices come from using the foundation of well written monoliths as a template for splitting out and creating smaller components. You don’t build a city out of molecules. You have several layers of abstraction in place so you can build with bricks, structures and buildings.
Sabine Hossenfelder: Most of you will know that if you sum up this series all the way to infinity it will converge to a finite value, in this case that’s 10 days. This means that even if you have an arbitrarily fine grid and you know the initial condition precisely, you will only be able to make predictions for a finite amount of time. And this is the real butterfly effect. That a chaotic system may be deterministic and yet still be non-predictable beyond a finite amount of time .
Sr_Noodles: [On Shopify] some 29% of the Best-Sold Products on the list are under 25$, but 50% of the top stores have their Best-Sold product at a price point above 50$.
Sabine Hossenfelder: There are infinitely many sets of axioms that are mathematically consistent but do not describe our universe. The only rationale scientists have to choose one over the other is that the axioms give rise to correct predictions. But there is no way to ever prove that a particular set of axioms is inevitably the correct one. Science has its limits. This is one of them.
Gwen Shapira: There is only one thing that's missing in my picture, which is that we lost the state on the way. I mentioned earlier that having state in my microservice is incredibly powerful and we'll miss it when it's gone. It's gone and I miss it. Why do I miss state so much? I miss having states because sometimes my rules are dynamic. I cannot hard code them, I have to look them up somewhere. Sometimes, my events contain some of the data that I need, an ID, but not the rest of the data that they need, so I have to look it up somewhere. Sometimes, I have to join multiple events. Netflix had a good talk about it earlier in the day. Sometimes, I just want to aggregate number of events per second, number of orders per second, dollars per hour. All these things are important, so I need state.
cooperadymas: So we're now at a Hacker News thread, linking to a Reddit thread, linking to a Quora question, which pulls from a Google+ post, which originated as a website inside Google. Can't wait until 2030 when we reminisce about the 10 year anniversary of this thread.
Glen Berseth: The key insight utilized by our method is that, in contrast to simple simulated domains, realistic environments exhibit dynamic phenomena that gradually increase entropy over time. An agent that resists this growth in entropy must take active and coordinated actions, thus learning increasingly complex behaviors. This is different from commonly proposed intrinsic exploration methods based on novelty, which instead seek to visit novel states and increase entropy. SMiRL holds promise for a new kind of unsupervised RL method that produces behaviors that are closely tied to the prevailing disruptive forces, adversaries, and other sources of entropy in the environment.
DTLACoder: The performance with elasticsearch as the backing store was atrocious for us. We rewrote the thing in Java GraphQL deployed on EKS and literally saw a 50%+ improvement in response time
derefr: This closely matches what happened to processor naming. It used to be that any two (CISC) CPUs with the same frequency were about on par with one-another, so we just called processors by their frequency. As soon as processors started adding things like SSE, though, that went out the window, since now “one cycle” could do arbitrary amounts of work, and also consume arbitrary amounts of electricity. So now we instead group processors by their manufacturer and model, and compare processors by “generation”, which ends up being a loose count of each time each vendor has done a large redesign that enabled efficiency increases beyond “just” a process-node shrink.
Alexey Ivanov (Dropbox): Overall, BBRv2 is a great improvement over the BBRv1 and indeed seems way closer to being a drop-in replacement for Reno/CUBIC in cases where one needs slightly higher bandwidth. Adding experimental ECN support to that and we can even see a drop-in replacement for Data Center TCP (DCTCP).
Kevin Berger: Art is representations with concepts. That’s what it is to me. AI art does that. AI art has concepts because it’s generated by scientific means. It goes beyond the science, but it has a scientific basis to it, and so it has concepts. I have my own definition of aesthetics, too. It makes art historians’ hair stand on end. Aesthetics equals the image in a work of art—and the image need not be a visual image—but an image from our five senses, plus the apparatus that generates it. For example, at CERN, I asked a physicist for his definition of aesthetics, and he said, “My notion of aesthetics is nicely laid out wires.” I saw them. There are these units of parallel wires, nothing crossed over, and the wires are color coded. It was beautiful.
Daniel Wolf Savin: From the formation of a dense cloud to the ignition of fusion at the heart of a star is a process whose complexity far exceeds what came before it. In fact, even the most sophisticated computer simulations available have yet to reach the point where the object becomes stellar in size, and fusion begins. Simulating most of the 200-million-year process is relatively easy, requiring only about 12 hours using high speed, parallel processing computer power. The problem lies in the final 10,000 years. As the density of the gas goes up, the structure of the cloud changes more and more rapidly. So, whereas for early times one needs only to calculate how the cloud changes every 100,000 years or so, for the final 10,000 years one must calculate the change every few days. This dramatic increase in the required number of calculation translates into more than a year of non-stop computer time on today’s fastest machines. Running simulations for the full range of possible starting conditions in these primordial clouds exceeds what can be achieved in a human lifetime.

Useful Stuff:

Yay, a good old-fashioned architecture where they have a hard problem and they build something to solve it. Not one mention of serverless anywhere! Mastercard explains the architecture they use for scaling beyond a billion transactions per day with strict SLAs. This is a very well done talk. Lots of good info.
- Mastercard's Decision Management Platform is their multipurpose transaction processing engine based on a plug-in architecture. It supports 20 Mastercard products. When you swipe a card it goes through this system before it as approved. It can handle 60,000 transactions per second with average response times under 100ms (target is 50ms) using parallel processing on 100s of commodity servers.
- Uses transaction features enriched with historical and real-time aggregates to execute several risk models and hundreds of decision rules. Calculates hundreds of variables in real-time and consumes even more offline generated data to evaluate risk of each transaction.
- Scaling data is the challenge because DMP consumes 30+ billion of aggregates in real-time. Many terabytes of data is shared between instances. Requirements: sub-millisecond reads at several million reads per second; applying atomic updates to large entries in under 3 milliseconds to hundreds of thousands of entries per second.
- With hundreds of reads and writes per transaction they can't go to disk, data is in memory. They chose Geode/GemFire for their real-time scoring solution. But that didn't solve all their problems. They still needed for their distributed data system, especially for real-time processing: data access scalability through co-location, even data distribution, concurrent operations, and latency consistency. All these issues need to be addressed for large scale.
- GemFire is an in-memory data grid. We covered one of these way back in 2008. I thought data grids would be more popular, but the cloud may have interrupted that trend. A move to stateless serverless may be the next evolution of the data grid.
- For scalability of data access related data should be stored together. Related data for the same account should be stored on the same node so one request can return all data. Note that this is similar to NoSQL. This is one of the most important strategies. It supports 8 million reads per second and gives the ability to grow the cluster with increasing the number of network calls.
- Keys must distribute evenly or the cluster won't scale. Choose the number of buckets wisely.
- As always with Java GC pauses are killer. GemFire supports larger heap sizes by minimizing the impact of short-lived objects and using byte array storage. Data is already in on the wire format. To handle large data sets they chose large heaps with Pauseless JVM (Azule Zing). Requires no tuning.
- After growing horizontally (more nodes) and vertically (more RAM) and using a Pauseless JVM they were able to use 40 terabyte clusters with 600 GB heaps—without sacrificing latency.
- Large entries caused problems on the client. Problems were: long transfer times, long deserialization times, slow to run business logic. Pulling lots of data to the client was way too slow. The solution was to keep the logic where the data lives using data aware distributed functions (think stored procedures). Benefits: parallelizes execution, reduces client lode, reduces network utilization.
- Updating large chunks of data within a distributed function holds locks too long so they use delta propagation to apply changes. It's like a SQL update where only certain fields are updated.
- Using data aware function execution and delta propagation resulted in: 95% reduction in network traffic; 50% latency reduction for function execution; 40% CPU reduction on server nodes; 50% reduction in network traffic when updating large values.
- What happens when there's a hot partition, when an entry is experiencing a high rate of updates? For example, the Home Shopping network submits a batch of all their updates nightly. Delta propagation can only help so much, they still couldn't support the desired update rate to single entries. They relax strong consistency constraints. Replication, partitioning redundancy, and partitioning are all suspended. The entry is put in RAM on the same node. Entries are replicated at a rate of one second for safety. This one second window is an acceptable risk. They went from supporting 1000 updates per second to supporting 100,000 updates per second.

A very thorough answer. Distributed System Design: How to design Twitter? Part I - Interview question at Facebook, Google, Amazon, Netflix.

Dang, CI/CD has come to the movies, possibly all digital content? There may never again be a version of digital content all of us have consumed and remember. Digital has become the ultimate unreliable narrator. Scratch that: Cats film to be 'resupplied' with 'improved visuals'

Conclusion from the Cockroach Labs 2020 Cloud Report: Interestingly, the highest performing machine types from each cloud are also the same machine types which performed the best on the CPU and Network Throughput tests. Both AWS’s c5n.4xlarge and GCP’s c2-standard-16 won the CPU, Network Throughput, and Network Latency tests while Azure’s Standard_DS14_v2 won the CPU and Network Throughput throughput tests. However, the machine types which performed best on the read and write storage tests (e.g., AWS i3.4xlarge and i3en.6xlarge, GCPs n2-standard-16, and Azure’s Standard_GS4) varied in their TPC-C performance. This suggests that these tests are less influential in determining OLTP performance. These results match our expectation that OLTP workloads like TPC-C are often limited by compute resources due to their relatively high ratio of transactions to data size.

Performance testing HTTP/1.1 vs HTTP/2 vs HTTP/2 + Server Push for REST APIs: If speed is the overriding requirement, keep using compound documents; If a simpler, elegant API is the most important, having smaller-scoped, many endpoints is definitely viable; Caching only makes a bit of difference; Optimizations benefit the server more than the client.

The Universal Laws of Growth and Pace | Geoffrey B. West. To analogize from this talk the Fate of IT = Fate of the Cloud. We're in the cloudocene era, as the cloud has come to dominate IT. The city as an engine for growth and innovation has a lot of parallels to the cloud. You could almost consider the cloud as the tech form of urbanization. Exponential growth is one marker. Cities provide greater material well being, opportunity, buzz, events, education, etc., the city is a magnet attracting people. The cloud is the same sort of attractor. It's where all the ideas are created, where all the wealth is created, and where everything happens. Cities are made of complex evolving interactive systems as is the cloud. While companies may die, the cloud—driven by very efficient metabolisms and efficient network structures—will live on. Clouds optimize their energy use so they can maximize the allocation of resources to users. It's seems likely the pace of life on the cloud is superlinear, it's a positive feedback loop creating ever more capable building blocks that themselves become the building blocks for more and more systems to be created and grow.

Facebook with their Systems @Scale Tel Aviv 2019 recap. You might like: Scaling Facebook’s data center infrastructure; Managing trade-offs for Data prefetching.

Need need to check if an object is in a set? Xor filters take a bit longer to build, but once built, it uses less memory and is about 25% faster. Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters: The Bloom filter provides fast approximate set membership while using little memory. Engineers often use these filters to avoid slow operations such as disk or network accesses. As an alternative, a cuckoo filter may need less space than a Bloom filter and it is faster.

Keep in mind all end of history style arguments prove false in the end. Change is the one constant. In fact, Geoffrey B. West suggests we'll have another innovation that's equivalent to IT within the next 20-25 years. The End of the Beginning: The implication of this view should at this point be obvious, even if it feels a tad bit heretical: there may not be a significant paradigm shift on the horizon, nor the associated generational change that goes with it. And, to the extent there are evolutions, it really does seem like the incumbents have insurmountable advantages: the hyperscalers in the cloud are best placed to handle the torrent of data from the Internet of Things, while new I/O devices like augmented reality, wearables, or voice are natural extensions of the phone. In other words, today’s cloud and mobile companies — Amazon, Microsoft, Apple, and Google — may very well be the GM, Ford, and Chrysler of the 21st century. The beginning era of technology, where new challengers were started every year, has come to an end; however, that does not mean the impact of technology is somehow diminished: it in fact means the impact is only getting started.

Even in the cloud servers are under utilized. That means there's a profit opportunity. Google Cloud E2 machines: Overselling (finally) comes to the cloud: This week, Google announced a new "machine type" (E2) that takes this further. They know the that adapting applications to effectively use auto-scaling is hard. Instead, Google is going to oversell their hardware. You buy cores and RAM as usual, and Google promises that nearly all the time, they will be there when your application actually wants to use them. However, on some rare occasions they might not, and your application will pause as they shuffle things around to make it available. This is a brilliant idea, and I'm surprised that it has taken this long for a cloud provider to oversell their hardware. Providers are in a much better position to do the necessary "bin packing" than customers, since they can see all the workloads. This seems like a great way to improve the overall efficiency of our global computing infrastructure, and I expect we will see more overselling in the cloud in the future.

LinkedIn reaped big rewards by moving from Clojure to Java, but that's not all they did. Making the LinkedIn experimentation engine 20x faster.
- It handles up to 23 trillion experiment evaluations per day
- It is used in about 500 production services; We made a proof-of-concept language parser and evaluator in Java. The results were astonishing: our code achieved 2-3 times better performance than the previous version without much optimization work
- We decided to interpret DSLs by parsing them into evaluation trees and then executing them
- A proper type resolution code is about 3 times faster than Java reflection, at 15ns versus 45ns per call
- It is surprising, but we measured a 2.5x improvement in performance by switching from the naive approach to the current approach because: CPUs prefer sequential memory access and smaller data structures, as CPU caches are small and main memory access is quite slow. By switching to three plain arrays, we have also eliminated the overhead of virtual calls on Java ArrayList data structure
- Using auto-generated code significantly reduces implementation complexity, as developers only need to implement processing logic for specific argument types.

It's always the dang clocks. Boeing’s Starliner won’t make it to the ISS now because its internal clock went wrong. They couldn't reset the clock because they were out of satellite communication range.

Gergely Orosz with his Distributed systems learnings in 2019.
- Building a new distributed system is easier than migrating the old system over to it
- A better way to improve reliability: measure, report, repeat. Start simple; Idempotency changes should be treated as breaking changes - even if they technically don't qualify as such; Reliability & postmortem reviews are more impactful when going further and looking at systems issues that are hurting multiple teams
- Unique and unexpected challenges when running your own data centers
- Deploy on Fridays, Saturdays, and any day - but think about when code chills are sensible tradeoffs
- Financial / end-user impact of outages is just as important as the systems impact; A simple way to determine who owns a service: who owns the oncall?

At its simplest level, Moore’s Law refers to a doubling of transistors on a chip with each process generation. Some nice spin Intel, but that's not what Moore's law says, it says transistors will double about every two years.

PHP 7.4 took the gold in 17/17 (5 N/A). 🏆🚀 The Definitive PHP 5.6, 7.0, 7.1, 7.2, 7.3, and 7.4 Benchmarks (2020): we benchmarked six different PHP versions across 22 different platforms/configurations; including WordPress, Drupal, Joomla!, Laravel, Symfony, and many more.

Starting Simple + Growth Now = Refactoring Later. Let me tell you about the time we had 15,000 direct connections to our database. From 15,000 database connections to under 100: DigitalOcean's tale of tech debt.
- From 2012 to 2016, DigitalOcean’s user traffic grew over 10,000%...Like GitHub, Shopify, and Airbnb, DigitalOcean began as a Rails application in 2011. The Rails application, internally known as Cloud, managed all user interactions in both the UI and public API. Aiding the Rails service were two Perl services: Scheduler and DOBE (DigitalOcean BackEnd)
- Neither Cloud, Scheduler, nor DOBE talked directly to one another. They communicated via a MySQL database. This database served two roles: storing data and brokering communication. All three services used a single database table as a message queue to relay information.
- For four years, the database message queue formed the backbone of DigitalOcean’s technology stack...To keep up with the increased Droplet demand, we were adding more and more servers to handle the traffic. Each new hypervisor meant another persistent connection to the database. By the start of 2016, the database had over 15,000 direct connections...To tackle the database dependencies, DigitalOcean engineers created Event Router. Event Router served as a regional proxy that polled the database on behalf of each DOBE instance in each data center. Instead of thousands of servers each querying the database, there would only be a handful of proxies doing the querying. When Event Router went live, it slashed the number of database connections from over 15,000 to less than 100
- The updated Scheduler completely revamped the ranking system. Instead of querying the database for the server metrics, it aggregated them from the hypervisors and stored it in its own database. Additionally, the Scheduler team used concurrency and replication to make their new service performant under load
- The centralized MySQL message queue was still in use – bustling even – by early 2017. It was handling up to 400,000 new records per day, and 20 updates per second.
- As Harpoon pushed new events to the queue on one side, the workers pulled them from the other. And since RabbitMQ replaced the database's queue, the workers were free to communicate directly with Scheduler and Event Router. Thus, instead of Scheduler V2 and Event Router polling for new changes from the database, Harpoon pushed the updates to them directly.
- Also, Strategies for Working with Message Queues, Apache Pulsar, Content-based Filtering

Rob Sutter has serverless predictions for 2020:
- On the cutting edge, FaaS fades into the background as service integrations become king
- Leading serverless-first companies (Roger's "Innovators") slash the number of lines of code they maintain by a full order of magnitude
- This accelerates value delivery and experimentation even further at even lower short- and long-term cost, helping them outpace their competitors even faster.
- Meanwhile, the incredible partner and open source community brings more and more tools for Early Majority organizations to deploy their applications as they are. Think framework transpilers

Go’s FastHTTP came on top by peaking at nearly 210k responses per second. Java-based Netty is a not so distant second with almost 170k. Webserver Benchmark: Erlang vs Go vs Java vs NodeJS: Go’s built-in webserver peaked slightly above 120k, NodeJS cluster at 90k, Erlang-based Cowboy 1.x at 80k. In the 50-60k range, we have another Erlang-based webserver, Mochiweb, then Cowboy 2.x, and Java-based Rapidoid. Finally, non-clustered NodeJS scored 25k...In all tests, with the notable exception of non-clustered NodeJS, the limiting factor was the CPU being fully saturated. In essence, the test has shown that all webservers were scaling to all available CPUs with varying degrees of efficiency. Clustered NodeJS and Rapidoid both crashed by running out of RAM once overloaded.

Let's hope people aren't choosing a cloud service based on "popularity." That would be like picking chocolate as your favorite ice cream flavor because a random sample of mimes voted chocolate as their favorite. Microsoft Is Winning The ‘Cloud War’ Against Amazon: Report.
- code4tee: These results are misleading for the same reasons why Microsoft’s market share claims are misleading. Microsoft counts things like Office 365 and Azure AD as “cloud.” If you look at people truly using their cloud products in terms of things that pair off against AWS offerings the picture looks vastly different.
- okareaman: I'm a retired programmer and now Uber driver in Silicon Valley so I often talk shop with my riders. I was giving a guy a ride to make a pitch for his product to a Japanese company. He related that his biggest challenge so far was when Walmart licensed his product and they insisted it absolutely could not be hosted on AWS so he had to transfer over to Azure. Obviously Amazon and Walmart are competitors and that might apply in many other situations.
- oxfordmale: I think this article is comparing apple and pears to some extend. My company is solely cloud based and are using Azure, AWS and GCP, although 98% of our bespoke infrastructure runs on AWS. We could migrate this infrastructure to GCP, as Google has been catching up with AWS on cloud infrastructure, however, I would dread to migrate it all to Azure. We mostly use Azure for Azure AD (and Office 365), and AWS to build out APIs. Basically we are using Microsoft for out of the box services and AWS for anything else.
- mcv: The bank I'm working for recently decided to switch to Azure despite the vast majority of developers preferring AWS. From what I've heard, it's probably related to the company switching to Office 365. Sounds to me like Microsoft is leveraging its dominance in one market to also dominate another market.
- moksly: I work for a Danish municipality with roughly 10,000 employees. I’m not sure if you know, but our public sector has been competing with Estonia at being the most digitised in the world for a decade. We operate an estimated 300-500 different IT-systems, some of them enterprise sized SAP solutions deployed on old IBM mainframes with multiple layers of APIs to make their fronts somewhat web-based (don’t ask). Others are minor time registration systems or automated vacation-payouts. I said estimated because a day-care institution is free (though advices not to) buy it-systems without talking with any part of the centralised organisation. Microsoft has been one of our better partners in all of this. They aren’t cheap, but they listen to us when we need something. We have a direct line to Seattle, and parts of what we ring up at tickets have made it into the global 365 infrastructure. Stuff like making it easier to hide teams-emails from the company-wide outlook address-book. More than that though, our tech-team is trained and certified in Microsoft technologies. The combination of in-house staff and a 30+ year long good business relationship makes Azure such an obvious choice for cloud. Some of the co-municipal systems we buy through a joint-owned organisation called KOMBIT operate in AWS (support and operations is handled by private sector companies), and it’s not like we’re religious about not using AWS or something other, but we’d need to build a relationship and retrain staff to do so.

Lots of good tutorials at raywenderlich.com. I started the iOS and SwiftUI course and I like it so far.

RDS Pricing Has More Than Doubled. Looks like SaaS will not follow the same price reduction curve we've experienced with IaaS. throaway_oct302018: Well first of all, cloud services do not strictly sell technology, they sell aversion to switching costs. How averse are you to moving off of RDS? With each day it the aversion will rise, not fall. This is true of any cloud service. For example, if you initially have 10mb in S3, the aversion to switching is low....you could easily copy 10mb over to another storage model. But when you later have 10pb, the aversion is much higher...it would be really difficult to migrate that data. Aeiedil: Every penny of the thousands my company has spent on RDS over the years has been worth it. Prior to RDS had to manage a traditional server setup, and maintaining the DB was a constant headache. The ease of securing, scaling, and restoring RDS just takes so much hassle off my hands leaving me free to worry about the less solved problems! Also, The Amazon Premium

Cool table of GCP inter-region latencies between all available regions. The lowest latency I saw (the table is updated every hour) is .3 ms between us-west1 and us-west1. The highest latency I saw was 387.4 ms between asia-south1 and southamerica-east1.

An advanced form of rolling your own. Segment on Serving 100µs reads with 100% availability:
- This is the story of how we built ctlstore, a distributed multi-tenant data store that features effectively infinite read scalability, serves queries in 100µs, and can withstand the failure of any component.
- At the center of the read path is a SQLite database called the LDB, which stands for Local Database. The LDB has a full copy of all of the data in ctlstore. This database exists on every container instance in our fleet, the AWS EC2 instances where our containerized services run. It’s made available to running containers using a shared mount. SQLite handles cross-process reads well with WAL mode enabled so that readers are never blocked by writers. The kernel page cache keeps frequently read data in memory. By storing a copy of the data on every instance, reads are low latency, scale with size of the fleet, and are always available. A daemon called the Reflector, which runs on each container instance, continuously applies a ledger of sequential mutation statements to the LDB. This ledger is stored in a central MySQL database called the ctldb. These ledger entries are SQL DML and DDL statements like REPLACE and CREATE TABLE.
- The LDB tracks its position in the ledger using a special table containing the last applied statement’s sequence number, which is updated transactionally as mutation statements are applied. This allows resuming the application of ledger statements in the event of a crash or a restart. The implications of this decoupling is that the data at each instance is usually slightly out-of-date (by 1-2 seconds). This trade-off of consistency for availability on the read path is perfect for our use cases.
- @KoboldUnderlord: I built a crappy version of this for handling the anonymization of all US operational data and it was able to backfill 53m rows to S3 in about 15 minutes using a similar strategy. Local light dbs are wildly powerful esp if you can take advantage of giving it in memory for stuff

We've always been at war with The Monolith. Why do we need distributed systems?: Distributed systems offer better availability; Distributed systems offer better durability; Distributed systems offer better scalability; Distributed systems offer better efficiency.

Who knew DNS responses can differ by as much as 200 ms—by domain! Is your fancy new domain hurting your performance?: The biggest shockers were the .info and .org domains that showed really poor performance especially in the 85 percentile range; The .net and .com were very slightly slower than we expected in Europe and North America, but otherwise offer great and stable performance across all regions as we can see in the global median; Another interesting thing to see was the performance for .co, .biz and .in domains that ended up way ahead of the rest

When is it better to use on-prem, or hybrid, or multi-cloud? Jonathan Ellis: There are three main areas to consider when evaluating the infrastructure options for an application. The best approach will depend on what you want to optimize for. The first thing to consider is agility—cloud services offer significant advantages on how quickly you can spin infrastructure up and down, allowing you to concentrate on creating value on the software and data side. But the flip side of this agility is our second factor, which is cost. The agility and convenience of cloud infrastructure comes with a price premium that you pay over time, particularly for “higher level” services than raw compute and storage. The third factor is control. If you want full control over the hardware or network or security environment that your data lives in, then you will probably want to manage that on-premises. A hybrid cloud strategy can let you take advantage of the agility of the cloud where speed is the most important factor, while optimizing for cost or for control where those are more critical. This approach is popular for DataStax customers in the financial services sector, for instance. They like the flexibility of cloud, but they also want to retain control over their on-premises data center environment. We have partnered with VMware on delivering the best experience for public/private cloud deployments here.

How Plaid 30x'd our Node parallelism
- There's a lot of smart work here. If you thought just turning on parallelism would work—it doesn't. As with all software systems once you remove one bottleneck that only reveals another bottleneck. What's great about this article is they show you in depth how they found and solved the chain of bottlenecks using detailed monitoring, flamegraphs, understanding the Node runtime, and reading the docs for every system layer.
- We were running 4,000 Node containers (or "workers") for our bank integration service. The migration to parallel workers has reduced our annual EC2 instance spend by around $300k and greatly simplified our architecture. We now run about 30x fewer containers in production, and our system is more robust to increases in external request latencies or spikes in API traffic from our customers.
- By adding logic in our load balancing layer to prioritize user-present requests over transaction updates, we could handle API spikes of 1,000% or more at the expense of transaction freshness.
- Latency spikes on bank requests were similarly causing our worker capacity to decrease. We decided that increasing parallelism was the best way to remove application bottlenecks and improve our service reliability.
- Our primary goal during any rollout is to maintain reliability, and we can't just YOLO a parallelism increase. We expected this rollout to be especially risky: it would affect our CPU usage, memory usage, and task latency in hard-to-predict ways. Since Node’s V8 runtime processes tasks on an event loop, our main concern was that we might do too much work on the event loop and reduce our throughput.
- To mitigate these risks, we made sure that the following tooling and observability was in place before the first parallel worker was ever deployed to production.
- After this preliminary work was done, we created a new ECS cluster for our "parallel workers". These are workers that use LaunchDarkly feature flags to dynamically set their maximum parallelism.
- They ran into memory allocation problems: We hypothesized that increasing the Node maximum heap size from the default 1.7GB may help. To solve this problem, we started running Node with the max heap size set to 6GB (--max-old-space-size=6144 command-line flag),
- They ran into task throughput problems: Our best guess was that GC wasn’t happening frequently enough on old objects, causing the worker to accumulate more allocated objects as it processed more tasks. We searched through our code for fire-and-forget operations, also known as "floating promises". Bingo! Heap usage on our parallel workers now remains stable for an extended period of time.
- They ran into S3 bottlenecks: We compress our debugging data, which can be quite large because it includes network traffic. We then upload the compressed data to S3. We eventually dug into the AWS Node documentation and found something surprising: the S3 client reduces maxSockets from Infinity to 50
- They ran into JSON serialization bottlenecks: Even with a minimal test, bfj was around 5x slower than another package, JSONStream. We quickly replaced bfj with JSONStream, and immediately observed significant increases in performance.
- They ran into garbage collection bottlenecks: So garbage collection is being run too often on our Node processes. We could just increase the size of the new space by bumping the limit on the “semi space” in Node (--max-semi-space-size=1024 command-line flag). This allows for more allocations of short-lived objects before V8 runs its scavenging, thereby reducing frequency of GC. Increasing the new space size resulted in a precipitous drop of time spent on scavenge garbage collection, from 30% down to 2%.
- After all this work, we were satisfied with the results. Tasks running on parallel workers had latencies almost on par with those running on single workers at a concurrency of around 20.
- They ran into CPU bottlenecks that they were able to reduce but did not cover.
- bjacokes: Hi, Plaid engineer here (not the author, but I helped with the post). I don't think we've tried to assert that the old system is perfect. We went into some detail in the post about why it took us this far. Certainly, the single request per container approach wouldn't scale if our unit economics were different. We didn't get into this too much in the post, but the Node service sits behind a couple of layers of Go services, so the we had more control over scaling API traffic than it might appear. Likewise, I hope we didn't give the impression that the new system is perfect. We've explored other languages for integrations in the past (even Haskell, at one point), and are continuing to do so. A migration away from our years-old Node integrations codebase would be a massive undertaking at this point. Absent that, it doesn't seem consistent to say "you're incompetent for handling 1 request per container" and also "you're incompetent for writing this post" – if you believe the former then it makes sense to be an advocate for this project, at least until a language migration can be done. I think the set of hoops we had to jump through in order to add concurrent requests without adding latency is a good demonstration of why we didn't do this sooner. It wasn't a massive undertaking by any means, but it wasn't trivial. At any rate, we're not really looking for a gold star here – just putting this out there and hoping this will be useful for others who are, as other commenters have put it, building their own "Frankensteins" :)

It was an age of heroes. Debugging a live saturn V:
- "Bill, how sure are you that this relay is the problem? Are we going to send people to the pad to rewire the rocket and not be able to launch because we guessed wrong?" said "AC" Filbert C. Martin
- "It's worth a shot, the signal is not reaching the vehicle and that relay module is the only active component between the Firing Room Console and the Vehicle. You snap out the old Relay Module and snap in the new one and we will be able to tell if that was the problem a few seconds later."
- "Well, we are a little concerned about sending a team to the pad with a fully loaded vehicle. We thought your team would do a lot of blueprint trouble shooting -- I'm not sure we planned to actually send anybody out to a fueled vehicle"
- "Just don't let them launch this mother till we are at least half way back from the pad -- OK!"

Soft Stuff

tensorflow/fairness-indicators: Fairness Indicators is designed to support teams in evaluating and improving models for fairness concerns in partnership with the broader Tensorflow toolkit.

lyft/flyte (article): Flyte is an open source, K8s-native extensible orchestration engine that manages the core machine learning pipelines at Lyft: ETAs, pricing, incentives, mapping, vision, and more.

betrusted-io (article): Betrusted is a protected place for your private matters. It’s built from the ground up to be checked by anyone, but sealed only by you. Betrusted is more than just a secure CPU – it is a system complete with screen and keyboard, because privacy begins and ends with the user.

gchq/stroom: a data processing, storage and analysis platform. It is scalable - just add more CPUs / servers for greater throughput. It is suitable for processing high volume data such as system logs, to provide valuable insights into IT performance and usage.

iximiuz/producer-consumer-vis (article): Producer-consumer problem visualization.

FlaSpaceInst/EZ-RASSOR (article): An inexpensive, autonomous, regolith-mining robot.

Pub Stuff:

AnyLog: a Grand Unification of the Internet of Things (article): AnyLog is a decentralized platform for data publishing, sharing, and querying IoT (Internet of Things) data that enables an unlimited number of independent participants to publish and access the contents of IoT datasets stored across the participants. AnyLog provides decentralized publishing and querying functionality over structured data in an analogous fashion to how the world wide web (WWW) enables decentralized publishing and accessing of unstructured data. However, AnyLog differs from the traditional WWW in the way that it provides incentives and financial reward for performing tasks that are critical to the well-being of the system as a whole, including contribution, integration, storing, and processing of data, as well as protecting the confidentiality, integrity, and availability of that data. Another difference is how Anylog enforces good behavior by the participants through a collection of methods, including blockchain, secure enclaves, and state channels.

The Community Ecology of Herbivore Regulation in an Agroecosystem: Lessons from Complex Systems: The ant is oddly nonrandom in its spatial distribution: When you find a nest (almost always in a shade tree), you frequently find another nest nearby, but large sections of shaded farms have no nests at all. Quantitative sampling verifies this simple observable fact (Vandermeer et al. 2008, Jackson et al. 2014, Li et al. 2016), an important feature of the regulation of all three of the herbivores. The question first arises as to where this pattern comes from. There is now substantial evidence that the spatial pattern of the ants is self-organized, which is to say that it emerges from the internal dynamics of the ant population itself, not from any underlying forces such as moisture or temperature or particular vegetation formations (Vandermeer et al. 2008, Liere et al. 2014, Li et al. 2016). The pattern is formed in a complicated fashion by a process similar to that described by Alan Turing in 1952.

Amazon Aurora: Papers Review: Engineers often dont read papers, which is a big mistake because they loose often Deep Dive tech content related to Algorithms, Datastructures, Techniques, and lessons learned. So If you are reading this, start reading papers.

2019 in Review: 10 AI Papers That Made an Impact: As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019.

KRAKsat Satellite Mission - Lessons Learned: Space 4.0 age has come with numerous changes in a way space exploration is being perceived. This area of study evolved from the political and economic ground to academic and commercial fields of interest. Such a rapid spread led to quick growth of the CubeSat market, as more and more units are engaging in small satellites development projects.