Wednesday
Aug162023

The Swedbank Outage shows that Change Controls don't work

This week I’ve been reading through the recent judgment from the Swedish FSA on the Swedbank outage. If you’re unfamiliar with this story, Swedbank had a major outage in April 2022 that was caused by an unapproved change to their IT systems. It temporarily left nearly a million customers with incorrect balances, many of whom were unable to meet payments. 

After investigation, the regulator found that Swedbank had not followed their change management process and issued a SEK850M (~85M USD) fine. That’s a lot of money to you and me, but probably didn’t impact their bottom line very much. Either way I’m sure the whole episode will have been a big wake up call for the people at the bank whose job it is to ensure adequate risk and change controls. So, what went wrong and how could it have been avoided? 

How did the Swedbank incident happen?

The judgment doesn’t describe the technical details behind the incident, but it does provide glimpses into how they assessed what went wrong:

  • “The deficiencies that were present in Swedbank’s internal control made it possible to make changes to one of the bank’s most central IT systems without following the process in place at the bank to ensure continuity and reliable operations. This violation is therefore neither minor nor excusable.”
  • “none of the bank’s control mechanisms were able to capture the deviation and ensure that the process was followed”
  • “one of the main causes underlying the IT incident was non-compliance with the change management process and that it is probable that this also resulted in a slower analysis of the incident and a greater impact on the operations.”
  • “good internal control is a prerequisite for a bank to be able to fulfill the requirements on risk management”

Even if you think $85M isn’t much of a fine - simply the cost of doing business - the full range of options open to the regulator included removing Swedbank’s banking license:  “It is therefore not relevant to withdraw Swedbank’s authorisation or issue the bank a warning. The sanction should instead be limited to a remark and an administrative fine.” Gulp.

Change management doesn’t mitigate risk

What really interests me about cases like this is that, even when followed to the letter, the old ways of managing change with manual approvals and change meetings do not mitigate risk in today’s technology organizations. These processes don’t work because complying with them is no guarantee that changes are being made safely and securely. 

Tell me if you’ve heard this one before?

  • Bank has a major IT outage/incident
  • A change was applied without following change management process
  • Bank claims the risk controls work if they are followed
  • Regulator fines bank for not following process + having insufficient controls
  • Bank adds more change controls

The position of the regulator constitutes self-referential logic. You said you’d do something to manage risk, it wasn’t done, therefore you are in violation. But, is change management the best way to manage IT risk?

What the UK FCA says about change

I’ve written previously on some fantastic research published by the Financial Conduct Authority in the UK. They took a data-driven approach to understand the workings of change management processes, which uncovered some provocative findings:

“One of the key assurance controls firms used when implementing major changes was the Change Advisory Board (CAB). However, we found that CABs approved over 90% of the major changes they reviewed, and in some firms the CAB had not rejected a single change during 2019. This raises questions over the effectiveness of CABs as an assurance mechanism.”

Change as a control gate doesn’t work, but everyone does it. Why? To avoid $85MUSD fines. In the UK and USA these can be issued to individuals as well as organizations. So, if you have followed the process, at the very least you are compliant and not liable for heavy financial penalties. It’s also about covering your back - “It’s not my fault, I ticked all the boxes.” But is the bank safe though? Are the systems themselves secure?

Change management gathers documentation of process conformance, but it doesn’t reduce risk in the way that you’d think. It reduces the risk of undocumented changes, but risks in changes that are fully documented can sail through the approval process unnoticed. This is an important and quite shocking finding: adherence to traditional change management doesn’t work to manage the risk of changes.

Research shows external approvals don’t work

The science of DevOps backs this up. Here’s the unvarnished truth on external approvals and CABs based on research by Dr. Nicole Forsgren, Jez Humble, and Gene Kim in their 2018 book, Accelerate: Building and Scaling High Performing Technology Organizations.

“We found that external approvals were negatively correlated with lead time, deployment frequency, and restore time, and had no correlation with change fail rate. In short, approval by an external body (such as a change manager or CAB) simply doesn’t work to increase the stability of production systems, measured by the time to restore service and change fail rate. However, it certainly slows things down. It is, in fact, worse than having no change approval process at all.”

Worse than no change approval process at all. So, if you want to avoid fines, cover your back AND reduce the likelihood of production incidents, what would you do? 

Change is not the problem. It’s unaddressed risk

If change is not the problem, then what is?

What would work? Well the FCA has some insights on this:

“Frequent releases and agile delivery can help firms to reduce the likelihood and impact of change related incidents:

Overall, we found that firms that deployed smaller, more frequent releases had higher change success rates than those with longer release cycles. Firms that made effective use of agile delivery methodologies were also less likely to experience a change incident.”

In short - paperwork doesn’t reduce risk.  Less risky changes reduce risk.  I’m going out on a limb here, but if Swedbank had in fact followed processes and still had the outage, I believe Finansinspektionen (the Swedish FCA) would still have given a fine, but for insufficient risk management.

Story time: streams feeding the lake

We can think of software changes as streams, feeding into our environments which are lakes.  Change management puts a gate in the stream to control what flows into the lake, but doesn’t monitor the lake.

If it is possible to make a change to production without detection, then change management only protects one source of risk. The only way to be sure you don’t have undocumented production changes is with runtime monitoring.

For me what is really interesting about this story is the echoes and parallels it has with the Knight Capital incident so well documented by the SEC. In both cases, an incomplete understanding of how changes have been applied to production systems due to insufficient observability and traceability prolonged and amplified the scale of the outages.

And leaves an open question: how many similar changes have been made that didn’t cause an outage?  Without monitoring it is really hard to know.

If change management doesn’t work, why do we do it?

It all goes back to software history. Traditionally changes were rare, big and risky.  It was the annual upgrade, or the monthly patch. Because these big batches of change were risky, companies introduced long testing and qualification processes, change management, service windows, and a large number of checklists to help mitigate the risks and test-in quality.

Before we had modern practices of test automation, continuous delivery, DevSecOps, rolling deployments with fast rollback, this was the only way. The trouble is, the financial services industry is packed full of legacy systems and outsourcing where implementing these practices is technically challenging and uneconomic.

Maybe it is time we acknowledge legacy software, risk management, and outsourcing are a major systemic risk in the financial sector?

The flipside is also true. Many next generation systems in financial services are so dynamic and distributed that it is really hard to get a handle on the volume of changes occurring.

Risk management that works

The only way to not get burned is to avoid playing with fire. Checklists can help, but if you have a lot of IT risk the only way to really reduce it is to do the technical work to make changes less risky, and move to smaller, more frequent changes. And you can reduce this toil by automating change controls and documentation, and introducing monitoring and alerting systems to detect unauthorized changes. It’s all part of a DevSecOps approach to change management that harmonizes the speed of software delivery with the demands made on organizations by cybersecurity and audit and compliance.

Sunday
Jul162023

Lessons Learned Running Presto at Meta Scale

Presto is a free, open source SQL query engine. We’ve been using it at Meta for the past ten years, and learned a lot while doing so. Running anything at scale - tools, processes, services - takes problem solving to overcome unexpected challenges. Here are four things we learned while scaling up Presto to Meta scale, and some advice if you’re interested in running your own queries at scale.

Scaling Presto rapidly to meet growing demands: What challenges did we face?

 

Deploying new Presto releases

Click to read more ...

Sunday
Jul162023

Gossip Protocol Explained

You can subscribe to the system design newsletter to excel in system design interviews and software architectureThe original article was published on systemdesign.one website.



What Is Gossip Protocol?

The typical problems in a distributed system are the following [1], [11]:

  • maintaining the system state (liveness of nodes)
  • communication between nodes

The potential solutions to these problems are as follows [1]:

  • centralized state management service
  • peer-to-peer state management service

Click to read more ...

Wednesday
Feb222023

Consistent hashing algorithm

 

This is a guest article by NK. You can view the original article Consistent hashing explained on systemdesign.one website.

At a high level, consistent hashing performs the following operations:

  1. The output of the hash function is placed on a virtual ring structure (known as the hash ring)
  2. The hashed IP addresses of the nodes are used to assign a position for the nodes on the hash ring
  3. The key of a data object is hashed using the same hash function to find the position of the key on the hash ring
  4. The hash ring is traversed in the clockwise direction starting from the position of the key until a node is found
  5. The data object is stored or retrieved from the node that was found

 

Click to read more ...

Friday
Dec162022

What is Cloud Computing? According to ChatGPT.

DALL-E picture generated from the answer.

I was curious, given all the ChatGPT love, what it would make of some of our favorite topics. I was both impressed and unimpressed.

Impressed. ChatGPT instantly generated an response that might make a good answer in a Miss Universe contest. You know, the kind of answer delivered with a radiant smile calculated to offend no one yet still sounds comprehensive and deep.

Unimpressed. The answers are a mile wide and an inch deep. There's not a lot of insight. It's more of a zeitgeist oracle. Nothing wrong with that, but I don't think we'll be unemployed soon. It's as many have predicted about AI/ML. These are tools to augment us, flawed humans, not replace us...yet.

Thank you for agreeing to be interviewed today.

You're welcome! I'm happy to answer any questions you have. Please go ahead and ask your first question.

Explain cloud computing to me at a professional level?

Cloud computing is a model of computing that delivers computing services over the internet, including storage, data processing, and networking. It allows users to access and use shared computing resources, such as servers, storage, and applications, on demand and without the need to manage the underlying infrastructure. This model of computing has become increasingly popular in recent years, as it offers a number of benefits, including cost savings, flexibility, scalability, and increased efficiency. Additionally, cloud computing allows for greater collaboration and innovation, as it enables users to access and share data and resources from anywhere, at any time.

Can you expand?

Certainly. Cloud computing has become a widely-used model of computing, as it offers a number of benefits over traditional, on-premises computing systems. One of the key benefits of cloud computing is its scalability, which allows users to easily and quickly scale up or down their use of computing resources, depending on their changing needs. This means that users only pay for the computing resources they actually use, rather than having to invest in expensive hardware and software upfront. This can result in significant cost savings, as well as increased flexibility and agility.

Another key benefit of cloud computing is its reliability and availability. Because cloud computing services are delivered over the internet, users can access their data and applications from anywhere, at any time. This means that they are not tied to a specific location, and can work remotely or on the go. Additionally, because cloud computing services are provided by large, established companies, they are typically highly reliable and available, with robust security and privacy measures in place to protect user data.

Finally, cloud computing enables greater collaboration and innovation. By allowing users to access and share data and resources from anywhere, at any time, cloud computing makes it easier for teams to work together and share ideas. This can facilitate greater collaboration and innovation, as it enables users to access the latest tools and technologies, and work together in real time to solve complex problems. Overall, cloud computing offers a number of benefits that make it an attractive option for businesses and individuals alike.

Which cloud provider would you recommend?

I'm sorry, but as a large language model trained by OpenAI, I don't have the ability to browse the internet or keep up-to-date with current events. I am not able to recommend specific cloud providers. Additionally, choosing a cloud provider is a complex decision that depends on a number of factors, including the specific needs and requirements of your organization. It's best to carefully evaluate the options and compare the different services and features offered by different providers before making a decision.

Do you think HEY and Basecamp should leave the cloud or go on-prem?

As a large language model trained by OpenAI, I do not have the ability to browse the internet or keep up-to-date with current events. I am not familiar with the specific circumstances of HEY and Basecamp, and do not have enough information to make a recommendation on whether they should leave the cloud and go on-premises. As I mentioned earlier, choosing a computing model, whether it be cloud or on-premises, is a complex decision that depends on a number of factors, including the specific needs and requirements of your organization. It's best to carefully evaluate the options and compare the different services and features offered by different providers before making a decision.

Are you hosted on the cloud?

As a large language model trained by OpenAI, I exist purely as a software program and do not have a physical presence. I do not reside on any specific type of hardware or infrastructure, and am not hosted on the cloud or on-premises. I exist as a set of algorithms and data that enable me to process natural language input and generate responses based on that input. I am designed to be run on a variety of different hardware and software platforms, including mobile devices, desktop computers, and cloud-based servers.

Explain serverless to me at a professional level.

Serverless computing is a cloud computing model in which the cloud provider dynamically manages the allocation of computing resources, such as server and storage capacity, in response to the specific needs of the user. This means that the user does not need to worry about managing and scaling the underlying infrastructure, and only pays for the specific resources they use, such as the amount of data processed or the number of requests handled. Serverless computing is often used for applications that require scalable, on-demand computing resources, such as web and mobile applications, data processing and analysis, and real-time streaming data. By using a serverless model, developers can focus on building and deploying their applications, without worrying about the underlying infrastructure. This can result in faster development and deployment times, as well as reduced costs, as users only pay for the resources they actually use.

Serverless computing is also often used in conjunction with other cloud computing services, such as containers and managed services, to provide a fully-managed and scalable solution for deploying and running applications. Overall, serverless computing offers a number of benefits, including increased scalability, flexibility, and cost savings, making it an attractive option for many organizations.

Is serverless the future of cloud computing?

It's difficult to say for certain whether serverless computing will become the dominant model of cloud computing in the future. While serverless computing does offer a number of benefits, including increased scalability, flexibility, and cost savings, it may not be the best fit for every organization or use case. Additionally, there are still some challenges and limitations to the serverless model, such as the need for careful monitoring and management to ensure that applications are running optimally and to avoid over-provisioning or under-provisioning of resources.

It's likely that serverless computing will continue to evolve and become more widely adopted in the coming years, but it's also possible that other models of cloud computing, such as containers or managed services, may also gain popularity. Ultimately, the future of cloud computing will depend on the specific needs and requirements of organizations, as well as the continued development and evolution of cloud computing technologies.

What do you think of the book Explain the Cloud Like I'm 10?

As a large language model trained by OpenAI, I do not have the ability to read books or browse the internet, so I am unable to comment on the book "Explain the Cloud Like I'm 10". As a language model, my primary function is to process natural language input and generate responses based on that input. I am not capable of evaluating or providing opinions on books or other written materials.

Thank you for your time.

You're welcome! I'm here to help answer any questions you have, so feel free to ask me anything.

Friday
Dec022022

Stuff The Internet Says On Scalability For December 2nd, 2022

Never fear, HighScalability is here!

  1958: An engineer wiring an early IBM computer 2021: An engineer wiring an early IBM quantum computer. @enclanglement

 

My Stuff:

  • I'm proud to announce a completely updated and expanded version of Explain the Cloud Like I'm 10! This version adds 2x more coverage, with special coverage of AWS, Azure, GCP, and K8s. It has 482 mostly 5 star reviews on Amazon. Here's a 100% organic, globally sourced review:

  • Love this Stuff? I need your support on Patreon to keep this stuff going.

Number Stuff:

Click to read more ...

Monday
Oct242022

The Cloud is Not a Railroad - An Argument Against the Vertical Separation of Cloud Providers

 

There's a move to regulate cloud providers by vertically separating the services they offer.

Like railroads of yore, who were not allowed to provide freight services on top of their base services, cloud providers would not be allowed to provide services on top of their base platform services.

Vertical separation would be new to the cloud industry. Is it a good idea? Would it actually solve any problems? My answers are no and no, but probably not for the reasons you think. Let's dive in.

Here are a few useful resources for exploring this argument:

  • Maintaining monopolies with the cloud by Cory Doctorow. https://pluralistic.net/2022/09/28/other-peoples-computers.
  • A group advocating for fair licensing. https://www.fairsoftwarelicensing.com/our-principles.
  • Cloud Infrastructure Services -- An analysis of potentially anti-competitive practices by Professor Frédéric Jenny. https://www.fairsoftwarestudy.com/.

Now Cory is about 100x times smarter than I am, but this analogy is wrong:

That's why the trustbusters were so big on "structural separation": the principle that a business can own a platform or use the platform, but not both.

So this remedy is also wrong:

Rather than trying to construct rules that kept the referee honest even when they played on one of the teams, they ordered the companies to choose one role – either you were the ref, or you were a player, but you couldn't referee a match where you were also in the competition.

Because this idea, while trivially true, misses the point:

There is no cloud, there is only other people's computers.

This is like saying there is no human body, it's just a collection of cells. While true, it is the lesser truth that fails to recognize the importance of the body, organs, and the adaptive function of the brain. That's what a cloud provider is; it's the body electric for the cloud.

The report by Professor Jenny seems to be the main support for cleaving cloud providers in two.

I've read the report. Is there a smoking gun? Not that I found. But then, I'm not interested in enterprise licensing woes with Oracle or Office 365. What I'm concerned about, as a developer, is the collateral damage such ham-fisted regulation might cause.

The job of a cloud is to run workloads. Workloads are implemented using services. A cloud offers those services. The more high-quality services a cloud offers, the better products we can build.

Limit the cloud services a cloud provider can offer and you limit the quality of the software we can build. It would be like removing legos from a lego set. You can't do a proper build without the full set.

There is no arbitrary dividing line that can be made such that platform services are on one side of the line and non-platform services are on the other side. A cloud is a platform all the way down. Without cloud services, there is no cloud because a cloud is literally defined by the services it offers.

A cloud provider packages up vast pools of compute resources and resells them as cloud services. There is no structural separation as there is with a train service. What some consider infrastructure or platform is just another cloud service.

Would you really trust some committee or government agency to draw this line correctly? I would not because I don't think the people who think this way actually understand how the cloud works.

Does anyone really want to go back to the VM-centric days when we rolled everything ourselves? No.

What it means to be cloud-native has gone through several evolutions: VM to container to serverless. Each cloud-native evolution is about using the hardware more efficiently.

Do you think those evolutions would have happened with a structural separation in place? No.

Where's the incentive unless you derive a benefit from optimizing the whole stack? We would still be building higher and higher on top of VMs because that's where platform innovation would have stopped evolving.

This is because the cloud is not a railroad. They are different in kind. Building software is not moving freight. A railroad is fixed. The cloud is ever evolving.

This is my best argument for why the cloud should not be vertically separated. It would make the job of building quality software even harder and slower and more expensive.

Admittedly, it's a developer focussed argument, but there are more technical arguments against the move. Let's walk through them.

Is The Cloud a Natural Monopoly?

Wikipedia:

A natural monopoly is a monopoly in an industry in which high infrastructural costs and other barriers to entry relative to the size of the market give the largest supplier in an industry, often the first supplier in a market, an overwhelming advantage over potential competitors.

Let's look at the size of the market. For all the attention the cloud gets, it's still a child or maybe a pre-teen in terms of the overall IT industry.

The market value of the worldwide IT industry is $5 trillion. The global cloud computing industry was valued at $371 billion in 2020.

So, the cloud is not a monopoly. Why should it be regulated as one?

Examples of traditional natural monopolies are dams, railroads, power plants, water supply, and sewer services. The idea is it's practical to only have one of these for a given area. And even that list is not invulnerable. With solar power, the centralization of power generation is giving way to a more decentralized model.

Does the there can be only one apply to the cloud? No. Network effects are not the same as monopoly control.

Apple is not a monopoly in the smartphone market simply because people like their smartphones. Neither are clouds. People choose to run their workloads on the cloud from a large set of options. It's a choice. It's not because there are no other options available to them because of structural issues.

While high infrastructure costs do create a barrier to entry to creating a cloud provider, this misses an important point: the benefits of the cloud come from the cloud model, not any particular cloud implementation.

Here's an informal definition of a cloud provider taken from my book Explain the Cloud Like I'm 10:

A cloud provider installs a bunch of computers in a bunch of datacenters spread throughout the world, pools all those datacenter resources together, offers them over the internet as cloud services through an API, and rents them to the public in an on-demand, pay-for-what-you-use basis.

The cloud model means accessing services over a network through an API, safely running multiple workloads on a single server, cloud storage, cloud computing, auto-scaling, elasticity, resource pooling, on-demand pay-for-what-you-use billing, load balancing, fault tolerance, leveraging managed services, and building software using a cloud-native paradigm.

The cloud model has rightly attracted customers because it's a powerful way to create, distribute, and use software, but the cloud model is not relegated to only the cloud.

The cloud model and, thus, cloud-native software development can work in any context—on-prem, colo, hybrid, edge, bare-metal. It’s the model that matters. Not who implements it or how it’s implemented.

This is what I mean when I say people aren't understanding the cloud. They over-index on specific cloud implementation instances and not the value and flexibility of the cloud model itself. Does the free market work for the cloud?

Another way to address the natural monopoly questions is to answer: does the free market work for the cloud?

This is where we have a problem. Like telecom, the cloud provider market slipped into oligopoly. There is little incentive for the Big 3 to drive costs lower, but that's not the same as having a monopoly that should split and regulate an industry.

There are many alternatives for running workloads. Take a look at this still incomplete list of cloud providers from Wikipedia. The problem is there's no easy alternatives for running workloads, and that's because there is no standard cloud API.

We have a cloud model that offers a cloud-native method of software development, but it may take several more technological evolutions for the model to advance enough to have a minimal viable portability layer. With container-native PaaS (App Runner and Cloud Run) we are getting close.

We have several players like Cloudflare innovating to become alternative lower-cost cloud providers. And we have k8s striving to become a cloud in a box droppable into any datacenter. You can switch between clouds with effort. It has been done. You can even go old school and use non cloud-native architectures. Anyone can rent a rack in a colo and stand up a system. Did anyone promise you a rose garden? I don't think so.

The larger point is there is a large competitive market amongst a group of competing enterprises. There's just no easy solution to the perceived problem of fungible cloud alternatives.

Is there a way to create greater competition and expand economic choice for cloud customers? I don't know, but high switching costs isn't a proper test for regulating an industry. It's a technical issue that's fixable with effort and investment. Maybe start investing into open source alternatives rather than complaining they don't already exist? Invent the future you want. A Cloud Provider Absorbs Huge Fixed and Sunk Costs

Cloud providers incur huge fixed costs for creating and maintaining a network of datacenters spread throughout the word. Let's say you regulate cloud providers by requiring them to decouple platform services from their other services, whatever those services may be.

Then we must rely on third parties to fill in the gap with managed services of all varieties that must operate out of every datacenter while surviving datacenter level and region level failures.

That's easy to do, right? Of course not, but let's ignore that very few organizations in the world have the technological know-how to create such managed services, especially without low level control of the entire system. But we're ignoring that.

The fixed and sunk costs incurred by the cloud provider will dwarf any investment from third party service providers.

How do you create the right incentives for the cloud provider to keep investing huge amounts of capital in infrastructure while recovering its fixed costs? If competitive service level pricing levels do not cover the cost of the infrastructure then prices would have to be raised to a level that would deny lower-margin users.

We've seen telecom rest and vest on their installed infrastructure, rent seeking by a combination of regulatory capture and the avoidance of the infrastructure investments that would improve and maintain the system.

The disincentive to game service pricing levels in order to create short-term profits at the expense of long-term investment will be overwhelming. It will be impossible to properly set service prices at a level that will provide the ROI required by cloud providers, third party service providers, while minimizing costs for customers so they also can earn an adequate ROI. There's simply not enough room for everyone to get their piece of the pie. What we'll inevitably see are the constantly rising prices and declining service levels experienced in other industries that are regulated as utilities.

This is the single greatest weakness of the proposal for vertically separating cloud providers.

Is that the kind of cloud you want?

Benefits of Vertical Integration

Where is the optimal point of vertical separation in a cloud provider's stack of services? What are the consequences of making the wrong decision?

Any deliberation must recognize the technological interdependence between cloud infrastructure technology and the cloud services that can be offered.

There's a huge short-term and long-term efficiency of services that depends on the successful coordination of cloud services and infrastructure.

For example, AWS created Nitro. Nitro is a revolutionary combination of purpose-built hardware and software designed to provide performance and security. Would Nitro have been invented if AWS was restricted to being a platform provider? Unlikely. It would have had no way of propagating Nitro across an entire vertical stack of hardware and software services. So why bother innovating?

Vertically-integrated cloud providers can optimize the entire system by ensuring that each component is combined in sub-systems that work together and are optimized as whole.

One consequence of this tight integration is the overall high reliability offered by clouds. How reliable would the cloud be if a cloud provider did not control their entire stack of services? Given the experience with finger pointing in other highly interdependent systems, it would be less reliable and outages would no doubt last longer.

Are there any positive externalities to cloud providers that might justify keeping cloud providers? Very large integrated cloud providers might reduce the overall number of servers required to run the world's workloads, but that's not sufficient reason for me not to regulate them. I think my other arguments are reason enough.

Please leave our legos alone.

Monday
Jul182022

Sponsored Post: G-Core Labs, Close, Wynter, Pinecone, Kinsta, Bridgecrew, IP2Location, StackHawk, InterviewCamp.io, Educative, Stream, Fauna, Triplebyte

Who's Hiring? 

  • Close is building the sales communication platform of the future and we're looking for a Site Reliability Engineer to help us accomplish that goal. If you have a passion for databases (both SQL and NoSQL), significant experience building, managing, and monitoring infrastructure, databases, and backend services at scale and want to work with a rag-tag team of hardworking, but humble humans, then come check us out!

  • Wynter is looking for system administrators, engineers, and developers to join its research panel. Participate in research surveys, get paid ($90-$180/hr) for your feedback and comments. Super low key commitment, 10-15 mins per survey. Learn more and sign up.

  • DevOps Engineer: At Kinsta, we set out to create the best managed hosting platform in the world. If you are an experienced DevOps Engineer who is constantly looking for ways to innovate and improve, we might just be the place for you! As Kinsta’s DevOps Engineer, you will be instrumental in making sure that our infrastructure is always on the bleeding edge of technology, remaining stable and high-performing at all times. If you love working with Linux, have a background in PHP, and have worked with cutting-edge technologies, such as Ansible and LXC, check us out and apply here.

  • SysOps Engineer: As Kinsta’s SysOps Engineer, you’ll play a key role in caring for the health of our servers, preventing issues, and responding immediately to mitigate any problems in our infrastructure. If you have experience in hosting and with the WordPress stack, have worked with Ubuntu or Debian-based systems, and cutting-edge technologies, such Ansible and LXC, you should definitely check us out and apply here!

  • JavaScript Developer: Kinsta’s Development Team aims to create truly effective solutions for our customers and our internal teams. We believe the only way to maintain and scale our standards is to focus on quality code. If you are a talented JavaScript developer and you would like to work with an ownership-embracing, highly skilled team in a relaxed environment where culture is as important as technical excellence, apply here.

  • InterviewCamp.io has hours of system design content. They also do live system design discussions every week. They break down interview prep into fundamental building blocks. Try out their platform.

  • Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.

  • Need excellent people? Advertise your job here! 

Cool Products and Services

  • G-Core Labs CDN is a next-generation content delivery network that helps you accelerate the dynamic and static content of your websites or applications. CDN has many built-in web security features and operates through a rapidly growing Anycast/DNS infrastructure with over 140 points of presence worldwide. Learn more here.

  • Search like you mean itPinecone is a vector database that makes it easy to add semantic search to your applications. It combines state-of-the-art vector search libraries like Faiss, advanced features such as filtering, and fully managed distributed infrastructure for high performance and reliability at any scale. Give your users better search results and recommendations and make them happy. Get started for free.

  • Bridgecrew is the cloud security platform for developers. By leveraging automation and delivering security-as-code, Bridgecrew empowers teams to find, fix, and prevent misconfigurations in deployed cloud resources and in infrastructure as code. Get started for free!

  • IP2Location is IP address geolocation service provider since 2002. The geolocation database or API detects location, proxy and other >20 parameters. The technology has been cited in more than 700 research papers and trusted by many Fortune 500 companies. Try it today!

  • Developers care about shipping secure applications. Application security products and processes, however, have not kept up with advances in software development. There are a new breed of tools hitting the market that enable developers to take the lead on AppSec. Learn how engineering teams are using products like StackHawk and Snyk to add security bug testing to their CI pipelines.

  • Stateful JavaScript Apps. Effortlessly add state to your Javascript apps with FaunaDB. Generous free tier. Try now!

  • Learn to balance architecture trade-offs and design scalable enterprise-level software. Check out Educative.io's 5-part learning path: Scalability and System Design for Developers. In addition to other content written by seasoned industry experts, this path includes core material from our bestselling course Grokking the System Design Interview. Join more than 500,000 other learners from top tech companies around the world. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorialStream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number.  

Fun and Interesting Events

  • Advertise here!

 

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Make Your Job Search O(1) — not O(n)

Triplebyte is unique because they're a team of engineers running their own centralized technical assessment. Companies like Apple, Dropbox, Mixpanel, and Instacart now let Triplebyte-recommended engineers skip their own screening steps.

We found that High Scalability readers are about 80% more likely to be in the top bracket of engineering skill.

Take Triplebyte's multiple-choice quiz (system design and coding questions) to see if they can help you scale your career faster.


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Monday
Jul112022

Stuff The Internet Says On Scalability For July 11th, 2022

Never fear, HighScalability is here!

My Stuff:

  • Love this Stuff? I need your support on Patreon to help keep this stuff going.

  • Know anyone who needs to go from zero to cloud? My book teaches them all they need to know about the cloud. Explain the Cloud Like I'm 10. It has 440 mostly 5 star reviews on Amazon. Here's a 100% sincere, heartfelt review:

  • Do you like Zone 2, Zone 5, interval, REHIT, HIIT, and bodyweight workouts? I made an app for that. Max Workout. I’m not just the programmer, I’m a client. I use it 4 times a week and if you want to retrain your mitochondria, get fit, get healthy, and live longer, I think you’ll like it too.

Number Stuff:

Click to read more ...

Friday
Jun102022

Sponsored Post: G-Core Labs, Close, Wynter, Pinecone, Kinsta, Bridgecrew, IP2Location, StackHawk, InterviewCamp.io, Educative, Stream, Fauna, Triplebyte

Who's Hiring? 

  • Close is building the sales communication platform of the future and we're looking for a Site Reliability Engineer to help us accomplish that goal. If you have a passion for databases (both SQL and NoSQL), significant experience building, managing, and monitoring infrastructure, databases, and backend services at scale and want to work with a rag-tag team of hardworking, but humble humans, then come check us out!

  • Wynter is looking for system administrators, engineers, and developers to join its research panel. Participate in research surveys, get paid ($90-$180/hr) for your feedback and comments. Super low key commitment, 10-15 mins per survey. Learn more and sign up.

  • DevOps Engineer: At Kinsta, we set out to create the best managed hosting platform in the world. If you are an experienced DevOps Engineer who is constantly looking for ways to innovate and improve, we might just be the place for you! As Kinsta’s DevOps Engineer, you will be instrumental in making sure that our infrastructure is always on the bleeding edge of technology, remaining stable and high-performing at all times. If you love working with Linux, have a background in PHP, and have worked with cutting-edge technologies, such as Ansible and LXC, check us out and apply here.

  • SysOps Engineer: As Kinsta’s SysOps Engineer, you’ll play a key role in caring for the health of our servers, preventing issues, and responding immediately to mitigate any problems in our infrastructure. If you have experience in hosting and with the WordPress stack, have worked with Ubuntu or Debian-based systems, and cutting-edge technologies, such Ansible and LXC, you should definitely check us out and apply here!

  • JavaScript Developer: Kinsta’s Development Team aims to create truly effective solutions for our customers and our internal teams. We believe the only way to maintain and scale our standards is to focus on quality code. If you are a talented JavaScript developer and you would like to work with an ownership-embracing, highly skilled team in a relaxed environment where culture is as important as technical excellence, apply here.

  • InterviewCamp.io has hours of system design content. They also do live system design discussions every week. They break down interview prep into fundamental building blocks. Try out their platform.

  • Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.

  • Need excellent people? Advertise your job here! 

Cool Products and Services

  • G-Core Labs CDN is a next-generation content delivery network that helps you accelerate the dynamic and static content of your websites or applications. CDN has many built-in web security features and operates through a rapidly growing Anycast/DNS infrastructure with over 140 points of presence worldwide. Learn more here.

  • Search like you mean itPinecone is a vector database that makes it easy to add semantic search to your applications. It combines state-of-the-art vector search libraries like Faiss, advanced features such as filtering, and fully managed distributed infrastructure for high performance and reliability at any scale. Give your users better search results and recommendations and make them happy. Get started for free.

  • Bridgecrew is the cloud security platform for developers. By leveraging automation and delivering security-as-code, Bridgecrew empowers teams to find, fix, and prevent misconfigurations in deployed cloud resources and in infrastructure as code. Get started for free!

  • IP2Location is IP address geolocation service provider since 2002. The geolocation database or API detects location, proxy and other >20 parameters. The technology has been cited in more than 700 research papers and trusted by many Fortune 500 companies. Try it today!

  • Developers care about shipping secure applications. Application security products and processes, however, have not kept up with advances in software development. There are a new breed of tools hitting the market that enable developers to take the lead on AppSec. Learn how engineering teams are using products like StackHawk and Snyk to add security bug testing to their CI pipelines.

  • Stateful JavaScript Apps. Effortlessly add state to your Javascript apps with FaunaDB. Generous free tier. Try now!

  • Learn to balance architecture trade-offs and design scalable enterprise-level software. Check out Educative.io's 5-part learning path: Scalability and System Design for Developers. In addition to other content written by seasoned industry experts, this path includes core material from our bestselling course Grokking the System Design Interview. Join more than 500,000 other learners from top tech companies around the world. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorialStream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number.  

Fun and Interesting Events

  • Advertise here!

 

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Make Your Job Search O(1) — not O(n)

Triplebyte is unique because they're a team of engineers running their own centralized technical assessment. Companies like Apple, Dropbox, Mixpanel, and Instacart now let Triplebyte-recommended engineers skip their own screening steps.

We found that High Scalability readers are about 80% more likely to be in the top bracket of engineering skill.

Take Triplebyte's multiple-choice quiz (system design and coding questions) to see if they can help you scale your career faster.


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.