Strategy: Use TensorFlow.js in the Browser to Reduce Server Costs


One of the strategies Jacob Richter describes (How we built a big data platform on AWS for 100 users for under $2 a month) in his relentless drive to lower AWS costs is moving ML from the server to the client.

Moving work to the client is a time honored way of saving on server side costs. Not long ago it might have seemed like moving ML to the browser would be an insane thing to do. What was crazy yesterday often becomes standard practice today, especially when it works, especially when it saves money:

Our post-processing machine learning models are mostly K-means clustering models to identify outliers. As a further step to reduce our costs, we are now running those models on the client side using TensorFlow.js instead of using an autoscaling EC2 container. While this was only a small change using the same code, it resulted in a proportionally massive cost reduction. 

To learn more Alexis Perrier has a good article on Tensorflow with Javascript Brings Deep Learning to the Browser:

Tensorflow.js has four layers: The WebGL API for GPU-supported numerical operations, the web browser for user interactions, and two APIs: Core and Layers. The low-level Core API corresponds to the former deeplearn.js library. It provides hardware-accelerated linear algebra operations and an eager API for automatic differentiation. The higher-level Layers API is used to build machine-learning models on top of Core. The Layers API is modeled after Keras and implements similar functionality. It also allows to import models previously trained in python with Keras or TensorFlow SavedModels and use it for inference or transfer learning in the browser. 

If you want a well paved path to success this is not it. I couldn’t find a lot of success stories yet of using TensorFlow in the browser.

One clear use case came from kbrose:

There are a lot of services that offer free or very cheap hosting of static websites. If you are able to implement your ML model in JS this allows you to deploy your product/app/whatever easily and with low cost. In comparison, requiring a backend server running your model is harder to setup and maintain, in addition to costing more. 

Related Articles



Stuff The Internet Says On Scalability For April 20th, 2018

Hey, it's HighScalability time:


Freeman Dyson dissects Geoffrey West's book on universal scaling laws, Scale. (Image: Steve Jurvetson)

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics. 

  • 5x: BPF over iptables; 51.21%: SSL certificates now issued by Let's Encrypt; 15,000x: acceleration from a genomics co-processor on long read assembly; 100 Million: Amazon Prime members; 20 minutes: time it takes a robot to assemble an Ikea chair; 1.7 Tbps: DDoS Attack; 200 Gb/sec: future network fabric speeds; $7: average YouTube earnings per 1000 views; 800 million: viruses cascading onto every square meter of the planet each day; <10m: error in  Uber's GPS enhancement; $45 million: total value of Bitcoin ransomware extortion; 

  • Quotable Quotes:
    • @sachinrekhi: Excited to read the latest Amazon shareholder letter. Amazing the scale they are operating at: 100M prime members, $20B AWS business, >50% of products sold from third-party sellers...Bezos seems intent on stressing the economic value Amazon creates for so many, including "560,000 employees, 2 million sellers, hundreds of thousands of authors, millions of AWS developers, and hundreds of millions of divinely discontent customers around the world"...@JeffBezos and I share a belief that perks and benefits are not the best way to create a great culture. Instead high standards are: "people are drawn to high standards – they help with recruiting and retention."
    • @kellabyte (good thread): We could really use a mature stable ring buffer implementation in #golang as an alternative to channels for high performance use cases. I need to pump hundreds of millions through a multi-producer single consumer data structure. Channels can’t keep up with that.
    • @allspaw (good thread): Advocates of “error budgets” - please research past similar approaches to quantifying risk (probabilistic risk assessment (PRA) and human reliability analysis (HRA)). The limitations these methods have and illusions they produce are greater than you might think.
    • @cloud_opinion: They [Google] are victims of internal politicking, currently container faction seems to be winning.
    • @aprilwensel~ Last tweet reminds me… I really need to follow through on my project idea "April on Joel on Software," wherein I'll detail the various toxic beliefs spread by Joel Spolsky's articles and explain how they've contributed to the mess of a tech industry we have now...I'm mostly referring to his early 2000s posts relating to interviews and hiring and what makes a good engineer. Glorifying "smart" at the cost of being a decent person, reinforcing harmful stereotypes, promoting arrogance and elitism, etc. There are no doubt gems in there too. :)
    • Murat: To motivate the importance of the cache hit rate research, Asaf mentioned the following. The key-value cache is 100x faster than database. For Facebook if you can improve its cache hit rate of 98%, by just another additional 1%, the performance would improve 35%.
    • Jacob Richter: As a further step to reduce our costs, we are now running those models on the client side using TensorFlow.js instead of using an autoscaling EC2 container. While this was only a small change using the same code, it resulted in a proportionally massive cost reduction.
    • @JeffDean: We just posted new DAWNBench results for ImageNet classification training time and cost using Google Cloud TPUs+AmoebaNet (architecture learned via evolutionary search).  You can train a model to 93% top-5 accuracy in <7.5 hours for <$50.
    • Packet Pushers: Businesses are forecast to spend $1.67 trillion on technology (hardware, software, and services) in 2018. Roughly half of that spending (50.5%) will come from the IT budget while the other half (49.5%) will come from the budgets of technology buyers outside of IT.  
    • There are so many more quotes...

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Google: A Collection of Best Practices for Production Services

This excerpt—Appendix B - A Collection of Best Practices for Production Services—is from Google's awesome book on Site Reliability Engineering. Worth reading if it hasn't been on your radar. And it's free!


Fail Sanely

Sanitize and validate configuration inputs, and respond to implausible inputs by both continuing to operate in the previous state and alerting to the receipt of bad input. Bad input often falls into one of these categories:

Incorrect data

Validate both syntax and, if possible, semantics. Watch for empty data and partial or truncated data (e.g., alert if the configuration is N% smaller than the previous version).

Delayed data

This may invalidate current data due to timeouts. Alert well before the data is expected to expire.

Fail in a way that preserves function, possibly at the expense of being overly permissive or overly simplistic. We’ve found that it’s generally safer for systems to continue functioning with their previous configuration and await a human’s approval before using the new, perhaps invalid, data.


In 2005, Google’s global DNS load- and latency-balancing system received an empty DNS entry file as a result of file permissions. It accepted this empty file and served NXDOMAIN for six minutes for all Google properties. In response, the system now performs a number of sanity checks on new configurations, including confirming the presence of virtual IPs for, and will continue serving the previous DNS entries until it receives a new file that passes its input checks.

In 2009, incorrect (but valid) data led to Google marking the entire Web as containing malware [May09]. A configuration file containing the list of suspect URLs was replaced by a single forward slash character (/), which matched all URLs. Checks for dramatic changes in file size and checks to see whether the configuration is matching sites that are believed unlikely to contain malware would have prevented this from reaching production.

Progressive Rollouts

Click to read more ...


Stuff The Internet Says On Scalability For April 13th, 2018

Hey, it's HighScalability time:


Bathroom tile? Grandma's needlepoint? Nope. It's a diagram of the dark web. Looks surprisingly like a tumor.

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics. 

  • $23 billion: Amazon spend on R&D in 2017; $0.04: cost to unhash your email address; $35: build your own LIDAR; 66%: links to popular sites on Twitter come from bots; 60.73%: companies report JavaScript as primary language; 11,000+: object dataset provide real objects with associated depth information; 150 years: age of the idea of privacy; 30%~ AV1's better video compression; 100s of years: rare-earth materials found underneath Japanese waters; 67%: better image compression using Generative Adversarial Networks; 1000 bit/sec: data exfiltrated from air-gapped computers through power lines using conducted emissions; 

  • Quotable Quotes:
    • @Susan_Hennessey: Less than two months ago, Apple announced its decision to move mainland Chinese iCloud data to state-run servers.
    • @PaulTassi: Ninja's New 'Fortnite' Twitch Records: 5 Million Followers, 250,000 Subs, $875,000+ A Month via @forbes
    • @iamtrask: Anonymous Proof-of-Stake and Anonymous, Decentralized Betting markets are fundamentally rule by the rich. If you can write a big enough check, you can cause anything to happen. I fundamentally disagree that these mechanisms create fair and transparent markets.
    • David Rosenthal: The redundancy needed for protection is frequently less than the natural redundancy in the uncompressed file. The major threat to stored data is economic, so compressing files before erasure coding them for storage will typically reduce cost and thus enhance data survivability.
    • @mjpt777: The more I program with threads the more I come to realise they are a tool of last resort.
    • JPEG XS~ For the first time in the history of image coding, we are compressing less in order to better preserve quality, and we are making the process faster while using less energy. Expected to be useful for virtual reality, augmented reality, space imagery, self-driving cars, and professional movie editing.
    • Martin Thompson: 5+ years ago it was pretty common for folks to modify the Linux kernel or run cut down OS implementations when pushing the edge of HFT. These days the really fast stuff is all in FPGAs in the switches. However there is still work done on isolating threads to their own exclusive cores. This is often done by exchanges or those who want good predictable performance but not necessarily be the best. A simple way I have to look at it. You are either predator or prey. If predator then you are mostly likely on FPGAs and doing some pretty advanced stuff. If prey then you don't want to be at the back of the herd where you get picked off. For the avoidance of doubt if you are not sure if you are prey or predator then you are prey. ;-)
    • Brian Granatir: serverless now makes event-driven architecture and microservices not only a reality, but almost a necessity. Viewing your system as a series of events will allow for resilient design and efficient expansion. DevOps is dead. Serverless systems (with proper non-destructive, deterministic data management and testing) means that we’re just developers again! No calls at 2am because some server got stuck? 
    • @chrismunns: I think almost 90% of the best practices of #serverless are general development best practices. be good at DevOps in general and you'll be good at serverless with just a bit of effort
    • David Gerard: Bitcoin has failed every aspiration that Satoshi Nakamoto had for it. 
    • @joshelman: Fortnite is a giant hit. Will be bigger than most all movies this year. 
    • @swardley: To put it mildly, the reduction in obscurity of cost through serverless will change the way we develop, build, refactor, invest, monitor, operate, organise & commercialise almost everything. Micro services is a storm in a tea cup compared to this category 5.
    • James Clear: The 1 Percent Rule is not merely a reference to the fact that small differences accumulate into significant advantages, but also to the idea that those who are one percent better rule their respective fields and industries. Thus, the process of accumulative advantage is the hidden engine that drives the 80/20 Rule.
    • Ólafur Arnalds: MIDI is the greatest form of art.
    • Abraham Lincoln: Give me six hours to chop down a tree and I will spend the first four sharpening the axe.
    • @RichardWarburto: Pretty interesting that async/await is listed as essentially a sequential programming paradigm.
    • @PatrickMcFadin: "Most everyone doing something at scale is probably using #cassandra" Oh. Except for @EpicGames and @FortniteGame They went with MongoDB. 
    • Meetup: In the CloudWatch screenshot above, you can see what happened. DynamoDB (the graph on the top) happily handled 20 million writes per hour, but our error rate on Lambda (the red line in the graph on the bottom) was spiking as soon as we went above 1 million/hour invocations, and we were not being throttled. Looking at the logs, we quickly understood what was happening. We were overwhelming the S3 bucket with PUT requests
    • Sarah Zhang: By looking at the polarization pattern in water and the exact time and date a reading was taken, Gruev realized they could estimate their location in the world. Could marine animals be using these polarization patterns to navigate through the ocean? 
    • Vinod Khosla: I have gone through an exercise of trying to just see if I could find a large innovation coming out of big companies in the last twenty five years, a major innovation (there’s plenty of minor innovations, incremental innovations that come out of big companies), but I couldn’t find one in the last twenty five years.
    • Click through for lots more quotes.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Sponsored Post: InMemory.Net, Educative, Triplebyte, Exoscale, Loupe, Etleap, Aerospike, Scalyr, Domino Data Lab, MemSQL

Who's Hiring? 

  • Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • 5 Signs You’ve Outgrown DynamoDB. Companies often select a database that seems to be the best choice at first glance, as well as the path of least resistance, and then are subsequently surprised by cost overruns and technology limitations that quickly hinder productivity and put the business at risk. This seems to be the case with many enterprises that chose Amazon Web Service’s (AWS) DynamoDB. In this white paper we’ll cover elements of costing as well as the results of benchmark-based testing. Read 5 Signs You’ve Outgrown DynamoDB to determine if your organization has outgrown this technology.

  • Advertise your event here!

Cool Products and Services

  • Datadog is a cloud-scale monitoring platform that combines infrastructure metrics, distributed traces, and logs all in one place. With out-of-the-box dashboards and seamless integrations with over 200 technologies, Datadog provides end-to-end visibility into the health and performance of modern applications at scale. Build your own rich dashboards, set alerts to identify anomalies, and collaborate with your team to troubleshoot and fix issues fast. Start a free trial and try it yourself.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
  • For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike older enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own. Read stories from customers like Okta and PagerDuty, or try Etleap yourself.

  • Educative provides interactive courses for software engineering interviews created by engineers from Facebook, Microsoft, eBay, and Lyft. Prepare in programming languages like Java, Python, JavaScript, C++, and Ruby. Design systems like Uber, Netflix, Instagram and more. More than 10K software engineers have used Coderust and Grokking the System Design Interview to get jobs at top tech companies like Facebook, Google, Amazon, Microsoft, etc. Ace your software engineering interviews today. Get started now

  • Gartner’s 2018 Magic Quadrant for Data Science and Machine Learning Platforms. Read Gartner’s most recent 2018 release of the Magic Quadrant for Data Science and Machine Learning Platforms. A complimentary copy of this important research report into the data science platforms market is offered by Domino. Download the report to learn: 
    • How Gartner defines the Data Science Platform category, and their perspective on the evolution of the data science platform market in 2018. 
    • Which data science platform is right for your organization. 
    • Why Domino was named a Visionary in 2018.

  • Exoscale GPU Cloud Servers. Powerful on-demand GPU. Perfect for your machine learning, artificial, and encoding workloads. GPU instances work exactly like other instances: they are billed by the minute and integrate seamlessly with your existing infrastructure. Tap the GPU's full power with direct passthrough access. Speed-up Tensorflow or any other Deep Learning, Big Data, AI, or Encoding workload. Start your GPU instances via our API or with your existing deployment management tools. Add parallel computational power to your stack with no effort. Get Started

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here:

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Make Your Job Search O(1) — not O(n)

Triplebyte is unique because they're a team of engineers running their own centralized technical assessment. Companies like Apple, Dropbox, Mixpanel, and Instacart now let Triplebyte-recommended engineers skip their own screening steps.

We found that High Scalability readers are about 80% more likely to be in the top bracket of engineering skill.

Take Triplebyte's multiple-choice quiz (system design and coding questions) to see if they can help you scale your career faster.

The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don't be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Give Meaning to 100 billion Events a Day - The Analytics Pipeline at Teads

This is a guest post by Alban Perillat-Merceroz, Software Engineer at

In this article, we describe how we orchestrate Kafka, Dataflow and BigQuery together to ingest and transform a large stream of events. When adding scale and latency constraints, reconciling and reordering them becomes a challenge, here is how we tackle it.

Teads for Publisher, one of the webapps powered by Analytics


In digital advertising, day-to-day operations generate a lot of events we need to track in order to transparently report campaign’s performances. These events come from:

  • Users’ interactions with the ads, sent by the browser. These events are called tracking events and can be standard (start, complete, pause, resume, etc.) or custom events coming from interactive creatives built with Teads Studio. We receive about 10 billion tracking events a day.
  • Events coming from our back-ends, regarding ad auctions’ details for the most part (real-time bidding processes). We generate more than 60 billion of these events daily, before sampling, and should double this number in 2018.

In the article we focus on tracking events as they are on the most critical path of our business.

Simplified overview of our technical context with the two main event sources


Tracking events are sent by the browser over HTTP to a dedicated component that, amongst other things, enqueues them in a Kafka topic. Analytics is one of the consumers of these events (more on that below).

We have an Analytics team whose mission is to take care of these events and is defined as follows:

We ingest the growing amount of logs,
We transform them into business-oriented data,
Which we serve efficiently and tailored for each audience.

To fulfill this mission, we build and maintain a set of processing tools and pipelines. Due to the organic growth of the company and new products requirements, we regularly challenge our architecture.

Why we moved to BigQuery

Click to read more ...


Stuff The Internet Says On Scalability For April 6th, 2018

Hey, it's HighScalability time:


Programmable biology - engineered cells execute programmable multicellular full-adder logics. (Programmable full-adder computations)

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics. 

  • $1: AI turning MacBook into a touchscreen; $2000/month: BMW goes subscription; 20MPH: 15′ Tall, 8000 Pound Mech Suit; 1,511,484 terawatt hours: energy use if bitcoin becomes world currency; $1 billion: Fin7 hacking group; 1.5 million: ethereum TPS, sort of; 235x: AWK faster than Hadoop cluster; 37%: websites use a vulnerable Javascript library; $0.01: S3, 1 Gig, 1 AZ; 

  • Quotable Quotes:
    • Huang’s Law~ GPU technology advances 5x per year because the whole stack can be optimized. 
    • caseysoftware: Metcalfe lives here in Austin and is involved in the local startup community in a variety of ways.  One time I asked him how he came up with the law and he said something close to: "It's simple! I was selling network cards! If I could convince them it was more valuable to buy more, they'd buy more!" As an EE who studied networks, etc in college, it was jarring but audacious and impressive.  He was either BSing all of us ~40 years ago or in that conversation a few years ago.. but either way, he helped make our industry happen.
    • Adaptive nodes: the consensus that the learning process is attributed solely to the synapses is questioned. A new type of experiments strongly indicates that a faster and enhanced learning process occurs in the neuronal dendrites, similarly to what is currently attributed to the synapses
    • @dwmal1: Spotted this paper via @fanf. The idea is amazing: "We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread", and find that several frameworks fail to beet a single core even when given 128 cores. 
    • David Rosenthal: But none of this addresses the main reason that flash will take a very long time to displace hard disk from the bulk storage layer. The huge investment in new fabs that would be needed to manufacture the exabytes currently shipped by hard disk factories, as shown in the graph from Aaron Rakers. This investment would be especially hard to justify because flash as a technology is close to the physical limits, so the time over which the investment would have to show a return is short.
    • @asymco: There are 400 million registered bike-sharing users and 23 million shared bikes in China. There were approximately zero of either in 2016. Fastest adoption curve I’ve ever seen (and I’ve seen 140).
    • @anildash: Google’s decision to kill Google Reader was a turning point in enabling media to be manipulated by misinformation campaigns. The difference between individuals choosing the feeds they read & companies doing it for you affects all other forms of media.
    • The Memory Guy: has recently been told that memory makers’ research teams have found a way to simplify 3D NAND layer count increases.
    • @JohnONolan: First: We seem to be approaching (some would argue, long surpassed) Slack-team-saturation. It’s just not new or shiny anymore, and where Slack was the “omg this is so much better than before” option just a few years ago — it has now become the “ugh… not another one” thing
    • Memory Guy: The industry has  moved a very long way over the last 40 years, but I need not mention this to anyone who’s involved in semiconductors.  In 1978 a Silicon Valley home cost about $100,000, or about the cost of a gigabyte of DRAM.  Today, 40 years later, the average Silicon Valley home costs about $1 million and a gigabyte of DRAM costs about $7.
    • CockroachDB: A three-node, fully-replicated, and multi-active CockroachDB 2.0 cluster achieves a maximum throughput of 16,150 tpmC on a TPC-C dataset. This is a 62% improvement over our 1.1 release. Additionally, the latencies on 2.0 dropped by up to 81% compared to 1.1, using the same workload parameters. That means that our response time improved by 544%.
    • @Carnage4Life: Interesting thread about moving an 11,000 user community from Slack to Discourse. It's now quite clear that Slack is slightly better than email for small groups but is actually worse than the alternatives for large groups
    • Paul Barham: You can have a second computer once you’ve shown you know how to use the first one.
    • Nate Kupp~ petabyte hadoop cluster Apple uses to understand battery life on iphone and ipad looking at logging data coming off those devices.
    • More quotes. More stuff. Go get it.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Do you have too many microservices? - Five Design Attributes that can Help

This is a guest Post by Jake Lumetta, Founder and CEO, ButterCMS, an API-first CMS. For more content like this, follow @ButterCMS on Twitter and subscribe to our blog.

Are your microservices too small or too tightly coupled? Are you confident in your decision-making about service boundaries? In interviews with dozens of experienced CTOs, they offered design attributes that they consider when creating a set of microservices. This article distills that wisdom into five key principles to help you better design microservices.

The importance of microservice boundaries

The design attributes discussed below matter because reaping the benefits of microservices requires designing thoughtful microservice boundaries.

One of the major challenges when it comes to creating a new system with a microservice architecture. It came about when I mentioned that one of the core benefits of developing new systems with microservices is that the architecture allows developers to build and modify individual components independently — but problems can arise when it comes to minimizing the number of callbacks between each API. The solution according to McFadden, is to apply the appropriate service boundaries.

But in contrast to the sometimes difficult-to-grasp and abstract concept of domain driven design (DDD) —  a framework for microservices — I’ll be as practical as I can in this chapter as I discuss the need for well defined microservice boundaries with some of our industry’s tops CTOs.

First, avoid arbitrary rules

Click to read more ...


How ipdata serves 25M API calls from 10 infinitely scalable global endpoints for $150 a month

This is a guest post by Jonathan Kosgei, founder of ipdata, an IP Geolocation API. 

I woke up on Black Friday last year to a barrage of emails from users reporting 503 errors from the ipdata API.

Our users typically call our API on each page request on their websites to geolocate their users and localize their content. So this particular failure was directly impacting our users’ websites on the biggest sales day of the year. 

I only lost one user that day but I came close to losing many more.

This sequence of events and their inexplicable nature — cpu, mem and i/o were nowhere near capacity. As well as concerns on how well (if at all) we would scale, given our outage, were a big wake up call to rethink our existing infrastructure.

Our Tech stack at the time

Click to read more ...


Stuff The Internet Says On Scalability For March 30th, 2018

Hey, it's HighScalability time:


Objective painting is not good painting unless it is good in the abstract sense. A hill or tree cannot make a good painting just because it is a hill or tree. It is lines and colors put together so that they may say something.” – Georgia O’Keeffe


If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics.


  • 6,000: new viri spotted by AI; 300,000: Uber requests per second; 10TB & 600 years: new next-gen optical disk; 32,000: sites running Coinhive’s JavaScript miner code; $1 billion: Uber loss per quarter; 3.5%: global NAND flash output lost to power outage; 100TB: new SSD; 48TB: RAM on one server; 200 million: Telegram monthly active users; 2,000: days Curiosity Rover on Mars; 225: emerging trends; 4,425: SpaceX satellites approved; 

  • Quotable Quotes:
    • @msuriar: Uber's worst outage ever: - Master log replication to S3 failed. - Logs backup up on primary. - Alerts fire but are ignored. - Disk full on primary. - Engineer deletes unarchived WAL files. - Config error prevents failover/promotion. #SREcon
    • @thecomp1ler: Most powerful Xeon is the 28 core Platinum 2180 at $10k RSP and >200W TDP. Due to the nature of Intel turbo boost it almost always operates at TDP under load. Our ARM is the Centriq 2452 at less than $1400 RSP, 46 cores and 120W TDP, that it never hits. Beats Xeon 9/10 workloads.
    • Arjun Narayan: This is also finally the year when people start to wake up and realize they care about serializability, because Jeff Dean said its important. Michael Stonebraker, meanwhile, has been shouting for about 20 years and is very frustrated that people apparently only care when The Very Important Googlers say its important.
    • @xleem: Simple Recipe: 1) Identify system boundaries 2) Define capabilities exposed 3) Plain english definitions of availabilty 4) Define technical SLO 5) measure baseline 6) set targets 7) iterate#SREcon
    • Jordan Ellenberg~ For seven years, a group of students from MIT exploited a loophole in the Massachusetts State Lottery’s Cash WinFall game to win drawing after drawing, eventually pocketing more that $3 million. How did they do it? How did they get away with it? And what does this all have to do with mathematical entities like finite geometries, variance of probability distributions, and error-correcting codes?
    • Jeff Dean: ML hardware is at its infancy. Even faster systems and wider deployment will lead to many more breakthroughs across a wide range of domains. Learning in the core of all of our computer systems will make them better/more adaptive. There are many opportunities for this.
    • Teller: Here's a compositional secret. It's so obvious and simple, you'll say to yourself, "This man is bullshitting me." I am not. This is one of the most fundamental things in all theatrical movie composition and yet magicians know nothing of it. Ready? Surprise me.
    • @kcoleman: OH (from an awesome Lyft driver): “Today has been great. I’ve been blessed by the algorithm.” Immediately had an eerie feeling that this could become an increasingly common way to describe a day.
    • Jim Whitehurst [Red Hat chief executive]: We added hundreds of customers in the last year, while Pivotal only added 44 new customers. Their average deal size is $1.5 million, quite large. So they are more the top-down, big company kind of focus. We have over 650 customers [for OpenShift], we added hundreds this past year, and we are growing faster than Pivotal. We thought we were are performing favorably compared to them, but this is the first time we had the data to really compare.
    • @danctheduck: My GDC takeaway: Everyone who is making games-as-a-service is getting most of their actual traction by building co-op MMOs. But very few of them realize this is what they are doing. So they keep sabotaging their communities with bizarro design philosophies. Short version (1/2) - People want to do fun activities with friends. The higher order bit. Super sticky. - Core gameplay collapses to this. Ex: Team vs Team plus match making is just another way of make 'fair' Team vs Environment (aka coop) - We focus a lot on PvP, competition or esports. But many of those are *aspirational* for players. Not actually desirable. - Or we fixate on single player genre tropes. Which may be a familiar reason to *start* playing, but aren't always key to why people *continue* playing.
    • @iamtrask: Lots of folks are optimistic about #blockchain. I recently came across a difficult question...If a zero-knowledge proof can prove to users that a centralized service performed honest computation, why decentralize it? We live in free markets... I see a correction coming...
    • Katherine Bourzac: Shanbhag thinks it’s time to switch to a design that’s better suited for today’s data-intensive tasks. In February, at the International Solid-State Circuits Conference (ISSCC), in San Francisco, he and others made their case for a new architecture that brings computing and memory closer together. The idea is not to replace the processor altogether but to add new functions to the memory that will make devices smarter without requiring more power. Industry must adopt such designs, these engineers believe, in order to bring artificial intelligence out of the cloud and into consumer electronics.
    • @dylanbeattie: When npm was first released in 2010, the release cycle for typical nodeJS package was 4 months, and npm restore took 15-30 seconds on an average project. By early 2018, the average release cycle for a JS package was 11 days, and the average npm restore step took 3-4 minutes. 1/11
    • @davemark: THREAD:  J.C.R. Licklider was one of the true pioneers of computer science. Back in about 1953, Licklider built something called a Watermelon Box. If it heard the word watermelon, it would light up an LED. That’s all it did. But it was the start of a huge wave. //@reneritchie
    • smudgymcscmudge: I have to admit that the switch from “free software” to “open source” worked on me. Early in my career I was intrigued by the idea, but couldn’t get past how “free” software was a sustainable model. I started to get it at around the same time the terminology changed.
    • @msuriar: CPU attack: spin up something that burns 100% CPU. (openssl or something). What do you expect to happen? What actually happens? #SREcon
    • Forrest Brazeal: The way I describe it is: functions as a service are cloud glue. So if I’m building a model airplane, well, the glue is a necessary part of that process, but it’s not the important part. Nobody looks at your model airplane and says: “Wow, that’s amazing glue you have there.” It’s all about how you craft something that works with all these parts together, and FaaS enables that.
    • You want more quotes? There are lots more. Can you handle the truth? Click through and test yourself.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...