Stuff The Internet Says On Scalability For December 7th, 2018
Friday, December 7, 2018 at 8:59AM
HighScalability Team in hot links
Wake up! It's HighScalability time:
Do you like this sort of Stuff? Please support me on Patreon. I'd really appreciate it. Know anyone looking for a simple book explaining the cloud? Then please recommend my well reviewed (31 reviews on Amazon and 72 on Goodreads!) book: Explain the Cloud Like I'm 10. They'll love it and you'll be their hero forever. And if you know someone with hearing problems they might find Live CC very useful.
- $181.5M: top 10 YouTube star earnings; 153 million/1.2 billion/27 billion: Reddit posts/comments/votes; 12%: cut for Epic's new Steam competitor; 2,368: Baidu AI patents; 360: Walmart cleaning robots; 2:1: AMD CPUs outselling Intel; 11157: images in Disguised Faces in the Wild dataset; 7,518: FCC approved SpaceX LEO satellites; $40 million: saved Down Under by Tesla’s giant battery; intense beam of ultra-high energy heavy ions: testing AI chips for space; $100 million: bot heist; 16TB: HAMR hard drives; 12.8Tbps: new Tofino ASIC; 85%: cloud-hosted TensorFlow workloads run on AWS; 39%: tech workers depressed; 16%: drop in GPU shipments; 4 x 10βΈβ΄: photons ever emitted in the universe; 0.00%: blockchain success rate;
- Quotable Quotes:
- How do you level up in China?
- Is AWS running out of ideas? Not yet, though the pace may be slowing, as must eventually happen when you've reached Total Addressable Feature Space. You can watch re:Invent videos from a curated list or get the whole enchilada on YouTube. Don't want to spend the rest of your life watching video? Recaps to the rescue! The Cloudcast podcast covered All of the 2018 AWS reInvent Announcements. re:Invent 2018 Security Review. Comic Relief's Takeaways from AWS re:Invent. InfoQ's Recap of AWS re:Invent 2018 Announcements. Jeremy Daly with his re:Capping re:Invent. James Beswick with What I learned from AWS re:Invent 2018. There's The somewhat different AWS re:Invent recap. Jennine Townsend with Notable AWS re:invent Sessions. And of course there's Netflix at AWS re:Invent 2018.
- Scaling has always meant specialization. Bleeding edge hyperscalers throughout history have built custom solutions and their ideas have trickled down to become industry standard practice. In a fascinating twist, James Hamilton talks about how a similar process works for hardware in the cloud. AWS Inferentia Machine Learning Processor: Whereas in the past it was nearly impossible for an enterprise to financially justify hardware specialization in all but fairly exotic workloads, in the cloud there are thousands to possibly tens of thousands of even fairly rare workloads. Suddenly, not only is it possible to use hardware optimized for a specific workload type, but it would be crazy not to. In many cases it can deliver an order of magnitude of cost savings, consume as little as 1/10th the power, and these specialized solutions can allow you to give your customers better service at lower latency. Hardware specialization is the future. Believing that hardware specialization is going to be a big part of server-side computing going forward, Amazon has had a custom ASIC team focused on AWS since early 2015 and, prior to that, we worked with partners to build specialized solutions. Two years ago at re:Invent 2016, I showed the AWS custom ASIC that has been installed in all AWS servers for many years (Tuesday Night Live with James Hamilton). Even though this is a very specialized ASIC, we install more than a million of these ASIC annually and that number continues to accelerate. In the server world, it’s actually a fairly high volume ASIC.
- Videos from All Things Open are now available.
- Uber has an interesting way of dealing with network lag and offline mode. How Uber’s New Driver App Overcomes Network Lag: Any component of the driver app capable of operating optimistically begins the flow by submitting an optimistic request. An optimistic request has the ability to serialize and deserialize to disk, very similar to a regular network request, and every optimistic request is paired with an optimistic transform. When an optimistic request is submitted to the client, the transform associated with the request is applied immediately to move the app into an optimistic state, making it appear that the request has completed. The optimistic state outputted from the transform will be maintained until a response from the server is received with the actual state, syncing app and server...we have observed that the average time saved per optimistic operation is about 13.5 seconds. Even at this early stage in the new driver app’s life we are totaling over a year’s worth of continuous driver time saved in aggregate each and every day.
- Analysis showed that it was 90% cheaper to run the video transcoders inhouse. Here's how Egnyte serves video at scale. First thing they did was look at the actual kind of video users consumed and tailored their system to fit. For their global messaging system they started with the latest Google PubSub client but reverted to using a previous version, since the latest one would not renew leases for a longer duration. FFmpeg is by far the best free software available for transcoding videos and it has very good HLS support. Theirr first preference was to leverage a serverless architecture for deploying video transcoders. Video transcoding is a CPU-intensive operation and needs specific hardware like dedicated CPUs (or even GPUs) with enough memory and native ffmpeg installed on it. It was challenging to build this on serverless. They determined Kubernetes suited them better, so they created Alpine docker containers of FFmpeg + Python and deployed these within Kubernetes. They found that video transcoding jobs run faster on GPUs, but doing this isn’t cost effective. The best trade-off between speed and cost was allocating 4 CPUs to each video transcoder job. At 4 CPUs, we were able to process videos at about 25-40% of the video play time. In other words, a 1-hour video would take about 15-25 minutes to transcode. Adding more CPUs to a video transcoder job did not produce linear benefits so it was best to stick to 4 CPUs per job and instead provision more jobs. They set up a bulkhead between video service and their regular service because video can go viral. Their video service is based on OpenResty, with all authentication and video discovery written in Lua. It's deployed on a cache on a dedicated infrastructure fronted by a dedicated domain name such as media.egnyte.com. This video service does not share any infrastructure components like firewalls, switches, and ISP links with our primary services which allows us to scale the video service and rate limit users purely on our video needs.
- IMHO academics don't need to dumb down content to reach a broader audience, they just need to write better. The Myth of ‘Dumbing Down’.
- Put on those shades, the future looks k8s. The State of K8s 2018: Kubernetes has crossed the chasm. About 60% of respondents are using Kubernetes today, and 65% expect to be using the technology in the next year...Half of the organizations running Kubernetes are doing so in production. The bigger and more complex the organization, the more likely they’re already in production; 77% of organizations with more than 1,000 developers and 88% of organizations with more than 1,000 containers...63% are running stateful apps, 53% have entrusted data analytics to the platform and 31% operate IoT apps on Kubernetes... 63% of organizations that have deployed Kubernetes are immediately using their resources more efficiently. And 58% have shortened their software development cycles.
- Good example of evolving a small system to a more complex system. Distributed Systems: When you should build them, and how to scale. A step-by-step guide: My main point is: don’t try to build the perfect system when you start your product. Most of your design choices will be driven by what your product does and who is using it...Focus on figuring out what people need, and try to come up with a solution to their problem, even if it has a lot of manual steps. Then think about ways to automate, spend your time coding and destroying, and use third parties where it makes sense...Don’t scale but always think, code, and plan for scaling. Build your system step by step, don’t address system design issues based on features that are not mature yet, and finally always try to find the best trade-off between the time you will spend and the gain in performance, money, and lowered risk.
- Facebook's Mobile @Scale — Tel Aviv recap. Titles include: Learnings for scaling mobile dev at Facebook; Building for emerging markets.
- Bringing the operational infrastructure in-house is a huge undertaking. It may be less efficient early on, but it has opened the gates for scaling Periscope Data to what it is today and has laid the groundwork for future growth and optimization. 9 Lessons Learned Migrating From Heroku To Kubernetes with Zero Downtime: As we grew, so did our infrastructure requirements. Heroku wasn't able to keep up with all of those requirements and that's why we moved to hosting Kube ourselves...Kubernetes offers an excellent set of tools to manage containerized applications. You can think of it as managing a desired state for the containers...Lesson #1: A reverse proxy app can be a powerful tool to manage HTTP requests...Lesson #2: Horizontally scalable services don't always scale across different deployments...Lesson #3: A carefully chosen ratio of concurrency resources can lead to a highly optimized setup...Lesson #4: Achieve zero downtime during migration by using a reverse proxy app...Lesson #5: Managing releases to two production environments is highly error prone...Lesson #6: Database connection pooling is a great optimization in conserving database resources...Lesson #7: Make sure to explicitly specify CPU and memory, request and limit value...Lesson #8: CPU throttling looks like 5xx errors...Lesson #9: Be proactive with training and enablement.
- Didn't we know this three years ago? 'Low code' and 'no code' products is the hottest trend in enterprise startups. Also, The Cloud Is the New OS - A Developer's Perspective.
- Playing games is no game. It takes a lot of work. Riot Games built several competing authentication and authorization implementations running in parallel. But there can be only one. The one is OpenID Connect, they made it their standard for handling authentication and authorization. How is detailed in a richly detailed article on Globalizing Player Accounts. There's also a Re:Invent talk. The context: "We deploy League of Legends to 12 disparate game shards in Riot-operated regions, and many more in China and southeast Asia via our publishing partners Tencent and Garena. With 10 clustered databases storing hundreds of millions of player account records, hundreds of thousands of valid logins and failed authentication requests, and over a million account lookups per minute, we have our work cut out for us." They chose Continuent Tungsten Clustering suite which consists of a number of processes that live alongside MySQL that wrap around and manage the cluster. Database instances are deployed using Terraform and allocated static IP addresses when provisioned. The services are all containerized and launched via docker-compose, which is written through userdata startup scripts. All general maintenance, restarts, and upgrades are managed by Ansible. It's deployed to a multi-region composite cluster with the intended primary residing in us-west-2. As you can see in the image above, there are three nodes per region with a primary or relay and two secondaries. A relay functions as a local read-only primary that replicates off of the current global primary node. The secondaries local to each relay replicate off of their local relay node. The connector is configured to send read requests to the most up-to-date node in the same AWS region, but if there is a local outage, requests will be proxied off to the other regions. This global cluster has a single write primary, and each of our backend services that do writes to the database connect to the appropriate primary over the DirectConnect backend by leveraging a connector. There's also an interesting story about how high CPU consumption can cause havoc through failed health checks.
- Videos from Code Mesh LDN 2018 are now available. You might love: From quadcopters to helicopters: formal verification for safer vehicles.
- What if all that magic neural net dust isn't necessary? What to do with Big Data? Making ML useful is a platform problem: Despite the widespread collection of data at scale, there’s little evidence that most enterprises are successful in efficiently realizing this value...on this structured data, much simpler models often perform nearly as well. Instead, the bottleneck is in simply putting the data to use...Buried on page 12 of the Supplemental Materials, we see that logistic regression (appearing in lecture 3 of our intro ML class at Stanford) “essentially performs just as well as Deep Nets” for these predictive tasks, coming within 2-3% accuracy without any manual feature engineering...For many use cases, putting data to work doesn’t require a new deep network, or more efficient neural architecture search. Instead, it requires new software tools...Help navigate organizations’ existing data at scale...Provide results users can trust...Work alongside users.
- AWS Aurora vs Google Cloud SQL Pricing. AWS Aurora (Reserved Instance): $145.23/mo. Google Cloud SQL: $322.62/mo. Also, How to Figure Out AWS Pricing for a Mobile Application.
- We fight against it, but every force is aligning in the direction of removing humans from the hunt-kill and other OODA loops. And we know how humans love to take the easier path. In the military context, it’s going to be manned and unmanned teaming. No one believes that all pilots will leave all the airplanes. It’s not going to happen. There’s only so many variables that you can program into any piece of software. A retired Navy captain explains how drones will shape the future of war: It used to be that a warrior prepares, trains, deploys to a foreign location where he is face-to-face with an enemy, he may or may not survive, and at the end, he comes home...One of the other things that’s important for drones is not only that there is no pilot or crew aboard, but they also have the ability to stay over the target for 24 hours or more...there’s an unmanned surface ship that’s in sea trials right now. It’s 132 feet and called the Sea Hunter, and it’s designed to go off and do missions of up to 10,000 miles on a single tank of fuel with no one on board...There’s another interesting project that attempts to take relatively small unmanned aerial vehicles, launch them out of the back of a cargo plane, have them do their mission and recover them in midair and bring them back into the airplane...Another focus will be transportation on roads. The biggest killer of soldiers is improved explosive devices, so if the work is being done on automated convoys...when you talk about unmanned submarines. You just can’t talk to them when they’re below the water. So you have to make sure you have secure communications. That’s a big vulnerability. Also, Artificial Intelligence, China And The U.S. - How The U.S. Is Losing The Technology War. Also also, Chip wars: China, America and silicon supremacy.
- The good thing about starting over is you get to pick a new stack. React Native at Picnic: we decided to build this app in React Native instead...an important requirement for the new application was that there should be no device or operating system lock-in...needs to operate well under uncertain networking conditions. Hence, offline support is very important...we decided to use [TypeScript] instead of Flow...For navigation, we use React Navigation...we use Microsoft CodePush...For state persistence, we use redux combined with redux-persist for offline support...axios as our HTTP client...On the UI-side, we use styled components for styling and storybook to document our UI components. Snapshots are automatically generated for each story by using StoryShots and React Native Storybook Loader...Any argument about syntax, we defer to Prettier. Finally, as the cherry on the cake, we use husky to run pre-commit and pre-push hooks that verify that all code that we check in is up to the standards that we have set for ourselves.
- The ETL pipeline operates a micro-batching window of one minute and processes a few billion events per day. The pipeline runs on our YARN cluster and uses 64 single core containers with 8 GB of memory. Sessionizing Uber Trips in Real Time: We refer to the data underlying each trip as a session, which begins when a user opens the Uber app...A typical trip lifecycle like this might span across six distinct event streams, with events generated by the rider app, driver app, and Uber’s back-end dispatch server. These distinct event streams thread into a single Uber trip...How do we contextualize these event streams so they can be logically grouped together and quickly surface useful information to downstream data applications? The answer lies in defining a time-bounded state machine modeling the flow of different user and server-generated events towards completion of a single task. We refer to this type of state machine, consisting of raw actions, as a “session.”...Putting all the relevant events for our session lifecycle in one place unlocks a wide variety of use cases, such as: Our Demand Modeling team can compare app impressions, Our Forecasting team can see how many sessions are in the Shopping state within a given area during a particular time window...We used Spark Streaming to implement the Rider Session State Machine...we’re looking at moving to Flink due to its deeper support for out-of-box event time processing and wider support at Uber...Clock synchronization: Given the wide array of handsets and variations of mobile operating systems, not to mention user settings, you can never really trust the timestamps sent from mobile clients...Back-pressure and rate limit It uses a PID rate estimator to control the input rate of subsequent batches.
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (article): GPipe is a scalable pipeline parallelism library that enables learning of giant deep neural networks. It partitions network layers across accelerators and pipelines execution to achieve high hardware utilization. It leverages recomputation to minimize activation memory usage. For example, using partitions over 8 accelerators, it is able to train networks that are 25× larger, demonstrating its scalability. It also guarantees that the computed gradients remain consistent regardless of the number of partitions. It achieves an almost linear speedup without any changes in the model parameters: when using 4× more accelerators, training the same model is up to 3.5× faster. We train a 557 million parameters AmoebaNet model and achieve a new state-of-theart 84.3% top-1 / 97.0% top-5 accuracy on ImageNet.
- FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network: FastGRNN then extends the residual connection to a gate by reusing the RNN matrices to match state-of-the-art gated RNN accuracies but with a 2-4x smaller model. Enforcing FastGRNN’s matrices to be low-rank, sparse and quantized resulted in accurate models that could be up to 35x smaller than leading gated and unitary RNNs. This allowed FastGRNN to accurately recognize the “Hey Cortana” wakeword with a 1 KB model and to be deployed on severely resource-constrained IoT microcontrollers too tiny to store other RNN models.
Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.