advertise
Monday
Dec112017

Netflix: What Happens When You Press Play?

 

This article is a chapter from my new book Explain the Cloud Like I'm 10. The first release was written specifically for cloud newbies. I've made some updates and added a few chapters—Netflix: What Happens When You Press Play? and What is Cloud Computing?—that level it up to a couple ticks past beginner. I think even fairly experienced people might get something out of it.

So if you are looking for a good introduction to the cloud or know someone who is, please take a look. I think you'll like it. I'm pretty proud of how it turned out. 

I pulled this chapter together from dozens of sources that were at times somewhat contradictory. Facts on the ground change over time and depend who is telling the story and what audience they're addressing. I tried to create as coherent a narrative as I could. If there are any errors I'd be more than happy to fix them. Keep in mind this article is not a technical deep dive. It's a big picture type article. For example, I don't mention the word microservice even once :-)

 

Netflix seems so simple. Press play and video magically appears. Easy, right? Not so much.

 

Given our discussion in the What is Cloud Computing? chapter, you might expect Netflix to serve video using AWS. Press play in a Netflix application and video stored in S3 would be streamed from S3, over the internet, directly to your device. 

A completely sensible approach…for a much smaller service. 

But that’s not how Netflix works at all. It’s far more complicated and interesting than you might imagine.

To see why let’s look at some impressive Netflix statistics for 2017.

  • Netflix has more than 110 million subscribers.
  • Netflix operates in more than 200 countries. 
  • Netflix has nearly $3 billion in revenue per quarter.
  • Netflix adds more than 5 million new subscribers per quarter.
  • Netflix plays more than 1 billion hours of video each week. As a comparison, YouTube streams 1 billion hours of video every day while Facebook streams 110 million hours of video every day.
  • Netflix played 250 million hours of video on a single day in 2017.
  • Netflix accounts for over 37% of peak internet traffic in the United States.
  • Netflix plans to spend $7 billion on new content in 2018. 

What have we learned? 

Netflix is huge. They’re global, they have a lot of members, they play a lot of videos, and they have a lot of money.

Another relevant factoid is Netflix is subscription based. Members pay Netflix monthly and can cancel at any time. When you press play to chill on Netflix, it had better work. Unhappy members unsubscribe.

Netflix operates in two clouds: AWS and Open Connect.

How does Netflix keep their members happy? With the cloud of course. Actually, Netflix uses two different clouds: AWS and Open Connect. 

Both clouds must work together seamlessly to deliver endless hours of customer-pleasing video.

The three parts of Netflix: client, backend, CDN.

You can think of Netflix as being divided into three parts: the client, the backend, and the CDN. 

The client is the user interface on any device used to browse and play Netflix videos. It could be an app on your iPhone, a website on your desktop computer, or even an app on your Smart TV. Netflix controls each and every client for each and every device. 

Everything that happens before you hit play happens in the backend, which runs in AWS. That includes things like preparing all new incoming video and handling requests from all apps, websites, TVs, and other devices.

Everything that happens after you hit play is handled by Open Connect. Open Connect is Netflix’s custom global content delivery network (CDN). When you press play the video is served from Open Connect. Don’t worry; we’ll talk about what this means later.

Interestingly, at Netflix they don’t actually say hit play on video, they say clicking start on a title. Every industry has its own lingo.

By controlling all three areas—client, backend, CDN— Netflix has achieved complete vertical integration. 

Netflix controls your video viewing experience from beginning to end. That’s why it just works when you click play from anywhere in the world. You reliably get the content you want to watch when you want to watch it. 

Let’s see how Netflix makes that happen.

In 2008 Netflix Started Moving to AWS

Click to read more ...

Friday
Dec082017

Stuff The Internet Says On Scalability For December 8th, 2017

Hey, it's HighScalability time: 


AWS Geek creates spectacular visual summaries.

 

If you like this sort of Stuff then please support me on Patreon. And please recommend my new book—Explain the Cloud Like I'm 10—to those looking to understand the cloud. I think they'll like it.


  • 127 terabytes: per year growth in blockchain if bitcoin wins; 4: hours from tabula rasa to chess god; 1.4 billion: Slack jobs per day; 400: hyperscale data centers worldwide by 2018; 9.8X: Machine Learning Engineer job growth; 14%: Ethereum transactions are for Cryptokitties; 80: seconds per hash on 55 year old IBM 1401 mainframe; $110 billion: app stores spending in 2018; 25: years since first text message; 4,000: AWS code pushes per day; two elephants: of space dust hits earth every day; 

  • Quotable Quotes:
    • @DavidBrin: Now that's what I call engineering! [Voyager 1] Thrusters that haven't been used in 37 years - still reliable!
    • drkoalamanSo despite not supporting other cryptos the majority of my time on the DNM's I think its officially time to step away from bitcoin, at least for the time being. Went to do a direct deal today with a vendor, realized my $250 purchase would end up costing me $315 or so with fees and would still take probably 24 hours to get to him. As of this morning the lowest electrum fee was approx $32 to send coin.... and people reporting at the highest level still not having coin move 12-16 hours later. Vendors are loving this surge but its creating a sellers market and backlogging the blockchain and fees are just crazy... Not to mention not knowing if your $250 will be worth $300 when it gets to the vendor or a random drop in BTC causing it to be less...
    • Alex Lindsay: 30 years ago you couldn’t get cash on Sunday. Now you can send cash on your watch.
    • @prestonjbyrne: “We’re launching on Ethereum” == “100% uptime, unless someone makes a cat app, at which point all bets are off”
    • @GossiTheDog: So I got somebody to talk, without names, about one of the big S3 bucket leaks. A developer set a bucket to open by mistake. They had open S3 bucket monitoring scripts running and got warning emails, which nobody did anything with - nobody had ownership of S3 buckets.
    • @jaksprats: reInvent 2017 Amazon Time Sync Service … Prediction: by reInvent 2018 either they build their own Spanner or they acquire CockroachDB
    • @PatrickMcFadin: 4/ I don’t think we’ll see many more big AWS database announcements after this year. What they have is “good enough” for them and the consensus is they are moving to AI and “everything Alexa” quickly.
    • Eric Horvitz~ in 50 summers, the aviation industry went from canvas flopping on a beach to the Boeing 707...And what is this thing called consciousness, that we use the word consciousness to refer to. Where does that come from? What are these subjective states? We have no idea. We have theories and reflections, but they are not really based in any scientific theories just yet. However, is it possible in 50 summers, we have a whole new world. We have big surprises. We understand how minds work.
    • @martinkl: Google Realtime API is shutting down … — It’s so risky to rely on proprietary services for building apps.
    • @jeremiahg: Equifax’s stock price isn’t recovering post-breach as expected. If the stock remains flat over the next two months, it’ll been interesting to discuss why — what made them different.
    • @xmal: A possible solution to the Fermi Paradox is that any sufficiently advanced civilization is dedicating all its resources to bitcoin mining.
    • @rbhar90: The AI future where megacorporations control enormous datasets and near infinite compute letting them machine learn to predict our every action terrifies me. At NIPS, it's clear this future is nearer rather than further.
    • Sue Hartley: This plant has built this little structure. It's sort of like a barracks for the ant army. And they live inside. When herbevores arrive the ant army comes out and attacks the herbivores...The plant can spend up to 20% of its resources housing and feeding its army.
    • Netflix: With great elasticity comes great responsibility.
    • Michael Widmann: There’s a change in the (NATO) mindset to accept that computers, just like aircraft and ships, have an offensive capability. I need to do a certain mission and I have an air asset, I also have a cyber asset. What fits best for the me to get the effect I want?
    • There are so many more quotes. More. More. More. More...

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Dec052017

Sponsored Post: Symbiont, Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, Zohocorp

Who's Hiring? 

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • On-demand Webinar. Fast & Frictionless - The Decision Engine for Seamless Digital Business. In this session, guest speakers Michele Goetz, Principal Analyst at Forrester Research and Matthias Baumhof, VP Worldwide Engineering at ThreatMetrix, discuss: How risk-based authentication leveraging digital identities is key to empowering customer transactions; How real-time customer trust decisions can reduce fraud and improve customer satisfaction; How a high performance Hybrid Memory Architecture (HMA) database helps continuously evaluate across a multitude of factors to drive decisioning at the lowest operational cost. View now

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • The Practical Guide to Managing Data Science at Scale. The ability to manage, scale, and accelerate an entire data science discipline increasingly separates successful organizations from those falling victim to hype and disillusionment. Download this practical guide for data science management, if you're currently, or aspiring to be, a data science manager. The paper demystifies and elevates the current state of data science management.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Click to read more ...

Monday
Dec042017

The Eternal Cost Savings of Netflix's Internal Spot Market

 

Netflix used their internal spot market to save 92% on video encoding costs. The story of how is told by Dave Hahn in his now annual A Day in the Life of a Netflix Engineer. Netflix first talked about their spot market in a pair of articles published in 2015: Creating Your Own EC2 Spot Market Part 1 and Part 2.

The idea is simple:

  • Netflix runs out of three AWS regions and uses hundreds of thousands of EC2 instances; many are underutilized at various parts in the day.

  • Video encoding is 70% of Netflix’s computing needs, running on 300,000 CPUs in over 1000 different autoscaling groups.

  • So why not create a spot market out of their own underutilized reserved instances to process video encoding?

Before proceeding let's define what a spot market is:

Spot Instances enable you to request unused EC2 instances, which can lower your Amazon EC2 costs significantly. The hourly price for a Spot Instance (of each instance type in each Availability Zone) is set by Amazon EC2, and adjusted gradually based on the long-term supply of and demand for Spot Instances. Your Spot Instance runs whenever capacity is available and the maximum price per hour for your request exceeds the Spot price.

At any point in time AWS has a lot of underutilized instances. It turns out so does Netflix. To understand why creating an internal spot market helped Netflix so much, we'll first need to understand how they encode video.

How Netflix Encodes Video

Click to read more ...

Friday
Dec012017

Stuff The Internet Says On Scalability For December 1st, 2017

Hey, it's HighScalability time: 

  Isn't this all of software? @thomasfuchs: Here we see a group of JavaScript engineers implementing a method that adds two numbers

 

If you like this sort of Stuff then please support me on Patreon. And there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies. 


  • 82%: chance a file on GitHub is a duplicate; 11: new AWS regions; 42%: AWS yearly growth; 1,100: new AWS services in 2017; 300%: year of year growth in Lambda; 00000000: code to launch a Minuteman missile; 100 megawatts in 100 days: biggest battery in the world; 40: months in prison for VW engineer; 3,000 cores: Raspberry Pi cluster; 11: lost cities found by building a database from 4,000-year-old clay tablets; 1.25 million: Riot Games builds per year; 41.78: miles walked at reinvent; 

  • Quotable Quotes:
    • @gigastacey: This FCC is going to destroy net neutrality, strangle competition in media, let wireline providers off the hook for replacing copper with fiber or an equivalent to copper AND kill broadband access for the poor. This is an unprecedented attack on consumers.
    • @randyshoup: “My service is stateless, by which I mean I have state, but I store it somewhere else.” @samnewman #reInvent
    • @StuFlemingNZ: "Hi, I've found a fault with the English language and I need an entomologist" "An etymologist you mean?" "Νo. It's a bug, not a feature"
    • @copyconstruct: The future where "all the code you ever write is business logic" is one that will be facilitated by the huge cloud providers, leaving most infra startups either acquired or in the dust.
    • Mark Callaghan: At high-concurrency mysqld with jemalloc or tcmalloc can get ~4X more QPS on sysbench read-only compared to mysqld with glibc malloc courtesy of memory allocation stalls.
    • @aisipos: AWS Lambda functions can now use top memory size of 3GB. #reinvent2017
    • @cloud_opinion: It feels like AWS is putting more stress on containers than on serverless - Is it because they want to balance long game with short term revenue to fund the retail business? #reInvent
    • @__apf__: "how was your day" "today I parallelized a thing and slowed it down 100x" "you mean sped it up 100x?" "nope"
    • @mipsytipsy: It’s this simple: if you don’t sample, you don’t scale.
    • Daniel Dennett: The key insight, which I’ve known for years, is that we have to get away from the idea of there being the pure ultimate fixed proposition that captures the information in any informational state.
    • @kelseyhightower: I need to put my hands on EKS before I can speak on it, but my initial reaction: this is a good thing for the community and adds weight to the Kubernetes anywhere promise.
    • @Koffie_kopjes: Ok, so far for #Cloud9 It could be a great IDE, but requiring third party cookies.... thought @Werner told developers are the new security team, but if they require third party cookies in 2017, they aren't very aware... ;) #security #reinvent2017
    • @GossiTheDog: I honestly think IT is backsliding in InfoSec across the world at the moment. I’ve said it before, but a decade ago we had two factor VPNs etc - now there’s a massive tilt towards open RDP, AWS keys everywhere etc etc.
    • melissa mcewen: If I won the lottery would I still code? I would, but it would not be like work. It would be projects I enjoyed. And it would be fewer hours.
    • olalonde: I feel like that should be the other way around. When all the "blockchain startups" and ICOs blow up, Bitcoin will be left standing. The true innovation behind the "blockchain" was its decentralised consensus mechanism. That mechanism is only secure as long as no single entity controls over 50% of the hash rate. Some of the largest Bitcoin miners have so much hash rate today that they could attack any (SHA-256 based) blockchain but the Bitcoin one.
    • @ben11kehoe: "The future" in this keynote is apparently 2020, which will still be containers for most customers. #serverless is on a bit longer timeline for the masses #reinvent
    • @irwin: I’ve seen things you people wouldn’t believe. Gopher, Netscape with frames, the first Browser Wars. Searching for pages with AltaVista, pop-up windows self-replicating, trying to uninstall RealPlayer. All those moments will be lost in time, like tears in rain. Time to die. 
    • Andy Jassy: We're just at the beginning of mainstream enterprise mass migration to the cloud...The torrid pace of adoption and innovation in the serverless (Lambda) space has totally blown us away...in fact, he says that if Amazon.com were starting today, it would go serverless
    • Andy Jassy: In our [AWS] business, you have to be able to have access to capital. It's part of why I think it's hard at the scale that we're operating at. It’s hard for others to start from scratch and pursue it because not only do you need hundreds of services to have a competitive offering, but you need large amounts of capital.
    • RightScale: 70 percent of the 104 price points we include in our comparison have gone down since our last comparison in April 2017. Although these comprise a fraction of the total price points, they represent some of the most commonly used instances
    • RightScale: Overall, Azure is the cost leader, with the lowest price across scenarios about 71% of the time with the highest price just 8% of the time. AWS fell in the middle, while surprisingly, Google Cloud had the highest price half the time, 
    • Takashi Nishikawa: The power grid is quite robust against the propagation of failures — perhaps surprisingly robust, when we consider all the complexities involved
    • @vgcerf: Today is the 40th anniversary of the first three-network test of the Internet Protocols: joining ARPANET, Packet Radio and Packet Satellite networks linking the US and Europe!
    • Andy Jassy: What's different is with every successive year, as we launch a thousand plus features and services, we just have the capabilities to make it easier for the rest of the market to use us. So I think the total addressable market for the areas that we touch, which is infrastructure software, hardware and data center services, is trillions of dollars
    • @GossiTheDog: Again: stop paying the ransoms. We’re creating a billion dollar criminal industry instead of, well, setting up backups. We are monetising low skill crime.
    • @cogconfluence: Asked a bunch of mechanical turkers what one question they would ask to determine if they were talking to a human or AI. fave reply: When is the last time your teeth felt like they had little sweaters on them?
    • @EricJorgenson:  I still find this concept absolutely staggering: "On a daily basis, 15 percent of searches -- 500 million -- have never been seen before by Google's search engine, and that has continued for 15+ years"
    • @mijndert: Things I’m most excited about from the @awscloud #reInvent announcements: Fargate, EKS, Launch Templates, Aurora Multi-Master, Aurora Serverless, MediaLive, Inter-Region VPC Peering, and GuardDuty.
    • @0x424c41434b: You are probably tired of hearing me talk about rust but one reason I like it is that, I feel like a better programmer because it takes out that fear of something going wrong. Concentrating on the logic only made me do things much quicker than I did in the past. More confidence
    • Steve Konves: For those of us developers who have a unwavering love for our craft, there will always exist a bias to make decisions based on our passion for coding rather than profitability or cost savings.
    • @dcaoyuan: With new tuned #akka http client, our crawler can fetch and process 300k+ web page per day, 100 millis per year, on a 16 cores 64G memory machine.
    • @benschwarz: When Amazon changed their pricing to per minute billing I implemented an aggressive autoscaling policy. This policy (with tweaks and improvements along the way) has reduced EC2 costs by >30% and improved service dramatically.
    • @CodyBrown: Seriously. I don’t think people quite understand how many lawyers are getting into Space Law right now. Satellite internet is so feasible and the economics are changing. Massssssive terrestrial infrastructure is about to get competition
    • @Silver_Watchdog: The Evolution of Bitcoin 1. It's the future of global payments. The revolution! 2. So what about Mt.Gox and robo traders. Stocks are rigged too. 3. Yes it forks and creates more supply. Your point? 4. Everyone knows it's all traded for speculation and not used for payments. Geez
    • DHH: Etsy corrupted itself when it sold its destiny in endless rounds of venture capital funding. This wasn’t inevitable, it was a choice. One made by founders and executives who found it easier to ask investors for money than to develop the habits and skills to ask customers.
    • @bitfield: “In cloud-native, network issues—mapping IP addresses, latency, retries—are now falling into the lap of developers.”
    • There's more. So much more more more more.

    Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

    Click to read more ...

Tuesday
Nov212017

Sponsored Post: Symbiont, Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, Zohocorp

Who's Hiring? 

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • On-demand Webinar. Fast & Frictionless - The Decision Engine for Seamless Digital Business. In this session, guest speakers Michele Goetz, Principal Analyst at Forrester Research and Matthias Baumhof, VP Worldwide Engineering at ThreatMetrix, discuss: How risk-based authentication leveraging digital identities is key to empowering customer transactions; How real-time customer trust decisions can reduce fraud and improve customer satisfaction; How a high performance Hybrid Memory Architecture (HMA) database helps continuously evaluate across a multitude of factors to drive decisioning at the lowest operational cost. View now

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • The Practical Guide to Managing Data Science at Scale. The ability to manage, scale, and accelerate an entire data science discipline increasingly separates successful organizations from those falling victim to hype and disillusionment. Download this practical guide for data science management, if you're currently, or aspiring to be, a data science manager. The paper demystifies and elevates the current state of data science management.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Click to read more ...

Friday
Nov172017

Stuff The Internet Says On Scalability For November 17th, 2017

Hey, it's HighScalability time: 


The BOSS Great Wall. The largest structure yet found in the universe. Contains 830 galaxies. A billion light years across. 10,000 times the mass of the Milky Way.

 

If you like this sort of Stuff then please support me on Patreon. And there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies. 


  • $25 billion: Alibaba's Singles' Day sales; 6+ million: Slack daily active users; 4ms: boot time for a unikernel based VM; 1 billion: out of date Android devices; 10-20%: increase in RAM prices; 8 million: lines of code in F-35; $3 million: lost by Isaac Newton in the stock market; 30: it's RAID's birthday!; thousands: bugs fixed with Pentagon hackathon; 6+ terabytes: earth satellite data downloaded per day; 

  • Quotable Quotes:
    • Berners-Lee: When I invented the web, I didn’t have to ask Vint Cerf [the ‘father of the internet’] for permission to use the internet
    • Germaine de Stael: Ridicule dries up the imagination.
    • Alex Hudson: A lot of technical write-ups focus on scaling, performance and large-scale systems. It’s definitely interesting to see what problems Netflix have, and how they respond to them. It’s important to understand why Google take decisions in the way they do. However, most of their problems don’t apply to anyone else, and therefore many of the solutions may or may not be appropriate.
    • @jpetazzo: Step functions: they're great, but they don't support dynamic fan out (i.e. invoking an arbitrary number of "sub-lambdas" in parallel).
    • parasubvert: Perhaps one of the lessons of architecture that is missing is to teach people how to evaluate tradeoffs, or in other words, “taste”. I don’t think we’ve ever really had good taste as an industry. Buzzword bingo has always ruled, with some exceptions.
    • Calvin Biesecker: The cost to change one line of code on a piece of avionics equipment is $1 million, and it takes a year to implement. For Southwest Airlines, whose fleet is based on Boeing’s 737, it would “bankrupt” them if a cyber vulnerability was specific to systems on board 737s, he said, adding that other airlines that fly 737s would also see their earnings hurt.
    • @QConSF: @natekupp shares some of Thumbtack's learnings on their journey to scale: from a PHP/PostgreSQL monolith with a self-managed Hadoop cluster, to Dockerized #microservices paired with managed/serverless data infrastructure #qconsf
    • Bail Bloc: Mine Monero, waste electricity, generate CO2 and send less money to a charity than you could have just sent directly! What’s not to like?
    • @Xof: The notion that the only way to be a good programmer is to let it consume your life is toxic.
    • @swardley: AMZN is now worth an IBM + Oracle + CISCO and you'd still have enough  change left over to buy most of VMware. Not bad for a decade of growth.
    • @swardley: I've been a bit gobsmacked by who is using Lambda recently ... there was me thinking that big / traditional enterprise would be testing the waters slowly. How wrong.
    • @crichardson: If GoLang becomes #1 it will primarily due to fashion rather fitness for purpose. It's far too low level/lacking in expressiveness for many kinds of applications. Eg. Enterprise/business applications. It is not what Java's successor should be.
    • Dropbox: IPv6 does show slightly better performance over IPv4. However, without detailed client-side and network information, it is hard to say definitely where the IPv6 performance gain is from.
    • Stack Overflow: Two tags stand out in this analysis, both with tremendous growth, and they have something in common. Swift is Apple’s language for developing iOS apps that is a successor to Objective-C, and the angular tag
    • @Falkvinge: I've said it before and I'm saying it again and again: in order to beat old-world banking, crypto must be at least an order of magnitude better. Old-world banking offers free instant tx between private accts, and 15-cent txs to merchant accounts. Beat that or be obsoleted.
    • Alex Hudson: I want to hear more about projects that deferred decisions and put off architecting until much later in the process. I want to hear more about delivery at real speed. Small pieces of software that are not necessarily interesting but deliver business value are the real heroes in our industry, and the developers who create them the real stars. I especially want to hear more about developers working with systems that have constraints. I want to hear from people pushing standard stuff beyond its limits. I think we grossly underestimate what off-the-shelf systems can do, and grossly overestimate the capabilities of the things we develop ourselves. It’s time to talk much more about real-world, practical, medium-enterprise software architecture.
    • David Gerard: BTC is very clogged at the moment, with around 100,000 unconfirmed transactions as I write this, and peaks of 160,000 a few days ago. Transaction fees peaked at around $20 just to get your transaction through. This wasn’t helped by long delays between blocks, as mining capacity moved to BCH — the time between blocks peaking at 63 minutes a few days ago, on 11 November. fork.lol, which charts the relative profitability of the two, was overloaded and inaccessible. If shutdowns of mining progress in China, then whoever remains in mining will become the power. This is currently divided between Iceland, India, Japan, Georgia and the Czech Republic.
    • linkmotif: There’s a very common ethos that if people just focused on shipping they would somehow magically ship but that’s not how software works. You can’t just will shipping. You need to know what you’re doing.
    • sp527: I had a serious epiphany when I read that Braintree managed to vertically scale a two node (“HA”) Postgres setup to transaction volume in the millions and a massive valuation. Stack Overflow has had a similarly lean footprint for much of its history.
    • @ben11kehoe: This graphic from @googlecloud App Engine is nonsense. GAE literally makes you select instance sizes
    • @danielbryantuk: "Any change made to a complex adaptive system is a gamble. We mitigate risks, but we can't eliminate them" @relix42 #qconsf
    • ivanstepin: Flickr implemented Lanczos algorithm while Discord uses near-neighbor ( much less resource-consuming, but with slightly less quality ) algo. It may turn out that the gpu mem<->cpu mem data transfer can eat all the benefits for such simple algo as near-neighbor scaling.
    • There's more. Lots more.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Monday
Nov132017

Cassandra NoSQL Data Model Design 

We at Instaclustr recently published a blog post on the most common data modelling mistakes that we see with Cassandra. This post was very popular and led me to think about what advice we could provide on how to approach designing your Cassandra data model so as to come up with a quality design that avoids the traps.

There are a number of good articles around that with rules and patterns to fit your data model into: 6 Step Guide to Apache Cassandra Data Modelling and Data Modelling Recommended Practices.

However, we haven’t found a step by step guide to analysing your data to determine how to fit in these rules and patterns. This white paper is a quick attempt at filling that gap.

Phase 1: Understand the data

This phase has two distinct steps that are both designed to gain a good understanding of the data that you are modelling and the access patterns required.

Define the data domain

The first step is to get a good understanding of your data domain. As someone very familiar with relation data modelling, I tend to sketch (or at least think) ER diagrams to understand the entities, their keys and relationships. However, if you’re familiar with another notation then it would likely work just as well. The key things you need to understand at a logical level are:

• What are the entities (or objects) in your data model?
• What are the primary key attributes of the entities?
• What are the relationships between the entities (i.e. references from one to the other)?
• What is the relative cardinality of the relationships (i.e. if you have a one to many is it one to 10 or one to 10,000 on average)?

Basically, these are the same things you’d expect in from logical ER model (although we probably don’t need a complete picture of all the attributes) along with a complete understanding of the cardinality of relationships that you’d normally need for a relational model. An understanding of the demographics of key attributes (cardinality, distribution) will also be useful in finalising your Cassandra model. Also, understand which key attributes are fixed and which change over the life of a record.

Define the required access patterns

Click to read more ...

Friday
Nov102017

Stuff The Internet Says On Scalability For November 10th, 2017

Hey, it's HighScalability time: 


Ah, the good old days. This is how the FBI stored finger prints in 1944. (Alex Wellerstein). How much data? Estimates range from 30GB to 2TB.

 

If you like this sort of Stuff then please support me on Patreon. Also, there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies. 


  • 1 million: times we touch our phones per year; 13 million: lines of Javascript @ Facebook; 256K: RAM needed for TensorFlow on a microcontroller; 2,502%: increase in the sale of ransomware on the dark web; 800 million: monthly Instagram users; 40%: VMs in Azure run Linux; 40%: improved GCP network latency from new SDN stack; 50%: fat content of a woolly mammoth; 

  • Quotable Quotes:
    • Sean Parker: And that means that we [Facebook] need to sort of give you a little dopamine hit every once in a while, because someone liked or commented on a photo or a post or whatever. And that's going to get you to contribute more content, and that's going to get you ... more likes and comments
    • David Gerard: I spent yesterday afternoon on Twitter and /r/buttcoin, giggling. It was a popcorn overload moment for every acerbic cryptocurrency sceptic who ever thought that immutable, unfixable smart contracts were an obviously stupid idea that would continue to end in tears and massive losses, as they so often had previously.
    • @jessfraz: I remember now why I put everything into containers in the first place, it's because all software is 💩
    • Amin Vahdat: What we have found running our applications at Google is that latency is as important, or more important, for our applications than relative bandwidth. It is not just latency, but predictable latency at the tail of the distribution. If you have a hundred or a thousand applications talking to one another on some larger task, they are chatty with one another, exchanging small messages, and what they care about is making a request and getting a response back quickly, and doing so across what might a thousand parallel requests.
    • @SteveBellovin: Why anyone with any significant programming experience--and hence experience with bugs--every liked smart contracts is a mystery to me.
    • Neha Bagri: Startups worship the young. But research shows people are most innovative when they’re older
    • @manisha72617183: OH: I no longer tolerate complicated programming languages. My mental space is like Silicon Valley; rent is high and space is at a premium
    • @atoonk: On days like today, we're yet again reminded that the Internet is held together with duct tape.. #rockSolid #BGP #comcast #outage
    • @bradfitz: 0 days since last high impact bug in an experimental programming language on the Ethereum VM affecting millions of dollars.
    • TheScientist: The genetic, molecular, and morphological diversity of the brain leads to a functional diversification that is likely necessary for the higher-order cognitive processes that are unique to humans.
    • Woods' Theorem: As the complexity of a system increases, the accuracy of any single agent's own model of that system decreases rapidly.
    • Carlos E. Perez: The brain performs compensation when it encounters something it does not expect. It learns how to correct itself through perturbative methods. That’s what Deep Learning systems also do, and it’s got nothing to do with calculating probabilities. It’s just a whole bunch of “infinitesimal” incremental adjustments.
    • @erickschonfeld: “What can one expect of a few wretched wires?”—telegraph skeptic, 1841
    • @ErikVoorhees: The average Bitcoin transaction fee ($10.17) is now more than twice the cost of Bitcoin itself when I first learned of it ($5) in 2011 :(
    • LightShadow: StackOverflow should be one of the first internet companies to accept cryptocurrency micro payments. All they'd have to do is skim a small percentage from people tipping each other pennies for good answers
    • @lworonowicz: I feel like I killed a family dog - had to decommission an old #Solaris server with uptime of 6519 days.
    • Google: Andromeda 2.1 latency improvements come from a form of hypervisor bypass that builds on virtio, the Linux paravirtualization standard for device drivers. Andromeda 2.1 enhancements enable the Compute Engine guest VM and the Andromeda software switch to communicate directly via shared memory network queues, bypassing the hypervisor completely for performance-sensitive per-packet operations.
    • iAfrikan News: The first-ever fiber optic cable with a route between the U.S. And India via Brazil and South Africa will soon be a reality. This is according to a joint provisioning agreement entered into by Seaborn Networks ("Seaborn") and IOX Cable Ltd ("IOX").
    • @iamdevloper: 1969: -what're you doing with that 2KB of RAM? -sending people to the moon 2017: -what're you doing with that 1.5GB of RAM? -running Slack
    • Eric Schmidt: Bob Taylor invented almost everything in one form or another that we use today in the office and at home.
    • @ben11kehoe: I am so on board with CRDT-based data stores providing state to FaaS at the edge. 
    • VMG: “Code is Law” fails again.
    • Paul Frazee: In Bitcoin, acceptance of a change is signaled by the miners - once some percent of the miners agree, the change is accepted. This means that hashing power is used as a measure of voting power, and so the political system is essentially plutocratic. How is that significantly better than the board of a publicly traded company?
    • gtrubetskoy: Professor Tanenbaum is one of the most respected computer scientists alive, and for Intel to include Minix in their chip and not let him know is kind of unprofessional and not very nice to say the least. That is his only (and quite fair) point.
    • jsolson: Both approaches have tradeoffs, although I think even with ENA AWS hits ~70µs typical round-trip-times while GCE gets down to ~40µs. Amazon's largest VMs in some families do advertise higher bandwidth than GCE does currently.
    • @brendangregg: AWS put lots of work into optimizing Xen, including net & disk SR-IOV (direct metal access). But their new optimized KVM is even better.
    • @wheremattisat: “Facebook and Google are proto-AIs and we are their microbiome. The objective function of those AIs today is to make more money” @timoreilly
    • @ossia: "Weeks of programming can save you hours of planning." - Anonymous
    • @sallamar: For kicks, we run over 6.2 billion requests a month on lambda (450% yoy) at @ExpediaEng. Still cheaper than renting an apartment for a year.
    • @Joab_Jackson: At this point,IBM #openwhisk is the most viable #open source #serverless platform—@ryan_sb @thecloudcastnet #podcast
    • Polvi: I think PaaS is dead. That's why you see OpenShift and Cloud Foundry and everyone pivoting to Kubernetes. What's going to happen is PaaS will be reborn as serverless on the other side of the Kubernetes transition.
    • mmgutz: We're running our Debian farm on Azure thanks to startup perks. It's been up 100% for us the last 2.5 years. Azure service is no less or better than AWS.
    • zzzeek: I switch between multiple versions of MySQL and MariaDB all day long. If you aren't using specific things like MySQL's JSON type or NDB storage engine or expecting CHECK constraints to enforce on MySQL (oddly omitted from this feature comparison!), there is nothing different at all from a developer point of view, beyond the default values of flags which honestly change more between MySQL releases than anything else.
    • SEJeff: They [Azure] allow you to have native RDMA[1] for your VMs, something neither amazon or google will give you. As an oldhat Linux/Unix guy, it is somewhat amusing to think of Microsoft's cloud offering as the high perf one, but the facts don't lie. If you have true HPC style workloads such as bioinformatics, oil/natgas exploration, finance, etc, the extra node to node communication bits are necessary. The QDR fabric they have has a native speed of 40 Gbps. It is a shame they don't have FDR (56G) or EDR (100G), but still is quite impressive depending on your app. This also could be a game changer for large MPI jobs.
    • johnnycarcin: I've honestly yet to see a customer moving to Azure who has more than 50% Windows based systems. Almost everyone I've worked with only uses Windows Server for their SQL Server services, outside of that it's RHEL, CentOS or Ubuntu.
    • lurchedsawyer: So to answer your question as to what is needed for Azure to become a viable alternative to AWS: I would say about 10 years.
    • @mjpt777: If Google thinks latency trumps bandwidth then they should look to software before hardware for the main source of latency.
    • Ben Kehoe: Like so many things in life, serverless is not an all-or-nothing proposition. It’s a spectrum — and more than that, it has multiple dimensions along which the degree of serverlessness can vary
    • zzzeek: If I was doing brand new development somewhere I'm sure I'd use Postgresql, since from a developer point of view it's the most consistent and flexible. While for the last few years I've worked way more with MySQL / MariaDB and at the moment the MySQL side of things is a bit more familiar to me, I still appreciate PG's vastly superior query planner and index features.
    • There's more. Much more. Click through for more. More. More. More.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Nov072017

Sponsored Post: Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, Zohocorp

Who's Hiring? 

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • On-demand Webinar. Fast & Frictionless - The Decision Engine for Seamless Digital Business. In this session, guest speakers Michele Goetz, Principal Analyst at Forrester Research and Matthias Baumhof, VP Worldwide Engineering at ThreatMetrix, discuss: How risk-based authentication leveraging digital identities is key to empowering customer transactions; How real-time customer trust decisions can reduce fraud and improve customer satisfaction; How a high performance Hybrid Memory Architecture (HMA) database helps continuously evaluate across a multitude of factors to drive decisioning at the lowest operational cost. View now

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • The Practical Guide to Managing Data Science at Scale. The ability to manage, scale, and accelerate an entire data science discipline increasingly separates successful organizations from those falling victim to hype and disillusionment. Download this practical guide for data science management, if you're currently, or aspiring to be, a data science manager. The paper demystifies and elevates the current state of data science management.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Click to read more ...