advertise
Monday
Oct202014

Facebook Mobile Drops Pull For Push-based Snapshot + Delta Model

We've learned mobile is different. In If You're Programming A Cell Phone Like A Server You're Doing It Wrong we learned programming for a mobile platform is its own specialty. In How Facebook Makes Mobile Work At Scale For All Phones, On All Screens, On All Networks we learned bandwidth on mobile networks is a precious resource. 

Given all that, how do you design a protocol to sync state (think messages, comments, etc.) between mobile nodes and the global state holding servers located in a datacenter?

Facebook recently wrote about their new solution to this problem in Building Mobile-First Infrastructure for Messenger. They were able to reduce bandwidth usage by 40% and reduced by 20% the terror of hitting send on a phone.

That's a big win...that came from a protocol change.

Facebook Messanger went from a traditional notification triggered full state pull:

Click to read more ...

Friday
Oct172014

Stuff The Internet Says On Scalability For October 17th, 2014

Hey, it's HighScalability time:


What could this be? Swarms of drones painting 3D light sculptures against the night sky!
  • Quotable Quotes:
    • Visnja Zeljeznjak: Steve Jobs' product pricing formula: cost of materials x 3 + 33%
    • Benedict Evans: We now have over 2bn iOS and Android devices on earth, and this will grow in the next few years to well over 3bn.
    • @ClearStoryData: It's true! Avg beer drinker attracts 4.4% more Mosquitos than water drinker #Strataconf
    • Leslie Lamport: The core idea of the problem of that notion of causality came about because of my familiarity with special relativity...where one event could causally effect another depended on weather or not information from one could physically reach the other.
    • @laurelatoreilly: Fascinating session about cargo ships going dark to shift market prices #IoT #strataconf "your decisions are only as good as your data"
    • @muratdemirbas: Distributed/decentralized coordination is expensive & hard to scale. Centralized coordination is cheap & scales easily using hierarchies.
    • @froidianslip: ”Kafka is awesome. We heard it cures cancer." -- @gwenshap #Strataconf
    • @timoreilly: RT @grapealope: The self-driving car has 6000 sensors, and takes readings at 4Hz. That's a lot of data. @MCSrivas #strataconf #MapR
    • @froidianslip: Love the paraphrase borrowed from Ray Bradbury, "Any sufficiently complex configuration is indistinguishable from code." #Strataconf
    • @matei_zaharia: Spark shatters MapReduce's 100 TB and 1 PB sort records... with 10x fewer nodes
    • @msallstr: “Synchronous calls in this environment are the crystal meth of programming”  @mjpt777 on the new   reactive manifesto 
    • @postwait: “If you put them under enough stress, perfectly rational people will panic and start believing in science” #priceless
    • Ilya Grigorik: It's great to see access from mobile is around 30% faster compared to last year.
    • @ryandotsmith: Recently migrated an async system to SQS. Much simple. Tiny latency. Here is the code (maybe a gem?)

  • People just don't appreciate the power of messy. The problematic culture of "Worse is Better". There's an implied notion here that people can't recognize better when they see it. Better is not a platonic ideal. It can't be proved by argument. Better, like evolution, is something that works itself out in practice. Like evolution, Worse is Better is an algorithm for stepping through a possibility space by jumping from one working phenotype to the next more adapted working phenotype. And for many, that's better. Not Ideal, but Better.

  • The Times They Are a-Changin'. Docker and Microsoft partner to drive adoption of distributed applications. What's the goal? nickstinemates: Package your Windows app in a docker container, use same tooling you would otherwise use to deploy to a docker engine running on a Windows host. Package your Linux app in a docker container, use same tooling you would otherwise use to deploy to a docker engine running on a Linux host.

  • Leandro Pereira writes a fine autobiography in Life of a HTTP request, as seen by my toy web server. All the stages of life are there. Socket creation. Acceptance. Scheduling. Coroutines. Reading requests. Parsing requests. All the way to the reply and the death of the connection. A lot to learn if you want to look at the simplified internals of a service.

  • Wonderful talk: Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions. Kyle Kingsbury takes a detailed look at different partition problems in different databases. There are split brains. Masters dying. Lost data. General network mayhem. It's great. The lesson: what's written down in the marketing documentation is not always what you get. Test your application and see what really happens. The world is not simple. A dumb solution where you understand the failure modes can be a good choice.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Oct152014

Using a SSD Cache in Front of EBS Boosted Throughput by 50%, for Free

Using EBS has lots of advantages--reliability, snapshotting, resizing--but overcoming the performance problems by using Provisioned IOPS is expensive. 

Swrve, an integrated marketing and A/B testing and optimization platform for mobile apps, did something clever. They are using the c3.xlarge EC2 instances, that have two 40GB SSD devices per instance, as a cache.

They found through testing RAID-0 striping using a 4-way stripe along with enhanceio, effectively increased throughput by over 50%, for free. With no filesystem corruption problems.

How is it free? "We were planning on upgrading to the C3 class of instance anyway, and sticking with EBS as the backing store. Once you’re using an instance which has SSD ephemeral storage, there are no additional fees to use that hardware."

For great analysis, lots of juicy details, graphs, and configuration commands, please take a look at How we increased our EC2 event throughput by 50%, for free

Tuesday
Oct142014

Sponsored Post: Apple, Hypertable, VSCO, Gannett, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Senior Engineer: Mobile Services. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here.
    • Apple Pay Automation Engineer. The Apple Pay group within iOS Systems is looking for a outstanding automation engineer with strong experience in building client and server test automation. We work in an agile software development environment and are building infrastructure to move towards continuous delivery where every code change is thoroughly tested by push of a button and is considered ready to be deployed if we choose so. Please apply here
    • Site Reliability Engineer. As a member of the Apple Pay SRE team, you’re expected to not just find the issues, but to write code and fix them. You’ll be involved in all phases and layers of the application, and you’ll have a direct impact on the experience of millions of customers. Please apply here.
    • Software Engineering Manager. In this role, you will be communicating extensively with business teams across different organizations, development teams, support teams, infrastructure teams and management. You will also be responsible for working with cross-functional teams to delivery large initiatives. Please apply here

  • VSCO. Do you want to: ship the best digital tools and services for modern creatives at VSCO? Build next-generation operations with Ansible, Consul, Docker, and Vagrant? Autoscale AWS infrastructure to multiple Regions? Unify metrics, monitoring, and scaling? Build self-service tools for engineering teams? Contact me (Zo, zo@vs.co) and let’s talk about working together. vs.co/careers.

  • Gannett Digital is looking for talented Front-end developers with strong Python/Django experience to join their Development & Integrations team. The team focuses on video, user generated content, API integrations and cross-site features for Gannett Digital’s platform that powers sites such as http://www.usatoday.com, http://www.wbir.com or http://www.democratandchronicle.com. Please apply here.

  • Platform Software Engineer, Sprout Social, builds world-class social media management software designed and built for performance, scale, reliability and product agility. We pick the right tool for the job while being pragmatic and scrappy. Services are built in Python and Java using technologies like Cassandra and Hadoop, HBase and Redis, Storm and Finagle. At the moment we’re staring down a rapidly growing 20TB Hadoop cluster and about the same amount stored in MySQL and Cassandra. We have a lot of data and we want people hungry to work at scale. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

  • November TokuMX Meetups for Those Interested in MongoDB. Join us in one of the following cities in November to learn more about TokuMX and hear TokuMX use cases. 11/5 - London;11/11 - San Jose; 11/12 - San Francisco. Not able to get to these cities? Check out our website for other upcoming Tokutek events in your area - www.tokutek.com/events.

Cool Products and Services

  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. Scalyr replaces all your monitoring and log management services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. Engineers say it's powerful and easy to use. Customer support teams use it to troubleshoot user issues. CTO's consider it a smart alternative to Splunk, with enterprise-grade functionality, sane pricing, and human support. Trusted by in-the-know companies like Codecademy – learn more!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Oct132014

How League of Legends Scaled Chat to 70 million Players - It takes Lots of minions.

How would you build a chat service that needed to handle 7.5 million concurrent players, 27 million daily players, 11K messages per second, and 1 billion events per server, per day?

What could generate so much traffic? A game of course. League of Legends. League of Legends is a team based game, a multiplayer online battle arena (MOBA), where two teams of five battle against each other to control a map and achieve objectives.

For teams to succeed communication is crucial. I learned that from Michal Ptaszek, in an interesting talk on Scaling League of Legends Chat to 70 million Players (slides) at the Strange Loop 2014 conference. Michal gave a good example of why multiplayer team games require good communication between players. Imagine a basketball game without the ability to call plays. It wouldn’t work. So that means chat is crucial. Chat is not a Wouldn’t It Be Nice feature.

Michal structures the talk in an interesting way, using as a template the expression: Make it work. Make it right. Make it fast.

Making it work meant starting with XMPP as a base for chat. WhatsApp followed the same strategy. Out of the box you get something that works and scales well...until the user count really jumps. To make it right and fast, like WhatsApp, League of Legends found themselves customizing the Erlang VM. Adding lots of monitoring capabilities and performance optimizations to remove the bottlenecks that kill performance at scale.

Perhaps the most interesting part of their chat architecture is the use of Riak’s CRDTs (commutative replicated data types) to achieve their goal of a shared nothing fueled massively linear horizontal scalability. CRDTs are still esoteric, so you may not have heard of them yet, but they are the next cool thing if you can make them work for you. It’s a different way of thinking about handling writes.

Let’s learn how League of Legends built their chat system to handle 70 millions players...

Stats

Click to read more ...

Friday
Oct102014

Stuff The Internet Says On Scalability For October 10th, 2014

Hey, it's HighScalability time:


Social climber: Instagram explorer scales to new heights in New York.

 

  • 11 billion: world population in 2100; 10 petabytes: Size of Netflix data warehouse on S3; $600 Billion: the loss when a trader can't type; 3.2: 0-60 mph time of probably not my next car.
  • Quotable Quotes:
    • @kahrens_atl: Last week #NewRelic Insights wrote 618 billion events and ran 237 trillion queries with 9 millisecond response time #FS14
    • @sustrik: Imagine debugging on a quantum computer: Looking at the value of a variable changes its value. I hope I'll be out of business by then.
    • Arrival of the Fittest: Solving Evolution's Greatest Puzzle: Every cell contains thousands of such nanomachines, each of them dedicated to a different chemical reaction. And all their complex activities take place in a tiny space where the molecular building blocks of life are packed more tightly than a Tokyo subway at rush hour. Amazing.
    • Eric Schmidt: The simplest outcome is we're going to end up breaking the Internet," said Google's Schmidt. Foreign governments, he said, are "eventually going to say, we want our own Internet in our country because we want it to work our way, and we don't want the NSA and these other people in it.
    • Antirez: Basically it is neither a CP nor an AP system. In other words, Redis Cluster does not achieve the theoretical limits of what is possible with distributed systems, in order to gain certain real world properties.
    • @aliimam: Just so we can fathom the scale of 1B vs 1M: 1,000,000 seconds is 11.5 days. 1,000,000,000 seconds is 31.6 YEARS
    • @kayousterhout: 92% of catastrophic failures in distributed data-intensive systems caused by incorrect error handling https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf … #osdi14
    • @DrQz: 'The purpose of computing is insight, not numbers.' (Hamming) Sometimes numbers ARE the insight so, make them accesible too. (Me)

  • Robert Scoble on the Gillmor Gang said that because of the crush of signups, ello had to throttle invites. Their single PostgreSQL server couldn't handle it captain.

  • Containers are getting much larger with new composite materials. Not that kind of container. Shipping containers. High oil costs have driven ships carrying 5000 containers to evolve. Now they can carry 18,000 and soon 19,000 containers!

  • If you've wanted to make a network game then this is a great start. Making Fast-Paced Multiplayer Networked Games is Hard: Fast-paced multiplayer games over the Internet are hard, but possible. First understanding your constraints then building within them is essential. I hope I have shed some light on what those constraints are and some of the techniques you can use to build within them. No doubt there are other ways out there and ways yet to be used. Each game is different and has its own set of priorities. Learning from what has been done before could help a great deal.

  • Arrival of the Fittest: Solving Evolution's Greatest Puzzle: Environmental change requires complexity, which begets robustness, which begets genotype networks, which enable innovations, the very kind that allow life to cope with environmental change, increase its complexity, and so on, in an ascending spiral of ever-increasing innovability...is the hidden architecture of life.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Oct082014

That's Not My Problem - I'm Renting Them

Scott Hanselman gives a hilarious and insightful talk in Virtual Machines, JavaScript and Assembler, a keynote at Velocity Santa Clara 2014. The topic of his talk is an intuitive understanding of the cloud and why it's the best thing ever. 

At about 6:30 into the video Scott is at his standup comic best when he recounts a story of a talk Adrian Cockroft gave on Netflix’s move to SSDs. An audience member energetically questioned the move to SSDs saying they had high failure rates and how moving to SSDs was a stupid idea.

To which Mr. Cockroft replies:

That's not my problem, I'm renting them.

Scott selected the ideal illustration of the high level of abstraction the cloud provides. If you are new to the cloud that's a very hard idea to grasp. "That's not my problem, I'm renting them" is the perfect mantra when you find yourself worried about things you don't need to be worried about anymore.

Monday
Oct062014

How Clay.io Built their 10x Architecture Using AWS, Docker, HAProxy, and Lots More

This is a guest repost by Zoli Kahan from Clay.io. 

This is the first post in my new series 10x, where I share my experiences and how we do things at Clay.io to develop at scale with a small team. If you find these things interesting, we're hiring - zoli@clay.io.

The Cloud

CloudFlare

CloudFlare

CloudFlare handles all of our DNS, and acts as a distributed caching proxy with some additional DDOS protection features. It also handles SSL.

Amazon EC2 + VPC + NAT server

Click to read more ...

Friday
Oct032014

Stuff The Internet Says On Scalability For October 3rd, 2014

Hey, it's HighScalability time:


Is the database landscape evolving or devolving?

 

  • 76 million: once more through the data breach; 2016: when a Zettabyte is transfered over the Internet in one year
  • Quotable Quotes:
    • @wattersjames: Words missing from the Oracle PaaS keynote: agile, continuous delivery, microservices, scalability, polyglot, open source, community #oow14
    • @samcharrington: At last count, there were over 1,000,000 million containers running in the wild. http://stats.openvz.org  @jejb_ #ccevent
    • @mappingbabel: Oracle's cloud has 30,000 computers. Google has about two million computers. Amazon over a million. Rackspace over 100,000.
    • Andrew Auernheimer: The world should have given the GNU project some money to hire developers and security auditors. Hell, it should have given Stallman a place to sleep that isn't a couch at a university. There is no f*cking justice in this world.
    • John Nagle: The right answer is to track wins and losses on delayed and non-delayed ACKs. Don't turn on ACK delay unless you're sending a lot of non-delayed ACKs closely followed by packets on which the ACK could have been piggybacked. Turn it off when a delayed ACK has to be sent. I should have pushed for this in the 1980s.
    • @neil_conway: The number of < 15 node Hadoop clusters is >> the number of > 15 node Hadoop clusters. Unfortunately not reflected in SW architecture.

  • In the meat world Google wants devices to talk to you. The Physical Web. This will be better than Apple's beacons because Apple is severely limiting the functionality of beacons by requiring IDs be baked into applications. It's a very static and controlled world. In other words, it's very Apple. By using URLs Google is supporting both the web and apps; and adding flexibility because a single app can dynamically and generically handle the interaction from any kind of device. In other words, it's very Google. Apple has the numbers though, with hundreds of millions of beacon enabled phones in customer hands. Since it's just another protocol over BLE it should work on Apple devices as well.

  • Did Netflix survive the great AWS rebootathon? The Chaos Monkey says yes, yes they did: Out of our 2700+ production Cassandra nodes, 218 were rebooted. 22 Cassandra nodes were on hardware that did not reboot successfully. This led to those Cassandra nodes not coming back online. Our automation detected the failed nodes and replaced them all, with minimal human intervention. Netflix experienced 0 downtime that weekend. 

  • Google Compute Engine is following Moore's Law by announcing a 10% discount. Bandwidth is still expensive because networks don't care about silly laws. And margin has to come from somewhere.

  • Software is eating...well you've heard it before. Mesosphere cofounder envisions future data center as ‘one big computer’: The data center of the future will be fully virtualized, with everything from power supplies to storage devices consolidated into a single pool and managed by software, according to an executive whose company intends to lead the way.

  • Companies, startups, hacker spaces, teams are all intentional communities. People choose to work together towards some end. A consistent group killer is that people can be really sh*tty to each other. There's a lot of work that has been done around how to make intentional communities work. Holacracy is just one option. Here's a really interesting interview with Diana Leafe Christian on what makes communities work. It requires creating Community Glue, Good Process and Communication Skill, Effective Project Management, and good Governance and Decision making. Which is why most communities fail. Did you know there's even something called Non-Defensive Communication? If followed the Internet would collapse.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Sep302014

Sponsored Post: Apple, Flipboard, All Your Base, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Siri Operations Developer. Apple is looking for talented developers to help build the next generation internal cloud platform for Siri. This person should be excited about solving difficult distributed systems problems as well as constantly improving user-experience. This person will be working with a highly technical and motivated team solving the hard problems. Please apply here.
    • Site Reliability Engineer. The Apple Pay Site Reliability Team is hiring for multiple roles focused on the front line customer experience and the back end integration of Apple systems with our Network and Banking partners. Please apply here.
    • Senior Software Engineer, iTunes Infrastructure. Hands-on senior software engineering for the iTunes digital media supply chain engineering team. We are looking for a self starting, energetic individual who is not afraid to question assumptions and with excellent written and oral communication skills. Please apply here
    • iTunes - Content Management Tools Engineer. The candidate should have several years experience developing large-scale web-based applications using object-oriented languages. Excellent understanding of relational databases and data-modeling techniques is also a must. Please apply here
    • Senior Engineer: Mobile Services. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here.
    • Apple Pay Automation Engineer. The Apple Pay group within iOS Systems is looking for a outstanding automation engineer with strong experience in building client and server test automation. We work in an agile software development environment and are building infrastructure to move towards continuous delivery where every code change is thoroughly tested by push of a button and is considered ready to be deployed if we choose so. Please apply here

  • Flipboard's Site Reliability Engineering Team is hiring! This team offers great challenges solving unique problems unlike any you have seen!  They work exclusively in the cloud, ensuring a highly available and performant product to millions of users daily.  If you have a passion for large-scale systems, next generation provisioning and orchestration tools apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • All Your Base is the only curated database conference of its kind in the UK. Listen to talks from database creators, industry leaders and developers working at the coal face on where to store and how to handle your data. Book tickets.

Cool Products and Services

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Better, Faster, Cheaper: Pick Three. Scalyr is your universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs”; our columnar data store enables enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – get on board!

  • Are You a Startup in need of Reliable Speed at Scale? The Aerospike Startup Special provides free access to the Aerospike Enterprise Edition software with Community Support for qualifying startups. Learn more and see if you qualify! http://www.aerospike.com/blog/special-startups/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...