Entries by HighScalability Team (1576)

Friday
Jan042013

Stuff The Internet Says On Scalability For January 4, 2013

It's HighScalability time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan022013

Why Pinterest Uses the Cloud Instead of Going Solo - To Be Or Not To Be

I love it when a somewhat older article like Pinterest Cut Costs From $54 To $20 Per Hour By Automatically Shutting Down Systems hits Hacker News and generates a good conversation. One of the common sentiments about the cloud was raised: why doesn't Pinterest save a lot of money by running their own hardware instead of using the cloud?

Ryan Park, Operations Engineer at Pinterest, responds in what I think is the perfect modern response to that ultimate existential design dilemma:

Click to read more ...

Monday
Dec312012

Designing for Resiliency will be so 2013

A big part of engineering for a quality experience is bringing in the long tail. An improbable severe failure can ruin your experience of a site, even if your average experience is quite good. That's where building for resilience comes in. Resiliency used to be outside the realm of possibility for the common system. It was simply too complex and too expensive.

An evolution has been underway, making 2013 possibly the first time resiliency is truly on the table as a standard part of system architectures. We are getting the clouds, we are getting the tools, and prices are almost low enough.

Even Netflix, real leaders in the resiliency architecture game, took some heat for relying completely on Amazon's ELB and not having a backup load balancing system, leading to a prolonged Christmas Eve failure. Adrian Cockcroft, Cloud Architect at Netflix, said they've investigated creating their own load balancing service, but that "we try not to invest in undifferentiated heavy lifting."

So resiliency is still not part of the standard package. There's an ROI calculation that has to be made. Yet the path Netflix would have to take in creating a hybrid architecture is fairly clear, Netflix prefers to concentrate on features rather than long tail events. That's a big difference. At one time designing for resiliency would have been unthinkable, now it's becoming a choice. 

A good New Year's resolution might be to learn more about resilience. It's a new way of thinking compared to straightforward high availability. It's a full stack, full team, full system, environment centric mode of thought.

Fortunately, Dr. Richard Cook, Professor of Healthcare Systems Safety and Chairman of the Department of Patient Safety at the Kungliga Techniska Hogskolan, has been thinking about resilience for a long time. And he gave a fascinating talk: How Complex Systems Fail on resilience, that is just detailed enough to be practical and high level enough to inspire new directions.

Here's a gloss of the essentials from his talk:

Why Don’t Systems Fail More Often?

Click to read more ...

Friday
Dec282012

Stuff The Internet Says On Scalability For December 28, 2012

It's HighScalability time:

  • 306 items per second: Orders on Amazon
  • Quotable Quotes:
    • @hackofalltrades: When positive change is only viewed through its scalability, bad things happen.
    • @faizanj: Is it time for #Netflix to move to a hybrid cloud architecture similar to Zynga zCloud?
    • @adrianco: we try not to invest in undifferentiated heavy lifting
    • @qui_oui: "scalability": a word that makes me think of how likely you are to have the ability to grow scales.
    • @Ninad_M: The question is, is #antifragile conceptually opposite of #bigdata
    • @pbailis: Batch your disk/network IO, kernel interrupts, customer package shipments -> delay arrival but increase efficiency
    • @Carnage4Life: One lesson that is hard for people to learn. Knowing that something occurred is different from knowing why it occurred
  • The best tech documentation both informs about the technology and teaches the wider context in which it plays a part. That fits the 400+ page Akka Documentation perfectly. In it you'll find excellent information on actors and the various architectures that can be created with them. Much to learn here. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Dec262012

Ask HS: What will programming and architecture look like in 2020?

This topic has been ripped directly from Lambda the Ultimate's What will programming look like in 2020? post. They are having a lively discussion and if you are interested in flexing your holiday thought muscles we might have a good discussion too.

Eight years is a difficult prediction horizon. It's too short to simply project out current trends and it's too long to discount potential technological breakthroughs coming to market. There's the challenge.

Some of my lousy predictions: 

  • Programmers Will Form Guilds Around New Gamified Training Hubs
  • The Web Will Become More Closed Before it Becomes More Open
  • Not Everyone Will Become a Programmer
  • Focus Will Shift to Creating Bigger People Instead of Chasing Big Ideas

Programmers Will Form Guilds Around New Gamified Training Hubs

Click to read more ...

Tuesday
Dec252012

Sponsored Post: Flurry, Rumble Games, Duolingo, Booking, aiCache, Teradata Aster, Hadapt, Aerospike, Percona, ScaleOut, New Relic, NetDNA, GigaSpaces, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Flurry has built large-scale app measurement and advertising services that are used by more than 80,000 media companies and independent developers to monetize mobile and related platforms. If you're interested in joining a thriving, growing team, please check us out.
  • Rumble Games is looking for a Senior Platform Engineer to build massively scalable and shared services for the next generation of online games. We have the best team this industry has seen, and we will transform the way people play together. Join us.
  • Duolingo, a fast-growing (>11% per week), free (no ads, no fees, no subscriptions) language learning site is looking for an infrastructure engineer to scale Duolingo to millions of users, please apply here.
  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html
  • Teradata Aster is looking for Distributed Systems, Analytic Applications,  and Performance Architects. As a member of the Architecture Group you will help define the technical roadmap for the product.
  • Hadapt is looking for software engineers. Come shape a cutting-edge technology while working in the fun, collaborative environment of a fast-paced start-up. 
  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • Aerospike: Two Trillion Transactions per month...100 million stored user profiles...25% of all video ads processed on the internet - mere realities of success for Aerospike customers. Industry leaders reveal their secrets
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • Follow the Cloudify blog to learn more about our open source PaaS stack – latest integration recipes, builds, features, and other cool stuff.  Visit the GigaSpaces blog to learn how to take your application to the next level of scalability and performance.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Friday
Dec212012

Stuff The Internet Says On Scalability For December 21, 2012

We at HighScalability are betting the over on the whole Mayan end of the world thingy:

  • 200M: monthly active Twitterers; 120: number of Netflix reencodings; 1.2 Million Years: Pr0n Watched Since 2006; 100M: Google Core-Hours Awarded to Science
  • Quotable Quotes:
    • @shipilev: I've settled on saying that if performance is the scalar field in state space, then scalability is just it's gradient.
    • @AndiMann: "Only 1% of #Amazon users should care about #cloud scalability, elasticity". Brilliant! 
    • @Guerrero_FJ: Always remember: 'scalability problems should be solved when there are scalability problems.' #leanstartup
  • Santa's Architecture: It's a little known fact that Santa Clause was an early queue innovator. Faced with the problem of delivering a planet full of presents in one night, Santa, in his hacker's workshop, created a Present Distribution System using thousands of region based priority present queues for continuous delivery by the Rudolphs. Rudolphs? You didn't think there was only one Rudolph did you? Presents are delivered in parallel by a cluster of sleighs, each with redundant reindeer in a master-master configuration. Each Rudolph is a cluster leader and they coordinate work using an early and more magical version of the ZooKeeper protocol.
  • ...

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Tuesday
Dec182012

Georeplication: When Bad Things Happen to Good Systems

Georeplication is one of the standard techniques for dealing when bad things--failure and latency--happen to good systems. The problem is always: how do you do that? Murat Demirbas, Associate Professor at SUNY Buffalo, has a couple of really good posts that can help: MDCC: Multi-Data Center Consistency and Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary

In MDCC: Multi-Data Center Consistency Murat discusses a paper that says synchronous wide-area replication can be feasible. There's a quick and clear explanation of Paxos and various optimizations that is worth the price of admission. We find that strong consistency doesn't have to be lost across a WAN:

The good thing about using Paxos over the WAN is you /almost/ get the full CAP  (all three properties: consistency, availability, and partition-freedom). As we discussed earlier (Paxos taught), Paxos is CP, that is, in the presence of a partition, Paxos keeps consistency over availability. But, Paxos can still provide availability if there is a majority partition. Now, over a WAN, what are the chances of having a partition that does not leave a majority? WAN has a lot of redundancy. While it is possible to have a data center partitioned off the Internet due to a calamity, what are the chances of several knocked off at the same time. So, availability is also looking good for MDCC protocol using Paxos over WAN.

In Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary Murat describes a paper that tries to hide the price of WAN latency for some classes of operations. In particular:

To alleviate this latency versus consistency tension, this paper proposes RedBlue consistency, which enables blue operations to be fast/asynchronous (and eventually consistent) while the remaining red operations are strongly-consistent/synchronous (and slow). So a program is partitioned into red and blue operations, which run with different consistency levels. While red operations must be executed in the same order at all sites (which make them slow), the order of execution of blue operations can vary from site to site (allowing them to be executed without requiring coordination across sites). "In systems where every operation is labeled red, RedBlue consistency is equivalent to serializability; in systems where every operation is labeled blue, RedBlue consistency allows the same set of behaviors as eventual consistency."

Just a little fun holiday reading :-)

Murat also has number of excellent posts that are a great boon for understanding the innards of distributed systems:

Click to read more ...

Monday
Dec172012

11 Uses For the Humble Presents Queue, er, Message Queue

It's a little known fact that Santa Clause was an early queue innovator. Faced with the problem of delivering a planet full of presents in one night, Santa, in his hacker's workshop, created a Present Distribution System using thousands of region based priority present queues for continuous delivery by the Rudolphs. Rudolphs? You didn't think there was only one Rudolph did you? Presents are delivered in parallel by a cluster of sleighs, each with redundant reindeer in a master-master configuration. Each Rudolph is a cluster leader and they coordinate work using an early and more magical version of the ZooKeeper protocol.

Programmers have followed Santa's lead and you can find a message queue in nearly every major architecture profile on HighScalability. Historically they may have been introduced after a first generation architecture needed to scale up from their two tier system into something a little more capable (asynchronicity, work dispatch, load buffering, database offloading, etc). If there's anything like a standard structural component, like an arch or beam in architecture for software, it's the message queue. 

An article from Iron.io, Top 10 Uses For A Message Queue, has nice summary of why message queues are so dang useful:

Click to read more ...

Friday
Dec142012

Stuff The Internet Says On Scalability For December 14, 2012

In a hole in the Internet there lived HighScalability:

  • $140 Billion: trivial cost of Google fiber everywhere; 5,200 GB: data for every person on Earth; 6 hours: time it takes for a 25-GPU cluster to crack all the passwords; 
  • Quoteable Quotes:
    • hnriot: Good architecture eliminates the need for prayer.
    • @adrianco: we break AWS, they fix it. Stuff that's breaking now is mostly stuff other clouds haven't got to yet.
    • Scalability Rules: Design for 20x capacity. • Implement for 3x capacity. • Deploy for ~1.5x capacity.
  • Fast typing Aaron Delp with his AWS re:Invent Werner Vogel Keynote Live Blog.  Some key points: Decompose into small loosely coupled, stateless building blocks; Automate your application and processes; Let Business levers control the system; Architect with cost in mind; Protecting your customer is the first priority; In production, deploy to at least two availability zones; Integrate security into your application from the ground up; Build, test, integrate and deploy continuously; Don't think in single failures; Assume Nothing.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...