Entries by HighScalability Team (1576)

Friday
Mar152013

Stuff The Internet Says On Scalability For March 15, 2013

Hey, it's HighScalability time: 

  • 0: # of Google Readers; 2.5 billion/day: new pieces of content added to Facebook; 2.7 billion/day: likes added to Facebook; 7PB/month: photos added to Facebook
  • Quotable Quotes:
    • @cwgem: It seems like cutting down API access is the stock scalability answer these days
    • @abenik: @Prismatic surfaced this article on their architecture for me. How meta.
    • @NewsBlur: The waters are rocky now, but take note that I have some time to get things right. I'm working this week to get things stable, then scale.
    • @Pinboard: Just learned that Google Reader no longer offers direct JSON export. I guess they held the annual "What should we ruin next?" staff retreat
    • @DEVOPS_BORAT: You can not able have unlimit scalability without unlimit outage.
    • Jeff: Amazon RDS Scales Up - Provision 3 TB and 30,000 IOPS Per DB Instance
    • @migueldeicaza: Google recently hired all of the Twitter's scalability team to work on Google IO checkout.
    • @skamille: Interesting to consider the greatly diminished role of networked file systems in modern distributed computing
    • @vambenepe: The server huggers have regrouped. Now they’re VM huggers, ironically. Fighting PaaS with all their might.
    • @jezhumble: If the developers can't self-service everything they need programmatically through an API, it's not a private cloud.
    • @Bremmel: Foursquare users crawl the real world like Google's spiders crawl the web - Dennis Crowley
    • @mollstam: SimCity's API (and I'm guessing region storage) is on Amazon. How can it not be auto-scaling? How can it take three days to add servers?
    • @josephmartz: Scalability gurus: It's about low coupling merging with high cohesion. More encapsulation and f*ck scaling out. I just want one #node.
    • @SQLSniper: great recipe for #sqlserver scaling from @GlennAlanBerry precon :) "scale up is like pets, scale out is like cattle" 
  • @NewsBlur's tweet feed is a great blow by blow of the crush that happened when Google Reader became a dead app walking. A signup a second...Now fetching millions of new feeds hourly...Suspended free accounts, premium accounts only....Prices increased...Redis suffered from memory corruption...Moved from one app server to 6...Dropped SES for Mail Gun...Hosting provider died...Bringing up PostgreSQL read slaves...DB server upgrades...Introduction of HAProxy...More app servers.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Mar132013

Iron.io Moved From Ruby to Go: 28 Servers Cut and Colossal Clusterf**ks Prevented

For the last few months I've been programming a system in Go, so I'm always on the lookout for information to feed my confirmation bias. An opportunity popped up when Iron.io wrote about their experience using Go to rewrite IronWorker, their ever busy job execution system, originally coded in Ruby.

The result:

Click to read more ...

Tuesday
Mar122013

If Your System was a Symphony it Might Sound Like This...

I am in no way a music expert, but when I listen to Symphony No. 4 by Charles Ives, I imagine it's what a complex software/hardware system might sound like if we could hear its inner workings. Ives uses a lot of riotously competing rhythms in this work. It can sound discordant, yet the effect is deeply layered and eventually harmonious, just like the systems we use, create, and become part of.

I was pointed to this piece by someone who said there were two conductors. I'd never heard of such a thing! So I was intrigued. The first version of the performance sounds and looks great, but it unfortunately does not use two conductors. The second version uses two conductors, but is unfortunately just a snippet.

It's strikingly odd to see two conductors, but I imagine different parts of our systems using different conductors too, running at different rhythms, sometimes slow, sometimes fast, sometimes there are outbursts, sometimes in vicious conflict. Yet conceptually it all stills seems to hang together. 

Charles Ives Symphony No. 4, BBC Symphony Orchestra/David Robertson, cond./Ralph van Raat, piano:

Click to read more ...

Monday
Mar112013

Low Level Scalability Solutions - The Conditioning Collection

We talked about 42 Monster Problems That Attack As Loads Increase. And in The Aggregation Collection we talked about the value of prioritizing work and making smart queues as a way of absorbing and not reflecting traffic spikes.

Now we move on to our next batch of strategies where the theme is conditioning, which is the idea of shaping and controlling flows of work within your application...

Use Resources Proportional To a Fixed Limit

This is probably the most important rule for achieving scalability within an application. What it means:

Click to read more ...

Friday
Mar082013

Stuff The Internet Says On Scalability For March 8, 2013

Hey, it's HighScalability time: 

  • Quotable Quotes:
    • @ibogost: Disabling features of SimCity due to ineffective central infrastructure is probably the most realistic simulation of the modern city.
    • antirez: The point is simply to show how SSDs can't be considered, currently, as a bit slower version of memory. Their performance characteristics are a lot more about, simply, "faster disks".
    • @jessenoller: I only use JavaScript so I can gain maximum scalability across multiple cores. Also unicorns. Paint thinner gingerbread
    • @liammclennan: high-scalability ruby. Why bother?
    • @scomma: Problem with BitCoin is not scalability, not even usability. It's whether someone will crack the algorithm and render BTC entirely useless.
    • @webclimber: Amazing how often I find myself explaining that scalability is not magical
    • @pneuman42: Game servers are the *worst* scalability problem. Most services start small and scale up over time, solving problems along the way
    • @jeffsussna: OH: "Amazon outages involve server auto-scaling failures. Microsoft outages involve credit card auto-renewal failures"
    • carlosthecharlie: Writing Map/Reduce jobs is like making debt payments on technical debt you don't yet owe
    • anonymous: *eye twitches* You maintain secondary indexes in dynamo db fields, managed in application code? Dude. DUDE!
  • LinkedIn: Secrecy Doesn't Scale. Winston Churchill: Truth is so precious that she should always be attended by a bodyguard of lies.
  • So eternal vigilance really can be crowdsourced: Bill Introduced to Re-Legalize Cellphone Unlocking.
  • Engaging discussion with George Dyson: Turing’s Cathedral and the Dawn of the Digital Universe. Template based addressing. DNA is searched by template. You don't have to know the exact location of a protein and the match doesn't have to be exact. Google is template searching for data. He thinks this template idea is a third revolution in computing. Much more flexible and robust. Because of errors you have to build architectures that are more flexible and can deal with ambiguity, which is what nature does. Google as an Oracle Machine. Alan Turing said machines will never be intelligent unless they are allowed to make mistakes. Deterministic computing is limited. A non-deterministic element, an Oracle is required. Machines need to learn by making mistakes, tolerating mistakes, a learning from mistakes. Google is made up deterministic machines. We humans are in Google's loop to act as the non-deterministic signal, as Oracle Machines. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Mar072013

It's a VM Wasteland - A Near Optimal Packing of VMs to Machines Reduces TCO by 22%

In Algorithm Design for Performance Aware VM Consolidation we learn some shocking facts (gambling in Casablanca?):

  • Average server utilization in many data centers is low, estimated between 5% and 15%. This is wasteful because an idle server often consumes more than 50% of peak power.
  • Surely that's just for old style datacenters? Nope. In Google data centers, workloads that are consolidated use only 50% of the processor cores. Every other processor core is left unused simply to ensure that performance does not degrade.

It's a VM wasteland. The goal is to reduce waste by packing VMs onto machines without hurting performance or wasting resources. The idea is to select VMs that interfere the least with each other and places them together on the same server.

It's a NP-Complete problem, but this paper describes a practical method that performs provably close to the optimal. Interestingly they can optimize for performance or power efficiency, so you can use different algorithms for different workloads. 

The result when optimizing for performance are utilizations between 75% and 80% compared to 50% from the naıve method. This gives a 22% reduction in TCO, a significant savings at scale:

Click to read more ...

Wednesday
Mar062013

Low Level Scalability Solutions - The Aggregation Collection

What good are problems without solutions? In 42 Monster Problems That Attack As Loads Increase we talked about problems. In this first post (OK, there was an earlier post, but I'm doing some reorganizing), we'll cover what I call aggregation strategies.

Keep in mind these are low level architecture type suggestions of how to structure the components of your code and how they interact. We're not talking about massive scale-out clusters here, but of what your applications might like like internally, way below the service level interface level. There's a lot more to the world than evented architectures.

Aggregation simply means we aren't using stupid queues. Our queues will be smart. We are deeply aware of queues as containers of work that eventually dictate how the entire system performs. As work containers we know intimately what requests and data sit in our queues and we can use that intelligence to our great advantage.

Prioritize Work

The key idea to it all is an almost mindful approach to design that has programmers consider as a first class concept the priority of what works gets done, why it gets done, and when it gets done, in every aspect of their creation.

Preventing Cascading Failures

Click to read more ...

Tuesday
Mar052013

Sponsored Post: Fitbit, OLO, Amazon, aiCache, Aerospike, Percona, ScaleOut, New Relic, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Fitbit is hiring a Site Operations Lead to help us on our mission to make the world a healthier place! Fitbit's wearable fitness devices are worn by people across the world, each syncing with the web site, wirelessly and automatically, every 15 minutes. Join our mission here
  • OLO's food ordering platform powers some of the largest restaurant chains and feeds millions of consumers. We're looking for Senior C# Software Engineers and DevOps Engineers to help us scale our system. Apply here.
  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
  • Aerospike is Hiring! You dream in C - and like it? Then join us as a Senior Distributed Systems Engineer or Client / Application Engineer. People covent your bag of tricks for troubleshooting systems and network issues? Join our Operations and QA team. See if these positions are a fit for you! 

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • New Benchmark shows Aerospike nearly 10x Faster than the Competition. Thumbtack Technology YCSB Benchmark shows Aerospike nearly 10x faster than Cassandra, Couchbase and Mongodb. Read it now!
  • ScaleOut Software. In-Memory Data Grids for the Enterprise. Download a Free Trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Mar042013

NoSQL Style - A Gangnam Style Parody

Listen up all you IT people...NoSQL, it's the rage now, so turn the page now and boost your stack...Hey, mighty people...Go, go, go, hey, hey, hey, hey, hey, hey...Go NoSQL style...

I for one feel both edified and entertained...can't wait for the Harlem Shake version. 

Monday
Mar042013

7 Life Saving Scalability Defenses Against Load Monster Attacks

We talked about 42 Monster Problems That Attack As Loads Increase. Here are a few ways you can defend yourself, secrets revealed by scaling masters across the ages. Note that these are low level programming level moves, not large architecture type strategies.

Use Resources Proportional To a Fixed Limit

Click to read more ...