Wednesday
Mar062013

Low Level Scalability Solutions - The Aggregation Collection

What good are problems without solutions? In 42 Monster Problems That Attack As Loads Increase we talked about problems. In this first post (OK, there was an earlier post, but I'm doing some reorganizing), we'll cover what I call aggregation strategies.

Keep in mind these are low level architecture type suggestions of how to structure the components of your code and how they interact. We're not talking about massive scale-out clusters here, but of what your applications might like like internally, way below the service level interface level. There's a lot more to the world than evented architectures.

Aggregation simply means we aren't using stupid queues. Our queues will be smart. We are deeply aware of queues as containers of work that eventually dictate how the entire system performs. As work containers we know intimately what requests and data sit in our queues and we can use that intelligence to our great advantage.

Prioritize Work

The key idea to it all is an almost mindful approach to design that has programmers consider as a first class concept the priority of what works gets done, why it gets done, and when it gets done, in every aspect of their creation.

Preventing Cascading Failures

Click to read more ...

Tuesday
Mar052013

Sponsored Post: Fitbit, OLO, Amazon, aiCache, Aerospike, Percona, ScaleOut, New Relic, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Fitbit is hiring a Site Operations Lead to help us on our mission to make the world a healthier place! Fitbit's wearable fitness devices are worn by people across the world, each syncing with the web site, wirelessly and automatically, every 15 minutes. Join our mission here
  • OLO's food ordering platform powers some of the largest restaurant chains and feeds millions of consumers. We're looking for Senior C# Software Engineers and DevOps Engineers to help us scale our system. Apply here.
  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
  • Aerospike is Hiring! You dream in C - and like it? Then join us as a Senior Distributed Systems Engineer or Client / Application Engineer. People covent your bag of tricks for troubleshooting systems and network issues? Join our Operations and QA team. See if these positions are a fit for you! 

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • New Benchmark shows Aerospike nearly 10x Faster than the Competition. Thumbtack Technology YCSB Benchmark shows Aerospike nearly 10x faster than Cassandra, Couchbase and Mongodb. Read it now!
  • ScaleOut Software. In-Memory Data Grids for the Enterprise. Download a Free Trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Mar042013

NoSQL Style - A Gangnam Style Parody

Listen up all you IT people...NoSQL, it's the rage now, so turn the page now and boost your stack...Hey, mighty people...Go, go, go, hey, hey, hey, hey, hey, hey...Go NoSQL style...

I for one feel both edified and entertained...can't wait for the Harlem Shake version. 

Monday
Mar042013

7 Life Saving Scalability Defenses Against Load Monster Attacks

We talked about 42 Monster Problems That Attack As Loads Increase. Here are a few ways you can defend yourself, secrets revealed by scaling masters across the ages. Note that these are low level programming level moves, not large architecture type strategies.

Use Resources Proportional To a Fixed Limit

Click to read more ...

Friday
Mar012013

Stuff The Internet Says On Scalability For February 29, 2013

Hey, it's HighScalability time: 

Unfortunately with Delicious still down access to all my lovingly curated links is out. But the show must go on...

  • Quotable Quotes:
    • @muratdemirbas: In the cloudcomputing webservices domain antifragility= elastic scalability + network effect
    • @SQLPerfTips: More hardware won't solve response time problems. Proper indexing does.
    • Stefan Boberg: The fastest I/O request is the one you dont'!
    • Alan Kay: The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.
    • antirez: One thing, more than everything else, keeps me focused while programming: never interrupt the flow.
  • The NPR apps team shows How to build a news app that never goes down and costs you practically nothing. Two servers provide high reliability at very little cost. Even on election night only one server was required. They use Flask, Jinja, LESS, JST, Bootstrap, Fab, git, Python, Node.js, and S3 to serve static content. Excellent description of their low cost, low overhead process.
  • Spotify sings In praise of “boring” technology: More often than not, the right tool for the job is piece of software that has been around for some time, with proven success. One example would be writing a backend service in Java or Python instead of Go or Node.JS. Another example would be storing data in MySQL or PostgreSQL instead of MongoDB or Riak. < They are looking into Zookeeper to replace DNS for more dynamic configuration and Cassandra to handle distributed writes.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Feb272013

42 Monster Problems that Attack as Loads Increase

For solutions take a look at: 7 Life Saving Scalability Defenses Against Load Monster Attacks.

This is a look at all the bad things that can happen to your carefully crafted program as loads increase: all hell breaks lose. Sure, you can scale out or scale up, but you can also choose to program better. Make your system handle larger loads. This saves money because fewer boxes are needed and it will make the entire application more reliable and have better response times. And it can be quite satisfying as a programmer.

Large Number Of Objects

We usually get into scaling problems when the number of objects gets larger. Clearly resource usage of all types is stressed as the number of objects grow.

Continuous Failures Makes An Infinite Event Stream

During large network failure scenarios there is never time for the system recover. We are in a continual state of stress.

Lots of High Priority Work

Click to read more ...

Monday
Feb252013

SongPop Scales to 1 Million Active Users on GAE, Showing PaaS is not Passé

Should you use PaaS for your next project? Often the answer is no because you want control, but here's an example from SongPop showing why the promise of PaaS is not passé. SongPop was able to autoscale to 60 million users, 1 million daily active users, deliver 17 terabytes/day of songs and images worldwide, handle 10k+ queries/second, all with a 6 person engineering team, and only one engineer working full-time on the backend.

Unfortunately there aren't a lot of details, but what there is can be found in Scaling SongPop to 60 million users with App Engine and Google Cloud Storage. The outline follows the script. You start small. Let PaaS do the heavy lifting. And when you need to scale you just buy more resources and tune a little (maybe a lot). The payoff is you get to focus on feature development and can get by with a small team.

Here's a diagram of their architecture:

Click to read more ...

Friday
Feb222013

Stuff The Internet Says On Scalability For February 22, 2013

Hey, it's HighScalability time: 

  • Quotable Quotes:
    • @p337er: I have committed some truly horrendous crimes against scalability today.
    • @ErrataRob: doubling performance doesn't double scalability.
    • @rsingel: In 2008 when Yahoo.com  linked out, I had a Wired story get 1M visitors in an hour from their homepage.
    • @philiph: Lets solve this scalability problem with a queuing system
    • @jaykreps: Transferring data across data centers? Read this page and go tune your TCP buffer sizes...
    • @gwestr: In which the node community showers schadenfreude upon the rails community for "scalability is not my problem" architectures
    • @pbailis:  Makes sense, though I think there's a tradeoff re: coordination and scalability (always homogeneous vs dynamically heterogenous)
    • @pembleton: To summarize Yoav's philosophy: we started as quick as we can and then we accelerated #operationgrandma in #reversim
    • @surfichris: “We chose Heroku because we believed we could just `heroku scale web=X` when needed.” - Yeah, because scalability is magic, and unlimited...
    • kent langley: cloud computing had already been brewing for decades with its roots reaching far back in time. Grids, clusters and more were all precursors. However, it is striking how far things have come in just about five years.
    • @caspereeko: So Youtube views count is being generated from MapReduce from Google CDNs access logs and stored later in cache
    • @michaellperry: Three reasons for using async: offloading (do work on different thread), concurrency (start multiple), and scalability (use fewer resources)
    • @otomillo: one thing to remember about big O notation, it’s that it’s a measure of scalability; not a measure of performance as is commonly believed

  • Late refund? The IRS may be having a few Congress Induced Scalability Problems...IRS Statement on "Where's My Refund?" Tool

  • The eternal Law of Frustration predicts every competent person eventually gets so tired of working with the messiness of the real world that they just want to press reset and and start all over again. Greenfields are so...well...green. No, I'm not talking about another master criminal trying to take over the world, but Google creating a replacement web protocol on top of UDP. But it always ends up the same, you realize Layer 1 needs to be replaced and that it's the only layer that really exists. Good discussion on Hacker News.

  • As we've seen major web properties transition back to black hole portal type sites, where attention enters and never escapes the event horizon, Pinterest thinks we are also trending back to human curated content, with humans doing the indexing. Or is that just what the machines want them to think?

  • Complexity accretes. Rails, You Have Turned into Java. Congratulations! Which is whey every few years we need to all get naked, have a party, declare a Jubilee, and start over again, hopefully a little wiser.

  • What is NoSQL and is it pornography? I think Emin's lesson is you can't turn something into pornography by adding more layers of clothing.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Feb202013

Smart Companies Fail Because they Do Everything Right - Staying Alive to Scale

Wired has a wonderful interview with Clayton Christensen, author of the tech ninja's bible, Innovator's Dilemma. Innovation is the name of the game in Silicon Valley and if you want to understand the rules of the game this article is a quick and clear way of learning. Everything is simply explained with compelling examples by the man himself.

Just as every empire has fallen, every organization is open to disruption. It's the human condition to become comfortable and discount potential dangers. It takes a great deal of mindfulness to outwit and outlast the human condition. If you want to be the disruptor and avoid being the disruptee, this is good stuff.

He also talks about his new book, The Capitalist's Dilemma, which addresses this puzzle: if corporations are doing so well why are individuals doing so bad?

If someone can help you see a deep meaningful pattern in life then they haven't brought you a fish, they've taught you how to fish. That's what Christensen does. Here's a gloss of his world view changing points:

Click to read more ...

Tuesday
Feb192013

Puppet monitoring: how to monitor the success or failure of Puppet runs  

This is a guest post by LogicMonitor's Director of Tech Ops, Jesse Aukeman, about the different ways they're monitoring the success or failure of Puppet runs.

If you are like us, you are running some type of linux configuration management tool. The value of centralized configuration and deployment is well known and hard to overstate. Puppet is our tool of choice. It is powerful and works well for us, except when things don't go as planned. Failures of puppet can be innocuous and cosmetic, or they can cause production issues, for example when crucial updates do not get properly propagated.

Why?

In the most innocuous cases, the puppet agent craps out (we run puppet agent via cron). As nice as puppet is, we still need to goose it from time to time to get past some sort of network or host resource issue. A more dangerous case is when an administrator temporarily disables puppet runs on a host in order to perform some test or administrative task and then forgets to reenable it. In either case it’s easy to see how a host may stop receiving new puppet updates. The danger here is that this may not be noticed until that crucial update doesn't get pushed, production is impacted, and it’s the client who notices.

How to implement monitoring?

Click to read more ...