How to get started with sizing and capacity planning, assuming you don't know the software behavior?
Here's a common situation and question from the mechanical-sympathy Google group by Avinash Agrawal on the black art of capacity planning:
How to get started with sizing and capacity planning, assuming we don't know the software behavior and its completely new product to deal with?
Gil Tene, Vice President of Technology and CTO & Co-Founder, wrote a very understandable and useful answer that is worth highlighting:
Start with requirements. I see way too many "capacity planning" exercises that go off spending weeks measuring some irrelevant metrics about a system (like how many widgets per hour can this thing do) without knowing what they actually need it to do.
There are two key sets of metrics to state here: the "how much" set and the "how bad" set:
In the "How Much" part, you need to establish, based on expected business needs, Numbers for things (like connections, users, streams, transactions or messages per second) that you expect to interact with at the peak time of normal operations, and the growth rate of those metrics that you need to be able keep up with. Also state expected things like data set size, data interaction rates, and data set growth rates.
For the "How Bad" part, you need to make sure your metrics include a description of what acceptable behavior is, remembering that without describing what us not acceptable, you have not described what acceptable is. Be specific. Saying "always fast" is not nearly as useful as "never slower than X", and saying "mostly" (or "on the average") is usually a way to avoid facing (or even considering) potential consequences of non typical behavior, the best approach here is to think of how often it is ok to have certain levels of bad things happen. (Don't get too greedy and ask for "perfect" here, or you'll get a big bill at the end.) So consider things like how often is it ok for the system to be out of commission for longer than X (for multiple values of X like a year, a week, a day, an hour, a minute, etc.). Also consider how often it is ok for the system react in longer than T (for multiple values of T, like an hour, a minus, a second, 50msec, etc.). Both of these are usually best stated as levels at percentiles, with availability being stated at percentiles of time, and responsiveness stared at percentiles of actual interactions. Don't forget to state the worst acceptable case for each.
Once you have a feel for business-driven requirements stated with "how much" and "how bad" metrics, design a set of experiments to test "how much" (measured in whatever capacity metrics your requirements use) the system can handle without EVER showing even the slightest hint of failing your "how bad" availability and responsiveness requirements. This will invariably include repeated testing under a wide range of "how much" levels to see how far things go before they start to fail.
Then run your experiments...
The rest, like padding for business requirements underestimating reality, and for being optimistically wrong in various ways in measurement, is a relatively easy exercise of arm wrestling between waiting to sleep well at night and wanting to have more beans left to count.
An important note: Before you run the actual experiments and start considering your results, validate that the experimental setup can actually measure what you want. The best way to do that is by artificially introducing certain conditions and verifying that the setup correctly reports on what you know to have actually happened. My favorite tools for this step are the physical pulling out network cables, power cords, and using ^Z (or equivalent signals). You may find yourself spending a good amount of time calibrating the experimental setup so that you can actually trust the results, but that us time well spent, as wasting your time (and risking your business) by analyzing and relying on badly measured data is a very expensive proposition.
Reader Comments (4)
"The Woman Question" to put along side this post ? http://en.wikipedia.org/wiki/The_woman_question How is this even relevant to this post ?
I use it to mark questions, usually ask HS questions, but it seemed to fit for this case as well, in case anyone had something productive to add.
Good advice, Todd, thanks. Too often, capacity planning starts from the other end: the system is subjected to increasing load until it completely wedges, and based on the load at that point, the whole system is deemed sufficient or not. It's much more interesting to know the mean, 90th and 99th percentile performance on a wide variety of metrics under increasing load. Those will reveal bottlenecks with more precision. And since real-world load is rarely very similar to test load, often the bottleneck that's hit is not the one you expected. More-detailed planning allows for longer and more accurate forecasting.
On another note, I would also say that I find the graphic you chose to be inappropriate.
Perhaps a follow-on to the topic should be "OK, how to do budgetary sizing for a new project, assuming you don't know how the software behaves?" Getting the business requirements is the first step, as the poster described, but we're still not able to do any meaningful testing. Assuming we're at the beginning of a budget cycle, it's very likely that the code hasn't been started even, other than perhaps on a few white-boards. Put another way, the finance people need to see a number next to the project's infrastructure line item, and they need it by the last meaningful workday of the calendar year, which in many outfits was about 4 hours ago (end of Friday before Christmas). What then? One honestly can't use a scientific approach to this, but the business requires that a budget for the project is allocated. In this case, the safe route is probably to find the closest analogue to the new project within the existing environment. Size the new project based on the business requirements gathered in step 1 as if it were this analogous project. If you think there is something radical that is missing, put in a best guesstimate, some kind of fudge factor. In this situation, you can't really rely upon a scientific approach, but have to rely on experience and get consensus from the project stakeholders. The goal is to get the project funded amply to not come up short when the time comes, but not so amply that it gets rejected outright.
And what's with all the wet blankets about the girl holding the question mark? Why read into it all sorts of undertones which aren't there. It's just a girl holding a sign, and it's more tasteful than most of what's on prime-time television in 2013.