Sunday
Feb012009
More Chips Means Less Salsa

Yes, I just got through watching the Superbowl so chips and salsa are on my mind and in my stomach. In recreational eating more chips requires downing more salsa. With mulitcore chips it turns out as cores go up salsa goes down, salsa obviously being a metaphor for speed.
Sandia National Laboratories found in their simulations: a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added. The problem is the lack of memory bandwidth as well as contention between processors over the memory bus available to each processor.
The implication for those following a diagonal scaling strategy is to work like heck to make your system fit within eight multicores. After that you'll need to consider some sort of partitioning strategy. What's interesting is the research on where the cutoff point will be.
Sandia National Laboratories found in their simulations: a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added. The problem is the lack of memory bandwidth as well as contention between processors over the memory bus available to each processor.
The implication for those following a diagonal scaling strategy is to work like heck to make your system fit within eight multicores. After that you'll need to consider some sort of partitioning strategy. What's interesting is the research on where the cutoff point will be.
Reader Comments (5)
Is not that already known? I am not sure but I remember one of my prof. telling me this a few years ago. This has to do with the communication overhead and lock contention.
Well, from the article it seems like this is dependent on the application, as well as the system's architecture, rather than a set in stone rule of nature.
So right now it makes sense to design for a partitioning system regardless of what progress the system and algorithm designers make on the issue, as there will most likely always be limits. Whether they are cost, speed, or time, you'll hit some limiting factor that will eventually point in the direction of partitioning.
If no one has stated that as an generally true axiom, some one should. I'd propose it, but I'm not a researcher and so no one would ever call it Newton's conjecture.
Do things like OpenCL help to alleviate this? Some GPUs have hundreds of cores, now, and they seem to work well on certain parallel problems, and together, provide higher overall "speed." I wonder if some framework (i.e. OpenCL) can help to solve or minimize this effect as CPUs move beyond 8 cores.
well the drum has been beating on this one for a while (herb sutter, functional programming people)....you're going to have to use new tools to get to scaling gracefully to higher cpu counts. the haskell people are apparently working on 8+ cores now, we'll see.
in any case i can't think of many web-bound workloads that are currently requiring 16+ intel i7s. people, the electricity bills on these things are scary.
Multicore and multithread processors are becoming mainstream. This new shift in computer architecture is very well explained in the updated edition of http://www.amazon.com/gp/product/0123704901?ie=UTF8&tag=innoblog-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0123704901">Computer Architecture, Fourth Edition: A Quantitative Approach
http://www.assoc-amazon.com/e/ir?t=innoblog-20&l=as2&o=1&a=0123704901" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />. Most architecture books either ignore multi-core processor designs or cover it poorly. This book is easy to read and covers all the topics from ILP to multi core design changes.
There are in fact limits of multicore and multithread scalability but this is the most reasonable way forward if we want to harness the benefits of Moore's Law in the coming decade.