« Updating distributed web applications | Main | Forum sort order »
Sunday
Aug242008

A Scalable, Commodity Data Center Network Architecture

Looks interesting...

Abstract:
Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance.
In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (2)

A very small addition to the 48 port case, is all those switching nodes tend to have 4 10gige links on them as well.

These can connect one fat tree up to another across these 10gige links, thus doubling your maximum cluster size for the cost of a mere 96 10gige links on hardware that would already be purchased, increasing the cheap cluster size from 27,648 full bandwidth nodes to 55,296 full bandwidth nodes. 10gige fiber also helps with the physical limitations imposed by 500 racks of 27k hosts. Most hosts also have two Ethernet ports. That second port could be used for connecting up to another fat tree, or doubling the bandwidth into the first treee (but halving the number of hosts).

By adding another layer of 96 cheapo 48+4 switches you can connect one 27k host fat tree to 3 of it's friends or 7 others if you are willing to run lots of gige cable.

Or if you decide to step up to the big 10gige 'core' switches at this point, you can connect up as many 27k node fat trees to N switches with N ports each. where N <= 96, for a full bisection band width network of 2,654,208 nodes. That's about 40 racks cubed not counting support gear. Enough?

Another cute trick is dumping storage nodes the 10gige links off the intermediate switching/routing nodes. This does screw up the symmetry of the solution, however it is vast quantities of parallel io for basically zero cost.

Maybe this is why Google is spinning it's own rack switch... hummm.

Note this isn't that new. The 1992 cm-5 was based on a fat tree interconnect. That being said, it is still a good reminder.

December 31, 1999 | Unregistered Commentergulfie

Screwed up the math in at least two places, sorry about that {insert excuse here}.

Linking two fat trees would take ~ 27,648 / 10 10gige links or 2765 or so plus or minus a few. (obviously, sorry a bout that).

Grouping these fat trees together on high end 128 port full bisect bandwidth 10 gige switches/routers could yield 27,648 * 128 = 3,538,944 node full bisection bandwidth interconnect. (which is about 42^3 racks... hummm... 42 ).

As to going to 10gige for interconnect... the phone company has got a lot of millage out of running lots of parallel copper cables... cheaply.

Sorry again about the math... it's probably still wrong in places.

December 31, 1999 | Unregistered Commentergulfie

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>