« Stuff The Internet Says On Scalability For December 16, 2011 | Main | Sponsored Post: Cedexis, Callfire, Attribution Modeling, Logic Monitor, New Relic, ScaleOut, AppDynamics, CloudSigma, ManageEngine, Site24x7 »
Wednesday
Dec142011

Virtualization and Cloud Computing is Changing the Network to East-West Routing

It’s called “east-west” networking, which when compared to its predecessor, “north-south” networking, evinces images of maelstroms and hurricane winds and tsunamis for some reason. It could be the subtle correlation between the transformative shift this change in networking patterns has on the data center with that of El Niño’s transformative power upon the weather patterns across the globe.

virtual maelstromTraditionally, data center networks have focused on North-South network traffic. The assumption is that clients on the edge would mainly communicate with servers at the core, rather than across the network to other clients.

But server virtualization changes all this, with servers, virtual appliances and even virtual desktops scattered across the same physical infrastructure. These environments are also highly dynamic, with workloads moving to different physical locations on the network as virtual servers are migrated (in the case of data center networks) and clients move about the building (in the case of wireless LANs).

-- Distributed Core And East-West Routing--The Network Is Changing (Stephen Foskett, Network Computing Magazine)

Though the term “east-west networking” really focuses on the maelstrom of traffic inside the data center, it is also applicable to the traffic patterns occurring outside the data center as well. The advent of cloud and virtualization has made it such that integration of cloud-hosted resources is becoming more and more appealing, if not already a done deal. That means there’s a lot more east-west networking between “data centers” – between the corporeal data center of the enterprise and the ethereal data center out there, in the cloud. To maintain the efficiencies and cost-savings gained by leveraging those cloud resources, such communications must occur, after all, lest organizations find themselves managing two completely separate processes and sets of policies governing the delivery of the applications they have deployed. Even dismissing the impact of monitoring and management-related traffic, it’s important to note that modern distributed architectures may result in the same sort of traffic patterns for live, user-oriented traffic.

Closely related to the concept of directional networking is that of trombone networking, a phenomenon we’ve been seeing more and more of as virtualization takes hold of data centers the world over. It, too, is equally applicable to multi-data center deployments:

When L2 domains stretch across multiple data centers, traffic flows belonging to a single user session might have to traverse Data Center Interconnect (DCI) link multiple times.

-- TRAFFIC TROMBONE (WHAT IT IS AND HOW YOU GET THEM) 

INFRASTRUCTURE is IRRELEVANT – NOT

A pet peeve of mine is the notion that cloud makes infrastructure irrelevant in some way. Whether organizations themselves must deal with the infrastructure does not make it irrelevant. It may make it less frustrating, less costly, less troublesome on a day-to-day basis, but it is not in any way irrelevant. The infrastructure must still exist and, as we’re seeing in the real world, it’s not going away despite the eager predictions of pundits that it would. The enterprise needs to retain some measure of governance over applications, and they recognize that this means control over the infrastructure so vital to delivering them.

But it is important to recognize that infrastructure is changing, whether inside or outside the data center. The architectures and networking techniques of yesterday are not necessarily well-suited to the architectures and applications of this afternoon and tomorrow. There are unintended consequences to not paying enough attention to the network infrastructure and the changes being wrought on traffic patterns by the disruptive force that is cloud and virtualization. This is no summer rain shower we’re looking at, we’re facing a full-on technological squall that will, if ignored, hit the data center head on.

IT professionals need to take the wheel and be very aware of the impact of the change in traffic patterns wrought by cloud and virtualization – and not just inside the data center, but outside, as well. Architectural solutions to one challenge may inadvertently trigger a change in traffic in the network that causes congestion, overload, or simply poor performance. The notion of extending a VLAN across data centers, into the cloud, sounds appealing from the perspective of managing components in a heterogeneous deployment model, but may in fact be a source of performance and ultimately availability issues due to trombone traffic or east-west networking. Depending on the type of traffic bouncing back and forth, the result could be anything from a TCP retransmission storm to a bouncing up-down status of remote nodes in the data center caused by a heart-beat check on the Load balancer that is too short a time when checking across a LAN to a remote site.

A thorough understanding of networking and the infrastructure that is its foundation as well its relationship to applications is necessary when architecting a data center network capable of not just supporting but adapting to future challenges that arise as a result of virtualization and cloud computing . A silo-based IT organization cannot effectively address the impact from virtualization and cloud computing because no one team has all the pieces necessary. A more collaborative approach or an approach in which a team of cross-functional experts is at the fore will be required to navigate the coming storms.

 

Reader Comments (2)

I've always maintained there are three problems we have to resolve in building a system of any kind:

1. Speed of light (inclusive of speed of electrons moving in media e.g. processing capacity).
2. Rotational latency of disks.
3. Entropy increases.

Using "cloud" doesn't change any of those fundamental "laws" if you will. They just move where someone is dealing with it out to where someone else can deal with it. Now one thing a cloud-y provider gives you is someone else to deal with 2 and 3. Spread disk requests out over potentially TB of spindles and you get faster because you have multiple systems sharing a data path that is larger than any would have on their own (unless they're oversubscribed, but this is theory-land, so let's ignore that for now). Entropy increasing can be mitigated by rigorous automation/monitoring/operations work so that someone who runs 1000 servers may (though not necessarily) run them more efficiently and reduce the probability of failures to the point your individual server(s) have more reliability (less entropy impacts) over time.

But number 1 - that we still can't do anything about. And where that mostly shows up is in networking.

Glad to see someone is finally talking about this in.

December 14, 2011 | Unregistered CommenterMary

Network load inside datacenters (or 'clouds') can be significantly reduced by smart data sharding across processing and storage units. Every incoming request must be routed to the corresponding 'shard', which has enough infromation for request processing with minimal (ideally, zero) external communications. A 'shard' is a set of tightly coupled processing and storage units with fast interconnect. Ideally these units must share the same physical machine.

Requests' routing can be performed using sticky load balancing.

But what to do if shard's hardware breaks? How to handle re-sharding to the higher number of shards when traffic increases? Obviously, shards must be interchangable. So they shouldn't store any persistent data. All persistent data should be flushed to external storage (which can be distributed, scalable, replicated and highly available :) ). Shards should just cache data from persistent storage in order to reduce external communications. Shards' cache sizes can be enormous - actually they can cache hot data in RAM, while flushing cold data to SSD or HDD. All contemporary 64-bit operating systems can do this transparently for you - just put cashes to multi-TiB virtual memory of 64-bit process.

Since each request is processed by the shard, which already has all the required data for its' processing in caches, there is no need in numerous silly RPCs to external services for each request processing, unlike 'stateless nodes' approach, which is popular across the current 'cloud' providers. Obviously, the sharding approach saves not only network bandwidth inside the 'cloud', but reduces CPU load and decreases response times, because there is no need in silly construction, sending, receiving and parsing RPCs to external services. Additionally this approach increases effective cache sizes, since distinct shards don't share shard-specific data in caches, unlike traditional 'stateless servers' usually do. Moreover, the shard approach could save a lot of energy and make datacenters more greener than they are today :)

The most obvious splitting policies for shards are:
- By user. This is useful for any user-centric service.
- By url. This is useful for media sharing services such as picasa, youtube, facebook photos, mediafire or dropbox.

It's a pity that the current 'cloud' providers usually don't provide sticky load balancing, which is essential for the sharding approach described above.

December 16, 2011 | Unregistered Commentervalyala

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>