GEO-aware traffic load balancing and caching at CNBC.com
Saturday, February 6, 2010 at 9:56AM
Rashid Karimov in Strategy

CNBC, like many large web sites, relied  on a CDN for content delivery.  Recently, we started looking  to see if we could improve this model.  Our criteria was:

- improve response time
- have better control over traffic (real time reporting, change management and alerting)
- better utilize internal datacenters and their infrastructure
- shield users from any troubles at the origin infrastructure
- cost out



After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler. We' have had   about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us.  We started building our relationship at Velocity conference in the summer of 2009.

Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin point. The traffic balancing rules could be very flexible. For example:


- send 70% of US East Coast traffic to origin point A, 30% to a CDN
- send 50% of West Coast traffic to origin point B, 25% to C and rest to a CDN
- send everything in EU to origin point D
- send everything in Asia to a CDN

Dyn also offers  a set of value-add services, such as automatic monitoring , failover, alerting and traffic reporting, via a flexible and easy-to-use web portal.


I will now provide a non-scientific explanation of how Anycast works. In principle, to direct  a user to geographically closest origin point, one has to have an idea as to the user's location. A very traditional way of doing that required some form of a DB, mapping IP addresses to locations. Such DBs are widely available and used in all sorts of products, including geo-targeted Ads etc.

A very different, albeit less granular, way to accomplish the same is to use Internet routing (BGP protocol) to advertise routes to the same IP addresses from multiple points of presence. For example, let's imagine one has 4 DNS clusters, each cluster containing 4 nodes with IPs of 1.1.1.1, .2,.3 and .4.

Each cluster is positioned at a major peering point : US East Coast,  West Coast, one in EU and one in Asia. From each location, one advertizes, via BGP, the same subnet with our DNS servers on it.  Mission accomplished ! Through magic of routing,  users in Asia will have their DNS requests come to one's DNS servers in Asia, EU to EU and so on. It is easy to see how this implied knowledge of requestor's geo location can now be used to direct their traffic in a certain, location-specific way. We use a lower value of DNS TTL setting, so that we can assure any DNS changes take place within reasonably short amount of time.

Put different way: when a user wants to visit www.cnbc.com, his/her browser requests DNS resolution for www.cnbc.com. The DNS request will naturally flow to the closest Dyn DNS cluster. The DNS servers at the said cluster have implied awareness of their location. Based on that, DNS server infers that the requests are also coming from users in the same geo area and based on that and set of rules we configure, it directs requesting user to proper origin point for www.cnbc.com.


For origin points, we've chosen our own datacenters,  each with multiple gigabits of egress capacity, at East and West coasts of US.

To actually deliver the traffic,  the decision was simple - we went with aiScaler caching engines that we've used by now for close to year for other projects. Just 4 common 1RU blade servers, 2 at each location, are all we needed to deliver all of the traffic to our US user base. The latest iteration of aiScaler product, v6, has been tested to in excess of 250,000 RPS  per  common HP DL360 server. It runs on any Linux distro, including our regular RedHat 5.x. In our case, the requests to www.cnbc.com peak at over 3000 RPS, so we have a lot of excess capacity for any possible traffic spikes.

Here're our results so far:

- we were able to shave about 1 sec (about 30%!) off page load times, as reported by Keynote.
- the load on our CMS infrastructure has dropped by more than 80%,  surely to have a positive impact on overall stability of CMS environment
- we now have complete, real-time view of traffic - down to RPS, response time, number of  connections, cached responses etc,
- we can now report, chart (Zabbix) and  alert on any of traffic parameters, all in real time!
- our CDN traffic has seen about 80% reduction as well - complete with 80%  reduction in CDN fees
- we're now better utilizing our own datacenters capacity
-  we now have ability to instantaneously affect  our caching rules or load distribution.

- we catch and redirect mobile clients right at aiScaler, sparing unnecessary hits on CMS server


We still have a CDN configured as an overflow/safety valve, just in case.

Summary:  Dyn's Dynamic and Geo-aware DNS load balancing solution and aiScaler's proven caching software have enabled a top-tier financial news website to shave 30% off response time, save money, have better, real-time monitoring, reporting and alerting setup.

A bit about myself: EE-major, I've been working  with Internets  since 1992 - from an ISP node in Russia back in the days of UUCP over 2400b MNP-5 modems  to running some of the worlds busier sites.


And lastly: the above doesn't constitute, in any way, shape or form, an endorsement of the mentioned products, vendors and/or solutions, by CNBC, NBC, GE or any of its subsidiaries. It is instead merely sharing of my experience, for the benefit of Internet community at large.

Rashid Karimov. Platform CNBC.com

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.