« How FarmVille Scales to Harvest 75 Million Players a Month | Main | High Availability Principle : Concurrency Control »
Saturday
Feb062010

GEO-aware traffic load balancing and caching at CNBC.com

CNBC, like many large web sites, relied  on a CDN for content delivery.  Recently, we started looking  to see if we could improve this model.  Our criteria was:

- improve response time
- have better control over traffic (real time reporting, change management and alerting)
- better utilize internal datacenters and their infrastructure
- shield users from any troubles at the origin infrastructure
- cost out



After researching the market, we turned to two vendors: Dyn (Dynamic Network Services) and aiScaler. We' have had   about a year worth of experience with aiScaler (search for "CNBC" to see my previous post ), but Dyn was a new vendor for us.  We started building our relationship at Velocity conference in the summer of 2009.

Dyn has recently started offering a geo-aware DNS load balancing solution, using Anycast and the distributed nature of their DNS presence to enable a key component of what we were trying to achieve: steer users to geographically closest origin point. The traffic balancing rules could be very flexible. For example:


- send 70% of US East Coast traffic to origin point A, 30% to a CDN
- send 50% of West Coast traffic to origin point B, 25% to C and rest to a CDN
- send everything in EU to origin point D
- send everything in Asia to a CDN

Dyn also offers  a set of value-add services, such as automatic monitoring , failover, alerting and traffic reporting, via a flexible and easy-to-use web portal.


I will now provide a non-scientific explanation of how Anycast works. In principle, to direct  a user to geographically closest origin point, one has to have an idea as to the user's location. A very traditional way of doing that required some form of a DB, mapping IP addresses to locations. Such DBs are widely available and used in all sorts of products, including geo-targeted Ads etc.

A very different, albeit less granular, way to accomplish the same is to use Internet routing (BGP protocol) to advertise routes to the same IP addresses from multiple points of presence. For example, let's imagine one has 4 DNS clusters, each cluster containing 4 nodes with IPs of 1.1.1.1, .2,.3 and .4.

Each cluster is positioned at a major peering point : US East Coast,  West Coast, one in EU and one in Asia. From each location, one advertizes, via BGP, the same subnet with our DNS servers on it.  Mission accomplished ! Through magic of routing,  users in Asia will have their DNS requests come to one's DNS servers in Asia, EU to EU and so on. It is easy to see how this implied knowledge of requestor's geo location can now be used to direct their traffic in a certain, location-specific way. We use a lower value of DNS TTL setting, so that we can assure any DNS changes take place within reasonably short amount of time.

Put different way: when a user wants to visit www.cnbc.com, his/her browser requests DNS resolution for www.cnbc.com. The DNS request will naturally flow to the closest Dyn DNS cluster. The DNS servers at the said cluster have implied awareness of their location. Based on that, DNS server infers that the requests are also coming from users in the same geo area and based on that and set of rules we configure, it directs requesting user to proper origin point for www.cnbc.com.


For origin points, we've chosen our own datacenters,  each with multiple gigabits of egress capacity, at East and West coasts of US.

To actually deliver the traffic,  the decision was simple - we went with aiScaler caching engines that we've used by now for close to year for other projects. Just 4 common 1RU blade servers, 2 at each location, are all we needed to deliver all of the traffic to our US user base. The latest iteration of aiScaler product, v6, has been tested to in excess of 250,000 RPS  per  common HP DL360 server. It runs on any Linux distro, including our regular RedHat 5.x. In our case, the requests to www.cnbc.com peak at over 3000 RPS, so we have a lot of excess capacity for any possible traffic spikes.

Here're our results so far:

- we were able to shave about 1 sec (about 30%!) off page load times, as reported by Keynote.
- the load on our CMS infrastructure has dropped by more than 80%,  surely to have a positive impact on overall stability of CMS environment
- we now have complete, real-time view of traffic - down to RPS, response time, number of  connections, cached responses etc,
- we can now report, chart (Zabbix) and  alert on any of traffic parameters, all in real time!
- our CDN traffic has seen about 80% reduction as well - complete with 80%  reduction in CDN fees
- we're now better utilizing our own datacenters capacity
-  we now have ability to instantaneously affect  our caching rules or load distribution.

- we catch and redirect mobile clients right at aiScaler, sparing unnecessary hits on CMS server


We still have a CDN configured as an overflow/safety valve, just in case.

Summary:  Dyn's Dynamic and Geo-aware DNS load balancing solution and aiScaler's proven caching software have enabled a top-tier financial news website to shave 30% off response time, save money, have better, real-time monitoring, reporting and alerting setup.

A bit about myself: EE-major, I've been working  with Internets  since 1992 - from an ISP node in Russia back in the days of UUCP over 2400b MNP-5 modems  to running some of the worlds busier sites.


And lastly: the above doesn't constitute, in any way, shape or form, an endorsement of the mentioned products, vendors and/or solutions, by CNBC, NBC, GE or any of its subsidiaries. It is instead merely sharing of my experience, for the benefit of Internet community at large.

Rashid Karimov. Platform CNBC.com

Reader Comments (7)

Thank you for sharing, Rashid.

February 9, 2010 | Unregistered CommenterEric

Is ai cache primarily being used for small objects to reduce traffic on the Akamai CDN, or is it also being used to store and deliver video?

February 24, 2010 | Unregistered CommenterJim

Rashid,

nice article, txs.

What was the impact of the 30% drop in page load time on business KPIs?
Did time on site go up? Did traffic go up? Did (ad) revenues go up?

February 25, 2010 | Unregistered CommenterAaron Peters

Hi Rashid,

Thanks for your article, though it is not clear to me where the benefits came from:

was it the caching mechanism, or was it the decreasing CDN traffic in favour of your own datacenters?

March 1, 2010 | Unregistered CommenterPeter

Folx,

to answer some of your questions:

- we cache all of content, except video. Video is delivered from a number of sources/CDNs, depending on format (VOD vs Live streaming vs Mobile video). Some CDNs have a really long delay from input to screen, to where we can not possibly use em for Live streaming. Mobile streaming is still more of art than science.

- on benefits: I was pleasantly surprised to see this turn into one of those rare IT projects where you hit a whole _number_ of objective. Site is faster, we have more control (real time reporting, instant changes, automatic alerting) and visibility, it costs less to run, we reduced load on CMS to where CMS is much more stable, we're now shielded from CMS outages.

March 24, 2010 | Registered CommenterRashid Karimov

A very broad geo targeted solution aimed at increasing internal efficiency. But what happens when one needs more specificity in delivering localized content.. ie. on a country or city level basis. Sure at city and country levels the performance increase alone would not justify the use of a local presence but what about for marketing purposes? Is there any advantage of having a localized copy of content. Our studies have shown that there is. Geo Cloud Hosting is a solution design for geo specific load balancing that combines the intelligence of anycast and bgp with data mapped dns based request routing to ensure a formidable web presence in ones desired geo specific target markets. The effects can be anywhere in between amazing an unimpressive.
Thank you for sharing Rashid. Paka

October 11, 2010 | Unregistered CommenterTechnogenics

Did you consider other vendors? Ex. Akamai. Where there clear advantages that you can share with your current vendors?

October 13, 2010 | Unregistered CommenterJeff Dalton

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>