« Numbers Everyone Should Know | Main | Scaling Digg and Other Web Applications »
Monday
Feb162009

Handle 1 Billion Events Per Day Using a Memory Grid

Moshe Kaplan of RockeTier shows the life cycle of an affiliate marketing system that starts off as a cub handling one million events per day and ends up a lion handling 200 million to even one billion events per day. The resulting system uses ten commodity servers at a cost of $35,000.

Mr. Kaplan's paper is especially interesting because it documents a system architecture evolution we may see a lot more of in the future: database centric --> cache centric --> memory grid.

As scaling and performance requirements for complicated operations increase, leaving the entire system in memory starts to make a great deal of sense. Why use cache at all? Why shouldn't your system be all in memory from the start?

General Approach to Evolving the System to Scale

  • Analyze the system architecture and the main business processes. Detect the main hardware bottlenecks and the related business process causing them. Focus efforts on points of greatest return.
  • Rate the bottlenecks by importance and provide immediate and practical recommendation to improve performance.
  • Implement the recommendations to provide immediate relief to problems. Risk is reduced by avoiding a full rewrite and spending a fortune on more resources.
  • Plan a road map for meeting next generation solutions.
  • Scale up and scale out when redesign is necessary.

    One Million Event Per Day System

  • The events are common advertising system operations like: ad impressions, clicks, and sales.
  • Typical two tier system. Impressions and banner sales are written directly to the database.
  • The immediate goal was to process 2.5 million events per day so something needed to be done.

    2.5 Million Event Per Day System

  • PerfMon was used to check web server and DB performance counters. CPU usage was at 100% at peak usage.
  • Immediate fixes included: tuning SQL queries, implementing stored procedures, using a PHP compiler, removing include files and fixing other programming errors.
  • The changes successfully double the performance of the system within 3 months. The next goal was to handle 20 million events per day.

    20 Million Event Per Day System

  • To make this scaling leap a rethinking of how the system worked was in order.
  • The main load of the system was validating inputs in order to prevent forgery.
  • A cache was maintained in the application servers to cut unnecessary database access. The result was 50% reduction in CPU utilization.
  • An in-memory database was used to accumulate transactions over time (impression counting, clicks, sales recording).
  • A periodic process was used to write transactions from the in-memory database to the database server.
  • This architecture could handle 20 million events using existing hardware.
  • Business projections required a system that could handle 200 million events.

    200 Million Event Per Day System

  • The next architectural evolution was to a scale out grid product. It's not mentioned in the paper but I think GigaSpaces was used.
  • A Layer 7 load balancer is used to route requests to sharded application servers. Each app server supports a different set of banners.
  • Data is still stored in the database as the data is used for statistics, reports, billing, fraud detection and so on.
  • Latency was slashed because logic was separated out of the HTTP request/response loop into a separate process and database persistence is done offline.

    At this point architecture supports near-linear scaling and it's projected that it can easily scale to a billion events per day.

    Related Articles

  • GridGain: One Compute Grid, Many Data Grids
  • Reader Comments (17)

    Dear Todd,

    Thank you for the great post,
    I'll be glad to have any reader comment and clarify any issue here, via my email moshe.kaplan @ rocketier.com and in my blog: http://top-performance.blogspot.com/

    Best,
    Moshe

    December 31, 1999 | Unregistered CommenterMoshe Kaplan

    Let me see if I have this right.

    You have a case study that purportedly shows 200 million txns/day. Let's assume we believe that this is true.

    Then you suggest that based on this a 5x increase should be trivial. And therefore you report 1 billion transactions per day because you extrapolated the use case in your head?

    Are you crazy?

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    Wurd. It's never that simple.

    December 31, 1999 | Unregistered CommenterAnonymous

    > Are you crazy?

    Would you really trust my answer?

    December 31, 1999 | Unregistered CommenterTodd Hoff

    > Would you really trust my answer?

    Depends. What's the answer?

    It's not that I think 200 million pages per day or even 1 billion is astronomical. I think these are very achievable numbers with commodity solutions today.

    What I object to is the suggestion that a system is "linear" (or even "near-linear) and therefore should scale to such a degree. If we were talking about 10 txn/s and you said it should handle 20 or even 100/s I might say sure go ahead make the leap.

    But the leap from 200 million to 1 billion has me scratching my head. Is there enough bandwidth in the network fabric? Does this account for peak traffic or is it just an average? What other systems are involved in the infrastructure?
    Are there load balancers, databases, middleware, NAS or SAN? How much more hardware would it really require? Are there enough network ports, enough rack space to accommodate all that?

    Making such a claim without fully evaluating the impact a 5x jump of such a magnitude would have does make me disinclined to lend any credibility to the post.

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    Dear Taylor,

    I'll try to better answer your questions:
    1. The architecture is fully functional and supports the customer growth that is very impressive

    2. We focus in our work with software companies is detecting the 1% of the code that will cause the X20 boosting factor. Meaning that choosing the right business processes and employing the right horizontal sharding/partitioning or scale out mechanism will result in being able to grow in a semi-linear growth.

    3. Regarding the numbers:
    - Impressions request are optimized HTTP request, meaning that each packet is about few hundreds bytes resulting in 40Mbs average and 80Mbs peak network bandwidth consumption for 1 billion impressions events per day (and about 2 times to the whole system). These numbers are reasonable in today's datacenters and do not require any special network devices.
    - Load balancers like Radware AppDirector support up to 16Gbs raw traffic. Layer 7 load balancing is much more resource consuming, but still it is far from reaching these devices limits.
    - Servers - each commodity server (and we use real commodity servers: 1 quad core CPU per server with 4GB RAM) handles about 20 million events per day (average in follow the sun case) or 10 million in case that the business is focused in defined geographic location. Meaning that reaching from 200 million to 1 billion is growing to 100 servers. You may accept that it is trivial in a world that Animoto grows from 50 servers to 3500 servers in minutes.
    - Regarding http://blog.gigaspaces.com/2009/02/09/ultra-scalable-and-blazing-fast-the-sun-fire-x4450-intel-7460-gigaspaces-xap-platform-18-million-operationssec/
    ">how much can you do with Gigaspaces I recommend taking a look in the linked case study.
    - Regarding databases, smart sharding of a database can enable MySQL (or any other database) support enormous sizes of data. You just have to design the sharding in the right way. We did not covered the OLAP/reporting mechanism in this paper, but you can be sure that we were taking care of this issue as well.
    - Regarding storage - storage becomes less expensive every day. I would recommend taking a look at the XIV solution that purchased not long ago by IBM.

    4. Last but no least, you are absolutely right that going from 10 events per second to 100, 1000 and 25K is not trivial, but otherwise it was not worth publishing this case study and nobody should either read it or download it. That what we are doing for our living and the results are pretty impressive so far.

    I'll be glad to answer any other comment or issue,

    Best Regards and have a nice weekend,
    Moshe Kaplan. http://www.rocketier.com">RockeTier. The Performance Experts.
    moshe.kaplan at rocketier.com

    December 31, 1999 | Unregistered CommenterMoshe Kaplan

    Right. In other words, in theory the system should handle 1 billion, but you are (based on the claims) currently doing 200 million.

    I object only to the sensationalist title and conslusion of the article - that 1 billion is easily extrapolated from 200 million. Even given your explanation, without actually testing it and deploying it, I think it's suspect and intentionally deceptive to make such claims.

    If it can do 1 billion then by all means test it and then report back. Until then, don't make outlandish claims. That's all I am saying.

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    Dear Taylor,

    I hope that you and Teracotta doing well these days, and I hope that you'll provide similar case studies or even better so we all be able to learn your methodologies and capabilities.

    I'll keep you updated when we'll reach the 1 billion events per day in production.

    You may find several Q&A that I received in my personal mail at http://top-performance.blogspot.com/2009/02/case-study-handle-1-billion-events-per.html">my blog.

    Best,
    Moshe Kaplan. http://www.rocketier.com">RockeTier. The Performance Experts
    moshe.kaplan at rocketier.com

    December 31, 1999 | Unregistered CommenterMoshe Kaplan

    Sounds good. Please do report back here when you see 1 Billion.

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    Just for your reference there is already a benchmark report that shows how you can generate more then 1Billion (~1.4B) pages a day that was published recently on our GigaSpaces blog http://blog.gigaspaces.com/2009/02/09/ultra-scalable-and-blazing-fast-the-sun-fire-x4450-intel-7460-gigaspaces-xap-platform-18-million-operationssec/">here

    In this benchmark we were using standard pet-clinick application backed with GigaSpaces as data-grid and MySQL as the database. The benchmark was running on Sun Fire X4450 multi core servers using Intel Xeon 7460 processors (4 CPUs with six cores each) with GigaSpaces XAP.

    December 31, 1999 | Unregistered Commenternatis

    Nati,

    With all due respect, my objection is to the use of extrapolated numbers. Your Petstore benchmark admits the numbers are EXTRAPOLATED.

    It also fails to indicate how many page requests generate a read only page view vs. a page view that generates a session write.

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    Hi Taylor

    If you call the conversion of pages/sec to pages/day extrapolation then we may have different view on the definition of extrapolated. The only thing that was extrapolated in this benchmark is the number of concurrent users not the pages/sec.

    The pages that was generated where dynamic pages that included the actual page generation + data query from a remote clustered space. The query involved data aggregation using a map/reduce pattern across data-grid partitions. We didn't use any optimization like http://www.gigaspaces.com/wiki/display/XAP7/Local+Cache+and+Local+View+Components">local-cache which would obviously speed up the results quite significantly.

    If you would look at the results you would notice that we got 45,000 writes/sec with single write operation per writer. Each write was going through synchronous replication to backup (which ends up with two network calls per write). The test was running with 30 concurrent clients. I expect that with batch writes (writeMultiple) the result would have been even higher. The single write remote benchmark is translated into 3888000000 write/day that are 3.88 billion write / day with only two servers involved.

    Based on those results i expect that there would be no significant difference between read or read/write scenario as in our case both read/write are done purely in-memory. All writes to disk is done asynchronously.

    HTH
    Nati S.

    December 31, 1999 | Unregistered Commenternatis

    Hi Nati.

    Well, I do consider measuring pages/sec and then reporting pages/day misleading. I am sure you will agree that things can go wrong with Java when stressed for long periods of time - garbage collection etc. So while I don't consider it particularly egregious, it's better just to keep the measurements at what you measured, rather than make claims about what is possible given the measurement.

    A good (non-technical) example would be an auto manufacturer let's say is able to measure the horsepower of their new engine. They can also measure the weight of the vehicle. Would it be justifiable for them to then take those measurements, and compute how long their 0-60 or 1/4 mile speeds *should* be? Of course not. They have to run the 0-60 and 1/4 mile tests since there all kinds of other variables to consider in the real world.

    As the saying goes, "In theory there is no difference between theory and practice. But, in practice, there is."

    Back to your test. I'm sorry if I don't understand the benchmark you ran. I don't see code somewhere that I can try? If so that would be great in helping me grok it a little more.

    In the meanwhile, I don't understand what you extrapolated. So let's simplify things.

    Can you just list what the exact measurements were for the test you claim gets 1.4B pages/day? If I understand correctly, that's the Petstore benchmark listed at 16K pages/sec. Please focus on that and don't bring in all the other numbers into the discussion.

    So can you tell us how many concurrent users - how many servers - how many threads - how many pages generated. And how many of those pages were read requests vs. write requests (e.g. update the users session object vs. display their session or display a catalog page)

    Btw, what did you use to generate the HTTP Load?

    Thanks.

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    We do have different view on the meaning of Extrapolation - see the Wikipedia http://en.wikipedia.org/wiki/Extrapolation">definition:

    "In mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points"

    In our case we use conversion rather then extrapolation of the units of measurement for the same quantity of data. When you measure speed in km/hour or meter/hour it doesn't mean that you actually drove an hour and measured the speed. Of course over a course of an hour you can get traffic lights, bumps and the speed would be different, but then if you'll try to measure the speed in that way your measure becomes also a function of the road not the car engine which leads to un-comparable results. The two (extrapolation and convention) are very distinct and different. When you use the term "extrapolation" it would hint that you mean that someone "invented" numbers based on some theoretical assumptions. I'll suggest that we will both be more careful when we use that term.

    Having said that I don't want to run into this theoretical argument about the units of measurment if you want to stick to pages/sec I'm fine with it.

    See below the data points you asked for.

    - How many concurrent users, how many threads - 100 threads in total

    - How many pages generated - Each thread generated a new page every 6 ms. This means 160 pages per sec per thread. This means 16000 pages per sec.

    - How many of those pages were read requests vs. write requests - as i mentioned earlier all the 16000 was read running aggregated query (read) over the entire data partitions we had separate test that measured write only operations that i mentioned earlier. In our case both read/write goes to memory i.e. the database is completely decoupled from the application so i don't expect real difference here but yet again - we didn't measure that exact scenario.

    Notes: In our case we consumed the entire client CPU, but not the server CPU or the servers running the spaces which means that by running more client machines we could get even higher number on that regard. If we would use key based query vs aggregation i expect that we would see better results. We also used Apache soft-load-balancer and not some high end hardware load-balancer. The purpose of the test was not to measure our scalability limits on all fronts but really to get a good estimate as to how much you can achieve with relatively small cluster and multi-core hardware.

    I think that were missing the main points that comes from those results:

    1. You can use fairly commodity servers and relatively simple architecture to deal with high load.
    2. The bottleneck is not really the data-layer anymore. i.e. there is a fixed overhead of accessing the data but its not really a concurrency contention anymore.
    3. Latency is mostly a function of the network (x100msec over WAN) not the time it takes to access the data (>1msec, <10 msec).

    You can scale easily by increasing the number of partitions on the data and the number of web-servers. I assume that the next bottleneck would be the load-balancer itself.

    HTH
    http://natishalom.typepad.com>"Nati S

    December 31, 1999 | Unregistered Commenternatis

    Nati,

    Thanks for clearing that up. I don't agree with you that there won't be any difference between read and write. I am sure for the constrained test you have run here it may not make any difference, but in any real world app this is not a distinction to throw away lightly.

    December 31, 1999 | Unregistered CommenterTaylor Gautier

    A single dual-quad core server can nearly handle that. One dual-quad core server with 8gb of RAM can handle 5,000 hits / sec. easily with a typical PHP based website using Varnish, Lighttpd & MySQL. Let's do the math:

    5,000 per sec * 60 seconds = 300,000 per minute.
    300,000 per minute * 60 minutes = 18,000,000 per hour.
    18,000,000 * 24 hours = 432,000,000 per day.

    If you are stuggling to support a billion hits then you need a redesign. 4 commodity servers can handle that easily.

    December 31, 1999 | Unregistered CommenterEnzo

    Hi Enzo,

    You are absolutely right, we never designed this system to work using a single server. This system was designed from day 1 to work using the modest servers (200-500 events per second per server is absolutely enough for the most basic servers - 1 CPU/1GB RAM). This way you can easily implement this kind of systems in the cloud (Amazon AWS for example), gain the infinite growth without investing a single Dollar in the infrastructure,

    Keep Performing,
    Moshe Kaplan. RockeTier. The Performance Experts.
    moshe.kaplan at rocketier.com | blog: http://top-performance.blogspot.com | Web: http://www.rocketier.com | Meet us at Cloud Slam 09: http://cloudslam09.com

    December 31, 1999 | Unregistered CommenterMoshe Kaplan

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>