Strategy: Break Up the Memcache Dog Pile
 Friday, August 7, 2009 at 1:44AM
Friday, August 7, 2009 at 1:44AM Update: Asynchronous HTTP cache validations. A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience.
Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails. Glenn Franxman also has a Django solution in MintCache.
Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly SQL query you want to avoid and a little stale data is OK. Or maybe the amount of data you have is simply larger than physical memory on any one machine. Or maybe you have the temerity to write to your database and cause its cache to flush so database caching isn't sufficient at a certain level of scale. 
Typical examples are for caching article vote counts, comment threads, and event streams. One familiar example that bit me hard is displaying the the top N blog articles. Do you want to scan through your entire access log  table for every page display? Absolutely not. Especially when the nightly backups are going on and the network is very slow. Not good :-)  Yet you still want to update the results every X minutes so the stats stay fresh. 
Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? Database load spikes and the world nearly ends. And since memcached operations are not atomic it's possible stale data could be cached and you'll serve stale data. Which kind of defeats of the purpose of taking load off the data while providing accurate data. So, how do you unpile the dogs?
No Expire Solution
If cache items never expire then there can never be a recalculation storm. Then how do you update the data? Use cron to periodically run the calculation and populate the cache. Take the responsibility for cache maintenance out of the application space. This approach can also be used to pre-warm the the cache so a newly brought up system doesn't peg the database. 
The problem is the solution doesn't always work. Memcached can still evict your cache item when it starts running out of memory. It uses a LRU (least recently used) policy so your cache item may not be around when a program needs it which means it will have to go without, use a local cache, or recalculate. And if we recalculate we still have the same piling on issues.
This approach also doesn't work well for item specific caching. It works for globally calculated items like top N posts, but it doesn't really make sense to periodically cache items for user data when the user isn't even active. I suppose you could keep an active list to get around this limitation though.
Stale Date Solution
This solution introduces a stale date in addition to the expiration date. Glen describes it as:
The first client to request data past the stale date is asked to refresh the data,
while subsequent requests are given the stale but not-yet-expired data as if it
were fresh, with the understanding that it will get refreshed in a 'reasonable'
amount of time by that initial request
In the memcached FAQ a one key approach is described:
Alexey describes a different two key approach:
I dislike embedding meta data with data so I like Alexey's approach a bit better, even though it doubles the key space.
None of these options prevent the problem for ever happening, but they do greatly reduce the failure window for relatively little cost.

 
   
   
   
   
   
   
   
   
   
   
   
   
  
Reader Comments (8)
This is an interesting problem, I hadn't thought to do the whole "stale data vs. expired data" thing.
For top N blogs etc. type lists, we regenerate the data with crons, but for item level caching, we regenerate randomly, so we only hit high traffic items for recaching. For your average back-catalog trailer, a 30 minute cache on # of downloads etc. is fine for our media. But it won't hold for newer media for two reasons. New media that are getting lots of downloads are stale data much quicker, and if they're getting that many downloads, you can get dog-piled.
For example, if we have a new GTA4 trailer go up or something that's getting a whole bunch of traffic, a dog-pile on its download count could easily cripple our database. That said, we obviously can't cron recache our whole library. Our solution was that every page runs a rand(1, 1500), and if it hits 1, then we recache all items on that page. The 1500 value isn't significant, its just what we picked - your site's sweet spot may vary. The point is that pages (like the example trailer) that are getting 30,000 views every 15 minutes, has a high(er) refresh rate, but never dog-piles. It would've recached ~20 times, which we can easily handle, rather than 2000 people dog-piling on it when it expires every 30 minutes.
Nothing better about this solution than the stale/expire methods, except that there's no chance for mini-dogpiles during your "i'm recaching this" set. This was also very specific to our implementation, not sure how well it would work in other venues.
> This is an interesting problem
It's one of those cool little problems I love because the cause and the effect seem so improbably out of whack. You notice the database being slammed periodically like clock work and you have to bring in the CSIs to trace the problem back using a cloudy evidence trail. Once you find the problem it's like oh duh!.
I really like the rand solution. It removes round trips, is simple, elegant, and bounded. Thanks.
Alexey describes a different two key approach:
Create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier.
On a get read STALE key too. If the stale has expired, re-calculate and set the stale key again.
Can you explain a bit more about how this combats the dogpile effect? It seems to me that this just means the STALE key will cause the dogpile if all of the visitors are getting the STALE key also and the STALE key is, in effect, determining the actual re-calculation timing.
The problem with these solutions are that they place the onus of refreshing the cache on the user getting that data, causing him to have a bad user experience. You'd actually like to refresh the data just a bit earlier. No idea how, though...
I tackled this problem a little differently, though I have to say I like the stale key approach. For cases where showing cached data past its expiration is unacceptable though (for example, when data needs to be expired, like cached shopping cart contents), I'll describe my solution.
Basically, on a cache get failure, any requesting clients would generate a rand(1,1000) and cache it to a lock key using memcached's add(). Then, each client would read back the lock key value and compare it to the random value they tried to set. Since add() only sets a key if it doesn't already exist, only one client will acquire this lock (the others will read back that client's random value and not their own).
Clients that didn't successfully acquire the lock would sleep for a specified time (few hundred ms) and go back and retry the cache get. If the data still wasn't cached, they'd attempt to acquire the lock again, and repeat until either they successfully read refreshed data from the cache, or got the lock and refreshed themselves.
The lock key would expire quickly in case one client aborted before the refresh was successful, to avoid hanging the other clients.
I've had mixed results with this approach in practice, but it's the best I've seen so far for handling cases where showing stale data is unacceptable.
-Alex
"... on expiry immediately set a time in the future and re-store the data as is..."
Using a CAS operation here then use the memcache result code to determine whether the current thread needs to update the cache.
If the CAS operation is successful, then current thread should update the cache.
If fails with RES_NOTFOUND then attempt an ADD operation, if that succeeds then the current thread should update the cache.
If fails with RES_DATA_EXISTS then some other thread is updating the cache.
This eliminates multiple threads trying to update a stale item in the cache.
The problem is that thread becomes a single point of failure, and if it does fail no other threads can update till either the TTL used on the CAS operation or the embedded expiration expires.
I am not sure if I understand this solution. There are two issues
1. We are assuming cache is preloaded. So there is no race condition.
2. Even with stale key approach. There is get-test-set kind of operation which is done without using any mutex. So isnt the dog pile effect still going to cause when many threads read stale key and see that it is expired?
Rather than using memcache, and then having to deal with it sometimes ditching your keys for LRU reasons, why not use Redis?
You're not using a cache - you're storing precomputed data.
Redis is great as a shared in-memory data structure as it can be trusted not to drop keys unexpectedly (hardware failures permitting!), and has a much richer data structure than memcache. Have your apps read from Redis, and a cron job or some kind of trigger on updates that updates Redis.
It's also great for tracking live site usage stats in real time, too, and other neat things.