Tuesday
Feb122008
We want to cache a lot :) How do we go about it ?

We have a lot of dependencies to our SQL databases and we have heard that caching does help a lot as we move into scaling and providing better performance.
So the question is what are some reliable software products out there that we could consider in this space ? We want to put a lot of frequently called database calls that do not change frequently into this caching layer.
Also what would be an easy way to move only those database changes into the cache as opposed to reloading or pulling it into cache every few mins or hours.
We need something smart that would just push changes to the caching layer as it happens. I guess we could build our own, but are there any good reliable products out there ? Please also mention how they play with regards to pricing 'cos that would be a determining factor as well.
Thanks
So the question is what are some reliable software products out there that we could consider in this space ? We want to put a lot of frequently called database calls that do not change frequently into this caching layer.
Also what would be an easy way to move only those database changes into the cache as opposed to reloading or pulling it into cache every few mins or hours.
We need something smart that would just push changes to the caching layer as it happens. I guess we could build our own, but are there any good reliable products out there ? Please also mention how they play with regards to pricing 'cos that would be a determining factor as well.
Thanks
Reader Comments (4)
Dear Anonymous? user,
All really depends on your app. There is no magic app AFAIK that can sit between your app and your DB and make things faster.
Even if you find such a magic app/appliance dont go that way. You might not really understand what happened when it suddenly breaks.
With caching, you have think in terms of objects and content.
The content caching is quiet simple, you pull the blob from the db, put it on the filesystem and serve it from there.
The best performance gain that you might get is when you start to put caching in your app.
Perhaps you start with caching small parts, then bigger parts and then the result.
For example:
if you have some code like $user=new User(id);
this code might make gazillion of db calls (perhaps find the basic user account info, then find the address info, then find the profile info, then find the preferences, then find the rights, then find the subscriptions, etc). Ofcoarse its is a very bad thing if this call is doing all that.
So eventually if you serialze the result and store it on disk/mem and load it from disk/mem then you save those gazillion calls.
Then again it might be not so easy :-) you will have to invalidate a cache when an event happens, for example the user changes his preference about something, or the user gets a new mail, or a new bill is generated for the user, etc.
So you might have to put the cache on deeper levels.
For example in the part which gets/sets preferences.
function getPrefs(uid){
if /on/disk/prefs/uid
return file
else go and make gazillion queries
save the result to /on/disk/prefs/uid
return file
}
function setPrefs(uid, name, value){
delete /on/disk/prefs/uid
set the pref
}
etc, etc.
best regards
Atif
What about products like Gigaspace and Tangosol ? Worth considering ?
I'm opposed to things like Gigaspace... I think rolling your own with Memcached is better.
I agree that memcached and an approach like atif's is the best way to go for caching.
The key advantage is that the memcached process runs outside of the JavaVM and thus you don't waste any precious java VM memory space and you avoid a lot of garbage collection.
On 32-bit windows, the java virtual machine can allocate a maximum of 1.5GB of memory and even that much is a burden on the GC. If i have more memory, i usually split the website into several smaller instances
On the other hand, if you need clustering instead of caching then options that cluster at the VM level are much more interesting. Personally, I'm in love with terracotta.org - it's seem easier to use than it's competitors and according to them it performs and scales very well. A fairly simple terracotta setup will let you share some sort of object structure like a hashmap across many different instances and servers. That way you may be able to avoid some of the more complex clustering setups i've seen.
The drawback with terracotta is that is lives within the VM, so I'm not sure it's good for caching large amounts. For small amount of shared objects (maybe less than 100.000) it's great, but for larger ones, i'd go with memcached