Ehcache - A Java Distributed Cache
 Tuesday, July 29, 2008 at 2:21PM
Tuesday, July 29, 2008 at 2:21PM Ehcache is a pure Java cache with the following features: fast, simple, small foot print, minimal dependencies,  provides memory and disk stores for scalability into gigabytes, scalable to hundreds of caches
is a pluggable cache for Hibernate, tuned for high concurrent load on large multi-cpu servers, provides LRU, LFU and FIFO cache eviction policies, and is production tested. Ehcache is used by LinkedIn to cache member  profiles. The user guide says it's possible to get at 2.5 times system speedup for persistent Object Relational Caching, a 1000 times system speedup for Web Page Caching, and a 1.6 times system speedup Web Page Fragment Caching.
From the website:
Introduction
Ehcache is a cache library. Before getting into ehcache, it is worth stepping back and thinking about caching generally.
About Caches
Wiktionary defines a cache as A store of things that will be required in future, and can be retrieved rapidly . That is the nub of it.
In computer science terms, a cache is a collection of temporary data which either duplicates data located elsewhere or is the result of a computation. Once in the cache, the data can be repeatedly accessed inexpensively.
Why caching works
Locality of Reference
While ehcache concerns itself with Java objects, caching is used throughout computing, from CPU caches to the DNS system. Why? Because many computer systems exhibit locality of reference . Data that is near other data or has just been used is more likely to be used again.
The Long Tail
Chris Anderson, of Wired Magazine, coined the term The Long Tail to refer to Ecommerce systems. The idea that a small number of items may make up the bulk of sales, a small number of blogs might get the most hits and so on. While there is a small list of popular items, there is a long tail of less popular ones.
The Long Tail
The Long Tail is itself a vernacular term for a Power Law probability distribution. They don't just appear in ecommerce, but throughout nature. One form of a Power Law distribution is the Pareto distribution, commonly know as the 80:20 rule.
This phenomenon is useful for caching. If 20% of objects are used 80% of the time and a way can be found to reduce the cost of obtaining that 20%, then the system performance will improve.
Will an Application Benefit from Caching?
The short answer is that it often does, due to the effects noted above.
The medium answer is that it often depends on whether it is CPU bound or I/O bound. If an application is I/O bound then then the time taken to complete a computation depends principally on the rate at which data can be obtained. If it is CPU bound, then the time taken principally depends on the speed of the CPU and main memory.
While the focus for caching is on improving performance, it it also worth realizing that it reduces load. The time it takes something to complete is usually related to the expense of it. So, caching often reduces load on scarce resources.
Speeding up CPU bound Applications
CPU bound applications are often sped up by:
* improving algorithm performance
* parallelizing the computations across multiple CPUs (SMP) or multiple machines (Clusters).
* upgrading the CPU speed.
The role of caching, if there is one, is to temporarily store computations that may be reused again.
An example from ehcache would be large web pages that have a high rendering cost. Another caching of authentication status, where authentication requires cryptographic transforms.
Speeding up I/O bound Applications
Many applications are I/O bound, either by disk or network operations. In the case of databases they can be limited by both.
There is no Moore's law for hard disks. A 10,000 RPM disk was fast 10 years ago and is still fast. Hard disks are speeding up by using their own caching of blocks into memory.
Network operations can be bound by a number of factors:
* time to set up and tear down connections
* latency, or the minimum round trip time
* throughput limits
* marshalling and unmarhshalling overhead
The caching of data can often help a lot with I/O bound applications. Some examples of ehcache uses are:
* Data Access Object caching for Hibernate
* Web page caching, for pages generated from databases.
Increased Application Scalability
The flip side of increased performance is increased scalability. Say you have a database which can do 100 expensive queries per second. After that it backs up and if connections are added to it it slowly dies.
In this case, caching may be able to reduce the workload required. If caching can cause 90 of that 100 to be cache hits and not even get to the database, then the database can scale 10 times higher than otherwise.
How much will an application speed up with Caching?
The short answer
The short answer is that it depends on a multitude of factors being:
* how many times a cached piece of data can and is reused by the application
* the proportion of the response time that is alleviated by caching
In applications that are I/O bound, which is most business applications, most of the response time is getting data from a database. Therefore the speed up mostly depends on how much reuse a piece of data gets.
In a system where each piece of data is used just once, it is zero. In a system where data is reused a lot, the speed up is large.
The long answer, unfortunately, is complicated and mathematical. It is considered next.
Related Articles

 
   
   
   
   
   
   
   
   
   
   
   
   
  
Reader Comments (6)
EHCache is not a distributed cache. Its clearly spelled out in the documentation. Thanks for playing.
Are you sure ?
http://ehcache.sourceforge.net/documentation/distributed_caching.html
If you say not, then why not ?
This page:
http://ehcache.sourceforge.net/documentation/distributed_design.html
describes how Ehcache is distributed. It can distribute by RMI, JMS, JGroups, etc. It can notify other cache instances by copying data or by invalidation.
Wayne
Hello Everyone,
Just to provide an update to earlier comments:
Ehcache has merged withTerracotta as of August 2009.
Ehcache can now be distributed into a coherent distributed cache using the Terracotta Server Array. So developers can have the simplicity, performance, and robust API of Ehcache, and the scalability, HA capabilities, and data coherence of Terracotta. Plus you get some really useful tools to help you see what's happening inside the cache, manage cache policies like eviction, TTL, etc., and generally tune the application much more easily.
If you use Ehcache today, distributing your cache with Terracotta is a very simple configuration change.
Check it out at http://www.terracotta.org/distributedcache or http://www.ehcache.org and send us feedback.
The appropriate updates to documentation for Ehcache and Terracotta are now underway.
Thanks,
Jeff
Terracotta Product Team
We were facing some problem of scalability and performance and the best way we figure out was the use of distributed caching solution. So we decided to use NCache which is .NET distributed caching solution which also provides support for Java app as well
It seems, that ehcache-from-box can't be used as distributed cache. You may only replicate caches via JMS, RMI, JGroups.
To use it for cache distribution you also need of Terracotta Enterprise Suite. And it is commercial solution. Isn't it?